CN113362830A

CN113362830A - Starting method, control method, system and storage medium of voice assistant

Info

Publication number: CN113362830A
Application number: CN202110698819.XA
Authority: CN
Inventors: 邓晓斌; 肖小斌
Original assignee: Guang'an Yige Electronics Co ltd; Huaying Yingsheng Electronics Co ltd; Shenzhen Shengbida Communication Co ltd
Current assignee: Guang'an Yige Electronics Co ltd; Huaying Yingsheng Electronics Co ltd; Shenzhen Shengbida Communication Co ltd
Priority date: 2021-06-23
Filing date: 2021-06-23
Publication date: 2021-09-07

Abstract

The application relates to the technical field of voice assistants, in particular to a starting method, a control method, a system and a storage medium of a voice assistant, which comprise the following steps: receiving a voice wake-up signal; judging whether a preset awakening word exists in the voice awakening signal or not and whether the biological characteristic information of the voice awakening signal is consistent with preset standard biological characteristic information or not; if yes, waking up the voice assistant and entering a screen display mode of the voice assistant; acquiring a voiceprint signal in a voice assistant screen display mode to obtain a volume value of the voiceprint signal; judging whether the volume value meets a first preset condition or not, and if so, acquiring face image information; and judging whether the face image information meets a second preset condition, if so, starting a voice assistant and entering a voice assistant hand control mode. Through many-sided discernment judgement, effectively reduce the possibility that voice assistant mistake opened.

Description

Starting method, control method, system and storage medium of voice assistant

Technical Field

The present application relates to the field of voice assistant technologies, and in particular, to a method, a system, a terminal, and a storage medium for controlling a voice assistant.

Background

With the rapid development of mobile terminals and internet technologies, currently, most mobile terminals support voice assistants, and the voice assistants have the functions of realizing voice control, information query and the like through voice interaction modes such as intelligent conversation, instant question answering and the like. Currently, the voice assistant on the mobile terminal usually needs the user to wake up the mobile terminal, and the wake-up of the voice assistant is generally realized by inputting a specific voice wake-up word, for example: the voice wake-up word of the voice assistant Siri is "hey Siri".

However, when the user is in a public place or in a noisy environment and wakes up the voice assistant, the voice assistant of other people is easily woken up by mistake, and the false wake-up rate is high.

Disclosure of Invention

In order to reduce the false wake-up rate of the voice assistant, the application provides a starting method, a control method, a system and a storage medium of the voice assistant.

In a first aspect, the method for starting a voice assistant provided by the present application adopts the following technical solutions:

a starting method of a voice assistant comprises the following steps:

receiving a voice wake-up signal;

judging whether a preset awakening word exists in the voice awakening signal or not and whether the biological characteristic information of the voice awakening signal is consistent with preset standard biological characteristic information or not; if yes, waking up the voice assistant and entering a screen display mode of the voice assistant;

acquiring a voiceprint signal in a voice assistant screen display mode to obtain a volume value of the voiceprint signal;

judging whether the volume value meets a first preset condition or not, and if so, acquiring face image information;

and judging whether the face image information meets a second preset condition, if so, starting a voice assistant and entering a voice assistant hand control mode.

By adopting the technical scheme, whether the voice assistant is awakened or not is judged by identification of the awakening words and comparison of the biological characteristic information, after the voice assistant is awakened, whether the voice volume value of the voiceprint signal and the face image information meet the preset conditions or not is judged, the voice assistant is started when the voice assistant meets the preset conditions, and the possibility of mistaken starting of the voice assistant is effectively reduced by multi-aspect identification and judgment.

Optionally, the method for setting the standard biometric information includes:

acquiring user voice data;

extracting voiceprint information of the user voice data;

and training the voiceprint information to obtain standard biological characteristic information of the user.

Optionally, the determining whether the volume value meets a first preset condition specifically includes:

and if the volume value is within a preset range and the distance of the sound source of the voiceprint signal is smaller than a preset distance threshold value, determining that the volume value meets a first preset condition.

Optionally, the determining whether the face image information meets a second preset condition specifically includes:

and if the face image information is consistent with the preset face standard image information, determining that the face image information meets a second preset condition.

In a second aspect, the present application provides a method for controlling a voice assistant, which adopts the following technical solutions:

a control method of a voice assistant is based on the control method of the voice assistant, and comprises the following steps:

receiving a first voice control instruction in a voice assistant control mode, and determining an application and a corresponding operation corresponding to the first voice control instruction;

and controlling the application corresponding to the first voice control instruction to execute corresponding operation through the voice assistant.

Optionally, the method further includes:

and under the voice assistant control mode, if the voice instruction is not received within the preset time, automatically closing the voice assistant.

In a third aspect, the present application provides a control system of a voice assistant, which adopts the following technical solutions:

a control system for a voice assistant, comprising:

the receiving awakening signal module is used for receiving the voice awakening signal;

the primary judgment module is used for judging whether a preset awakening word exists in the voice awakening signal and whether the biological characteristic information of the voice awakening signal is consistent with preset standard biological characteristic information; if yes, waking up the voice assistant and entering a screen display mode of the voice assistant;

the voice print signal acquisition module is used for acquiring a voice print signal in a voice assistant screen display mode to obtain a volume value of the voice print signal;

the face image information acquisition module is used for judging whether the volume value meets a first preset condition or not, and if so, acquiring face image information;

and the secondary judgment module is used for judging whether the face image information meets a second preset condition, if so, starting the voice assistant and entering a voice assistant hand control mode.

In a fourth aspect, the present application provides an intelligent terminal, which adopts the following technical scheme:

an intelligent terminal comprises a memory and a processor, wherein the memory is stored with a computer program which can be loaded by the processor and can execute the control method of the voice assistant.

In a fifth aspect, the present application provides a computer-readable storage medium, which adopts the following technical solutions:

a computer-readable storage medium storing a computer program capable of being loaded by a processor and executing the control method of the voice assistant as described above.

In summary, the present application includes at least one of the following beneficial technical effects:

whether the voice assistant is awakened or not is judged through identification of the awakening words and comparison of the biological characteristic information, after the voice assistant is awakened, whether preset conditions are met or not is judged through the volume value of the voiceprint signal and the face image information, the voice assistant is started when the preset conditions are met, and possibility of mistakenly starting of the voice assistant is effectively reduced through multi-aspect identification and judgment.

Drawings

Fig. 1 is a flow chart of a method for starting a voice assistant according to an embodiment of the present application.

Fig. 2 is a flow chart of a control method of a voice assistant according to an embodiment of the present application.

Fig. 3 is a block diagram illustrating a wake-up system of a voice assistant according to an embodiment of the present application.

Description of reference numerals: 1. a module for receiving wake-up signals; 2. a primary judgment module; 3. a voiceprint signal acquisition module; 4. the face image information acquisition module, 5, judge the module again.

Detailed Description

The present embodiments are only illustrative and not restrictive, and those skilled in the art can make modifications to the embodiments without inventive contribution as required after reading the present specification, but the technical solutions in the embodiments of the present application will be described clearly and completely in the following claims with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The present application is described in further detail below with reference to the attached drawing figures.

The embodiment of the application discloses a starting method of a voice assistant. Referring to fig. 1, the control method of the voice assistant includes the steps of:

s10, acquiring voice wake-up signals through a microphone of the mobile terminal;

s11, judging whether a preset awakening word exists in the voice awakening signal or not and whether the biological characteristic information of the voice awakening signal is consistent with preset standard biological characteristic information or not; if yes, waking up the voice assistant and entering a screen display mode of the voice assistant;

specifically, the received voice wake-up signal is denoised and converted into a text, and then the text is input into the wake-up word recognition model as input data, and whether a wake-up word exists in the voice wake-up signal is recognized through the wake-up word recognition model.

The awakening word recognition model is generally characterized by adopting a GMM-HMM model, namely a Hidden Markov Model (HMM) is used for characterizing the state transition condition between the voice units, and a Gaussian Mixture Model (GMM) is used for characterizing the state output probability of the voice units as the awakening word acoustic model of the voice units. Taking a phonetic unit as an example of a phoneme unit, when modeling specifically, a triphone unit may be used to represent a context-dependent phoneme unit of each phoneme unit. During specific training, firstly, collecting a large amount of voice data, and extracting acoustic features of voice units corresponding to the voice data; and then training the acoustic model of the awakening word of each phonetic unit by using the acoustic characteristics of the phonetic unit and the acoustic characteristics of the phonetic unit which is related to the context of the phonetic unit.

The state output probability of the voice unit can be represented by using a Deep Neural Network (DNN), the structure of the neural network is determined when an acoustic model is constructed, such as one or more combination forms of a feedforward neural network, a convolutional neural network or a cyclic neural network, the number of hidden layers of the neural network is generally 3 to 8, and the number of nodes of each hidden layer is generally 2048; and then carrying out model training by using a large amount of collected voice data to obtain the state output probability of each voice unit, namely the awakening word recognition model of the voice unit.

In addition, the method for setting the standard biometric information includes: and acquiring voice data of the user, extracting voiceprint information of the voice data of the user, and training the voiceprint information to obtain standard biological characteristic information of the user.

S12, acquiring a voiceprint signal in a voice assistant screen display mode to obtain a volume value of the voiceprint signal;

the voiceprint signals are collected through a microphone of the mobile terminal, and the voiceprint signals are converted into digital signals through an audio signal analog-to-digital converter arranged in the microphone, so that the volume value of the voiceprint signals is obtained.

S13, judging whether the volume value meets a first preset condition, and if so, acquiring face image information;

specifically, if the volume value is within the preset range and the distance of the sound source of the voiceprint signal is smaller than the preset distance threshold, it is determined that the volume value meets the first preset condition. The distance of the sound source of the voiceprint signal can be determined according to a qualitative attenuation formula of sound in air, wherein the preset range and the preset distance threshold value can be set according to experience.

And S14, judging whether the face image information meets a second preset condition, if so, starting the voice assistant and entering a voice assistant hand control mode.

Specifically, the face image information is obtained through a camera of the mobile terminal, whether the obtained face image information is consistent with the preset face standard image information or not is judged through a face recognition algorithm, and if yes, the face image information is determined to meet a second preset condition.

The face recognition algorithm means that after a face is detected and key facial feature points are located, a main face area can be cut out, and after preprocessing, the main face area is fed into a recognition algorithm at the rear end and is compared with a preset face standard image.

Based on the starting method of the voice assistant, the embodiment of the application discloses a control method of the voice assistant.

Referring to fig. 2, the control method of the voice assistant includes the steps of:

s20, receiving a first voice control instruction in a voice assistant control mode, and determining an application and a corresponding operation corresponding to the first voice control instruction;

when the face image information meets a second preset condition, determining that a user needs to perform voice input on the mobile terminal, enabling the mobile terminal to enter a voice assistant control mode, acquiring voiceprint information through a microphone of the mobile terminal to perform voice recognition, generating a first voice control instruction, and determining a corresponding application and a corresponding operation according to the first voice control instruction.

And S21, controlling the application corresponding to the first voice control instruction to execute the corresponding operation by the voice assistant.

In addition, in the voice assistant control mode, if the voice instruction is not received within the preset time, the voice assistant is automatically closed.

The embodiment of the present application further discloses a wake-up system of a voice assistant, referring to fig. 3, including:

the receiving awakening signal module 1 is used for receiving voice awakening signals;

the primary judgment module 2 is used for judging whether a preset awakening word exists in the voice awakening signal and whether the biological characteristic information of the voice awakening signal is consistent with preset standard biological characteristic information; if yes, waking up the voice assistant and entering a screen display mode of the voice assistant;

the voiceprint signal acquisition module 3 is used for acquiring the voiceprint signal in a voice assistant screen display mode to obtain the volume value of the voiceprint signal;

the face image information acquisition module 4 is used for judging whether the volume value meets a first preset condition or not, and if so, acquiring face image information;

and the secondary judgment module 5 is used for judging whether the face image information meets a second preset condition, if so, starting the voice assistant and entering a voice assistant hand control mode.

The embodiment of the application also discloses an intelligent terminal which comprises a memory and a processor, wherein the memory is stored with a computer program which can be added by the processor and can execute the voice time-telling starting method and the voice time-telling control method.

Based on the same inventive concept, the embodiment of the present application further discloses a computer-readable storage medium, which can be loaded and executed by a processor to implement the steps of the voice time-telling starting method and the voice time-telling control method.

The computer-readable storage medium includes, for example: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

It is clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of each functional module is merely used as an example, and in practical applications, the foregoing function distribution may be completed by different functional modules as needed, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the foregoing described functions, and the specific working processes of the above described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments and are not described herein again,

in the several embodiments provided in this application, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and the division into hard blocks or units is merely one logical functional division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not implemented.

In addition, in this application, each functional unit in each embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. With this understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, ceramic disk or optical disk, etc. various media capable of storing program codes.

The above embodiments are only used to describe the technical solutions of the present application in detail, but the above embodiments are only used to help understanding the method and the core idea of the present application, and should not be construed as limiting the present application. Those skilled in the art should also appreciate that various modifications and substitutions can be made without departing from the scope of the present disclosure.

Claims

1. A method for starting a voice assistant is characterized by comprising the following steps:

receiving a voice wake-up signal;

2. The method for starting up a voice assistant according to claim 1, wherein the method for setting the standard biometric information comprises:

acquiring user voice data;

extracting voiceprint information of the user voice data;

3. The method as claimed in claim 1, wherein the determining whether the volume value satisfies a first predetermined condition specifically comprises:

4. The method for starting a voice assistant according to claim 1, wherein the determining whether the face image information satisfies a second preset condition specifically comprises:

5. A method for controlling a voice assistant, the method being based on the method for starting the voice assistant according to any one of claims 1 to 4, comprising:

6. The method of claim 5, further comprising:

7. A control system for a voice assistant, comprising:

8. The utility model provides an intelligent terminal which characterized in that: comprising a memory and a processor, said memory having stored thereon a computer program which can be loaded by the processor and which performs the method according to any of claims 1-6.

9. A computer-readable storage medium characterized by: a computer program which can be loaded by a processor and which executes the method according to any of claims 1-6.