CN107678793A - Voice assistant starts method and device, terminal and computer-readable recording medium - Google Patents
Voice assistant starts method and device, terminal and computer-readable recording medium Download PDFInfo
- Publication number
- CN107678793A CN107678793A CN201710828831.1A CN201710828831A CN107678793A CN 107678793 A CN107678793 A CN 107678793A CN 201710828831 A CN201710828831 A CN 201710828831A CN 107678793 A CN107678793 A CN 107678793A
- Authority
- CN
- China
- Prior art keywords
- terminal
- voice assistant
- microphone
- user
- ambient sound
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
- G06F9/44505—Configuring for program initiating, e.g. using registry, configuration files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Abstract
The present invention provides a kind of voice assistant and starts method and device, terminal and computer-readable recording medium.The voice assistant starts method and is applied to terminal, including:Whether the predeterminable area of detection terminal and the microphone of the terminal are blocked;If the predeterminable area of the terminal is not blocked and the microphone of the terminal is blocked, ambient sound is gathered by the microphone;And if the ambient sound meets preparatory condition, start the voice assistant in the terminal, and using the ambient sound as voice assistant described in voice operating instruction input.It is of the invention can simple and fast startup voice assistant, avoid cumbersome operating process and embarrassment that stiff wake-ups word is brought to user, make that man-machine interaction is more natural, simplicity.
Description
Technical field
The present invention relates to intelligent sound technical field, more particularly to a kind of voice assistant start method and device, terminal and
Computer-readable recording medium.
Background technology
In prior art, user is usually required by pressing physical button, or is opened corresponding to voice assistant
Application program could wake up voice assistant, it is cumbersome.
In addition, user can also wake up voice assistant by specifically waking up word, such as:" he, Siri ".It is so specific
Wake-up word it is more stiff, easily user is felt awkward in public, man-machine interaction is not natural enough.
The content of the invention
In view of the foregoing, it is necessary to a kind of voice assistant is provided and starts method and device, terminal and computer-readable deposits
Storage media, can simple and fast startup voice assistant, avoid cumbersome operating process and stiff wake-up word to user with
The embarrassment come, make man-machine interaction more natural, easy.
A kind of voice assistant starts method, and methods described includes:
Whether the predeterminable area of detection terminal and the microphone of the terminal are blocked;
If the predeterminable area of the terminal is not blocked and the microphone of the terminal is blocked, pass through the Mike
Elegance collection ambient sound;And
If the ambient sound meets preparatory condition, start the voice assistant in the terminal, and by the ambient sound
Sound is as voice assistant described in voice operating instruction input.
According to the preferred embodiment of the present invention, the predeterminable area is located at the top of the terminal, and the microphone is located at
The bottom of the terminal;Or
Position of the predeterminable area with the microphone in the terminal is relative.
According to the preferred embodiment of the present invention, methods described also includes:
Judge whether the distance between the terminal and the user meet the first preparatory condition;
Whether detection user is speaking;
If the distance between the terminal and the user meet the first preparatory condition, and the user is speaking, then really
The fixed ambient sound meets preparatory condition.
According to the preferred embodiment of the present invention, it is described detection user whether speak including:
The lip image of the user is obtained by the camera device of the terminal;
Monitor the change of the lip image;
If the change of the lip image meets the second preparatory condition, it is determined that the user is speaking.
It is described to be helped the ambient sound as voice described in voice operating instruction input according to the preferred embodiment of the present invention
Hand includes:
Noise reduction process is carried out to obtain the environment to the background sound in the ambient sound by diamylose noise reduction technology
Speech sound in sound, and using the speech sound as voice assistant described in voice operating instruction input;
Methods described also includes:
The function according to corresponding to voice operating instruction controls the voice assistant to perform.
A kind of voice assistant starter, described device include:
Detection unit, whether it is blocked for the predeterminable area of detection terminal and the microphone of the terminal;
Collecting unit, if the predeterminable area for the terminal is not blocked and the microphone of the terminal is blocked,
Ambient sound is then gathered by the microphone;And
Start unit, if meeting preparatory condition for the ambient sound, start the voice assistant in the terminal, and
Using the ambient sound as voice assistant described in voice operating instruction input.
According to the preferred embodiment of the present invention, the predeterminable area is located at the top of the terminal, and the microphone is located at
The bottom of the terminal;Or
Position of the predeterminable area with the microphone in the terminal is relative.
According to the preferred embodiment of the present invention, described device also includes:
Judging unit, for judging whether the distance between the terminal and the user meet the first preparatory condition;
The detection unit, it is additionally operable to detect whether user is speaking;
Determining unit, if meeting the first preparatory condition, and the use for the distance between the terminal and the user
Speaking at family, it is determined that the ambient sound meets preparatory condition.
According to the preferred embodiment of the present invention, the detection unit is specifically used for:
The lip image of the user is obtained by the camera device of the terminal;
Monitor the change of the lip image;
If the change of the lip image meets the second preparatory condition, it is determined that the user is speaking.
According to the preferred embodiment of the present invention, the start unit is using the ambient sound as voice operating instruction input institute
Stating voice assistant includes:
Noise reduction process is carried out to obtain the environment to the background sound in the ambient sound by diamylose noise reduction technology
Speech sound in sound, and using the speech sound as voice assistant described in voice operating instruction input;
Described device also includes:
Control unit, for the function according to corresponding to the voice operating instruction control voice assistant execution.It is a kind of
Terminal, the terminal includes processor, when the processor is used to perform the computer program stored in memory described in realization
Voice assistant starts method.
A kind of computer-readable recording medium, is stored thereon with computer program, and the computer program is held by processor
Realize that the voice assistant starts method during row.
As can be seen from the above technical solutions, whether the predeterminable area of detection terminal of the present invention and the microphone of the terminal
It is blocked;If the predeterminable area of the terminal is not blocked and the microphone of the terminal is blocked, pass through the Mike
Elegance collection ambient sound;And if the ambient sound meets preparatory condition, start the voice assistant in the terminal, and by institute
Ambient sound is stated as voice assistant described in voice operating instruction input.Helped using the startup voice of energy simple and fast of the invention
Hand, avoids cumbersome operating process and embarrassment that stiff wake-up word is brought to user, makes that man-machine interaction is more natural, letter
Just.
Brief description of the drawings
Fig. 1 is the flow chart for the preferred embodiment that voice assistant of the present invention starts method.
Fig. 2 is the functional block diagram of the preferred embodiment of voice assistant starter of the present invention.
Fig. 3 is that the present invention realizes that voice assistant starts the structural representation of the terminal of the preferred embodiment of method.
Main element symbol description
Embodiment
In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with the accompanying drawings with specific embodiment pair
The present invention is described in detail.
As shown in figure 1, it is the flow chart for the preferred embodiment that voice assistant of the present invention starts method.According to different need
Ask, the order of step can change in the flow chart, and some steps can be omitted.
The voice assistant starts method and is applied in one or more terminal, and the terminal is that one kind can be according to thing
The equipment of the instruction for first setting or storing, automatic progress numerical computations and/or information processing, its hardware include but is not limited to micro- place
Manage device, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array
(Field-Programmable Gate Array, FPGA), digital processing unit (Digital Signal Processor,
DSP), embedded device etc..
The terminal can be any electronic product that man-machine interaction can be carried out with user, for example, personal computer,
Tablet personal computer, smart mobile phone, personal digital assistant (Personal Digital Assistant, PDA), game machine, interactive mode
Web TV (Internet Protocol Television, IPTV), intellectual Wearable etc..Residing for the terminal
Network includes but is not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN (Virtual Private
Network, VPN) etc..
Whether the microphone of S10, the predeterminable area of detection terminal and the terminal is blocked.
In at least one embodiment of the present invention, the predeterminable area can be located at the top of the terminal, and described
Microphone is located at the bottom of the terminal;Either, position phase of the predeterminable area with the microphone in the terminal
It is right, such as:When the microphone is located at the top of the terminal, the predeterminable area is then located at the bottom of the terminal.
In at least one embodiment of the present invention, the Mike of the predeterminable area of the detection terminal and the terminal
Whether wind is blocked can be by the combination of following one or more kinds of modes:
(1) detecting the predeterminable area of the terminal and the microphone of the terminal by the range sensor of the terminal is
It is no to be blocked.Such as:Mounting distance sensing is distinguished near the predeterminable area of the terminal and near the microphone of the terminal
Device, the distance that the range sensor and the microphone of the terminal nearby installed by the predeterminable area of the terminal are nearby installed
Sensor detects the distance with human body, judges whether the distance is less than or equal to the first pre-determined distance, so that it is determined that the terminal
Predeterminable area and the microphone of the terminal whether be blocked.Preferably, if (such as the terminal is default for range sensor
The range sensor of areas adjacent installation) detect and be less than or equal to the first pre-determined distance with the distance of human body, then judge this away from
It is blocked from sensor correspondence position;If range sensor (such as the Distance-sensing that the predeterminable area of the terminal is nearby installed
Device) detect and be more than the first pre-determined distance with the distance of human body, then judge that the range sensor correspondence position is not blocked.Example
Such as, if range sensor and the distance of human body that the predeterminable area of the terminal is nearby installed are more than the first pre-determined distance, sentence
The predeterminable area of the fixed terminal is not blocked;If the range sensor that the microphone of the terminal is nearby installed detects and people
The distance of body is less than or equal to the first pre-determined distance, then judges that the microphone of the terminal is blocked.
(2) detecting the predeterminable area of the terminal and the microphone of the terminal by the light sensor of the terminal is
It is no to be blocked.Such as:Light level is installed respectively near the predeterminable area of the terminal and near the microphone of the terminal
Device, the light that the light sensor and the microphone of the terminal nearby installed by the predeterminable area of the terminal are nearby installed
Sensor has detected whether that light is blocked, so that it is determined that the predeterminable area of the terminal and the microphone of the terminal whether by
Block.Preferably, if light sensor (such as light sensor of the predeterminable area of terminal installation nearby) has detected
When light is blocked, then judge that the light sensor correspondence position is blocked;If light sensor (such as the terminal is default
The light sensor of areas adjacent installation) when detecting that no light is blocked, then judge the light sensor correspondence position not
It is blocked.If for example, the light sensor that the predeterminable area of the terminal is nearby installed detects that no light is blocked, sentence
The predeterminable area of the fixed terminal is not blocked;If the light sensor that the microphone of the terminal is nearby installed has detected light
Line is blocked, then judges that the microphone of the terminal is blocked.
S11, if the predeterminable area of the terminal is not blocked and the microphone of the terminal is blocked, by described
Microphone gathers ambient sound.
In at least one embodiment of the present invention, when the predeterminable area for detecting the terminal is not blocked and described
When the microphone of terminal is blocked, it can be determined that microphone of the someone close to the terminal, it is possible to which someone is utilizing the wheat
Terminal inputs voice gram described in wind direction, now, passes through the microphone and gathers ambient sound.
In at least one embodiment of the present invention, by diamylose noise reduction technology to the background sound in the ambient sound
Noise reduction process is carried out to obtain the speech sound in the ambient sound.
In at least one embodiment of the present invention, the diamylose noise reduction technology refers to install two Mikes in the terminal
Wind, the microphone that a microphone uses when being common user's communication, for collecting voice, and another microphone arrangement exists
Fuselage top, possess ambient noise acquisition function, it is convenient to gather ambient noise.So, by diamylose noise reduction technology, to institute
The background sound stated in ambient sound realizes noise reduction process, it is possible to effectively resists the ambient noise institute band of the terminal, peripheral
The interference come, the definition of the speech sound in the ambient sound is greatly increased, so as to preferably get the ring
Speech sound in the sound of border.
S12, if the ambient sound meets preparatory condition, start the voice assistant in the terminal, and by the ring
Border sound is as voice assistant described in voice operating instruction input.
In at least one embodiment of the present invention, the voice assistant in the terminal is started, and by the ambient sound
Before sound is as voice assistant described in voice operating instruction input, the terminal judges whether the ambient sound meets default bar
Part.
In at least one embodiment of the present invention, the terminal judges whether the ambient sound meets preparatory condition
Mode is:Judge whether the distance between the terminal and the user meet the first preparatory condition, and detect user whether
Speak.If the distance between the terminal and the user meet the first preparatory condition, and the user is speaking, it is determined that institute
State ambient sound and meet preparatory condition.
In at least one embodiment of the present invention, it is described to judge whether the distance between the terminal and the user are full
The first preparatory condition of foot can pass through the combination of following one or more kinds of modes:
(1) judge whether the distance between the terminal and the user meet by the range sensor of the terminal
One preparatory condition.
Such as:Range sensor is installed in the terminal, by the range sensor detect the terminal with it is described
The distance between user, when the distance between the terminal and the user are less than or equal to the second pre-determined distance, judge
The distance between the terminal and the user meet the first preparatory condition.
It should be noted that second pre-determined distance can be 50cm, 100cm etc..Second pre-determined distance can
, can also be by the user of the terminal according to use habit when dispatching from the factory, to be rule of thumb configured by person skilled
Self-defined setting is carried out, this is not restricted by the present invention.
(2) judge whether the distance between the terminal and the user meet by the light sensor of the terminal
One preparatory condition.
Such as:Light sensor is installed in the terminal, when detecting the terminal by the light sensor
When light is blocked, then judge that the distance between the terminal and the user meet the first preparatory condition.
(3) judge whether the distance between the terminal and the user meet by the temperature sensor of the terminal
One preparatory condition.
Such as:Temperature sensor is installed in the terminal, when someone is close to the terminal, the temperature sensor will
The change of the temperature of the terminal is detected, so as to judge that the distance between the terminal and the user meet the first default bar
Part.
(4) judge whether the distance between the terminal and the user meet first by the camera device of the terminal
Preparatory condition.
Such as:The terminal is provided with camera device, and the face-image of the user is gathered by the camera device.If
In the face-image, the distance between eyes of the user are more than or equal to preset value, then judge the terminal and institute
State the distance between user and meet the first preparatory condition.
In at least one embodiment of the present invention, the detection user whether speak including:Pass through the end
The camera device at end obtains the lip image of the user;Monitor the change of the lip image;If the change of the lip image
Change and meet the second preparatory condition, it is determined that the user is speaking.
In at least one embodiment of the present invention, it is following to meet that the second preparatory condition includes for the change of the lip image
The combination of one or more kinds of situations:
(1) change frequency of the lip image is more than or equal to predeterminated frequency.
Such as:The terminal detects that the change frequency of the lip image is 20Hz, more than the predeterminated frequency 15Hz,
Then the terminal determines that the change of the lip image meets the second preparatory condition.
It should be noted that the predeterminated frequency can be rule of thumb configured when dispatching from the factory by person skilled,
Self-defined setting can also be carried out according to use habit by the user of the terminal, this is not restricted by the present invention.
(2) number that the lip image changes in preset duration is more than or equal to preset times.
Such as:The terminal detects that change frequency of the lip image in preset duration 1 second is more than or equal to
During preset times 5 times, then the terminal determines that the change of the lip image meets the second preparatory condition.
It should be noted that similarly, the preset duration and the preset times can be when dispatching from the factory, by correlation technique
Personnel are rule of thumb configured, and can also carry out self-defined setting, the present invention according to use habit by the user of the terminal
This is not restricted.
In at least one embodiment of the present invention, using the ambient sound as voice described in voice operating instruction input
Assistant includes:Noise reduction process is carried out to obtain the environment to the background sound in the ambient sound by diamylose noise reduction technology
Speech sound in sound, and using the speech sound as voice assistant described in voice operating instruction input.
In at least one embodiment of the present invention, speech sound of the terminal in the ambient sound is got
Afterwards, using the speech sound as voice assistant described in voice operating instruction input, and instructed and controlled according to the voice operating
Function corresponding to the voice assistant execution.
Such as:When the terminal carries out noise reduction process by diamylose noise reduction technology to the background sound in the ambient sound
During obtaining the speech sound in the ambient sound " weather for helping me to consult today ", the terminal gets institute's predicate
Sound operational order " helps me to consult the weather of today ", and the voice assistant is activated, and the terminal can control the voice
Assistant's execute instruction " inquiry weather ".
In summary, whether the present invention energy predeterminable area of detection terminal and the microphone of the terminal are blocked;If institute
The predeterminable area for stating terminal is not blocked and the microphone of the terminal is blocked, then passes through the microphone and gather ambient sound
Sound;And if the ambient sound meets preparatory condition, start the voice assistant in the terminal, and the ambient sound is made
For voice assistant described in voice operating instruction input.Therefore, the startup voice assistant of present invention energy simple and fast, is avoided cumbersome
Operating process and the embarrassment brought to user of stiff wake-up word, make man-machine interaction more natural, easy.
As shown in Fig. 2 it is the functional block diagram of the preferred embodiment of voice assistant starter of the present invention.The voice helps
Manual starting device 11 include detection unit 100, collecting unit 101, start unit 102, judging unit 103, determining unit 104 and
Control unit 105.In the present embodiment, the function on each unit will be described in detail in follow-up embodiment.
Whether the predeterminable area of the detection terminal of detection unit 100 and the microphone of the terminal are blocked.
In at least one embodiment of the present invention, the predeterminable area can be located at the top of the terminal, and described
Microphone is located at the bottom of the terminal;Either, position phase of the predeterminable area with the microphone in the terminal
It is right, such as:When the microphone is located at the top of the terminal, the predeterminable area is then located at the bottom of the terminal.
In at least one embodiment of the present invention, the detection unit 100 detects predeterminable area and the institute of the terminal
Whether stating the microphone of terminal, be blocked can be by the combination of following one or more kinds of modes:
(1) detecting the predeterminable area of the terminal and the microphone of the terminal by the range sensor of the terminal is
It is no to be blocked.Such as:Mounting distance sensing is distinguished near the predeterminable area of the terminal and near the microphone of the terminal
Device, the distance that the range sensor and the microphone of the terminal nearby installed by the predeterminable area of the terminal are nearby installed
Sensor detects the distance with human body, judges whether the distance is less than or equal to the first pre-determined distance, so that it is determined that the terminal
Predeterminable area and the microphone of the terminal whether be blocked.Preferably, if (such as the terminal is default for range sensor
The range sensor of areas adjacent installation) detect and be less than or equal to the first pre-determined distance with the distance of human body, then judge this away from
It is blocked from sensor correspondence position;If range sensor (such as the Distance-sensing that the predeterminable area of the terminal is nearby installed
Device) detect and be more than the first pre-determined distance with the distance of human body, then judge that the range sensor correspondence position is not blocked.Example
Such as, if range sensor and the distance of human body that the predeterminable area of the terminal is nearby installed are more than the first pre-determined distance, sentence
The predeterminable area of the fixed terminal is not blocked;If the range sensor that the microphone of the terminal is nearby installed detects and people
The distance of body is less than or equal to the first pre-determined distance, then judges that the microphone of the terminal is blocked.
(2) detecting the predeterminable area of the terminal and the microphone of the terminal by the light sensor of the terminal is
It is no to be blocked.Such as:Light level is installed respectively near the predeterminable area of the terminal and near the microphone of the terminal
Device, the light that the light sensor and the microphone of the terminal nearby installed by the predeterminable area of the terminal are nearby installed
Sensor has detected whether that light is blocked, so that it is determined that the predeterminable area of the terminal and the microphone of the terminal whether by
Block.Preferably, if light sensor (such as light sensor of the predeterminable area of terminal installation nearby) has detected
When light is blocked, then judge that the light sensor correspondence position is blocked;If light sensor (such as the terminal is default
The light sensor of areas adjacent installation) when detecting that no light is blocked, then judge the light sensor correspondence position not
It is blocked.If for example, the light sensor that the predeterminable area of the terminal is nearby installed detects that no light is blocked, sentence
The predeterminable area of the fixed terminal is not blocked;If the light sensor that the microphone of the terminal is nearby installed has detected light
Line is blocked, then judges that the microphone of the terminal is blocked.
If the predeterminable area of the terminal is not blocked and the microphone of the terminal is blocked, collecting unit 101
Ambient sound is gathered by the microphone.
In at least one embodiment of the present invention, when the detection unit 100 detects the predeterminable area of the terminal
It is not blocked and when the microphone of the terminal is blocked, it can be determined that microphone of the someone close to the terminal, it is possible to
Someone is using terminal input voice described in Mike's wind direction, and now, the collecting unit 101 is gathered by the microphone
Ambient sound.
In at least one embodiment of the present invention, the collecting unit 101 by diamylose noise reduction technology to the environment
Background sound in sound carries out noise reduction process to obtain the speech sound in the ambient sound.
In at least one embodiment of the present invention, the diamylose noise reduction technology refers to install two Mikes in the terminal
Wind, the microphone that a microphone uses when being common user's communication, for collecting voice, and another microphone arrangement exists
Fuselage top, possess ambient noise acquisition function, it is convenient to gather ambient noise.So, the collecting unit 101 is by double
Wheat noise reduction technology, noise reduction process is realized to the background sound in the ambient sound, it is possible to effectively resist the terminal week
Disturbed caused by the ambient noise on side, greatly increase the definition of the speech sound in the ambient sound, so as to more
The good speech sound got in the ambient sound.
If the ambient sound meets preparatory condition, start unit 102 starts the voice assistant in the terminal, and will
The ambient sound is as voice assistant described in voice operating instruction input.
In at least one embodiment of the present invention, the voice assistant in the start unit 102 starts the terminal,
And using the ambient sound as voice assistant described in voice operating instruction input before, judging unit 103 judges the environment
Whether sound meets preparatory condition.
In at least one embodiment of the present invention, it is pre- to judge whether the ambient sound meets for the judging unit 103
If the mode of condition is:Judge whether the distance between the terminal and the user meet the first preparatory condition, the detection
Unit 100 detects whether user is speaking.If the distance between the terminal and the user meet the first preparatory condition, and institute
State user speaking, then the determining unit 104 determines that the ambient sound meets preparatory condition.
In at least one embodiment of the present invention, the judging unit 103 is judged between the terminal and the user
Distance whether meet that the first preparatory condition can be by the combination of following one or more kinds of modes:
(1) judge whether the distance between the terminal and the user meet by the range sensor of the terminal
One preparatory condition.
Such as:Range sensor is installed in the terminal, by the range sensor detect the terminal with it is described
The distance between user, it is described when the distance between the terminal and the user are less than or equal to the second pre-determined distance
Judging unit 103 judges that the distance between the terminal and the user meet the first preparatory condition.
It should be noted that second pre-determined distance can be 50cm, 100cm etc..Second pre-determined distance can
, can also be by the user of the terminal according to use habit when dispatching from the factory, to be rule of thumb configured by person skilled
Self-defined setting is carried out, this is not restricted by the present invention.
(2) judge whether the distance between the terminal and the user meet by the light sensor of the terminal
One preparatory condition.
Such as:Light sensor is installed in the terminal, when detecting the terminal by the light sensor
When light is blocked, then the judging unit 103 judges that the distance between the terminal and the user meet the first default bar
Part.
(3) judge whether the distance between the terminal and the user meet by the temperature sensor of the terminal
One preparatory condition.
Such as:Temperature sensor is installed in the terminal, when someone is close to the terminal, the temperature sensor will
Detect the change of the temperature of the terminal, the judging unit 103 so as to judge between the terminal and the user away from
From meeting the first preparatory condition.
(4) judge whether the distance between the terminal and the user meet first by the camera device of the terminal
Preparatory condition.
Such as:The terminal is provided with camera device, and the face-image of the user is gathered by the camera device.If
In the face-image, the distance between eyes of the user are more than or equal to preset value, then the judging unit 103 is sentenced
Disconnected the distance between the terminal and the user meet the first preparatory condition.
In at least one embodiment of the present invention, the detection unit 100 detect the user whether speak including:
The lip image of the user is obtained by the camera device of the terminal;Monitor the change of the lip image;If the lip
The change of shape image meets the second preparatory condition, then the determining unit 104 determines that the user is speaking.
In at least one embodiment of the present invention, it is following to meet that the second preparatory condition includes for the change of the lip image
The combination of one or more kinds of situations:
(1) change frequency of the lip image is more than or equal to predeterminated frequency.
Such as:The detection unit 100 detects that the change frequency of the lip image is 20Hz, more than the default frequency
Rate 15Hz, then the determining unit 104 determine the lip image change meet the second preparatory condition.
It should be noted that the predeterminated frequency can be rule of thumb configured when dispatching from the factory by person skilled,
Self-defined setting can also be carried out according to use habit by the user of the terminal, this is not restricted by the present invention.
(2) number that the lip image changes in preset duration is more than or equal to preset times.
Such as:The detection unit 100 detect change frequency of the lip image in preset duration 1 second be more than or
When person is equal to preset times 5 times, then the determining unit 104 determines that the change of the lip image meets the second preparatory condition.
It should be noted that similarly, the preset duration and the preset times can be when dispatching from the factory, by correlation technique
Personnel are rule of thumb configured, and can also carry out self-defined setting, the present invention according to use habit by the user of the terminal
This is not restricted.
In at least one embodiment of the present invention, using the ambient sound as voice described in voice operating instruction input
Assistant includes:Noise reduction process is carried out to obtain the environment to the background sound in the ambient sound by diamylose noise reduction technology
Speech sound in sound, and using the speech sound as voice assistant described in voice operating instruction input.
In at least one embodiment of the present invention, voice sound of the control unit 105 in the ambient sound is got
After sound, using the speech sound as voice assistant described in voice operating instruction input, and instructed and controlled according to the voice operating
Make function corresponding to the voice assistant execution.
Such as:When the terminal carries out noise reduction process by diamylose noise reduction technology to the background sound in the ambient sound
During obtaining the speech sound in the ambient sound " weather for helping me to consult today ", the terminal gets institute's predicate
Sound operational order " helps me to consult the weather of today ", and the voice assistant is activated, the i.e. controllable institute of described control unit 105
State voice assistant execute instruction " inquiry weather ".
In summary, whether the present invention energy predeterminable area of detection terminal and the microphone of the terminal are blocked;If institute
The predeterminable area for stating terminal is not blocked and the microphone of the terminal is blocked, then passes through the microphone and gather ambient sound
Sound;And if the ambient sound meets preparatory condition, start the voice assistant in the terminal, and the ambient sound is made
For voice assistant described in voice operating instruction input.Therefore, the startup voice assistant of present invention energy simple and fast, is avoided cumbersome
Operating process and the embarrassment brought to user of stiff wake-up word, make man-machine interaction more natural, easy.
As shown in figure 3, it is that the present invention realizes that voice assistant starts the structural representation of the terminal of the preferred embodiment of method.
The terminal 1 be it is a kind of can be automatic to carry out numerical computations and/or information according to the instruction for being previously set or storing
The equipment of processing, its hardware include but is not limited to microprocessor, application specific integrated circuit (Application Specific
Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), number
Word processing device (Digital Signal Processor, DSP), embedded device etc..
The terminal 1, which can also be but not limited to any one, to pass through keyboard, mouse, remote control, touch pad with user
Or the mode such as voice-operated device carries out the electronic product of man-machine interaction, for example, personal computer, tablet personal computer, smart mobile phone, individual
Digital assistants (Personal Digital Assistant, PDA), game machine, IPTV (Internet
Protocol Television, IPTV), intellectual Wearable etc..
The terminal 1 can also be the computing devices such as desktop PC, notebook, palm PC and cloud server.
Network residing for the terminal 1 includes but is not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, Virtual Private Network
Network (Virtual Private Network, VPN) etc..
In one embodiment of the invention, the terminal 1 includes, but not limited to memory 12, processor 13, and
The computer program that can be run in the memory 12 and on the processor 13 is stored in, such as voice assistant starts journey
Sequence.It will be understood by those skilled in the art that the schematic diagram is only the example of terminal 1, the not restriction of structure paired terminal 1,
It can include than illustrating more or less parts, either combine some parts or different parts, such as the terminal 1
Input-output equipment, network access equipment, bus etc. can also be included.
Alleged processor 13 can be CPU (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other PLDs, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor
Deng the processor 13 is arithmetic core and the control centre of the terminal 1, utilizes various interfaces and the whole terminal of connection
1 various pieces, and perform the operating system of the terminal 1 and the types of applications program of installation, program code etc..
The processor 13 performs the operating system of the terminal 1 and the types of applications program of installation.The processor
13 perform the application program to realize that above-mentioned each voice assistant starts the step in embodiment of the method, such as shown in Fig. 1
Step S10, S11 and S12.
Or the processor 13 realizes each module in above-mentioned each device embodiment/mono- when performing the computer program
The function of member, such as:Whether the predeterminable area of detection terminal and the microphone of the terminal are blocked;If the terminal is default
Region is not blocked and the microphone of the terminal is blocked, then gathers ambient sound by the microphone;It is and if described
Ambient sound meets preparatory condition, then starts the voice assistant in the terminal, and using the ambient sound as voice operating
Voice assistant described in instruction input.
Exemplary, the computer program can be divided into one or more module/units, one or more
Individual module/unit is stored in the memory 12, and is performed by the processor 13, to complete the present invention.It is one
Or multiple module/units can be the series of computation machine programmed instruction section that can complete specific function, the instruction segment is used to retouch
State implementation procedure of the computer program in the terminal 1.For example, the computer program can be divided into detection list
Member 100, collecting unit 101, start unit 102, judging unit 103, determining unit 104 and control unit 105.
The memory 12 can be used for storing the computer program and/or module, the processor 13 by operation or
The computer program and/or module being stored in the memory 12 are performed, and calls the data being stored in memory 12,
Realize the various functions of the terminal 1.The memory 12 can mainly include storing program area and storage data field, wherein, deposit
Storing up program area can storage program area, application program (such as sound-playing function, image player work(needed at least one function
Can etc.) etc.;Storage data field can store uses created data (such as voice data, phone directory etc.) etc. according to mobile phone.
In addition, memory 12 can include high-speed random access memory, nonvolatile memory can also be included, for example, it is hard disk, interior
Deposit, plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card,
Flash card (Flash Card), at least one disk memory, flush memory device or other volatile solid-state parts.
The memory 12 can be the external memory storage and/or internal storage of terminal 1.Further, the storage
Device 12 can be the circuit with store function for not having in integrated circuit physical form, such as RAM (Random-Access
Memory, random access memory), FIFO (First In First Out) etc..Or the memory 12 can also be
Memory with physical form, such as memory bar, TF card (Trans-flash Card).
If the integrated module/unit of the terminal 1 is realized in the form of SFU software functional unit and is used as independent product
Sale in use, can be stored in a computer read/write memory medium.It is of the invention to realize based on such understanding
All or part of flow in embodiment method is stated, by computer program the hardware of correlation can also be instructed to complete, institute
The computer program stated can be stored in a computer-readable recording medium, and the computer program, can when being executed by processor
The step of realizing above-mentioned each embodiment of the method.Wherein, the computer program includes computer program code, the computer
Program code can be source code form, object identification code form, executable file or some intermediate forms etc..The computer can
Reading medium can include:Any entity or device of the computer program code, recording medium, USB flash disk, mobile hard can be carried
Disk, magnetic disc, CD, computer storage, read-only storage (ROM, Read-Only Memory), random access memory
(RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..Need what is illustrated
It is that the content that the computer-readable medium includes can be fitted according to legislation in jurisdiction and the requirement of patent practice
When increase and decrease, such as in some jurisdictions, according to legislation and patent practice, computer-readable medium, which does not include electric carrier wave, to be believed
Number and telecommunication signal.
The range sensor 14 is used for by the distance of detection and human body to detect the predeterminable area of the terminal 1 and institute
Whether the microphone 18 for stating terminal is blocked, and judges the end by detecting the distance between the terminal 1 and the user
Whether end the distance between 1 and the user meet the first preparatory condition.
The light sensor 15 is used for by detecting whether that light is blocked to detect the predeterminable area of the terminal 1
And whether the microphone 18 of the terminal is blocked, and whether the light by detecting the terminal 1 is blocked with described in judgement
Whether the distance between terminal 1 and the user meet the first preparatory condition.
The temperature sensor 16 is used for by the change of detection temperature to judge between the terminal 1 and the user
Whether distance meets the first preparatory condition.
The camera device 17 is used for the face-image for gathering the user.
The microphone 18 is used to gather ambient sound.
With reference to Fig. 1, the memory 12 in the terminal 1 stores multiple instruction to realize a kind of voice assistant startup side
Method, the processor 13 can perform the multiple instruction so as to realize:The Mike of the predeterminable area of detection terminal and the terminal
Whether wind is blocked;If the predeterminable area of the terminal is not blocked and the microphone of the terminal is blocked, pass through institute
State microphone collection ambient sound;And if the ambient sound meets preparatory condition, start the voice assistant in the terminal,
And using the ambient sound as voice assistant described in voice operating instruction input.
According to the preferred embodiment of the present invention, the predeterminable area is located at the top of the terminal, and the microphone is located at
The bottom of the terminal;Or
Position of the predeterminable area with the microphone in the terminal is relative.
According to the preferred embodiment of the present invention, the processor 13, which also performs multiple instruction, to be included:
Judge whether the distance between the terminal and the user meet the first preparatory condition;
Whether detection user is speaking;
If the distance between the terminal and the user meet the first preparatory condition, and the user is speaking, then really
The fixed ambient sound meets preparatory condition.
According to the preferred embodiment of the present invention, the processor 13, which also performs multiple instruction, to be included:
The lip image of the user is obtained by the camera device of the terminal;
Monitor the change of the lip image;
If the change of the lip image meets the second preparatory condition, it is determined that the user is speaking.
According to the preferred embodiment of the present invention, the processor 13, which also performs multiple instruction, to be included:
Noise reduction process is carried out to obtain the environment to the background sound in the ambient sound by diamylose noise reduction technology
Speech sound in sound, and using the speech sound as voice assistant described in voice operating instruction input;
According to the preferred embodiment of the present invention, the processor 13, which also performs multiple instruction, to be included:
The function according to corresponding to voice operating instruction controls the voice assistant to perform.
Specifically, the processor 13 refers to Fig. 1 to the concrete methods of realizing of above-mentioned instruction and corresponds to correlation in embodiment
The description of step, will not be described here.
In several embodiments provided by the present invention, it should be understood that disclosed system, apparatus and method can be with
Realize by another way.For example, device embodiment described above is only schematical, for example, the module
Division, only a kind of division of logic function, can there is other dividing mode when actually realizing.
The module illustrated as separating component can be or may not be physically separate, show as module
The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On NE.Some or all of module therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
In addition, each functional module in each embodiment of the present invention can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list
Member can both be realized in the form of hardware, can also be realized in the form of hardware adds software function module.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie
In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.
Therefore, no matter from the point of view of which point, embodiment all should be regarded as exemplary, and is nonrestrictive, sheet
The scope of invention limits by appended claims rather than described above, it is intended that will fall equivalency in claim
All changes in implication and scope are included in the present invention.Any attached associated diagram mark in claim should not be considered as limit
The involved claim of system.
Furthermore, it is to be understood that the word of " comprising " one is not excluded for other units or step, odd number is not excluded for plural number.In system claims
The multiple units or device of statement can also be realized by a unit or device by software or hardware.Second grade word is used
To represent title, and it is not offered as any specific order.
Finally it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention and it is unrestricted, although reference
The present invention is described in detail for preferred embodiment, it will be understood by those within the art that, can be to the present invention's
Technical scheme is modified or equivalent substitution, without departing from the spirit and scope of technical solution of the present invention.
Claims (10)
1. a kind of voice assistant starts method, it is characterised in that methods described includes:
Whether the predeterminable area of detection terminal and the microphone of the terminal are blocked;
If the predeterminable area of the terminal is not blocked and the microphone of the terminal is blocked, pass through Mike's elegance
Collect ambient sound;And
If the ambient sound meets preparatory condition, start the voice assistant in the terminal, and the ambient sound is made
For voice assistant described in voice operating instruction input.
2. voice assistant as claimed in claim 1 starts method, it is characterised in that:
The predeterminable area is located at the top of the terminal, and the microphone is located at the bottom of the terminal;Or
Position of the predeterminable area with the microphone in the terminal is relative.
3. voice assistant as claimed in claim 1 starts method, it is characterised in that methods described also includes:
Judge whether the distance between the terminal and the user meet the first preparatory condition;
Whether detection user is speaking;
If the distance between the terminal and the user meet the first preparatory condition, and the user is speaking, it is determined that institute
State ambient sound and meet preparatory condition.
4. voice assistant as claimed in claim 3 starts method, it is characterised in that whether the detection user is in bag of speaking
Include:
The lip image of the user is obtained by the camera device of the terminal;
Monitor the change of the lip image;
If the change of the lip image meets the second preparatory condition, it is determined that the user is speaking.
5. voice assistant as claimed in claim 1 starts method, it is characterised in that described using the ambient sound as voice
Operational order, which inputs the voice assistant, to be included:
Noise reduction process is carried out to obtain the ambient sound to the background sound in the ambient sound by diamylose noise reduction technology
In speech sound, and using the speech sound as voice assistant described in voice operating instruction input;
Methods described also includes:
The function according to corresponding to voice operating instruction controls the voice assistant to perform.
6. a kind of voice assistant starter, it is characterised in that described device includes:
Detection unit, whether it is blocked for the predeterminable area of detection terminal and the microphone of the terminal;
Collecting unit, if the predeterminable area for the terminal is not blocked and the microphone of the terminal is blocked, lead to
Cross the microphone collection ambient sound;And
Start unit, if meeting preparatory condition for the ambient sound, start the voice assistant in the terminal, and by institute
Ambient sound is stated as voice assistant described in voice operating instruction input.
7. voice assistant starter as claimed in claim 6, it is characterised in that:
The predeterminable area is located at the top of the terminal, and the microphone is located at the bottom of the terminal;Or
Position of the predeterminable area with the microphone in the terminal is relative.
8. voice assistant starter as claimed in claim 6, it is characterised in that described device also includes:
Judging unit, for judging whether the distance between the terminal and the user meet the first preparatory condition;
The detection unit, it is additionally operable to detect whether user is speaking;
Determining unit, if meet the first preparatory condition for the distance between the terminal and the user, and the user exists
Speak, it is determined that the ambient sound meets preparatory condition.
9. a kind of terminal, it is characterised in that the terminal includes processor, and the processor is used to perform what is stored in memory
Realize that voice assistant starts method as described in any one in claim 1-5 during computer program.
10. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that:The computer program
Realize that voice assistant starts method as described in any one in claim 1-5 when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710828831.1A CN107678793A (en) | 2017-09-14 | 2017-09-14 | Voice assistant starts method and device, terminal and computer-readable recording medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710828831.1A CN107678793A (en) | 2017-09-14 | 2017-09-14 | Voice assistant starts method and device, terminal and computer-readable recording medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107678793A true CN107678793A (en) | 2018-02-09 |
Family
ID=61136436
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710828831.1A Pending CN107678793A (en) | 2017-09-14 | 2017-09-14 | Voice assistant starts method and device, terminal and computer-readable recording medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107678793A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108712566A (en) * | 2018-04-27 | 2018-10-26 | 维沃移动通信有限公司 | A kind of voice assistant awakening method and mobile terminal |
CN108965562A (en) * | 2018-07-24 | 2018-12-07 | Oppo(重庆)智能科技有限公司 | Voice data generation method and relevant apparatus |
CN109166575A (en) * | 2018-07-27 | 2019-01-08 | 百度在线网络技术(北京)有限公司 | Exchange method, device, smart machine and the storage medium of smart machine |
CN110265007A (en) * | 2019-05-11 | 2019-09-20 | 出门问问信息科技有限公司 | Control method, control device and the bluetooth headset of voice assistant system |
WO2020087895A1 (en) * | 2018-10-29 | 2020-05-07 | 华为技术有限公司 | Voice interaction processing method and apparatus |
CN112890782A (en) * | 2021-02-07 | 2021-06-04 | 深圳云基智能科技有限公司 | Human body temperature information monitoring terminal and monitoring method based on smart phone |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1215658A2 (en) * | 2000-12-05 | 2002-06-19 | Hewlett-Packard Company | Visual activation of voice controlled apparatus |
CN104103274A (en) * | 2013-04-11 | 2014-10-15 | 纬创资通股份有限公司 | Speech processing apparatus and speech processing method |
CN104428832A (en) * | 2012-07-09 | 2015-03-18 | Lg电子株式会社 | Speech recognition apparatus and method |
CN104820556A (en) * | 2015-05-06 | 2015-08-05 | 广州视源电子科技股份有限公司 | Method and device for waking up voice assistant |
-
2017
- 2017-09-14 CN CN201710828831.1A patent/CN107678793A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1215658A2 (en) * | 2000-12-05 | 2002-06-19 | Hewlett-Packard Company | Visual activation of voice controlled apparatus |
CN104428832A (en) * | 2012-07-09 | 2015-03-18 | Lg电子株式会社 | Speech recognition apparatus and method |
CN104103274A (en) * | 2013-04-11 | 2014-10-15 | 纬创资通股份有限公司 | Speech processing apparatus and speech processing method |
CN104820556A (en) * | 2015-05-06 | 2015-08-05 | 广州视源电子科技股份有限公司 | Method and device for waking up voice assistant |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108712566A (en) * | 2018-04-27 | 2018-10-26 | 维沃移动通信有限公司 | A kind of voice assistant awakening method and mobile terminal |
CN108965562A (en) * | 2018-07-24 | 2018-12-07 | Oppo(重庆)智能科技有限公司 | Voice data generation method and relevant apparatus |
CN108965562B (en) * | 2018-07-24 | 2021-04-13 | Oppo(重庆)智能科技有限公司 | Voice data generation method and related device |
CN109166575A (en) * | 2018-07-27 | 2019-01-08 | 百度在线网络技术(北京)有限公司 | Exchange method, device, smart machine and the storage medium of smart machine |
WO2020087895A1 (en) * | 2018-10-29 | 2020-05-07 | 华为技术有限公司 | Voice interaction processing method and apparatus |
US11620995B2 (en) | 2018-10-29 | 2023-04-04 | Huawei Technologies Co., Ltd. | Voice interaction processing method and apparatus |
CN110265007A (en) * | 2019-05-11 | 2019-09-20 | 出门问问信息科技有限公司 | Control method, control device and the bluetooth headset of voice assistant system |
CN110265007B (en) * | 2019-05-11 | 2020-07-24 | 出门问问信息科技有限公司 | Control method and control device of voice assistant system and Bluetooth headset |
CN112890782A (en) * | 2021-02-07 | 2021-06-04 | 深圳云基智能科技有限公司 | Human body temperature information monitoring terminal and monitoring method based on smart phone |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107678793A (en) | Voice assistant starts method and device, terminal and computer-readable recording medium | |
US10394328B2 (en) | Feedback providing method and electronic device for supporting the same | |
US10754938B2 (en) | Method for activating function using fingerprint and electronic device including touch display supporting the same | |
US9329676B2 (en) | Control method and electronic device | |
CN109313519A (en) | Electronic equipment including force snesor | |
CN107193471B (en) | Unlocking control method and related product | |
CN108701043A (en) | A kind of processing method and processing device of display | |
CN103150109A (en) | Touch event model for web pages | |
WO2014026480A1 (en) | Method and apparatus for realizing display of component's content | |
WO2018006374A1 (en) | Function recommending method, system, and robot based on automatic wake-up | |
CN106651338A (en) | Method for payment processing and terminal | |
KR20180074983A (en) | Method for obtaining bio data and an electronic device thereof | |
CN110955364B (en) | Application program recommendation method and electronic equipment | |
CN106156583A (en) | A kind of method of speech unlocking and terminal | |
CN110210395B (en) | Vein image acquisition method and related product | |
CN109739423B (en) | Alarm clock setting method and flexible terminal | |
CN107402625A (en) | Touch screen scanning method, device, terminal and computer-readable recording medium | |
CN109756818A (en) | Dual microphone noise-reduction method, device, storage medium and electronic equipment | |
CN110191303A (en) | Video call method and Related product based on screen sounding | |
WO2017161814A1 (en) | Control method for terminal and terminal | |
CN107450811A (en) | Touch area amplification display method and system | |
CN106529231A (en) | User touch operation identification method and terminal | |
CN108519846B (en) | Image editing processing method and terminal | |
CN107132927A (en) | Input recognition methods and device and the device for identified input character of character | |
CN110162372B (en) | Virtual key creation method and related equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180209 |