CN105869639A

CN105869639A - Speech recognition method and system

Info

Publication number: CN105869639A
Application number: CN201610165978.2A
Authority: CN
Inventors: 房少杰
Original assignee: Guangdong Genius Technology Co Ltd
Current assignee: Guangdong Genius Technology Co Ltd
Priority date: 2016-03-21
Filing date: 2016-03-21
Publication date: 2016-08-17

Abstract

The invention discloses a speech recognition method and a speech recognition system. The speech recognition method is characterized by comprising the following steps: detecting that the distance from the face of a user is smaller than or equal to a preset distance; recognizing that the mouthing shape on the face of the user changes; and recognizing the recorded speech. With the adoption of the method and the system, when the user speaks by facing the device, the speech recognition is started automatically, so that the speech of the user is recognized, the opening operation is reduced, and thus the user experience is promoted.

Description

A kind of method and system of speech recognition

Technical field

The present invention relates to technical field of data storage, particularly relate to the method and system of a kind of speech recognition.

Background technology

The opening module of voice to the experience of whole speech recognition it is critical that, with fashion open mode also The biggest help is played to avoiding noise jamming.Existing speech recognition open mode mainly has two kinds: a kind of It is unlatching speech recognition after touch operation, such as: by releasing the button on screen, or defines the behaviour such as screen sliding Make to open, or pressing entity button key opens the speech identifying function of a function, but this mode is being opened When opening speech identifying function, need to operate with hands, it appears the most convenient, intelligent, have impact on the use of user Wish, in some occasion, as more seemed inconvenient when driving；Another kind is that user says the simple life provided Order is opened, such as the intelligent watch of Huawei, it is simply that by saying wrist-watch: " hello, Android " so simply and The order provided is to open its speech recognition mode.But this mode seems unnatural, intelligence sense fall Low, and will have individual command recognition process before speech recognition, efficiency also can be allowed to reduce.

How after user talks facing to equipment, system just can identify automatically, it is not necessary to user has one The breakdown action of voice, can automatically just automatically turn on when user starts to talk, make speech recognition more convenient, Intelligence, improving Consumer's Experience is urgent problem.

Summary of the invention

The invention provides the method and system of a kind of speech recognition, by according to the distance of user face and The change of nozzle type carries out speech recognition, it is achieved that when user carries out voice facing to equipment, automatically turn on language Sound identification, is identified the voice of user, decreases the operation of unlatching, improves Consumer's Experience.

For realizing above-mentioned design, the present invention by the following technical solutions:

On the one hand, it is provided that a kind of method of speech recognition, including:

Detect that the distance with user face is less than or equal to preset distance；

The nozzle type identifying user face changes；

The voice of admission is identified.

Preferably, described in detect distance with user face less than or equal to preset distance, including: by taking the photograph As head detects that the distance with user face is less than or equal to preset distance；

The described distance detected with user face, less than or equal to before preset distance, also includes: detect and lift Manually make, open photographic head.

Preferably, described in detect distance with user face less than or equal to preset distance, including:

Utilize the infrared sensor detection distance with object less than or equal to preset distance；

Determine that described object is user face by photographic head.

Preferably, detect that the distance with user face, less than or equal to after preset distance, also includes described in: Open recording.

Preferably, the described voice to admission is identified, including: remove the nozzle type identifying user face Recording before changing, the recording when the nozzle type identifying user face changes as starting point, The voice of admission is identified.

Preferably, after the described voice to admission is identified, also include: to the voice command identified Respond.

On the other hand, it is provided that the system of a kind of speech recognition, this system, including:

Distance detection module, for detecting that the distance with user face is less than or equal to preset distance；

Nozzle type identification module, changes for identifying the nozzle type of user face；

Sound identification module, for being identified the voice of admission.

Preferably:

Described distance detection module specifically for: detect that the distance with user face is less than by photographic head In preset distance；

Also include: opening module, be used for detecting action of raising one's hand, open photographic head.

Preferably, described distance detection module specifically for:

The distance with object is less than or equal to preset distance to utilize infrared sensor to detect；

Determine that described object is user face by photographic head.

Preferably, also include:

Recording opening module, for detecting that at distance detection module the distance with user face is less than or equal to pre- Put distance and open recording afterwards；

Respond module, for responding the voice command identified；

Described sound identification module, specifically for: remove and identify before the nozzle type of user face changes Recording, the language as starting point, to admission of the recording when the nozzle type identifying user face changes Sound is identified.

Compared with prior art, the invention have the benefit that and detect that the distance with user face is less than In preset distance；The nozzle type identifying user face changes；The voice of admission is identified.This Bright by according to carrying out speech recognition with the distance of user face and the change of nozzle type, it is achieved that user couple When the equipment of wearing carries out voice, automatically opening voice identification, the voice of user is identified, decreases unlatching Operation, improve Consumer's Experience.

Accompanying drawing explanation

For the technical scheme being illustrated more clearly that in the embodiment of the present invention, the embodiment of the present invention will be retouched below In stating, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only Some embodiments of the present invention, for those of ordinary skill in the art, are not paying creative work Under premise, it is also possible to content according to embodiments of the present invention and these accompanying drawings obtain other accompanying drawing.

Fig. 1 is the first embodiment of the method for a kind of speech recognition provided in the specific embodiment of the invention Method flow diagram.

Fig. 2 is the second embodiment of the method for a kind of speech recognition provided in the specific embodiment of the invention Method flow diagram.

Fig. 3 is the 3rd embodiment of the method for a kind of speech recognition provided in the specific embodiment of the invention Method flow diagram.

Fig. 4 is the first embodiment of the system of a kind of speech recognition provided in the specific embodiment of the invention Block diagram.

Fig. 5 is the second embodiment of the system of a kind of speech recognition provided in the specific embodiment of the invention Block diagram.

Fig. 6 is the 3rd embodiment of the system of a kind of speech recognition provided in the specific embodiment of the invention Block diagram.

Detailed description of the invention

Technical scheme and the technique effect reached for making to present invention solves the technical problem that, using are more clear Chu, is described in further detail the technical scheme of the embodiment of the present invention below in conjunction with accompanying drawing, it is clear that Described embodiment is only a part of embodiment of the present invention rather than whole embodiments.Based on this Embodiment in bright, those skilled in the art obtained under not making creative work premise all its His embodiment, broadly falls into the scope of protection of the invention.

Refer to Fig. 1, its be a kind of speech recognition provided in the specific embodiment of the invention method the The method flow diagram of one embodiment.As it can be seen, the method, including:

Step S101: detect that the distance with user face is less than or equal to preset distance.

When user needs that equipment is carried out Voice command, user can talk near equipment, also corresponds to User face is near equipment, and for improving the admission quality of voice, the distance that need to detect equipment and user face is No less than or equal to preset distance, described preset distance is 5cm, 10cm or 15cm etc., can set according to difference Standby and practical service environment is configured.Described equipment includes: large-scale intelligent equipment, as intelligent watch, The wearing portable equipment of Intelligent bracelet etc., such as the non-wearing portable equipment etc. of mobile phone, flat board etc..

Step S102: the nozzle type identifying user face changes.

For avoid face near but when not pronouncing, typing environmental noise, affect discrimination, need to carry out Nozzle type identification, if the nozzle type identifying user has change and the action of pronunciation, just with current point in time for control The starting point of voice processed.

Step S103: the voice of admission is identified.

The described voice to admission is identified, including: removal identifies the nozzle type of user face and changes Recording before, the recording when the nozzle type identifying user face changes is as starting point, to admission Voice be identified.Remove the recording before starting point, remove environment noise to a certain extent to voice The impact identified, improves discrimination.

In sum, the present embodiment is by according to carrying out voice with the distance of user face and the change of nozzle type Identify, to identify the starting point as speech recognition of the recording when nozzle type of user face changes, Eliminating the environment noise impact on speech recognition to a certain extent, improve discrimination, the present embodiment realizes When user carries out voice facing to equipment, automatically opening voice identification, the voice of user is identified, Decrease the operation of unlatching, improve Consumer's Experience.

Refer to Fig. 2, its be a kind of speech recognition provided in the specific embodiment of the invention method the The method flow diagram of two embodiments.As it can be seen, the method, including:

Step S201: detect action of raising one's hand, opens photographic head.

For the wearing portable equipment of intelligent watch, Intelligent bracelet etc., when user needs Voice command, Needing to raise one's hand, corresponding has an action of raising one's hand, but raises one's hand and not necessarily can carry out Voice command, because of This, when detect raise one's hand action time, in addition it is also necessary to carry out face recognition, be tested with action of raising one's hand, also identify Go out face, then explanation user needs to carry out Voice command.The present embodiment utilize photographic head carry out face recognition and The monitoring of distance, thus when detect raise one's hand action time, open photographic head.Acceleration transducer can be utilized Detecting action of raising one's hand, this is prior art, and here is omitted.

Step S202: detect that the distance with user face is less than or equal to preset distance by photographic head.

When the distance of user face with equipment is less than or equal to preset distance, illustrate that user needs to carry out Voice command. Photographic head is utilized to carry out the detection of face recognition and distance, to detect that the distance with user face is less than or equal to Preset distance, described preset distance is 5cm, 10cm or 15cm etc., can be according to distinct device with actual make It is configured with environment.Step S202 is a kind of speech recognition of offer in the specific embodiment of the invention Step S101 in the first embodiment of method: detect that the distance with user face is less than or equal to preset distance The more particular embodiment of portable equipment is dressed for intelligent watch, Intelligent bracelet etc..

Step S203: open recording.

After detecting that the distance with user face is less than or equal to preset distance, it is switched on recording.

Step S204: the nozzle type identifying user face changes.

Step S205: the voice of admission is identified.

Step S206: the voice command identified is responded.

The voice command identified is responded, institute's speech commands can be open certain application program, Close certain level of application, make a phone call, photos and sending messages etc..

The present embodiment is switched on photographic head when detecting and raising one's hand action, utilizes photographic head to carry out face recognition With the monitoring of distance, when photographic head detects the distance with user face less than or equal to preset distance, open Recording, removes the recording identified before the nozzle type of user face changes, to identify the nozzle type of user There are the change of pronunciation and the time point of action as the starting point of control voice, the voice of admission be identified, And the voice command identified is responded.The present embodiment is raise one's hand action user, carries out near equipment After pronunciation, can carry out the response of voice command immediately, need not do the action opened in advance, whole process is natural, Operating efficiency is high, removes the environment noise impact on speech recognition to a certain extent, improves discrimination.

Refer to Fig. 3, its be a kind of speech recognition provided in the specific embodiment of the invention method the The method flow diagram of three embodiments.As it can be seen, the method, including:

Step S301: utilize the infrared sensor detection distance with object less than or equal to preset distance.

When user needs that equipment is carried out Voice command, user can talk near equipment, therefore can profit In the range of preset distance, whether have object proximity with infrared sensor detection, be also equivalent to utilize infrared biography Whether sensor detection equipment is less than or equal to preset distance with the distance of object.

Step S302: determine that described object is user face by photographic head.

When the distance of infrared sensor detection equipment with object is less than or equal to preset distance, object proximity is described, But not representing must be to need to carry out Voice command, it is also possible to can be other situations, the most just has individual object It is placed in before equipment, or equipment has been placed on above an object, therefore also need to be determined by photographic head Described object is user face, and explanation is user near equipment, needs equipment is carried out Voice command.Step Rapid S301 and step S302 are a kind of method of speech recognition that provides in the specific embodiment of the invention the Step S101 in one embodiment: detect the distance with user face less than or equal to preset distance more specifically Embodiment.

Described equipment includes: large-scale intelligent equipment, such as the wearing portable equipment of intelligent watch, Intelligent bracelet etc., Non-wearing portable equipment such as mobile phone, flat board etc. etc..Described preset distance is 5cm, 10cm or 15cm Deng, can be configured according to distinct device and practical service environment.Utilize infrared sensor detection and object Distance less than or equal to after preset distance, open photographic head, determine that described object is user by photographic head Face.

Step S303: open recording.

Step S304: the nozzle type identifying user face changes.

Step S305: the voice of admission is identified.

Step S306: the voice command identified is responded.

The present embodiment utilizes infrared sensor detection to be less than or equal to preset distance with the distance of object, by shooting Head determines that described object is user face, utilize the combination of infrared sensor and photographic head to detect equipment with The distance of user face, less than or equal to after preset distance, opens recording, removes the nozzle type identifying user face Recording before changing, using identify the nozzle type of user have the change of pronunciation and the time point of action as Control the starting point of voice, the voice of admission is identified, and the voice command identified is responded. The present embodiment after equipment pronounces, can carry out the response of voice command user immediately, need not be prior Doing the action opened, whole process is natural, and operating efficiency is high, removes environment noise to a certain extent to language The impact of sound identification, improves discrimination.

The embodiment of the system of a kind of speech recognition provided in the specific embodiment of the invention, system are provided Embodiment embodiment based on above-mentioned method realize, the most most description, refer to aforementioned side The embodiment of method.

Refer to Fig. 4, its be a kind of speech recognition provided in the specific embodiment of the invention system the The block diagram of one embodiment.As it can be seen, this system, including:

Distance detection module 41, for detecting that the distance with user face is less than or equal to preset distance.

Nozzle type identification module 42, changes for identifying the nozzle type of user face.

Sound identification module 43, for being identified the voice of admission.

Described sound identification module 43, specifically for: removal identifies the nozzle type of user face and changes it Front recording, the recording when the nozzle type identifying user face changes is as starting point, to admission Voice is identified.Remove the recording before starting point, remove environment noise to a certain extent and voice is known Other impact, improves discrimination.

Refer to Fig. 5, its be a kind of speech recognition provided in the specific embodiment of the invention system the The block diagram of two embodiments.As it can be seen, this system, including:

Opening module 51, is used for detecting action of raising one's hand, and opens photographic head.

Distance detection module 52, for detecting with the distance of user face less than or equal to preset by photographic head Distance.

When the distance of user face with equipment is less than or equal to preset distance, illustrate that user needs to carry out Voice command. Photographic head is utilized to carry out the detection of face recognition and distance, to detect that the distance with user face is less than or equal to Preset distance, described preset distance is 5cm, 10cm or 15cm etc., can be according to distinct device with actual make It is configured with environment.

At distance detection module 52, recording opening module 53, for detecting that the distance with user face is less than Recording is opened after preset distance.

Nozzle type identification module 54, changes for identifying the nozzle type of user face.

Sound identification module 55, for being identified the voice of admission.

Described sound identification module 55, specifically for: removal identifies the nozzle type of user face and changes it Front recording, the recording when the nozzle type identifying user face changes is as starting point, to admission Voice is identified.Remove the recording before starting point, remove environment noise to a certain extent and voice is known Other impact, improves discrimination.

Respond module 56, for responding the voice command identified.

The present embodiment utilizes photographic head to detect, and the distance with user face is less than or equal to preset distance, to identify The nozzle type going out user has the change of pronunciation and the time point of the action starting point as control voice, to admission Voice is identified, and responds the voice command identified.The present embodiment is raise one's hand action user, After equipment pronounces, the response of voice command can be carried out immediately, the action opened need not be done in advance, Whole process is natural, and operating efficiency is high, removes the environment noise impact on speech recognition to a certain extent, Improve discrimination.

Refer to Fig. 6, its be a kind of speech recognition provided in the specific embodiment of the invention system the The block diagram of three embodiments.As it can be seen, this system, including:

Distance detection module 61, for utilizing infrared sensor to detect, the distance with object is less than or equal to preset Distance；Determine that described object is user face by photographic head.

When user needs that equipment is carried out Voice command, user can talk near equipment, therefore can profit In the range of preset distance, whether have object proximity with infrared sensor detection, be also equivalent to utilize infrared biography Whether sensor detection equipment is less than or equal to preset distance with the distance of object.Described equipment includes: large-scale intelligent Equipment, such as the wearing portable equipment of intelligent watch, Intelligent bracelet etc., as mobile phone, flat board etc. non-wearing just Take equipment etc..Described preset distance is 5cm, 10cm or 15cm etc., can be according to distinct device and reality Use environments to be configured.Utilize infrared sensor detection and the distance of object less than or equal to preset distance it After, open photographic head, determine that described object is user face by photographic head.

At distance detection module, recording opening module 62, for detecting that the distance with user face is less than or equal to Recording is opened after preset distance.

Nozzle type identification module 63, changes for identifying the nozzle type of user face.

Sound identification module 64, for being identified the voice of admission.

Respond module 65, for responding the voice command identified.

In sum, the present embodiment provides the system of speech recognition user after equipment pronounces, energy Carrying out the response of voice command immediately, need not do the action opened in advance, whole process is natural, operating efficiency Height, removes the environment noise impact on speech recognition to a certain extent, improves discrimination.

The know-why of the present invention is described above in association with specific embodiment.These describe and are intended merely to explain this The principle of invention, and limiting the scope of the invention can not be construed to by any way.Based on herein Explaining, those skilled in the art need not pay performing creative labour can associate other tool of the present invention Body embodiment, within these modes fall within protection scope of the present invention.

Claims

1. the method for a speech recognition, it is characterised in that including:

The nozzle type identifying user face changes；

The voice of admission is identified.

Method the most according to claim 1, it is characterised in that described in detect with user face away from From less than or equal to preset distance, including: detect that the distance with user face is less than or equal to pre-by photographic head Put distance；

Method the most according to claim 1, it is characterised in that described in detect with user face away from From less than or equal to preset distance, including:

Determine that described object is user face by photographic head.

Method the most according to claim 1, it is characterised in that described in detect with user face away from After less than or equal to preset distance, also include: open recording.

Method the most according to claim 1, it is characterised in that the described voice to admission is identified, Including: remove the recording identified before the nozzle type of user face changes, from identifying user face The voice of admission, as starting point, is identified by recording when nozzle type changes.

Method the most according to claim 1, it is characterised in that the described voice to admission is identified Afterwards, also include: the voice command identified is responded.

7. the system of a speech recognition, it is characterised in that including:

Sound identification module, for being identified the voice of admission.

System the most according to claim 7, it is characterised in that:

System the most according to claim 7, it is characterised in that described distance detection module specifically for:

Determine that described object is user face by photographic head.

System the most according to claim 7, it is characterised in that also include:

Respond module, for responding the voice command identified；