CN110428838A - A kind of voice information identification method, device and equipment - Google Patents
A kind of voice information identification method, device and equipment Download PDFInfo
- Publication number
- CN110428838A CN110428838A CN201910707528.5A CN201910707528A CN110428838A CN 110428838 A CN110428838 A CN 110428838A CN 201910707528 A CN201910707528 A CN 201910707528A CN 110428838 A CN110428838 A CN 110428838A
- Authority
- CN
- China
- Prior art keywords
- information
- user
- phonetic order
- environment
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 230000004044 response Effects 0.000 claims description 13
- 238000012544 monitoring process Methods 0.000 claims description 8
- 230000007613 environmental effect Effects 0.000 claims description 7
- 230000002452 interceptive effect Effects 0.000 abstract description 14
- 238000012545 processing Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 230000005291 magnetic effect Effects 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 230000001815 facial effect Effects 0.000 description 3
- 230000033001 locomotion Effects 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 230000002618 waking effect Effects 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000004424 eye movement Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The embodiment of the invention discloses a kind of voice information identification method, device and equipment, method includes: the information to be identified for continuing to monitor and identifying in set environment region;Wherein, the information to be identified includes that environment voice messaging, user's face information, user's sight information and user's lip move information;If moving information according to the user's face information and user's lip, or information is moved according to the user's face information, user's sight information and user's lip, it determines that the environment voice messaging includes the phonetic order information that target user issues, then the phonetic order information is responded.The technical solution of the embodiment of the present invention can be improved interactive voice efficiency.
Description
Technical field
The present embodiments relate to technical field of data processing more particularly to a kind of voice information identification method, device and
Equipment.
Background technique
Voice signal is identified as corresponding with scheduled instruction by the voice etc. that speech recognition technology is used to input based on user
Signal, and can be applied to multiple fields.
The current speech recognition system based on speech recognition technology, usually require user by manually wake up or by
The mode that equipment detects wake-up word automatically starts, thus the speech recognition and interaction of triggering following.When voice dialogue task is completed
When, system is restored to the state for needing to reawake quickly.
Inventor in the implementation of the present invention, it is found that it is existing existing speech recognition system has following defects that
Speech recognition system can not simulated implementation true man exchange scene, every time interaction be both needed to wake up starting mode can reduce interactive effect
Rate.
Summary of the invention
The embodiment of the present invention provides a kind of voice information identification method, device and equipment, realizes and improves interactive voice efficiency.
In a first aspect, the embodiment of the invention provides a kind of voice information identification methods, comprising:
It continues to monitor and identifies the information to be identified in set environment region;Wherein, the information to be identified includes environment
Voice messaging, user's face information, user's sight information and user's lip move information;
If moving information according to the user's face information and user's lip, or according to the user's face information, institute
It states user's sight information and user's lip moves information, determine that the environment voice messaging includes the voice that target user issues
Command information then responds the phonetic order information.
Second aspect, the embodiment of the invention also provides a kind of voice messaging identification devices, comprising:
Information monitoring module to be identified, for continuing to monitor and identifying the information to be identified in set environment region;Wherein,
The information to be identified includes that environment voice messaging, user's face information, user's sight information and user's lip move information;
Phonetic order information response module, if for moving information according to the user's face information and user's lip,
Or information is moved according to the user's face information, user's sight information and user's lip, determine the environment voice
Information includes the phonetic order information that target user issues, then responds to the phonetic order information.
The third aspect, the embodiment of the invention also provides a kind of terminal device, the terminal device includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing
Device realizes voice information identification method provided by any embodiment of the invention.
Fourth aspect, the embodiment of the invention also provides a kind of computer storage mediums, are stored thereon with computer program,
The program realizes voice information identification method provided by any embodiment of the invention when being executed by processor.
The embodiment of the present invention is by continuing to monitor and identifying the environment voice messaging in set environment region, user's face letter
Breath, user's sight information and user's lip move information, according to the dynamic letter of user's face information, user's sight information and user's lip
When breath determines that environment voice messaging includes the phonetic order information that target user issues, phonetic order information is responded, is solved
Certainly the problem of interactive voice low efficiency existing for existing voice identifying system, realizes and improve interactive voice efficiency.
Detailed description of the invention
Fig. 1 is a kind of flow chart for voice information identification method that the embodiment of the present invention one provides;
Fig. 2 a is a kind of flow chart of voice information identification method provided by Embodiment 2 of the present invention;
Fig. 2 b is a kind of flow chart of voice information identification method provided by Embodiment 2 of the present invention;
Fig. 3 is a kind of schematic diagram for voice messaging identification device that the embodiment of the present invention three provides;
Fig. 4 is a kind of structural schematic diagram for terminal device that the embodiment of the present invention four provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.
It also should be noted that only the parts related to the present invention are shown for ease of description, in attached drawing rather than
Full content.It should be mentioned that some exemplary embodiments are described before exemplary embodiment is discussed in greater detail
At the processing or method described as flow chart.Although operations (or step) are described as the processing of sequence by flow chart,
It is that many of these operations can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of operations can be by again
It arranges.The processing can be terminated when its operations are completed, it is also possible to have the additional step being not included in attached drawing.
The processing can correspond to method, function, regulation, subroutine, subprogram etc..
Embodiment one
Fig. 1 is a kind of flow chart for voice information identification method that the embodiment of the present invention one provides, and the present embodiment is applicable
In carry out speech recognition according to multidimensional information to be identified the case where, this method can be executed by voice messaging identification device, should
Device can be realized by the mode of software and/or hardware, and can be generally integrated in terminal device.Correspondingly, such as Fig. 1 institute
Show, this method includes following operation:
S110, it continues to monitor and identifies the information to be identified in set environment region;Wherein, the information to be identified includes
Environment voice messaging, user's face information, user's sight information and user's lip move information.
Wherein, set environment region can be environmental area applied by speech recognition system, for example, interior or indoor
Deng.It may be implemented to identify by speech recognition technology using voice information identification method provided by the embodiment of the present invention in the car
Phonetic order may be implemented using voice information identification method provided by the embodiment of the present invention through speech recognition skill indoors
Art carries out attendance etc. of checking card, and the embodiment of the present invention is not defined the concrete form in set environment region.Information to be identified
It can be the information for the processing to be identified that speech recognition system is recognized, including but not limited to environment voice messaging, user face
Portion's information, user's sight information and user's lip move information etc..Environment voice messaging is to collect in set environment region
Voice messaging, the including but not limited to voice messaging and the noise information in environment of user, such as audio-frequency information.User face
Portion's information is the face information of user, and user's sight information is the information such as the sight angular direction of eyes of user, and user's lip is dynamic
Information is the lip motion information of user.
In embodiments of the present invention, speech recognition system no longer carries out speech recognition only in accordance with unique voice messaging, and
It is by including that the multidimensional information such as voice, face, sight and lip be dynamic realize accurately speech recognition.Meanwhile in order to avoid user
It wakes up manually or equipment detects the modes such as wake-up word automatically and starts, speech recognition system can be continuously monitored and identify setting ring
Environment voice messaging, user's face information, user's sight information and user's lip in the region of border move information etc..
If S120, moving information according to the user's face information and user's lip, or believed according to the user's face
Breath, user's sight information and user's lip move information, determine that the environment voice messaging includes target user's sending
Phonetic order information, then the phonetic order information is responded.
Wherein, target user can be the user that speech recognition system needs to carry out speech recognition, such as interior driver
Or personnel that check card before attendance record terminal equipment etc., the embodiment of the present invention do not limit the specific identity type of target user
It is fixed.Phonetic order information is the command information of speech form.
Correspondingly, getting the environment voice messaging in set environment region, user's face information, user's sight information
And after user's lip moves information, in summary speech recognition system a variety of information to be identified can carry out identifying processings, and according to
Identifying processing result determine the environment voice messaging in set environment region whether include target user issue phonetic order letter
Breath, and in determining environment voice messaging include target user issue phonetic order information when, to phonetic order information carry out
Response.Judge in environment voice messaging specifically, speech recognition system can move information according to user's face information and user's lip
The phonetic order information whether issued including target user.If speech recognition system is dynamic according to user's face information and user's lip
Information can not judge in environment voice messaging whether to include phonetic order information that target user issues, then can be further combined with
User's sight information judges the phonetic order information whether issued including target user in environment voice messaging.
The embodiment of the present invention is by continuing to monitor and identifying the environment voice messaging in set environment region, user's face letter
Breath, user's sight information and user's lip move information, according to the dynamic letter of user's face information, user's sight information and user's lip
When breath determines that environment voice messaging includes the phonetic order information that target user issues, phonetic order information is responded, is solved
Certainly the problem of interactive voice low efficiency existing for existing voice identifying system, realizes and improve interactive voice efficiency.
Embodiment two
Fig. 2 a is a kind of flow chart of voice information identification method provided by Embodiment 2 of the present invention, and the present embodiment is with above-mentioned
It is embodied based on embodiment, in the present embodiment, gives the information to be identified continued to monitor in set environment region
And to the specific embodiment that phonetic order information is responded.Correspondingly, as shown in Figure 2 a, the method for the present embodiment can be with
Include:
S210, it continues to monitor and identifies the information to be identified in set environment region.
Wherein, S210 can specifically include following operation:
S211, the environment voice messaging, the user's face information, user's sight information and institute are continued to monitor
It states user's lip and moves information, and the environment voice messaging is identified.
In embodiments of the present invention, it in order to improve interactive voice efficiency, when speech recognition system starting, can open simultaneously
It opens speech recognition module, face recognition module, eye movement information identification module and lip and moves information identification module, to obtain ring simultaneously
Border voice messaging, user's face information, user's sight information and user's lip move information.Meanwhile it can be first to the ring of acquisition
Border voice messaging is identified.Optionally, environment voice can be obtained by the hardware devices such as microphone collected sound signal
Information acquires facial image by hardware devices such as cameras to obtain user's face information, user's sight information and user
Lip moves information.
S212, if it is determined that the environment voice messaging in the set environment region includes user speech information, then it is right
The user's face information, user's sight information and user's lip move information and identify.
Wherein, user speech information is the voice messaging that the user in set environment region is issued.
In order to mitigate the data processing amount of speech recognition system, in embodiments of the present invention, only in speech recognition system
It recognizes in environment voice messaging when including user speech information, it just can user's face information, user's sight information to acquisition
And user's lip moves information and identifies.
If S220, moving information according to the user's face information and user's lip, or believed according to the user's face
Breath, user's sight information and user's lip move information, determine that the environment voice messaging includes target user's sending
Phonetic order information, then the phonetic order information is responded.
Wherein, S220 can specifically include following operation:
If S221, set according to the recognition result determination of the user's face information and the dynamic information of user's lip
There are the target users in the specific environment region for determining in environmental area, then carry out to the user speech information of the target user
Speech recognition.
Wherein, specific environment region can be specific one piece of coordinates regional in set environment region.For example, specific environment
Region can be interior master and drive region, alternatively, specific environment region can also be indoor application speech recognition system equipment
Front region, the embodiment of the present invention is to this and is not limited.
It is understood that there may be at least one users in set environment region.Therefore, the friendship of user in order to prevent
The identification of the false triggerings speech recognition systems such as the noise of what is said or talked about and outside includes user speech letter recognizing environment voice messaging
After breath, it can be judged in set environment region according to the recognition result that system moves information to user's face information and user's lip
Specific environment region whether there is target user.If it is determined that there are target use for the specific environment region in set environment region
Family then further the user speech information to target user can carry out speech recognition.
S222, determined according to speech recognition result the environment voice messaging whether include target user issue voice refer to
Enable information.
S223, if it is determined that the environment voice messaging include target user issue phonetic order information, then to described
Phonetic order information is responded.
Correspondingly, after the user speech information to target user carries out speech recognition, it can be according to speech recognition result
Determine the phonetic order information whether issued including target user in environment voice messaging.Include when determining in environment voice messaging
When the phonetic order information that target user issues, phonetic order information is responded.
In an alternate embodiment of the present invention where, described to believe according to the user's face information and user's lip are dynamic
The recognition result of breath determines that the specific environment region in the set environment region may include: to work as there are the target user
When determining that the only user's face information in the specific environment region including the target user and user's lip move information, determine
Only in the specific environment region, there are the target users;It is described that the environment voice messaging is determined according to speech recognition result
The phonetic order information whether issued including target user, if may include: that institute's speech recognition result refers to default voice
Collection is enabled to match, it is determined that the environment voice messaging includes the phonetic order information that the target user issues.
Wherein, presetting phonetic order collection can be the phonetic order according to involved in speech recognition system concrete application scene
Collection.Illustratively, if speech recognition system is applied in automotive field, default phonetic order collection be can include but is not limited to
" music, address list are opened in navigation " etc.;If speech recognition system is applied in field of attendance, presetting phonetic order collection can be with
Including but not limited to " working checks card, comes off duty and check card " etc..
In embodiments of the present invention, if it only includes target user's in specific environment region that speech recognition system, which determines,
User's face information and user's lip move information, show that only there are target users in specific environment region, and target user has greatly
Probability has issued phonetic order to speech recognition system.At this point it is possible to be preset in the speech recognition result and system that will acquire
Phonetic order collection is matched, and successful match determines that target user has issued phonetic order information, and system can be directly in response to.
In an alternate embodiment of the present invention where, described to believe according to the user's face information and user's lip are dynamic
The recognition result of breath determines the specific environment region in the set environment region there are the target user, may include: as
Fruit determines the user's face information in the set environment region including at least two users, and the specific environment region Zhong Bao
The user's face information and user's lip for including the target user move information, it is determined that there are the targets in the specific environment region
User;It is described that the ring is determined according to the dynamic information of the user's face information, user's sight information and user's lip
Border voice messaging includes the phonetic order information that target user issues, if may include: user's sight of the target user
Information matches with the first default visibility region, it is determined that the environment voice messaging includes the voice that the target user issues
Command information.
Wherein, the first default visibility region can be visibility region set according to actual needs, it is preferred that first is pre-
If visibility region can be set according to the relative position of target user and speech recognition system.Such as with driver eye's coordinate
On the basis of, utilize the interior middle control region etc. of angle coordinate system calibration.
It is understood that when, there are when multiple users, speech recognition system can identify more in set environment region
The user's face information of a user.At this time if speech recognition system recognized in specific environment region user's face information and
User's lip moves information, can also determine that there are target users in specific environment region.But due to the presence of multiple users, speech recognition
System can not determine that target user is to send phonetic order to system or talking with other users.Therefore, speech recognition
System can be identified further combined with the sight information of user.
Specifically, when speech recognition system determines and there is multiple users including target user, it can be by the mesh of identification
User's sight information of mark user is matched with the first default visibility region.If it is determined that mark user user's sight information with
First default visibility region matches, and shows that the sight of target user is located in the first default visibility region, can determine at this time
Target user is sending phonetic order to speech recognition system.
In an alternate embodiment of the present invention where, if the method can also include: the user of the target user
Sight information matches with the second default visibility region, it is determined that the environment voice messaging does not include that the target user issues
Phonetic order information;Or, match if user's sight information of the target user presets visibility region with third, it is right
The user speech information of the target user carries out speech recognition, and determines the environment voice messaging according to speech recognition result
The phonetic order information whether issued including target user.
Wherein, the second default visibility region and third preset visibility region equally can be it is set according to actual needs
Visibility region.Optionally, the second default visibility region, which can be, deviates farther away region with the first default visibility region, and second is pre-
If the quantity of visibility region can be one, it is also possible to multiple, the embodiment of the present invention is to this and is not limited.Third is default
Visibility region, which can be, deviates closer region, such as corresponding region when driver's head-up with the first default visibility region.
Correspondingly, illustrating that target is used if user's sight information of target user matches with the second default visibility region
Family does not issue phonetic order to speech recognition system, such as target user and other users talk scene, then can determine ring
Border voice messaging does not include the phonetic order information that target user issues.If the user's sight information and third of target user are pre-
If visibility region matches, then speech recognition system can not accurately determine whether target user has issued phonetic order.At this point, language
Sound identifying system can user speech information to target user carry out speech recognition, and environment is determined according to speech recognition result
Voice messaging whether include target user issue phonetic order information.
In an alternate embodiment of the present invention where, the method can also include: when determining only in the set environment
When moving information including user's face information and user's lip in the nonspecific environmental area in region, without to the user speech information
Carry out speech recognition.
Wherein, nonspecific environmental area is the region in set environment region in addition to specific environment region, such as vehicle
Interior copilot region and back seat region etc..
Correspondingly, if it only includes using in the nonspecific environmental area in set environment region that speech recognition system, which recognizes,
Family facial information and user's lip move information, are shown to be the other users in addition to target user and are issuing voice, such as interior back seat
User talking.At this point, speech recognition system can the direct user speech that arrives of shield detection, not identify, to realize
Effective filtering to external noise.
In an alternate embodiment of the present invention where, the default phonetic order collection include pre-set business phonetic order collection with
And default starting phonetic order collection;It is described that the phonetic order information is responded, if may include: the speech recognition
As a result match with the pre-set business phonetic order collection, it is determined that the environment voice messaging includes that the target user issues
Phonetic order information, and the phonetic order information is directly responded;If institute's speech recognition result with it is described pre-
If starting phonetic order collection matches, it is determined that the environment voice messaging includes the phonetic order letter that the target user issues
Breath, and when determining that the phonetic order information and the pre-set business phonetic order collection match, the phonetic order is believed
Breath is directly responded;Otherwise, it prompts the target user to re-enter voice according to the pre-set business phonetic order collection to refer to
Enable information.
Wherein, pre-set business phonetic order collection can be the common phonetic order in speech recognition system applied business field
Collection, default starting phonetic order collection can be the phonetic order collection for waking up starting speech recognition system, and such as " hello, XX ".
In embodiments of the present invention, if speech recognition system determines speech recognition result and pre-set business phonetic order collection
Match, show that target user issues phonetic order to speech recognition system, directly phonetic order information can be carried out
Response.If speech recognition system determines that speech recognition result matches with default starting phonetic order collection, show target user
It is try to starting speech recognition system, speech recognition system can automatically wake up to respond target user at this time.Meanwhile if
Speech recognition system determines that the phonetic order information to match with pre-set business phonetic order collection refers to pre-set business voice simultaneously
It enables collection match, directly phonetic order information can be responded.If speech recognition system determination and pre-set business voice
The phonetic order information that instruction set matches does not match with pre-set business phonetic order collection, can prompt target user system institute
The business scope range of the speech recognition of support, and guiding target user re-enters voice according to pre-set business phonetic order collection
Command information.
In an alternate embodiment of the present invention where, the set environment region is environment inside car region.
Optionally, voice information identification method provided by the embodiment of the present invention can be applied to vehicle-mounted voice identification system
System moves information to the environment voice messaging in environment inside car region, user's face information, user's sight information and user's lip and carries out
It continues to monitor and identifies, and identify and respond according to the voice messaging that recognition result issues target user.Wherein, target
User can be the main driver for driving region.
Fig. 2 b is a kind of flow chart of voice information identification method provided by Embodiment 2 of the present invention, in a specific example
In son, as shown in Figure 2 b, speech recognition system is applied under interior scene, the natural way of human-computer interaction, system knot are passed through
Speech recognition, recognition of face, Eye-controlling focus and semantic understanding various dimensions information are closed, intelligence point is carried out to the speech act of user
The full-time interactive voice for exempting to wake up is realized in analysis, and driver does not need wake-up system can carry out interactive voice at any time, and will not be by
Interior normal talk and noise misrecognition, to further decrease security risks.Detailed process is as follows:
Step 1: speech recognition system starts, microphone starts to acquire environment inside car voice messaging, and by collected sound
Frequency stream is input to voice dictation engine, wherein voice dictation engine is for identifying the environment voice messaging of acquisition;Simultaneously
Start camera, starts to carry out face recognition, Eye-controlling focus and Lip Movement Recognition.
Step 2: if voice dictation engine start output character as a result, if to face recognition and Lip Movement Recognition result into
Row compares;Once voice dictation engine stop output character as a result, if be simultaneously stopped the comparison that face recognition and lip move result.
There is face if detecting and only driving region in the master of picture, is determined as interior only driver, and driver has lip
It is dynamic, then enter step three.Wherein, main region of driving is after camera deployment in the car is fixed, in camera video picture area
The one piece of preferred coordinates region demarcated in advance, this regional location can be driver head region in picture.
If it is dynamic to detect that driver has lip, and detects in addition to also having face in main other regions for driving region, then sentence
It is set to interior other occupants having other than driver, enters step four.
If detecting that the face in only other regions has lip to move, speech recognition system ignores the output of voice dictation engine
Text results are not responding to.
It is managed to semanteme Step 3: result will be dictated from beginning with text and be output to the entire text input of text end of input
It solves and carries out semantic calibration in model, and whether match comprising waking up word X.
If semantic calibration result is one of N number of business scope that speech recognition system is supported, then it is assumed that driver is not
It is talking to onself, is being to issue instruction with speech recognition system, system needs to respond instruction.
If semantic calibration result is not in N number of business scope that speech recognition system is supported, then it is assumed that driver is not
It is instructed being issued to system, system is without response.
If retrieving semantic calibration result includes to wake up word X and exact matching, regardless of whether semantic determine in system
In the business scope of support, speech recognition system requires to give corresponding response.
Specifically, when semantic calibration result include wake up word X and N number of business scope for supporting for speech recognition system it
One, system directly in response to;Semantic calibration result includes the N number of business scope for waking up word X but being not belonging to speech recognition system support
Territory that is interior, prompting user speech identifying system to support, guidance user re-enter phonetic order.
Step 4: output character detects the view of driver to terminating to export in this period since voice dictation engine
Line deflection, deflection are defined as (α, beta, gamma), wherein α, and beta, gamma is laterally y-axis, hangs down with using longitudinal direction of car as x-axis respectively
For histogram to the angle of three axis for z-axis, coordinate origin is located at driver head center.
If detecting that pilot's line of vision deflection coordinate is located at interior middle control region A, wherein A=[(α 1, β 1, γ
1), (α 2, β 2, γ 2), (α 3, β 3, γ 3), (α 4, β 4, γ 4)], then it is assumed that driver is needed issuing phonetic order to system
It responds.Wherein, A is the speech recognition systems such as the middle control vehicle device demarcated in advance or middle control vehicle-mounted speech robot people in angle coordinate
Carrier zones in system, correspondingly, the corresponding four groups of vertex region A can respectively correspond (α 1, β 1, γ 1), (α 2, β 2, γ 2),
(α 3, β 3, γ 3) and (α 4, β 4, γ 4) four groups of coordinates, four groups of coordinate values can size according to vehicle and middle control region A
Specific location in the car carries out adaptive settings.
If detecting pilot's line of vision towards interior copilot region B and heel row region C, then it is assumed that driver is not
It is to issue phonetic order with interior speech recognition system, there is no need to respond.Wherein, B=[(α 5, β 5, γ 5), (α 6, β
6, γ 6), (α 7, β 7, γ 7), (α 8, β 8, γ 8)], similarly, the corresponding four groups of vertex copilot region B can respectively correspond (α
5, β 5, γ 5), (α 6, β 6, γ 6), (α 7, β 7, γ 7) and (α 8, β 8, γ 8) four groups of coordinates, C=[(α 9, β 9, γ 9), (α 10, β
10, γ 10), (α 11, β 11, γ 11), (α 12, β 12, γ 12)], similarly, the corresponding four groups of vertex heel row region C equally can be with
Respectively correspond (α 9, β 9, γ 9), (α 10, β 10, γ 10), (α 11, β 11, γ 11) and (α 12, β 12, γ 12) four groups of coordinates.Its
In, four groups of coordinate values of region B and region C can be according to the size of vehicle and copilot region and heel row region in the car
Specific location adaptive settings.
If detecting that the main pilot's line of vision for driving region is horizontally toward front region D, at this time speech recognition system without
Method determines whether driver is issuing phonetic order to system, needs to identify that voice determines according to step 3.Wherein, D=[(α
13, β 13, γ 13), (α 14, β 14, γ 14), (α 15, β 15, γ 15), (α 16, β 16, γ 16)], similarly, front region D is corresponding
Four groups of vertex can equally respectively correspond (α 13, β 13, γ 13), (α 14, β 14, γ 14), (α 15, β 15, γ 15) and (α 16,
β 16, γ 16) four groups of coordinates, coordinate value specifically can adaptive settings according to actual needs.The embodiment of the present invention simultaneously misaligns
The specific coordinate value for controlling region A, copilot region B, heel row region C and front region D is defined.
It can be seen that voice information identification method provided by the embodiment of the present invention is applied to automotive field, can integrate
Vision and auditory information determine intentions of speaking of driver, while can effectively filter the non-voice intention of environment inside car, can be with
It realizes the similar interactive voice for not needing to wake up word exchanged with people, interactive efficiency and experience is substantially improved, and reduce because of interaction
Security risks caused by low efficiency.
It should be noted that in the above various embodiments between each technical characteristic arbitrary arrangement combination also belong to it is of the invention
Protection scope.
Embodiment three
Fig. 3 is a kind of schematic diagram for voice messaging identification device that the embodiment of the present invention three provides, as shown in figure 3, described
Device includes: information monitoring module 310 to be identified and phonetic order information response module 320, in which:
Information monitoring module 310 to be identified, for continuing to monitor and identifying the information to be identified in set environment region;Its
In, the information to be identified includes that environment voice messaging, user's face information, user's sight information and user's lip move information;
Phonetic order information response module 320, if for according to the user's face information and the dynamic letter of user's lip
Breath, or information is moved according to the user's face information, user's sight information and user's lip, determine the environment language
Message breath includes the phonetic order information that target user issues, then responds to the phonetic order information.
The embodiment of the present invention is by continuing to monitor and identifying the environment voice messaging in set environment region, user's face letter
Breath, user's sight information and user's lip move information, according to the dynamic letter of user's face information, user's sight information and user's lip
When breath determines that environment voice messaging includes the phonetic order information that target user issues, phonetic order information is responded, is solved
Certainly the problem of interactive voice low efficiency existing for existing voice identifying system, realizes and improve interactive voice efficiency.
Optionally, information monitoring module 310 to be identified, comprising: information monitoring unit, for continuing to monitor the environment language
Message breath, the user's face information, user's sight information and user's lip move information, and to the environment voice
Information is identified;Information identificating unit, for if it is determined that the environment voice messaging in the set environment region includes
User speech information is then moved information to the user's face information, user's sight information and user's lip and is known
Not;Phonetic order information response module 320, comprising: voice recognition unit, if for according to the user's face information and
The recognition result that user's lip moves information determines that there are the targets to use for the specific environment region in the set environment region
Family then carries out speech recognition to the user speech information of the target user;Command information determination unit, for being known according to voice
Other result determines whether the environment voice messaging includes phonetic order information that the target user issues.
Optionally, voice recognition unit is specifically used for when determining only in the specific environment region including the target
When the user's face information and user's lip of user moves information, determine that only there are the target users in the specific environment region;
Command information determination unit, if matched specifically for institute's speech recognition result and default phonetic order collection, it is determined that institute
Stating environment voice messaging includes the phonetic order information that the target user issues.
Optionally, voice recognition unit is specifically used for if it is determined that including at least two use in the set environment region
The user's face information at family, and include that the user's face information of the target user and user's lip move in the specific environment region
Information, it is determined that there are the target users in the specific environment region;Command information determination unit, if be specifically used for described
User's sight information of target user matches with the first default visibility region, it is determined that the environment voice messaging includes described
The phonetic order information that target user issues.
Optionally, described device further include: command information determining module, if user's sight for the target user
Information matches with the second default visibility region, it is determined that the environment voice messaging does not include the language that the target user issues
Sound command information;Or, matching if user's sight information of the target user presets visibility region with third, to described
The user speech information of target user carries out speech recognition, and whether determines the environment voice messaging according to speech recognition result
The phonetic order information issued including target user.
Optionally, described device further include: voice messaging identification module is determined for working as only in the set environment region
Nonspecific environmental area in when including that user's face information and user's lip move information, without being carried out to the user speech information
Speech recognition.
Optionally, the default phonetic order collection includes pre-set business phonetic order collection and default starting phonetic order
Collection;Phonetic order information response module 320, if being specifically used for institute's speech recognition result and the pre-set business phonetic order
Collection matches, it is determined that the environment voice messaging includes the phonetic order information that the target user issues, and to institute's predicate
Sound command information is directly responded;If institute's speech recognition result matches with the default starting phonetic order collection,
It determines that the environment voice messaging includes the phonetic order information that the target user issues, and is determining the phonetic order letter
When breath matches with the pre-set business phonetic order collection, the phonetic order information is directly responded;Otherwise, institute is prompted
It states target user and phonetic order information is re-entered according to the pre-set business phonetic order collection.
Optionally, the set environment region is environment inside car region.
Voice information identification method provided by any embodiment of the invention can be performed in above-mentioned voice messaging identification device, tool
The standby corresponding functional module of execution method and beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to this
The voice information identification method that invention any embodiment provides.
Since above-mentioned introduced voice messaging identification device is that can execute the voice messaging in the embodiment of the present invention to know
The device of other method, so based on voice information identification method described in the embodiment of the present invention, the affiliated technology people in this field
Member can understand the specific embodiment and its various change form of the voice messaging identification device of the present embodiment, so herein
How voice information identification method in the embodiment of the present invention is realized if being no longer discussed in detail for the voice messaging identification device.Only
It wants those skilled in the art to implement device used by voice information identification method in the embodiment of the present invention, belongs to this Shen
The range that please be protect.
Example IV
Fig. 4 is a kind of structural schematic diagram for terminal device that the embodiment of the present invention four provides.Fig. 4, which is shown, to be suitable for being used in fact
The block diagram of the terminal device 412 of existing embodiment of the present invention.The terminal device 412 that Fig. 4 is shown is only an example, should not be right
The function and use scope of the embodiment of the present invention bring any restrictions.
As shown in figure 4, terminal device 412 is showed in the form of universal computing device.The component of terminal device 412 can wrap
Include but be not limited to: one or more processor 416, storage device 428 connect different system components (including storage device 428
With processor 416) bus 418.
Bus 418 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts
For example, these architectures include but is not limited to industry standard architecture (Industry Standard
Architecture, ISA) bus, microchannel architecture (Micro Channel Architecture, MCA) bus, enhancing
Type isa bus, Video Electronics Standards Association (Video Electronics Standards Association, VESA) local
Bus and peripheral component interconnection (Peripheral Component Interconnect, PCI) bus.
Terminal device 412 typically comprises a variety of computer system readable media.These media can be it is any can be by
The usable medium that terminal device 412 accesses, including volatile and non-volatile media, moveable and immovable medium.
Storage device 428 may include the computer system readable media of form of volatile memory, such as arbitrary access
Memory (Random Access Memory, RAM) 430 and/or cache memory 432.Terminal device 412 can be into one
Step includes other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as an example, it stores
System 434 can be used for reading and writing immovable, non-volatile magnetic media (Fig. 4 do not show, commonly referred to as " hard disk drive ").
Although not shown in fig 4, the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided,
And to removable anonvolatile optical disk (such as CD-ROM (Compact Disc-Read Only Memory, CD-ROM),
Digital video disk (Digital Video Disc-Read Only Memory, DVD-ROM) or other optical mediums) read-write light
Disk drive.In these cases, each driver can pass through one or more data media interfaces and 418 phase of bus
Even.Storage device 428 may include at least one program product, which has one group of (for example, at least one) program mould
Block, these program modules are configured to perform the function of various embodiments of the present invention.
Program 436 with one group of (at least one) program module 426, can store in such as storage device 428, this
The program module 426 of sample includes but is not limited to operating system, one or more application program, other program modules and program
It may include the realization of network environment in data, each of these examples or certain combination.Program module 426 usually executes
Function and/or method in embodiment described in the invention.
Terminal device 412 can also be (such as keyboard, sensing equipment, camera, aobvious with one or more external equipments 414
Show device 424 etc.) communication, the equipment interacted with the terminal device 412 can be also enabled a user to one or more to be communicated, and/
Or (such as network interface card is adjusted with any equipment for enabling the terminal device 412 to be communicated with one or more of the other calculating equipment
Modulator-demodulator etc.) communication.This communication can be carried out by input/output (Input/Output, I/O) interface 422.And
And terminal device 412 can also pass through network adapter 420 and one or more network (such as local area network (Local Area
Network, LAN), wide area network Wide Area Network, WAN) and/or public network, such as internet) communication.As schemed
Show, network adapter 420 is communicated by bus 418 with other modules of terminal device 412.It should be understood that although not showing in figure
Out, other hardware and/or software module can be used in conjunction with terminal device 412, including but not limited to: microcode, device drives
Device, redundant processing unit, external disk drive array, disk array (Redundant Arrays of Independent
Disks, RAID) system, tape drive and data backup storage system etc..
The program that processor 416 is stored in storage device 428 by operation, thereby executing various function application and number
According to processing, such as realize voice information identification method provided by the above embodiment of the present invention.
That is, the processing unit is realized when executing described program: continue to monitor and identify in set environment region to
Identification information;Wherein, the information to be identified includes environment voice messaging, user's face information, user's sight information and use
Family lip moves information;If information is moved according to the user's face information and user's lip, or according to the user's face information,
User's sight information and user's lip move information, determine that the environment voice messaging includes the language that target user issues
Sound command information then responds the phonetic order information.
Embodiment five
The embodiment of the present invention five also provides a kind of computer storage medium for storing computer program, the computer program
When being executed by computer processor for executing any voice information identification method of the above embodiment of the present invention: continuing
It monitors and identifies the information to be identified in set environment region;Wherein, the information to be identified includes environment voice messaging, user
Facial information, user's sight information and user's lip move information;If dynamic according to the user's face information and user's lip
Information, or information is moved according to the user's face information, user's sight information and user's lip, determine the environment
Voice messaging includes the phonetic order information that target user issues, then responds to the phonetic order information.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media
Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable
Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or
Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool
There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires
(Read Only Memory, ROM), erasable programmable read only memory ((Erasable Programmable Read
Only Memory, EPROM) or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic
Memory device or above-mentioned any appropriate combination.In this document, computer readable storage medium, which can be, any includes
Or the tangible medium of storage program, which can be commanded execution system, device or device use or in connection make
With.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited
In wireless, electric wire, optical cable, radio frequency (Radio Frequency, RF) etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof
Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++,
Further include conventional procedural programming language --- such as " C " language or similar programming language.Program code can
Fully to execute, partly execute on the user computer on the user computer, be executed as an independent software package,
Part executes on the remote computer or executes on a remote computer or server completely on the user computer for part.
In situations involving remote computers, remote computer can pass through the network of any kind --- including local area network (LAN)
Or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize Internet service
Provider is connected by internet).
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.
Claims (13)
1. a kind of voice information identification method characterized by comprising
It continues to monitor and identifies the information to be identified in set environment region;Wherein, the information to be identified includes environment voice
Information, user's face information, user's sight information and user's lip move information;
If moving information according to the user's face information and user's lip, or according to the user's face information, the use
Family sight information and user's lip move information, determine that the environment voice messaging includes the phonetic order that target user issues
Information then responds the phonetic order information.
2. the method according to claim 1, wherein it is described continue to monitor and identify in set environment region to
Identification information, comprising:
Continue to monitor the environment voice messaging, the user's face information, user's sight information and user's lip
Dynamic information, and the environment voice messaging is identified;
If it is determined that the environment voice messaging in the set environment region includes user speech information, then to the user face
Portion's information, user's sight information and user's lip move information and identify;
It is described to determine that the environment voice messaging includes that target is used according to the user's face information and the dynamic information of user's lip
The phonetic order information that family issues, comprising:
If determining the set environment area according to the recognition result that the user's face information and user's lip move information
There are the target users in specific environment region in domain, then carry out voice knowledge to the user speech information of the target user
Not;
Determine whether the environment voice messaging includes phonetic order letter that the target user issues according to speech recognition result
Breath.
3. according to the method described in claim 2, it is characterized in that, described according to the user's face information and the user
The recognition result that lip moves information determines that there are the target users for the specific environment region in the set environment region, comprising:
When the determining only user's face information in the specific environment region including the target user and user's lip move information
When, determine that only there are the target users in the specific environment region;
It is described according to speech recognition result determine the environment voice messaging whether include target user issue phonetic order believe
Breath, comprising:
If institute's speech recognition result matches with default phonetic order collection, it is determined that the environment voice messaging includes described
The phonetic order information that target user issues.
4. according to the method described in claim 2, it is characterized in that, described according to the user's face information and the user
The recognition result that lip moves information determines that there are the target users for the specific environment region in the set environment region, comprising:
If it is determined that including the user's face information of at least two users, and the specific environment area in the set environment region
User's face information and user's lip in domain including the target user move information, it is determined that there are institutes in the specific environment region
State target user;
It is described that the environment is determined according to the dynamic information of the user's face information, user's sight information and user's lip
Voice messaging includes the phonetic order information that target user issues, comprising:
If user's sight information of the target user matches with the first default visibility region, it is determined that the environment voice
Information includes the phonetic order information that the target user issues.
5. according to the method described in claim 4, it is characterized in that, the method also includes:
If user's sight information of the target user matches with the second default visibility region, it is determined that the environment voice
Information does not include the phonetic order information that the target user issues;Or
Match if user's sight information of the target user presets visibility region with third, to the target user's
User speech information carries out speech recognition, and determines whether the environment voice messaging includes that target is used according to speech recognition result
The phonetic order information that family issues.
6. according to the method described in claim 2, it is characterized in that, the method also includes:
It only include user's face information and the dynamic letter of user's lip in the nonspecific environmental area in the set environment region when determining
When breath, without carrying out speech recognition to the user speech information.
7. according to any method of claim 2-6, which is characterized in that the default phonetic order collection includes pre-set business
Phonetic order collection and default starting phonetic order collection;
It is described that the phonetic order information is responded, comprising:
If institute's speech recognition result matches with the pre-set business phonetic order collection, it is determined that the environment voice messaging
Including the phonetic order information that the target user issues, and the phonetic order information is directly responded;
If institute's speech recognition result matches with the default starting phonetic order collection, it is determined that the environment voice messaging
Including the phonetic order information that the target user issues, and determining the phonetic order information and the pre-set business voice
When instruction set matches, the phonetic order information is directly responded;Otherwise, prompt the target user according to described pre-
If business phonetic order collection re-enters phonetic order information.
8. a kind of voice messaging identification device characterized by comprising
Information monitoring module to be identified, for continuing to monitor and identifying the information to be identified in set environment region;Wherein, described
Information to be identified includes that environment voice messaging, user's face information, user's sight information and user's lip move information;
Phonetic order information response module, if for moving information or root according to the user's face information and user's lip
Information is moved according to the user's face information, user's sight information and user's lip, determines the environment voice messaging
Including the phonetic order information that target user issues, then the phonetic order information is responded.
9. device according to claim 8, which is characterized in that the information monitoring module to be identified includes:
Information monitoring unit, for continuing to monitor the environment voice messaging, the user's face information, user's sight letter
Breath and user's lip move information, and identify to the environment voice messaging;
Information identificating unit, for if it is determined that the environment voice messaging in the set environment region includes user speech letter
Breath then moves information to the user's face information, user's sight information and user's lip and identifies;
Phonetic order information response module includes:
Voice recognition unit, if the recognition result for moving information according to the user's face information and user's lip is true
There are the target users in specific environment region in the fixed set environment region, then to the user speech of the target user
Information carries out speech recognition;
Command information determination unit, for determining whether the environment voice messaging includes the target according to speech recognition result
The phonetic order information that user issues.
10. device according to claim 9, which is characterized in that the voice recognition unit is specifically used for:
When the determining only user's face information in the specific environment region including the target user and user's lip move information
When, determine that only there are the target users in the specific environment region;
Described instruction information determination unit is specifically used for:
If institute's speech recognition result matches with default phonetic order collection, it is determined that the environment voice messaging includes described
The phonetic order information that target user issues.
11. device according to claim 9, which is characterized in that the voice recognition unit is specifically used for:
If it is determined that including the user's face information of at least two users, and the specific environment area in the set environment region
User's face information and user's lip in domain including the target user move information, it is determined that there are institutes in the specific environment region
State target user;
Described instruction information determination unit is specifically used for:
If user's sight information of the target user matches with the first default visibility region, it is determined that the environment voice
Information includes the phonetic order information that the target user issues.
12. device according to claim 11, which is characterized in that described device further include:
Command information determining module, if user's sight information and the second default visibility region phase for the target user
Match, it is determined that the environment voice messaging does not include the phonetic order information that the target user issues;Or
Match if user's sight information of the target user presets visibility region with third, to the target user's
User speech information carries out speech recognition, and determines whether the environment voice messaging includes that target is used according to speech recognition result
The phonetic order information that family issues.
13. a kind of terminal device, which is characterized in that the equipment includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
The now voice information identification method as described in any in claim 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910707528.5A CN110428838A (en) | 2019-08-01 | 2019-08-01 | A kind of voice information identification method, device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910707528.5A CN110428838A (en) | 2019-08-01 | 2019-08-01 | A kind of voice information identification method, device and equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110428838A true CN110428838A (en) | 2019-11-08 |
Family
ID=68412064
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910707528.5A Pending CN110428838A (en) | 2019-08-01 | 2019-08-01 | A kind of voice information identification method, device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110428838A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113442941A (en) * | 2020-12-04 | 2021-09-28 | 安波福电子(苏州)有限公司 | Man-vehicle interaction system |
CN114348000A (en) * | 2022-02-15 | 2022-04-15 | 安波福电子(苏州)有限公司 | Driver attention management system and method |
CN114694349A (en) * | 2020-12-30 | 2022-07-01 | 上海博泰悦臻网络技术服务有限公司 | Interaction method and interaction system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002244841A (en) * | 2001-02-21 | 2002-08-30 | Japan Science & Technology Corp | Voice indication system and voice indication program |
US20130054240A1 (en) * | 2011-08-25 | 2013-02-28 | Samsung Electronics Co., Ltd. | Apparatus and method for recognizing voice by using lip image |
CN104011735A (en) * | 2011-12-26 | 2014-08-27 | 英特尔公司 | Vehicle Based Determination Of Occupant Audio And Visual Input |
JP2015071320A (en) * | 2013-10-01 | 2015-04-16 | アルパイン株式会社 | Conversation support device, conversation support method, and conversation support program |
CN106875941A (en) * | 2017-04-01 | 2017-06-20 | 彭楚奥 | A kind of voice method for recognizing semantics of service robot |
CN109166575A (en) * | 2018-07-27 | 2019-01-08 | 百度在线网络技术(北京)有限公司 | Exchange method, device, smart machine and the storage medium of smart machine |
CN109410939A (en) * | 2018-11-29 | 2019-03-01 | 中国人民解放军91977部队 | General data maintaining method based on phonetic order collection |
CN109949812A (en) * | 2019-04-26 | 2019-06-28 | 百度在线网络技术(北京)有限公司 | A kind of voice interactive method, device, equipment and storage medium |
-
2019
- 2019-08-01 CN CN201910707528.5A patent/CN110428838A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002244841A (en) * | 2001-02-21 | 2002-08-30 | Japan Science & Technology Corp | Voice indication system and voice indication program |
US20130054240A1 (en) * | 2011-08-25 | 2013-02-28 | Samsung Electronics Co., Ltd. | Apparatus and method for recognizing voice by using lip image |
CN104011735A (en) * | 2011-12-26 | 2014-08-27 | 英特尔公司 | Vehicle Based Determination Of Occupant Audio And Visual Input |
JP2015071320A (en) * | 2013-10-01 | 2015-04-16 | アルパイン株式会社 | Conversation support device, conversation support method, and conversation support program |
CN106875941A (en) * | 2017-04-01 | 2017-06-20 | 彭楚奥 | A kind of voice method for recognizing semantics of service robot |
CN109166575A (en) * | 2018-07-27 | 2019-01-08 | 百度在线网络技术(北京)有限公司 | Exchange method, device, smart machine and the storage medium of smart machine |
CN109410939A (en) * | 2018-11-29 | 2019-03-01 | 中国人民解放军91977部队 | General data maintaining method based on phonetic order collection |
CN109949812A (en) * | 2019-04-26 | 2019-06-28 | 百度在线网络技术(北京)有限公司 | A kind of voice interactive method, device, equipment and storage medium |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113442941A (en) * | 2020-12-04 | 2021-09-28 | 安波福电子(苏州)有限公司 | Man-vehicle interaction system |
CN114694349A (en) * | 2020-12-30 | 2022-07-01 | 上海博泰悦臻网络技术服务有限公司 | Interaction method and interaction system |
CN114348000A (en) * | 2022-02-15 | 2022-04-15 | 安波福电子(苏州)有限公司 | Driver attention management system and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105501121A (en) | Intelligent awakening method and system | |
CN107516526B (en) | Sound source tracking and positioning method, device, equipment and computer readable storage medium | |
CN110428838A (en) | A kind of voice information identification method, device and equipment | |
CN105204628A (en) | Voice control method based on visual awakening | |
JP2022095768A (en) | Method, device, apparatus, and medium for dialogues for intelligent cabin | |
CN112397065A (en) | Voice interaction method and device, computer readable storage medium and electronic equipment | |
CN113486760A (en) | Object speaking detection method and device, electronic equipment and storage medium | |
CN112017650B (en) | Voice control method and device of electronic equipment, computer equipment and storage medium | |
US20230048330A1 (en) | In-Vehicle Speech Interaction Method and Device | |
US11861265B2 (en) | Providing audio information with a digital assistant | |
CN114187637A (en) | Vehicle control method, device, electronic device and storage medium | |
CN109215646A (en) | Voice interaction processing method, device, computer equipment and storage medium | |
WO2023273063A1 (en) | Passenger speaking detection method and apparatus, and electronic device and storage medium | |
CN109032345A (en) | Apparatus control method, device, equipment, server-side and storage medium | |
CN112083795A (en) | Object control method and device, storage medium and electronic equipment | |
CN111370004A (en) | Man-machine interaction method, voice processing method and equipment | |
CN111142655A (en) | Interaction method, terminal and computer readable storage medium | |
WO2023231211A1 (en) | Voice recognition method and apparatus, electronic device, storage medium, and product | |
CN109243457B (en) | Voice-based control method, device, equipment and storage medium | |
CN117789710A (en) | Voice interaction method, device and equipment for vehicle and vehicle | |
CN114598963A (en) | Voice processing method and device, computer readable storage medium and electronic equipment | |
CN115171692A (en) | Voice interaction method and device | |
CN114760417A (en) | Image shooting method and device, electronic equipment and storage medium | |
CN112017651B (en) | Voice control method and device of electronic equipment, computer equipment and storage medium | |
CN112951216B (en) | Vehicle-mounted voice processing method and vehicle-mounted information entertainment system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191108 |
|
RJ01 | Rejection of invention patent application after publication |