CN109941231A - Vehicle-mounted terminal equipment, vehicle-mounted interactive system and exchange method - Google Patents
Vehicle-mounted terminal equipment, vehicle-mounted interactive system and exchange method Download PDFInfo
- Publication number
- CN109941231A CN109941231A CN201910130763.0A CN201910130763A CN109941231A CN 109941231 A CN109941231 A CN 109941231A CN 201910130763 A CN201910130763 A CN 201910130763A CN 109941231 A CN109941231 A CN 109941231A
- Authority
- CN
- China
- Prior art keywords
- voice
- unit
- processing unit
- input signal
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- User Interface Of Digital Computer (AREA)
Abstract
The embodiment of the invention discloses a kind of vehicle-mounted terminal equipment, vehicle-mounted interactive system and exchange methods, the vehicle-mounted terminal equipment includes: voice-input unit, image input units, voice output unit, display unit, processing unit and communication unit, each unit interactive cooperation completes intelligent sound identification and image recognition based on machine vision, and multi-modal information exchange processing, to realize input the accurately identifying of information, accurate semantic understanding and the output of personalization, the man-machine interaction experience of user is promoted.
Description
Technical field
The present invention relates to car networking fields, more particularly to a kind of intelligent vehicle-carried man-machine interactive system and exchange method.
Background technique
Modern society, the vehicles more and more penetrate into people's lives, study, in work, as the vehicles
The automobile of important a member is increasingly becoming a part of people's life.Quick along with car networking technology is flourished, automobile
It is interconnected, intelligent to be possibly realized.One important component of car networking technology is vehicle-mounted interactive system, as people and automobile
Between interaction important bridge, vehicle-mounted interactive system sends out in terms of the degree of safety of automobile, comfort level, performance boost and usage experience
Wave irreplaceable role.And the vehicle-mounted interactive system that emerges as of artificial intelligence technology is filled with new vitality, has started vehicle
Carry the intelligentized revolution of new generation of interactive system.The intelligence degree and user experience of vehicle-mounted interactive system promote comfort level
Play key effect.
A kind of interactive mode of more mainstream is to realize to commonly use by the combination of mechanical button and touch screen in the prior art
Function.It since mechanical button is cumbersome, is equipped with quantity and frequency of use is all being remarkably decreased, as interior touch screen is magnified
Trend, gradually replaced completely by touch screen.Touch screen interaction usually requires driver and first carries out visual observation, and positioning touches
Position, and be also required to progress result after touching and check, it otherwise can not obtain relevant result feedback.But on running car way
In check that screen will disperse the attention of driver, reduce the safety of driving procedure.In addition, touch screen is generally disposed at automobile
Front row, only front-seat driver and passenger just can be carried out operation, and rear passenger can not then obtain interactive entrance, affect user
Experience.
Another emerging interactive mode is then interactive voice, i.e., using voice as input mode, passes through speech recognition skill
Art handles input, and using voice broadcast as output.But interactive voice has high requirement for environment inside car,
Interior various forms of noises, such as wind is made an uproar, tire is made an uproar, the interference of engine noise, vehicle-mounted sound box, it can be to the accurate of speech recognition
Degree affects greatly, especially for non-preceding dress voice interactive system, it is desirable to which the interactive voice experience obtained is that comparison is tired
Difficult.
In addition to this, the vehicle-mounted interactive system in part additionally provides gesture interaction and lip reading interaction as operation entry.But
The content that gesture can be expressed is very limited, and utilization rate is not high, and individually lip reading identification is difficult to realize higher discrimination,
It needs to cooperate other means of identification such as speech recognition toward contact, cause at present using less.Therefore, the prior art needs a kind of energy
Enough vehicle-mounted interactive system and interaction sides for realizing the accurately identifying of input information, accurate semantic understanding and personalized output
Method.
Summary of the invention
The embodiment of the invention provides a kind of vehicle-mounted terminal equipment, vehicle-mounted interactive system and exchange methods, may be implemented defeated
The output for entering the accurately identifying of information, accurate semantic understanding and personalization, promotes the man-machine interaction experience of user.
On the one hand, the embodiment of the present invention provides a kind of vehicle-mounted terminal equipment, comprising: voice-input unit, for acquiring language
Sound input signal;Image input units, for acquiring image input signal, described image input signal includes facial image letter
Number, one of facial expression image signal, lip picture signal and pupil image signal or a variety of;Voice output unit is used for
Generate sound output signal;Display unit, for showing interactive information;Processing unit, for control the voice-input unit,
Described image input unit, the voice output unit and the display unit, and for handle the voice input signal and
Described image input signal;Wherein processing unit includes that machine learning model establishes unit;The machine learning model establishes unit
It can be to one of facial image signal, facial expression image signal, lip picture signal and pupil image signal or a variety of foundation
One machine learning model;Communication unit, for being connect with cloud service equipment.The scheme provided through this embodiment, can be with
Realize that intelligent sound identification and image recognition and multi-modal information exchange based on machine vision are handled, to realize defeated
The output for entering the accurately identifying of information, accurate semantic understanding and personalization, promotes the man-machine interaction experience of user.
In a possible design, the voice-input unit is also used to remove or reduce noise, this is of the invention
One of inventive point.
In a possible design, the car-mounted terminal further includes phonation unit, and the phonation unit is for sending institute
State sound output signal.
On the other hand, the embodiment of the present invention provides a kind of vehicle-mounted interactive system, which includes cloud service equipment, and
Vehicle-mounted terminal equipment described in above-mentioned aspect.
In another aspect, this method is based on described in above-mentioned aspect the embodiment of the invention provides a kind of personal identification method
Vehicle-mounted terminal equipment, which comprises the voice-input unit and described image input unit acquire the voice respectively
Input signal and described image input signal;The processing unit extracts the facial characteristics in described image input signal;It is described
Processing unit carries out face recognition and matching according to the facial characteristics, determines user identity and associated with the user identity
Identity characteristic information, wherein the identity characteristic information include voiceprint;The processing unit extracts the voice input
Vocal print feature in signal;The processing unit compares the vocal print feature and the voiceprint, and by described
User identity described in contrast verification.Face recognition and base based on image may be implemented in the scheme provided through this embodiment
It is combined with each other in the Application on Voiceprint Recognition of voice, user identity is verified, meet the identification under high security level scene
Precision.
In a possible design, extracted in described image input signal by machine learning and/or neural network
Facial characteristics, to improve the efficiency and accuracy of image recognition, this is one of inventive point of the invention.
In a possible design, extracted in the voice input signal by machine learning and/or neural network
Vocal print feature, to improve the efficiency and accuracy of Application on Voiceprint Recognition, this is one of inventive point of the invention
In a possible design, the identity characteristic information includes gender, age, personality, the hobby of the user
One or more of, this is one of inventive point of the invention.
In a possible design, the identity characteristic information includes the biological information of the user, this is this
One of inventive point of invention.
In a possible design, the identity characteristic information includes the voiceprint of the user, this is the present invention
One of inventive point.
Another aspect, the embodiment of the invention provides a kind of interior localization methods, and this method is based on described in above-mentioned aspect
Vehicle-mounted terminal equipment, which comprises described image input unit acquires described image input signal;The processing unit mentions
Take the lip motion of the user in described image input signal;The processing unit is according to the lip motion and interior location area
The angular field of view mapping relations in domain and described image input unit determine the band of position of the user in the car.By this reality
The interior location region estimation based on image may be implemented, to promote the accuracy of positioning in the scheme for applying example offer.
In a possible design, the method also includes: the voice-input unit acquires the voice input letter
Number;The processing unit carries out auditory localization according to the voice input signal, determine the user in the position of the car,
To by based on image face recognition and voice-based auditory localization be combined with each other, realize interactive occupant's positioning
Method improves the accuracy and efficiency of positioning, this is one of inventive point of the invention.
In a possible design, the processing unit is using TDOA, beam forming or high resolution spectrum estimation
Method carries out auditory localization, this is one of inventive point of the invention.
Another aspect, the embodiment of the invention provides a kind of audio recognition method, this method is based on described in above-mentioned aspect
Vehicle-mounted terminal equipment, which comprises the voice-input unit and described image input unit acquire the voice respectively
Input signal and described image input signal;The processing unit carries out lip reading identification and expression according to described image input signal
Identification;The processing unit carries out speech recognition according to the voice input signal;The processing unit identifies the lip reading,
The Expression Recognition and the result of the speech recognition are weighted synthesis, generate output text.It provides through this embodiment
Scheme, the combination speech that speech recognition auxiliary lip reading identification and Expression Recognition may be implemented identifies, improves the essence of speech recognition
Degree and personalization, the user experience is improved.
In a possible design, the voice-input unit carries out noise reduction process to the voice input signal, from
And guaranteeing the accuracy of received voice input signal, this is one of inventive point of the invention.
In a possible design, the voice-input unit is screened by frequency spectrum or noise filter is to the voice
Input signal carries out noise reduction process, this is one of inventive point of the invention.
In a possible design, the voice-input unit passes through intelligent algorithm, machine learning and/or nerve
Network carries out noise reduction process to the voice input signal, this is one of inventive point of the invention.
In a possible design, the voice-input unit is according to the positioning result of above-mentioned interior localization method to institute
It states voice input signal and carries out noise reduction process, this is one of inventive point of the invention.
Another aspect, the embodiment of the invention provides a kind of feedback generation methods, and this method is based on described in above-mentioned aspect
Vehicle-mounted terminal equipment, which comprises the voice-input unit and described image input unit acquire the voice respectively
Input signal and described image input signal;The processing unit is according to the voice input signal and described image input signal
Determine user identity and identity characteristic information associated with the user identity;The processing unit is inputted according to described image
Signal carries out Expression Recognition;The processing unit carries out voice knowledge according to the voice input signal and described image input signal
Not;The processing unit carries out language according to the identity characteristic information, the Expression Recognition result and institute's speech recognition result
Reason and good sense solution;The processing unit according to the result, the identity characteristic information and the user of the semantic understanding in the car
Position generates feedback result.The scheme provided through this embodiment, can according to the identity characteristic information of user, Expression Recognition and
Speech recognition result realizes the semantic understanding of various dimensions, so as to generate personalized feedback result, has been obviously improved user
Experience.
In a possible design, the processing unit using above-mentioned personal identification method determine the user identity and
The identity characteristic information, this is one of inventive point of the invention.
In a possible design, the processing unit identifies the table of the user using above-mentioned audio recognition method
Feelings, this is one of inventive point of the invention.
In a possible design, the processing unit is defeated according to voice of the above-mentioned audio recognition method to the user
Enter signal to be identified, this is one of inventive point of the invention.
In a possible design, the feedback result is voice feedback or display feedback, this is invention of the invention
One of point.
In a possible design, the display unit is used for anti-with text and/or the image format output display
Feedback, this is one of inventive point of the invention.
In a possible design, the voice output unit is used to export the voice with analog voice or machine sound
Feedback, this is one of inventive point of the invention.
The technical solution provided according to embodiments of the present invention realizes input information by multi-modal vehicle-mounted interactive system
Accurately identify, accurate semantic understanding and personalization output, promote the man-machine interaction experience of user.Many wounds of the invention
One in new point includes integrating the system of above-mentioned multinomial input signal and output unit for interior multiple applications
In.It provides for occupant and easily experiences by bus.
Detailed description of the invention
It, below will be to required use in embodiment description for the clearer technical solution for illustrating the embodiment of the present invention
Attached drawing be briefly described.
Fig. 1 is a kind of schematic diagram of vehicle-mounted interactive system provided in an embodiment of the present invention;
Fig. 2 is a kind of schematic diagram of personal identification method provided in an embodiment of the present invention;
Fig. 3 is a kind of flow chart of personal identification method provided in an embodiment of the present invention;
Fig. 4 is a kind of schematic diagram of interior localization method provided in an embodiment of the present invention;
Fig. 5 is a kind of schematic diagram of audio recognition method provided in an embodiment of the present invention;
Fig. 6 is a kind of flow chart of audio recognition method provided in an embodiment of the present invention;
Fig. 7 is a kind of schematic diagram for feeding back generation method provided in an embodiment of the present invention;
Fig. 8 is a kind of flow chart for feeding back generation method provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, carries out to the scheme in the embodiment of the present invention clear, complete
Description.
The solution that the embodiment of the present invention proposes is based on vehicle-mounted interactive system 100 shown in FIG. 1.The vehicle-mounted interaction
System 100 can be installed on the front-seat region of automobile, can also be installed on the heel row region of automobile, and the embodiment of the present invention, which is not done, to be had
Body limits.The vehicle-mounted interactive system 100 includes vehicle-mounted terminal equipment 200 and cloud service equipment 300.Specifically, the vehicle
Mounted terminal equipment includes voice-input unit 201, image input units 202, voice output unit 203, display unit 204, place
Manage unit 205 and communication unit 206.
The voice-input unit 201 can be used for acquiring voice input signal, the voice input signal to be interior
The phonetic order that family issues.The voice-input unit 201 can be microphone or microphone array.The microphone can pacify
Front-seat region loaded on automotive interior, preferably to receive the instruction of driver;The heel row area of automotive interior can also be installed on
Domain increases the Interactive Experience of rear passenger so as to receive the voice signal from rear passenger.The microphone array can
To be fitted around in the entire space of automotive interior, so as to collect the voice signal of each angle of entire automotive interior,
Promote the rich and comprehensive of sound signal collecting.The voice-input unit 201 can also be for higher acquisition precision
Audio collection device or sound pick-up.
Optionally, the voice-input unit 201 has decrease of noise functions.The decrease of noise functions can reduce or remove environment
Noise can also remove undesired interference noise, for example, driver sends phonetic order to the vehicle-mounted terminal equipment 200
When, the ambient noise from passenger inside the vehicle's talk is lowered or removes.
Optionally, the voice-input unit 201 is screened by frequency spectrum, and the shadow of noise signal is removed in spectrum signal
It rings.The voice-input unit 201 can also weaken the influence of noise signal by noise filter.The voice-input unit
201 can also use artificial intelligence approach, carry out model training by machine learning and neural network, obtain noise reduction model, from
And remove the noise contribution in voice signal.
Described image input unit 202 is for acquiring image input signal.Described image input unit 202 can be phase
The equipment that machine, video camera etc. have image or video acquisition function, for example, color camera, black and white camera, infrared camera, 3D phase
Machine or their any combination.202 acquired image signal of described image input unit includes but is not limited to following several:
Face-image, feature and expression including occupant's face;
Facial expression image, the face action etc. when expressing various moods including user;
Opening and closing, the shape of the mouth as one speaks etc. of lip image, the motion characteristic including speaker's lip, such as lip;
Pupil image, the relevant action including pupil, for example, the contraction of pupil, focal position etc..
Optionally, described image input unit 202 also has light filling function, for increasing in the lower situation of ambient brightness
The brightness and clarity of strong acquisition image.
The voice output unit 203 can be simulation language for generating sound output signal, the sound output signal
Sound, or machine sound, for example, alarm sound, music etc..The voice output unit 203 can be loudspeaker.It is described
Voice output unit can export voice signal by both of which: the voice output unit 203 is interior sound box system
Loudspeaker, the sound output signal are sent by the loudspeaker of the interior sound box system to occupant.The sound is defeated
Unit 203 or the independent phonation unit independently of the interior sound box system, the sound output signal pass through institute out
Independent phonation unit is stated to send to occupant.
The display unit 204 is for showing interactive information, and the interactive information includes but is not limited to: human-computer interaction process
In necessary information, the query information of user, the answer information of question and answer and expression information etc. in human-computer interaction.The display is single
Member can be the common terminal presentation facilities such as LCD screen, LED screen, OLED screen curtain, touch screen, here no longer an example
It lifts.
The processing unit 205 is for controlling the voice-input unit 201, described image input unit 202, the sound
Sound output unit 203 and the display unit 204, and for handling the voice input signal and described image input signal,
And generate the voice signal.
It is appreciated that the processing unit 205 can be processor, the processor can be central processing unit (CPU),
General processor, digital signal processor (DSP), specific integrated circuit (ASIC), field programmable gate array (FPGA) or
Other programmable logic device, transistor logic, hardware component or any combination thereof.It may be implemented or execute combination
Various illustrative logic blocks, module and circuit described in the disclosure of invention.The processor is also possible to realize
The combination of computing function, such as combined comprising one or more microprocessors, DSP and the combination of microprocessor etc..
The communication unit 206 realizes the vehicle-mounted terminal equipment 200 for connecting with the cloud service equipment 300
With the information exchange of the cloud service equipment 300, so that completes local and network cooperates with processing.
It should be noted that voice-input unit 201 included in image capturing system 100 shown in FIG. 1, image are defeated
The quantity for entering unit 202 is only that one kind enumerates, and the embodiment of the present invention is also not restricted to this.For example, it is also possible to according to signal
The needs of acquisition and including more more voice input single 201 and image input units 202, to ask concise, do not retouch one by one in the accompanying drawings
It states.
Optionally, the vehicle-mounted interactive system 100 further includes sensor, and the sensor is used to vehicle-mounted interact system with described
System combines, and realizes algorithm fusion.The sensor for example may include inertial sensor, velocity sensor etc., realize signal or
The positioning and tracking of information source.
In the present embodiment, by above-mentioned vehicle-mounted interactive system 100, the intelligent sound identification based on machine vision may be implemented
And image recognition and the processing of multi-modal information exchange, to realize input the accurately identifying of information, accurate semantic understanding
And personalized output, promote the man-machine interaction experience of user.
A vehicle intellectualized important technology is identification, i.e., is obtained and confirmed to user identity, thus real
The realization of existing corresponding function uses authorization.For example, vehicle launch authorization, the personalized friendship of interior payment authorization, identity-based
Mutual information etc., these are required to confirm the identity of user as premise.Fig. 2 and Fig. 3 respectively illustrates one implementation of the present invention
The schematic diagram and flow chart for the personal identification method based on the vehicle-mounted interactive system 100 that example provides, below with reference to Fig. 2 and figure
3 pairs of methods provided in this embodiment are described in detail.
S21, voice-input unit 201 and image input units 202 acquire voice input signal and image input letter respectively
Number.
Specifically, the voice-input unit 201 acquires the voice of occupant, described image input unit 202 is acquired
The face-image of the occupant.
S22, the processing unit 205 extract the facial characteristics in described image input signal.
Specifically, the processing unit 205 carries out image recognition processing to described image input signal, pass through feature extraction
With edge detection position and determine described image input signal in face contour, to the face-image in the face contour into
Row optimization and enhancing carry out pattern-recognition according to mask, extract facial information characteristic value.The mask is based on machine
Study is established, and increases so as to the number with machine learning and precision is continuously improved.
S23, the processing unit 205 carry out face recognition and matching according to the facial characteristics, determine user identity and
Identity characteristic information associated with the user identity.
It is appreciated that searching and mentioning in the facial information that the processing unit 205 stores in the vehicle-mounted interactive system
The facial information that the facial information characteristic value taken matches, and according to the mapping relations of the facial information and user identity
User identity matching is carried out, so that it is determined that the identity of user, i.e. face recognition and successful match.
Optionally, the processing unit 205 constructs training pattern by machine learning and neural network, carries out facial information
Identification and matching, to improve the efficiency and accuracy of face recognition.
Optionally, if the face recognition and it fails to match, the processing unit 205 terminates identification procedure.It is described
Processing unit 205 can also re-start face recognition and matching in the face recognition and after it fails to match.
In a possible implementation, the identity characteristic information includes the customized information of user, for example, described
Gender, age, personality, hobby of user etc..The identity characteristic information can also include the biological information of the user,
For example, iris information, finger print information, pupil information, voiceprint etc..
S24, the processing unit 205 extract the vocal print feature in the voice input signal.
It is appreciated that the processing unit 205 pre-processes the voice signal, known by feature extraction and mode
Indescribably take the vocal print feature in the voice signal.The pattern-recognition can be realized by sound-groove model, can pass through depth
The technologies such as study, neural network establish sound-groove model, and as the number of machine learning is increasing, the precision of the identification is not yet
It is disconnected to improve.
S25, the processing unit 205 compare the vocal print feature and the voiceprint, and by described right
Than verifying the user identity.
Optionally, if vocal print feature comparison failure, the processing unit 205 terminate identification procedure.The place
Reason unit 205 can also re-start vocal print feature comparison after the vocal print feature compares failure.
In the present embodiment, by based on image face recognition and voice-based Application on Voiceprint Recognition be combined with each other, to user
Identity is verified, and the identification precision under high security level scene (such as interior payment scene) is met.
Fig. 4 shows the interior localization method provided by one embodiment of the present invention based on the vehicle-mounted interactive system 100
Schematic diagram, method provided in this embodiment is described in detail below with reference to Fig. 4.
S41, image input units 202 acquire image input signal.
Specifically, described image input unit 202 acquires the face-image of occupant.
S42, the processing unit 205 extract the lip motion of the user in described image input signal.
Wherein, the processing unit 205 identifies the lip motion and face action of the user by image recognition technology,
And the interior location, mood and lip reading of the user is identified according to the lip motion and face action.
The processing unit 205 carries out image recognition processing to described image input signal, passes through feature extraction and edge
Lip motion in detection positioning and determining described image input signal carries out pattern-recognition according to lip reading model, extracts lip
Motion characteristic value.The lip reading model based on machine learning establish, so as to the number with machine learning increases and it is continuous
Improve precision.The machine learning used for example can carry out machine learning using neural network model, use the neural network mould
Lip reading accuracy of identification not only can be improved in type, also resides in and can be also used for facial expression applied to the interior neural network, with
And the identification of driver behavior.The requirement of multinomial face recognition is met simultaneously using neural network model, to improve calculating
The utilization rate of unit.This is one of at innovation of the invention.
S43, the processing unit 205 is according to the lip motion and interior location region and described image input unit
202 angular field of view mapping relations determine the band of position of the user in the car.
Optionally, it is demarcated in the hardware device installation of the input unit 202, determines the interior location region
Mapping relations between the angular field of view of described image input unit 202.By the identification to user's lip motion, really
The acquisition visual angle of the fixed lip motion, to determine the band of position of the user in the car according to the mapping relations.
In a possible implementation, the localization method further include:
S44, voice-input unit 201 acquire voice input signal.
S45, the processing unit 205 carry out auditory localization according to the voice input signal, determine the user in institute
State interior position.
Optionally, the voice-input unit 201 is microphone array, realizes that sound source is fixed by the microphone array
Position.The processing unit 205 uses reaching time-difference (Time Difference of Arrival, TDOA) algorithm, according to institute
It states voice input signal and reaches time difference of each microphone and determine Sounnd source direction.The processing unit 205 can also use wave
Beam shaping or the method for high resolution spectrum estimation carry out auditory localization, and which is not described herein again.TDOA algorithm is used herein, and
And the creative auditory localization that the algorithm is used for car, because interior space is narrow, occupant is close (i.e. sound source is close),
There is the influence of other interference noises again, to accomplish to be accurately positioned in the environment, technical staff is led to by comparing a large amount of model
Cross experimental data verify above-mentioned algorithm can the positioning of effective solution environment inside car sound source difficulty the problem of.This is at innovation of the invention
One of.
In the present embodiment, by based on image face recognition and voice-based auditory localization be combined with each other, realize hand over
Mutual occupant's localization method improves the accuracy and efficiency of positioning.
Fig. 5 and Fig. 6 respectively illustrates the voice provided by one embodiment of the present invention based on the vehicle-mounted interactive system 100
The schematic diagram and flow chart of recognition methods are described in detail method provided in this embodiment below with reference to Fig. 5 and Fig. 6.
S51, voice-input unit 201 and image input units 202 acquire voice input signal and image input letter respectively
Number.
Specifically, the voice-input unit 201 acquires the voice signal of occupant, described image input unit 202
Acquire the face-image of the occupant.
S52, the processing unit 205 carry out lip reading identification and Expression Recognition according to described image input signal.
Wherein, the processing unit 205 identifies the lip motion and face action of the user by image recognition technology,
The lip reading that user can be identified according to the lip motion can identify the expression of the user according to the face action, and
Pass through the current mood of the Expression Recognition user.
S53, the processing unit 205 carry out speech recognition according to the voice input signal.
Optionally, noise reduction process is carried out to the voice input signal.The side such as frequency spectrum screening, noise filter can be passed through
Formula noise reduction, can also use intelligent algorithm, carry out model training by machine learning and neural network, obtain noise reduction mould
Type, to remove the noise in voice input signal.
The processing unit 205 can also be removed other than sound source by the positioning result of interior localization method shown in Fig. 4
Noise, to realize more accurate noise reduction.
S54, the processing unit 205 identifies the lip reading, the result of the Expression Recognition and the speech recognition into
Row can weight synthesis, generate output text.
Optionally, the processing unit 205 can be the result setting of lip reading identification, Expression Recognition and speech recognition
Weighted value, to synthesize the recognition result according to the weighted value, the recognition result is indicated and is exported by text.
The processing unit 205 can control the display unit 204 and show the text.The lip reading identification, Expression Recognition
It is all a basis of characterization with speech recognition each single item, but each single item is not unique or averagely treats, it can be according to reality
The accounting of data point reuse each single item is tested, with the accuracy of Statistical error, this is one of innovative point of the invention.
In the present embodiment, lip reading identification and Expression Recognition are assisted on the basis of speech recognition, combination speech may be implemented
Identification, improves the precision and personalization of speech recognition, the user experience is improved.
Fig. 7 and Fig. 8 respectively illustrates the feedback provided by one embodiment of the present invention based on the vehicle-mounted interactive system 100
The schematic diagram and flow chart of generation method are described in detail method provided in this embodiment below with reference to Fig. 7 and Fig. 8.
S71, the voice-input unit 201 and described image input unit 202 acquire the voice input signal respectively
With described image input signal.
S72, the processing unit 205 determine user identity according to the voice input signal and described image input signal
With identity characteristic information associated with the user identity.
Specifically, the processing unit 205 can use personal identification method described in Fig. 2 and determine the user identity
With the identity characteristic information.As described in step S23, the identity characteristic information includes the customized information of user, example
Such as, the gender of the user, age, personality, hobby etc..
S73, the processing unit 205 carry out Expression Recognition according to described image input signal.
It is appreciated that the processing unit 205 can use audio recognition method described in Fig. 5, according to the user
Face action identify the expression of the user, and the mood current by the Expression Recognition user.
S74, the processing unit 205 carry out voice knowledge according to the voice input signal and described image input signal
Not.
Optionally, the audio recognition method according to described in Fig. 5 of processing unit 205 inputs the voice of the user
Signal is identified.
S75, the processing unit 205 are known according to the identity characteristic information, the Expression Recognition result and the voice
Other result carries out semantic understanding.
It is appreciated that the processing unit 205 is according to the identity characteristic information of the user, mood and the knot of speech recognition
Fruit, can the accurate meaning and intention for understanding the user.
S76, the processing unit 205 is according to the result, the identity characteristic information and the user of the semantic understanding
Position in the car generates feedback result.
Optionally, the processing unit 205 car localization method according to described in Fig. 4 obtains the user in the car
Position.
The feedback result can be voice feedback or display feedback, or the knot of voice feedback and display feedback
It closes.The feedback result further includes the voice feedback and the display feedback should use towards angle, for example, described anti-
Present position of the output of result towards the user in the car.The display unit 204 is used for defeated with text and/or image format
The display feedback, the voice output unit 203 are used to export the voice feedback with analog voice or machine sound out.
In the present embodiment, various dimensions are realized according to the identity characteristic information, Expression Recognition and speech recognition result of user
Semantic understanding has been obviously improved user experience so as to generate personalized feedback result.
It is above-mentioned that mainly scheme provided in an embodiment of the present invention is carried out from angle interactive between each section in each step
It introduces.It is understood that in order to realize the above functions, corresponding portion, which contains, executes each corresponding hardware configuration of function
And/or software module.Those skilled in the art should be readily appreciated that, described in conjunction with the examples disclosed in this document each
Exemplary part and algorithm steps, the present invention can be realized with the combining form of hardware or hardware and computer software.Some
Function is executed in a manner of hardware or computer software driving hardware actually, depending on technical solution specific application and set
Count constraint condition.Professional technician can specifically realize described function to each using distinct methods, but
It is that such implementation should not be considered as beyond the scope of the present invention.
The step of method described in the embodiment of the present invention or algorithm, can be directly embedded into hardware, processing unit executes
The combination of software module or the two.Software module can store deposits in RAM memory, flash memory, ROM memory, EPROM
Other any form of storages in reservoir, eeprom memory, register, hard disk, moveable magnetic disc, CD-ROM or this field
In medium.Illustratively, storaging medium can be connect with processing unit, so that processing unit can be read from storaging medium
Information, and can be to storaging medium stored and written information.Optionally, storaging medium can also be integrated into processing unit.Processing unit
It can be configured in ASIC with storaging medium, ASIC can be configured in operating terminal equipment.
Those skilled in the art it will be appreciated that in said one or multiple examples, retouched by the embodiment of the present invention
The above-mentioned function of stating can be realized in hardware, software, firmware or any combination of this three.If realized in software,
These functions can store on computer-readable medium, or being transmitted in computer in the form of one or more instructions or code forms can
It reads on medium.Computer-readable medium include computer storage medium and convenient for so that allow computer program from one place transfer
Arrive communication media elsewhere.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all any modification, equivalent substitution, improvement and etc. on the basis of technical solution of the present invention, done should all
Including within protection scope of the present invention.
Claims (9)
1. a kind of vehicle-mounted terminal equipment characterized by comprising
Voice-input unit, for acquiring voice input signal;
Image input units, for acquiring image input signal;Described image input signal includes facial image signal, expression figure
One of picture signal, lip picture signal and pupil image signal are a variety of;
Voice output unit, for generating sound output signal;
Display unit, for showing interactive information;
Processing unit, for controlling the voice-input unit, described image input unit, the voice output unit and described
Display unit, and for handling the voice input signal and described image input signal;Wherein processing unit includes engineering
Practise model foundation unit;The machine learning model, which establishes unit, to believe facial image signal, facial expression image signal, lip image
Number and one of pupil image signal or a variety of establish a machine learning model;
Communication unit, for being connect with cloud service equipment.
2. vehicle-mounted terminal equipment as described in claim 1, which is characterized in that the voice-input unit is also used to remove or drop
Low noise.
3. such as vehicle-mounted terminal equipment of any of claims 1-2, which is characterized in that the car-mounted terminal further includes hair
Sound unit, the phonation unit is for sending the sound output signal.
4. a kind of vehicle-mounted interactive system characterized by comprising
Cloud service equipment and vehicle-mounted terminal equipment as claimed in any one of claims 1-3.
5. a kind of personal identification method is existed based on such as vehicle-mounted terminal equipment of any of claims 1-4, feature
In, which comprises
The voice-input unit and described image input unit acquire the voice input signal and described image input respectively
Signal;
The processing unit extracts the facial characteristics in described image input signal;
The processing unit carries out face recognition and matching according to the facial characteristics, determine user identity and with user's body
The associated identity characteristic information of part, wherein the identity characteristic information includes voiceprint;
The processing unit extracts the vocal print feature in the voice input signal;
The processing unit compares the vocal print feature and the voiceprint, and by using described in the contrast verification
Family identity.
6. a kind of car localization method, is existed based on such as vehicle-mounted terminal equipment of any of claims 1-4, feature
In, which comprises
Described image input unit acquires described image input signal;
The processing unit extracts the lip motion of the user in described image input signal;
The processing unit is reflected according to the lip motion and interior location region and the angular field of view of described image input unit
Relationship is penetrated, determines the band of position of the user in the car.
7. method as claimed in claim 6, which is characterized in that the method also includes:
The voice-input unit acquires the voice input signal;
The processing unit carries out auditory localization according to the voice input signal, determines the user in the position of the car
It sets.
8. a kind of audio recognition method, based on the vehicle-mounted terminal equipment as described in claim 1-4, which is characterized in that the side
Method includes:
The voice-input unit and described image input unit acquire the voice input signal and described image input respectively
Signal;
The processing unit carries out lip reading identification and Expression Recognition according to described image input signal;
The processing unit carries out speech recognition according to the voice input signal;
The processing unit identifies the lip reading, the result of the Expression Recognition and the speech recognition is weighted synthesis,
Generate output text.
9. a kind of feedback generation method, based on the vehicle-mounted terminal equipment as described in claim 1-4, which is characterized in that the side
Method includes:
The voice-input unit and described image input unit acquire the voice input signal and described image input respectively
Signal;
The processing unit according to the voice input signal and described image input signal determine user identity and with the use
The associated identity characteristic information of family identity;
The processing unit carries out Expression Recognition according to described image input signal;
The processing unit carries out speech recognition according to the voice input signal and described image input signal;
The processing unit carries out language according to the identity characteristic information, the Expression Recognition result and institute's speech recognition result
Reason and good sense solution;
Result of the processing unit according to the semantic understanding, the position of the identity characteristic information and the user in the car
Generate feedback result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910130763.0A CN109941231B (en) | 2019-02-21 | 2019-02-21 | Vehicle-mounted terminal equipment, vehicle-mounted interaction system and interaction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910130763.0A CN109941231B (en) | 2019-02-21 | 2019-02-21 | Vehicle-mounted terminal equipment, vehicle-mounted interaction system and interaction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109941231A true CN109941231A (en) | 2019-06-28 |
CN109941231B CN109941231B (en) | 2021-02-02 |
Family
ID=67007623
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910130763.0A Active CN109941231B (en) | 2019-02-21 | 2019-02-21 | Vehicle-mounted terminal equipment, vehicle-mounted interaction system and interaction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109941231B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110335600A (en) * | 2019-07-09 | 2019-10-15 | 四川长虹电器股份有限公司 | The multi-modal exchange method and system of household appliance |
CN110444212A (en) * | 2019-09-10 | 2019-11-12 | 安徽大德中电智能科技有限公司 | A kind of smart home robot voice identification device and recognition methods |
CN110641476A (en) * | 2019-08-16 | 2020-01-03 | 广汽蔚来新能源汽车科技有限公司 | Interaction method and device based on vehicle-mounted robot, controller and storage medium |
CN110827823A (en) * | 2019-11-13 | 2020-02-21 | 联想(北京)有限公司 | Voice auxiliary recognition method and device, storage medium and electronic equipment |
CN111696548A (en) * | 2020-05-13 | 2020-09-22 | 深圳追一科技有限公司 | Method and device for displaying driving prompt information, electronic equipment and storage medium |
CN115604660A (en) * | 2021-06-25 | 2023-01-13 | 比亚迪股份有限公司(Cn) | Human-vehicle interaction method and system, storage medium, electronic device and vehicle |
EP4134949A4 (en) * | 2020-04-30 | 2023-04-05 | Huawei Technologies Co., Ltd. | In-vehicle user positioning method, on-board interaction method, on-board device, and vehicle |
CN117174092A (en) * | 2023-11-02 | 2023-12-05 | 北京语言大学 | Mobile corpus transcription method and device based on voiceprint recognition and multi-modal analysis |
CN117370961A (en) * | 2023-12-05 | 2024-01-09 | 江西五十铃汽车有限公司 | Vehicle voice interaction method and system |
CN117672180A (en) * | 2023-12-08 | 2024-03-08 | 广州凯迪云信息科技有限公司 | Voice communication control method and system for digital robot |
CN118197315A (en) * | 2024-05-16 | 2024-06-14 | 合众新能源汽车股份有限公司 | Cabin voice interaction method, system and computer readable medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104691444A (en) * | 2013-12-09 | 2015-06-10 | 奇点新源国际技术开发(北京)有限公司 | Vehicle-mounted terminal based on electric car and vehicle-mounted terminal system |
CN106681483A (en) * | 2015-11-05 | 2017-05-17 | 芋头科技(杭州)有限公司 | Interaction method and interaction system for intelligent equipment |
CN107665295A (en) * | 2016-07-29 | 2018-02-06 | 长城汽车股份有限公司 | Identity identifying method, system and the vehicle of vehicle |
CN107933501A (en) * | 2016-10-12 | 2018-04-20 | 德尔福电子(苏州)有限公司 | A kind of automobile initiating means identified based on recognition of face and vocal print cloud |
DE102017200909A1 (en) * | 2017-01-20 | 2018-07-26 | Bayerische Motoren Werke Aktiengesellschaft | System for monitoring and controlling vehicle functions |
-
2019
- 2019-02-21 CN CN201910130763.0A patent/CN109941231B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104691444A (en) * | 2013-12-09 | 2015-06-10 | 奇点新源国际技术开发(北京)有限公司 | Vehicle-mounted terminal based on electric car and vehicle-mounted terminal system |
CN106681483A (en) * | 2015-11-05 | 2017-05-17 | 芋头科技(杭州)有限公司 | Interaction method and interaction system for intelligent equipment |
CN107665295A (en) * | 2016-07-29 | 2018-02-06 | 长城汽车股份有限公司 | Identity identifying method, system and the vehicle of vehicle |
CN107933501A (en) * | 2016-10-12 | 2018-04-20 | 德尔福电子(苏州)有限公司 | A kind of automobile initiating means identified based on recognition of face and vocal print cloud |
DE102017200909A1 (en) * | 2017-01-20 | 2018-07-26 | Bayerische Motoren Werke Aktiengesellschaft | System for monitoring and controlling vehicle functions |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110335600A (en) * | 2019-07-09 | 2019-10-15 | 四川长虹电器股份有限公司 | The multi-modal exchange method and system of household appliance |
CN110641476A (en) * | 2019-08-16 | 2020-01-03 | 广汽蔚来新能源汽车科技有限公司 | Interaction method and device based on vehicle-mounted robot, controller and storage medium |
CN110444212A (en) * | 2019-09-10 | 2019-11-12 | 安徽大德中电智能科技有限公司 | A kind of smart home robot voice identification device and recognition methods |
CN110827823A (en) * | 2019-11-13 | 2020-02-21 | 联想(北京)有限公司 | Voice auxiliary recognition method and device, storage medium and electronic equipment |
EP4134949A4 (en) * | 2020-04-30 | 2023-04-05 | Huawei Technologies Co., Ltd. | In-vehicle user positioning method, on-board interaction method, on-board device, and vehicle |
CN111696548A (en) * | 2020-05-13 | 2020-09-22 | 深圳追一科技有限公司 | Method and device for displaying driving prompt information, electronic equipment and storage medium |
CN115604660A (en) * | 2021-06-25 | 2023-01-13 | 比亚迪股份有限公司(Cn) | Human-vehicle interaction method and system, storage medium, electronic device and vehicle |
CN117174092A (en) * | 2023-11-02 | 2023-12-05 | 北京语言大学 | Mobile corpus transcription method and device based on voiceprint recognition and multi-modal analysis |
CN117174092B (en) * | 2023-11-02 | 2024-01-26 | 北京语言大学 | Mobile corpus transcription method and device based on voiceprint recognition and multi-modal analysis |
CN117370961A (en) * | 2023-12-05 | 2024-01-09 | 江西五十铃汽车有限公司 | Vehicle voice interaction method and system |
CN117370961B (en) * | 2023-12-05 | 2024-03-15 | 江西五十铃汽车有限公司 | Vehicle voice interaction method and system |
CN117672180A (en) * | 2023-12-08 | 2024-03-08 | 广州凯迪云信息科技有限公司 | Voice communication control method and system for digital robot |
CN118197315A (en) * | 2024-05-16 | 2024-06-14 | 合众新能源汽车股份有限公司 | Cabin voice interaction method, system and computer readable medium |
Also Published As
Publication number | Publication date |
---|---|
CN109941231B (en) | 2021-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109941231A (en) | Vehicle-mounted terminal equipment, vehicle-mounted interactive system and exchange method | |
US11031012B2 (en) | System and method of correlating mouth images to input commands | |
CN104361276B (en) | A kind of multi-modal biological characteristic identity identifying method and system | |
US11854550B2 (en) | Determining input for speech processing engine | |
Potamianos et al. | Recent advances in the automatic recognition of audiovisual speech | |
WO2021135685A1 (en) | Identity authentication method and device | |
CN102023703B (en) | Combined lip reading and voice recognition multimodal interface system | |
CN106157956A (en) | The method and device of speech recognition | |
JP6977004B2 (en) | In-vehicle devices, methods and programs for processing vocalizations | |
CN109564759A (en) | Speaker Identification | |
CN107731233A (en) | A kind of method for recognizing sound-groove based on RNN | |
KR20010039771A (en) | Methods and apparatus for audio-visual speaker recognition and utterance verification | |
US20230129816A1 (en) | Speech instruction control method in vehicle cabin and related device | |
CN109887187A (en) | A kind of pickup processing method, device, equipment and storage medium | |
CN108376215A (en) | A kind of identity identifying method | |
CN113643707A (en) | Identity verification method and device and electronic equipment | |
CN118197315A (en) | Cabin voice interaction method, system and computer readable medium | |
CN108960191B (en) | Multi-mode fusion emotion calculation method and system for robot | |
CN110414295A (en) | Identify method, apparatus, cooking equipment and the computer storage medium of rice | |
CN110083392A (en) | Audio wakes up method, storage medium, terminal and its bluetooth headset pre-recorded | |
Abreu | Visual speech recognition for European Portuguese | |
CN114537409B (en) | Multi-sensory vehicle-mounted interaction method and system based on multi-modal analysis | |
KR102535244B1 (en) | identification system and method using landmark of part of the face and voice recognition | |
CN117273743A (en) | Identity authentication method, device, system and medium in bank large-amount transaction | |
Kar et al. | Audio-visual biometric based speaker identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220314 Address after: 215100 floor 23, Tiancheng Times Business Plaza, No. 58, qinglonggang Road, high speed rail new town, Xiangcheng District, Suzhou, Jiangsu Province Patentee after: MOMENTA (SUZHOU) TECHNOLOGY Co.,Ltd. Address before: Room 601-a32, Tiancheng information building, No. 88, South Tiancheng Road, high speed rail new town, Xiangcheng District, Suzhou City, Jiangsu Province Patentee before: MOMENTA (SUZHOU) TECHNOLOGY Co.,Ltd. |
|
TR01 | Transfer of patent right |