CN109941231A

CN109941231A - Vehicle-mounted terminal equipment, vehicle-mounted interactive system and exchange method

Info

Publication number: CN109941231A
Application number: CN201910130763.0A
Authority: CN
Inventors: 李林军; 耿文童; 贾百龙; 周建
Original assignee: Momenta Suzhou Technology Co Ltd
Current assignee: Momenta Suzhou Technology Co Ltd
Priority date: 2019-02-21
Filing date: 2019-02-21
Publication date: 2019-06-28
Anticipated expiration: 2039-02-21
Also published as: CN109941231B

Abstract

The embodiment of the invention discloses a kind of vehicle-mounted terminal equipment, vehicle-mounted interactive system and exchange methods, the vehicle-mounted terminal equipment includes: voice-input unit, image input units, voice output unit, display unit, processing unit and communication unit, each unit interactive cooperation completes intelligent sound identification and image recognition based on machine vision, and multi-modal information exchange processing, to realize input the accurately identifying of information, accurate semantic understanding and the output of personalization, the man-machine interaction experience of user is promoted.

Description

Vehicle-mounted terminal equipment, vehicle-mounted interactive system and exchange method

Technical field

The present invention relates to car networking fields, more particularly to a kind of intelligent vehicle-carried man-machine interactive system and exchange method.

Background technique

Modern society, the vehicles more and more penetrate into people's lives, study, in work, as the vehicles The automobile of important a member is increasingly becoming a part of people's life.Quick along with car networking technology is flourished, automobile It is interconnected, intelligent to be possibly realized.One important component of car networking technology is vehicle-mounted interactive system, as people and automobile Between interaction important bridge, vehicle-mounted interactive system sends out in terms of the degree of safety of automobile, comfort level, performance boost and usage experience Wave irreplaceable role.And the vehicle-mounted interactive system that emerges as of artificial intelligence technology is filled with new vitality, has started vehicle Carry the intelligentized revolution of new generation of interactive system.The intelligence degree and user experience of vehicle-mounted interactive system promote comfort level Play key effect.

A kind of interactive mode of more mainstream is to realize to commonly use by the combination of mechanical button and touch screen in the prior art Function.It since mechanical button is cumbersome, is equipped with quantity and frequency of use is all being remarkably decreased, as interior touch screen is magnified Trend, gradually replaced completely by touch screen.Touch screen interaction usually requires driver and first carries out visual observation, and positioning touches Position, and be also required to progress result after touching and check, it otherwise can not obtain relevant result feedback.But on running car way In check that screen will disperse the attention of driver, reduce the safety of driving procedure.In addition, touch screen is generally disposed at automobile Front row, only front-seat driver and passenger just can be carried out operation, and rear passenger can not then obtain interactive entrance, affect user Experience.

Another emerging interactive mode is then interactive voice, i.e., using voice as input mode, passes through speech recognition skill Art handles input, and using voice broadcast as output.But interactive voice has high requirement for environment inside car, Interior various forms of noises, such as wind is made an uproar, tire is made an uproar, the interference of engine noise, vehicle-mounted sound box, it can be to the accurate of speech recognition Degree affects greatly, especially for non-preceding dress voice interactive system, it is desirable to which the interactive voice experience obtained is that comparison is tired Difficult.

In addition to this, the vehicle-mounted interactive system in part additionally provides gesture interaction and lip reading interaction as operation entry.But The content that gesture can be expressed is very limited, and utilization rate is not high, and individually lip reading identification is difficult to realize higher discrimination, It needs to cooperate other means of identification such as speech recognition toward contact, cause at present using less.Therefore, the prior art needs a kind of energy Enough vehicle-mounted interactive system and interaction sides for realizing the accurately identifying of input information, accurate semantic understanding and personalized output Method.

Summary of the invention

The embodiment of the invention provides a kind of vehicle-mounted terminal equipment, vehicle-mounted interactive system and exchange methods, may be implemented defeated The output for entering the accurately identifying of information, accurate semantic understanding and personalization, promotes the man-machine interaction experience of user.

On the one hand, the embodiment of the present invention provides a kind of vehicle-mounted terminal equipment, comprising: voice-input unit, for acquiring language Sound input signal；Image input units, for acquiring image input signal, described image input signal includes facial image letter Number, one of facial expression image signal, lip picture signal and pupil image signal or a variety of；Voice output unit is used for Generate sound output signal；Display unit, for showing interactive information；Processing unit, for control the voice-input unit, Described image input unit, the voice output unit and the display unit, and for handle the voice input signal and Described image input signal；Wherein processing unit includes that machine learning model establishes unit；The machine learning model establishes unit It can be to one of facial image signal, facial expression image signal, lip picture signal and pupil image signal or a variety of foundation One machine learning model；Communication unit, for being connect with cloud service equipment.The scheme provided through this embodiment, can be with Realize that intelligent sound identification and image recognition and multi-modal information exchange based on machine vision are handled, to realize defeated The output for entering the accurately identifying of information, accurate semantic understanding and personalization, promotes the man-machine interaction experience of user.

In a possible design, the voice-input unit is also used to remove or reduce noise, this is of the invention One of inventive point.

In a possible design, the car-mounted terminal further includes phonation unit, and the phonation unit is for sending institute State sound output signal.

On the other hand, the embodiment of the present invention provides a kind of vehicle-mounted interactive system, which includes cloud service equipment, and Vehicle-mounted terminal equipment described in above-mentioned aspect.

In another aspect, this method is based on described in above-mentioned aspect the embodiment of the invention provides a kind of personal identification method Vehicle-mounted terminal equipment, which comprises the voice-input unit and described image input unit acquire the voice respectively Input signal and described image input signal；The processing unit extracts the facial characteristics in described image input signal；It is described Processing unit carries out face recognition and matching according to the facial characteristics, determines user identity and associated with the user identity Identity characteristic information, wherein the identity characteristic information include voiceprint；The processing unit extracts the voice input Vocal print feature in signal；The processing unit compares the vocal print feature and the voiceprint, and by described User identity described in contrast verification.Face recognition and base based on image may be implemented in the scheme provided through this embodiment It is combined with each other in the Application on Voiceprint Recognition of voice, user identity is verified, meet the identification under high security level scene Precision.

In a possible design, extracted in described image input signal by machine learning and/or neural network Facial characteristics, to improve the efficiency and accuracy of image recognition, this is one of inventive point of the invention.

In a possible design, extracted in the voice input signal by machine learning and/or neural network Vocal print feature, to improve the efficiency and accuracy of Application on Voiceprint Recognition, this is one of inventive point of the invention

In a possible design, the identity characteristic information includes gender, age, personality, the hobby of the user One or more of, this is one of inventive point of the invention.

In a possible design, the identity characteristic information includes the biological information of the user, this is this One of inventive point of invention.

In a possible design, the identity characteristic information includes the voiceprint of the user, this is the present invention One of inventive point.

Another aspect, the embodiment of the invention provides a kind of interior localization methods, and this method is based on described in above-mentioned aspect Vehicle-mounted terminal equipment, which comprises described image input unit acquires described image input signal；The processing unit mentions Take the lip motion of the user in described image input signal；The processing unit is according to the lip motion and interior location area The angular field of view mapping relations in domain and described image input unit determine the band of position of the user in the car.By this reality The interior location region estimation based on image may be implemented, to promote the accuracy of positioning in the scheme for applying example offer.

In a possible design, the method also includes: the voice-input unit acquires the voice input letter Number；The processing unit carries out auditory localization according to the voice input signal, determine the user in the position of the car, To by based on image face recognition and voice-based auditory localization be combined with each other, realize interactive occupant's positioning Method improves the accuracy and efficiency of positioning, this is one of inventive point of the invention.

In a possible design, the processing unit is using TDOA, beam forming or high resolution spectrum estimation Method carries out auditory localization, this is one of inventive point of the invention.

Another aspect, the embodiment of the invention provides a kind of audio recognition method, this method is based on described in above-mentioned aspect Vehicle-mounted terminal equipment, which comprises the voice-input unit and described image input unit acquire the voice respectively Input signal and described image input signal；The processing unit carries out lip reading identification and expression according to described image input signal Identification；The processing unit carries out speech recognition according to the voice input signal；The processing unit identifies the lip reading, The Expression Recognition and the result of the speech recognition are weighted synthesis, generate output text.It provides through this embodiment Scheme, the combination speech that speech recognition auxiliary lip reading identification and Expression Recognition may be implemented identifies, improves the essence of speech recognition Degree and personalization, the user experience is improved.

In a possible design, the voice-input unit carries out noise reduction process to the voice input signal, from And guaranteeing the accuracy of received voice input signal, this is one of inventive point of the invention.

In a possible design, the voice-input unit is screened by frequency spectrum or noise filter is to the voice Input signal carries out noise reduction process, this is one of inventive point of the invention.

In a possible design, the voice-input unit passes through intelligent algorithm, machine learning and/or nerve Network carries out noise reduction process to the voice input signal, this is one of inventive point of the invention.

In a possible design, the voice-input unit is according to the positioning result of above-mentioned interior localization method to institute It states voice input signal and carries out noise reduction process, this is one of inventive point of the invention.

Another aspect, the embodiment of the invention provides a kind of feedback generation methods, and this method is based on described in above-mentioned aspect Vehicle-mounted terminal equipment, which comprises the voice-input unit and described image input unit acquire the voice respectively Input signal and described image input signal；The processing unit is according to the voice input signal and described image input signal Determine user identity and identity characteristic information associated with the user identity；The processing unit is inputted according to described image Signal carries out Expression Recognition；The processing unit carries out voice knowledge according to the voice input signal and described image input signal Not；The processing unit carries out language according to the identity characteristic information, the Expression Recognition result and institute's speech recognition result Reason and good sense solution；The processing unit according to the result, the identity characteristic information and the user of the semantic understanding in the car Position generates feedback result.The scheme provided through this embodiment, can according to the identity characteristic information of user, Expression Recognition and Speech recognition result realizes the semantic understanding of various dimensions, so as to generate personalized feedback result, has been obviously improved user Experience.

In a possible design, the processing unit using above-mentioned personal identification method determine the user identity and The identity characteristic information, this is one of inventive point of the invention.

In a possible design, the processing unit identifies the table of the user using above-mentioned audio recognition method Feelings, this is one of inventive point of the invention.

In a possible design, the processing unit is defeated according to voice of the above-mentioned audio recognition method to the user Enter signal to be identified, this is one of inventive point of the invention.

In a possible design, the feedback result is voice feedback or display feedback, this is invention of the invention One of point.

In a possible design, the display unit is used for anti-with text and/or the image format output display Feedback, this is one of inventive point of the invention.

In a possible design, the voice output unit is used to export the voice with analog voice or machine sound Feedback, this is one of inventive point of the invention.

The technical solution provided according to embodiments of the present invention realizes input information by multi-modal vehicle-mounted interactive system Accurately identify, accurate semantic understanding and personalization output, promote the man-machine interaction experience of user.Many wounds of the invention One in new point includes integrating the system of above-mentioned multinomial input signal and output unit for interior multiple applications In.It provides for occupant and easily experiences by bus.

Detailed description of the invention

It, below will be to required use in embodiment description for the clearer technical solution for illustrating the embodiment of the present invention Attached drawing be briefly described.

Fig. 1 is a kind of schematic diagram of vehicle-mounted interactive system provided in an embodiment of the present invention；

Fig. 2 is a kind of schematic diagram of personal identification method provided in an embodiment of the present invention；

Fig. 3 is a kind of flow chart of personal identification method provided in an embodiment of the present invention；

Fig. 4 is a kind of schematic diagram of interior localization method provided in an embodiment of the present invention；

Fig. 5 is a kind of schematic diagram of audio recognition method provided in an embodiment of the present invention；

Fig. 6 is a kind of flow chart of audio recognition method provided in an embodiment of the present invention；

Fig. 7 is a kind of schematic diagram for feeding back generation method provided in an embodiment of the present invention；

Fig. 8 is a kind of flow chart for feeding back generation method provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, carries out to the scheme in the embodiment of the present invention clear, complete Description.

The solution that the embodiment of the present invention proposes is based on vehicle-mounted interactive system 100 shown in FIG. 1.The vehicle-mounted interaction System 100 can be installed on the front-seat region of automobile, can also be installed on the heel row region of automobile, and the embodiment of the present invention, which is not done, to be had Body limits.The vehicle-mounted interactive system 100 includes vehicle-mounted terminal equipment 200 and cloud service equipment 300.Specifically, the vehicle Mounted terminal equipment includes voice-input unit 201, image input units 202, voice output unit 203, display unit 204, place Manage unit 205 and communication unit 206.

The voice-input unit 201 can be used for acquiring voice input signal, the voice input signal to be interior The phonetic order that family issues.The voice-input unit 201 can be microphone or microphone array.The microphone can pacify Front-seat region loaded on automotive interior, preferably to receive the instruction of driver；The heel row area of automotive interior can also be installed on Domain increases the Interactive Experience of rear passenger so as to receive the voice signal from rear passenger.The microphone array can To be fitted around in the entire space of automotive interior, so as to collect the voice signal of each angle of entire automotive interior, Promote the rich and comprehensive of sound signal collecting.The voice-input unit 201 can also be for higher acquisition precision Audio collection device or sound pick-up.

Optionally, the voice-input unit 201 has decrease of noise functions.The decrease of noise functions can reduce or remove environment Noise can also remove undesired interference noise, for example, driver sends phonetic order to the vehicle-mounted terminal equipment 200 When, the ambient noise from passenger inside the vehicle's talk is lowered or removes.

Optionally, the voice-input unit 201 is screened by frequency spectrum, and the shadow of noise signal is removed in spectrum signal It rings.The voice-input unit 201 can also weaken the influence of noise signal by noise filter.The voice-input unit 201 can also use artificial intelligence approach, carry out model training by machine learning and neural network, obtain noise reduction model, from And remove the noise contribution in voice signal.

Described image input unit 202 is for acquiring image input signal.Described image input unit 202 can be phase The equipment that machine, video camera etc. have image or video acquisition function, for example, color camera, black and white camera, infrared camera, 3D phase Machine or their any combination.202 acquired image signal of described image input unit includes but is not limited to following several:

Face-image, feature and expression including occupant's face；

Facial expression image, the face action etc. when expressing various moods including user；

Opening and closing, the shape of the mouth as one speaks etc. of lip image, the motion characteristic including speaker's lip, such as lip；

Pupil image, the relevant action including pupil, for example, the contraction of pupil, focal position etc..

Optionally, described image input unit 202 also has light filling function, for increasing in the lower situation of ambient brightness The brightness and clarity of strong acquisition image.

The voice output unit 203 can be simulation language for generating sound output signal, the sound output signal Sound, or machine sound, for example, alarm sound, music etc..The voice output unit 203 can be loudspeaker.It is described Voice output unit can export voice signal by both of which: the voice output unit 203 is interior sound box system Loudspeaker, the sound output signal are sent by the loudspeaker of the interior sound box system to occupant.The sound is defeated Unit 203 or the independent phonation unit independently of the interior sound box system, the sound output signal pass through institute out Independent phonation unit is stated to send to occupant.

The display unit 204 is for showing interactive information, and the interactive information includes but is not limited to: human-computer interaction process In necessary information, the query information of user, the answer information of question and answer and expression information etc. in human-computer interaction.The display is single Member can be the common terminal presentation facilities such as LCD screen, LED screen, OLED screen curtain, touch screen, here no longer an example It lifts.

The processing unit 205 is for controlling the voice-input unit 201, described image input unit 202, the sound Sound output unit 203 and the display unit 204, and for handling the voice input signal and described image input signal, And generate the voice signal.

It is appreciated that the processing unit 205 can be processor, the processor can be central processing unit (CPU), General processor, digital signal processor (DSP), specific integrated circuit (ASIC), field programmable gate array (FPGA) or Other programmable logic device, transistor logic, hardware component or any combination thereof.It may be implemented or execute combination Various illustrative logic blocks, module and circuit described in the disclosure of invention.The processor is also possible to realize The combination of computing function, such as combined comprising one or more microprocessors, DSP and the combination of microprocessor etc..

The communication unit 206 realizes the vehicle-mounted terminal equipment 200 for connecting with the cloud service equipment 300 With the information exchange of the cloud service equipment 300, so that completes local and network cooperates with processing.

It should be noted that voice-input unit 201 included in image capturing system 100 shown in FIG. 1, image are defeated The quantity for entering unit 202 is only that one kind enumerates, and the embodiment of the present invention is also not restricted to this.For example, it is also possible to according to signal The needs of acquisition and including more more voice input single 201 and image input units 202, to ask concise, do not retouch one by one in the accompanying drawings It states.

Optionally, the vehicle-mounted interactive system 100 further includes sensor, and the sensor is used to vehicle-mounted interact system with described System combines, and realizes algorithm fusion.The sensor for example may include inertial sensor, velocity sensor etc., realize signal or The positioning and tracking of information source.

In the present embodiment, by above-mentioned vehicle-mounted interactive system 100, the intelligent sound identification based on machine vision may be implemented And image recognition and the processing of multi-modal information exchange, to realize input the accurately identifying of information, accurate semantic understanding And personalized output, promote the man-machine interaction experience of user.

A vehicle intellectualized important technology is identification, i.e., is obtained and confirmed to user identity, thus real The realization of existing corresponding function uses authorization.For example, vehicle launch authorization, the personalized friendship of interior payment authorization, identity-based Mutual information etc., these are required to confirm the identity of user as premise.Fig. 2 and Fig. 3 respectively illustrates one implementation of the present invention The schematic diagram and flow chart for the personal identification method based on the vehicle-mounted interactive system 100 that example provides, below with reference to Fig. 2 and figure 3 pairs of methods provided in this embodiment are described in detail.

S21, voice-input unit 201 and image input units 202 acquire voice input signal and image input letter respectively Number.

Specifically, the voice-input unit 201 acquires the voice of occupant, described image input unit 202 is acquired The face-image of the occupant.

S22, the processing unit 205 extract the facial characteristics in described image input signal.

Specifically, the processing unit 205 carries out image recognition processing to described image input signal, pass through feature extraction With edge detection position and determine described image input signal in face contour, to the face-image in the face contour into Row optimization and enhancing carry out pattern-recognition according to mask, extract facial information characteristic value.The mask is based on machine Study is established, and increases so as to the number with machine learning and precision is continuously improved.

S23, the processing unit 205 carry out face recognition and matching according to the facial characteristics, determine user identity and Identity characteristic information associated with the user identity.

It is appreciated that searching and mentioning in the facial information that the processing unit 205 stores in the vehicle-mounted interactive system The facial information that the facial information characteristic value taken matches, and according to the mapping relations of the facial information and user identity User identity matching is carried out, so that it is determined that the identity of user, i.e. face recognition and successful match.

Optionally, the processing unit 205 constructs training pattern by machine learning and neural network, carries out facial information Identification and matching, to improve the efficiency and accuracy of face recognition.

Optionally, if the face recognition and it fails to match, the processing unit 205 terminates identification procedure.It is described Processing unit 205 can also re-start face recognition and matching in the face recognition and after it fails to match.

In a possible implementation, the identity characteristic information includes the customized information of user, for example, described Gender, age, personality, hobby of user etc..The identity characteristic information can also include the biological information of the user, For example, iris information, finger print information, pupil information, voiceprint etc..

S24, the processing unit 205 extract the vocal print feature in the voice input signal.

It is appreciated that the processing unit 205 pre-processes the voice signal, known by feature extraction and mode Indescribably take the vocal print feature in the voice signal.The pattern-recognition can be realized by sound-groove model, can pass through depth The technologies such as study, neural network establish sound-groove model, and as the number of machine learning is increasing, the precision of the identification is not yet It is disconnected to improve.

S25, the processing unit 205 compare the vocal print feature and the voiceprint, and by described right Than verifying the user identity.

Optionally, if vocal print feature comparison failure, the processing unit 205 terminate identification procedure.The place Reason unit 205 can also re-start vocal print feature comparison after the vocal print feature compares failure.

In the present embodiment, by based on image face recognition and voice-based Application on Voiceprint Recognition be combined with each other, to user Identity is verified, and the identification precision under high security level scene (such as interior payment scene) is met.

Fig. 4 shows the interior localization method provided by one embodiment of the present invention based on the vehicle-mounted interactive system 100 Schematic diagram, method provided in this embodiment is described in detail below with reference to Fig. 4.

S41, image input units 202 acquire image input signal.

Specifically, described image input unit 202 acquires the face-image of occupant.

S42, the processing unit 205 extract the lip motion of the user in described image input signal.

Wherein, the processing unit 205 identifies the lip motion and face action of the user by image recognition technology, And the interior location, mood and lip reading of the user is identified according to the lip motion and face action.

The processing unit 205 carries out image recognition processing to described image input signal, passes through feature extraction and edge Lip motion in detection positioning and determining described image input signal carries out pattern-recognition according to lip reading model, extracts lip Motion characteristic value.The lip reading model based on machine learning establish, so as to the number with machine learning increases and it is continuous Improve precision.The machine learning used for example can carry out machine learning using neural network model, use the neural network mould Lip reading accuracy of identification not only can be improved in type, also resides in and can be also used for facial expression applied to the interior neural network, with And the identification of driver behavior.The requirement of multinomial face recognition is met simultaneously using neural network model, to improve calculating The utilization rate of unit.This is one of at innovation of the invention.

S43, the processing unit 205 is according to the lip motion and interior location region and described image input unit 202 angular field of view mapping relations determine the band of position of the user in the car.

Optionally, it is demarcated in the hardware device installation of the input unit 202, determines the interior location region Mapping relations between the angular field of view of described image input unit 202.By the identification to user's lip motion, really The acquisition visual angle of the fixed lip motion, to determine the band of position of the user in the car according to the mapping relations.

In a possible implementation, the localization method further include:

S44, voice-input unit 201 acquire voice input signal.

S45, the processing unit 205 carry out auditory localization according to the voice input signal, determine the user in institute State interior position.

Optionally, the voice-input unit 201 is microphone array, realizes that sound source is fixed by the microphone array Position.The processing unit 205 uses reaching time-difference (Time Difference of Arrival, TDOA) algorithm, according to institute It states voice input signal and reaches time difference of each microphone and determine Sounnd source direction.The processing unit 205 can also use wave Beam shaping or the method for high resolution spectrum estimation carry out auditory localization, and which is not described herein again.TDOA algorithm is used herein, and And the creative auditory localization that the algorithm is used for car, because interior space is narrow, occupant is close (i.e. sound source is close), There is the influence of other interference noises again, to accomplish to be accurately positioned in the environment, technical staff is led to by comparing a large amount of model Cross experimental data verify above-mentioned algorithm can the positioning of effective solution environment inside car sound source difficulty the problem of.This is at innovation of the invention One of.

In the present embodiment, by based on image face recognition and voice-based auditory localization be combined with each other, realize hand over Mutual occupant's localization method improves the accuracy and efficiency of positioning.

Fig. 5 and Fig. 6 respectively illustrates the voice provided by one embodiment of the present invention based on the vehicle-mounted interactive system 100 The schematic diagram and flow chart of recognition methods are described in detail method provided in this embodiment below with reference to Fig. 5 and Fig. 6.

S51, voice-input unit 201 and image input units 202 acquire voice input signal and image input letter respectively Number.

Specifically, the voice-input unit 201 acquires the voice signal of occupant, described image input unit 202 Acquire the face-image of the occupant.

S52, the processing unit 205 carry out lip reading identification and Expression Recognition according to described image input signal.

Wherein, the processing unit 205 identifies the lip motion and face action of the user by image recognition technology, The lip reading that user can be identified according to the lip motion can identify the expression of the user according to the face action, and Pass through the current mood of the Expression Recognition user.

S53, the processing unit 205 carry out speech recognition according to the voice input signal.

Optionally, noise reduction process is carried out to the voice input signal.The side such as frequency spectrum screening, noise filter can be passed through Formula noise reduction, can also use intelligent algorithm, carry out model training by machine learning and neural network, obtain noise reduction mould Type, to remove the noise in voice input signal.

The processing unit 205 can also be removed other than sound source by the positioning result of interior localization method shown in Fig. 4 Noise, to realize more accurate noise reduction.

S54, the processing unit 205 identifies the lip reading, the result of the Expression Recognition and the speech recognition into Row can weight synthesis, generate output text.

Optionally, the processing unit 205 can be the result setting of lip reading identification, Expression Recognition and speech recognition Weighted value, to synthesize the recognition result according to the weighted value, the recognition result is indicated and is exported by text. The processing unit 205 can control the display unit 204 and show the text.The lip reading identification, Expression Recognition It is all a basis of characterization with speech recognition each single item, but each single item is not unique or averagely treats, it can be according to reality The accounting of data point reuse each single item is tested, with the accuracy of Statistical error, this is one of innovative point of the invention.

In the present embodiment, lip reading identification and Expression Recognition are assisted on the basis of speech recognition, combination speech may be implemented Identification, improves the precision and personalization of speech recognition, the user experience is improved.

Fig. 7 and Fig. 8 respectively illustrates the feedback provided by one embodiment of the present invention based on the vehicle-mounted interactive system 100 The schematic diagram and flow chart of generation method are described in detail method provided in this embodiment below with reference to Fig. 7 and Fig. 8.

S71, the voice-input unit 201 and described image input unit 202 acquire the voice input signal respectively With described image input signal.

S72, the processing unit 205 determine user identity according to the voice input signal and described image input signal With identity characteristic information associated with the user identity.

Specifically, the processing unit 205 can use personal identification method described in Fig. 2 and determine the user identity With the identity characteristic information.As described in step S23, the identity characteristic information includes the customized information of user, example Such as, the gender of the user, age, personality, hobby etc..

S73, the processing unit 205 carry out Expression Recognition according to described image input signal.

It is appreciated that the processing unit 205 can use audio recognition method described in Fig. 5, according to the user Face action identify the expression of the user, and the mood current by the Expression Recognition user.

S74, the processing unit 205 carry out voice knowledge according to the voice input signal and described image input signal Not.

Optionally, the audio recognition method according to described in Fig. 5 of processing unit 205 inputs the voice of the user Signal is identified.

S75, the processing unit 205 are known according to the identity characteristic information, the Expression Recognition result and the voice Other result carries out semantic understanding.

It is appreciated that the processing unit 205 is according to the identity characteristic information of the user, mood and the knot of speech recognition Fruit, can the accurate meaning and intention for understanding the user.

S76, the processing unit 205 is according to the result, the identity characteristic information and the user of the semantic understanding Position in the car generates feedback result.

Optionally, the processing unit 205 car localization method according to described in Fig. 4 obtains the user in the car Position.

The feedback result can be voice feedback or display feedback, or the knot of voice feedback and display feedback It closes.The feedback result further includes the voice feedback and the display feedback should use towards angle, for example, described anti- Present position of the output of result towards the user in the car.The display unit 204 is used for defeated with text and/or image format The display feedback, the voice output unit 203 are used to export the voice feedback with analog voice or machine sound out.

In the present embodiment, various dimensions are realized according to the identity characteristic information, Expression Recognition and speech recognition result of user Semantic understanding has been obviously improved user experience so as to generate personalized feedback result.

It is above-mentioned that mainly scheme provided in an embodiment of the present invention is carried out from angle interactive between each section in each step It introduces.It is understood that in order to realize the above functions, corresponding portion, which contains, executes each corresponding hardware configuration of function And/or software module.Those skilled in the art should be readily appreciated that, described in conjunction with the examples disclosed in this document each Exemplary part and algorithm steps, the present invention can be realized with the combining form of hardware or hardware and computer software.Some Function is executed in a manner of hardware or computer software driving hardware actually, depending on technical solution specific application and set Count constraint condition.Professional technician can specifically realize described function to each using distinct methods, but It is that such implementation should not be considered as beyond the scope of the present invention.

The step of method described in the embodiment of the present invention or algorithm, can be directly embedded into hardware, processing unit executes The combination of software module or the two.Software module can store deposits in RAM memory, flash memory, ROM memory, EPROM Other any form of storages in reservoir, eeprom memory, register, hard disk, moveable magnetic disc, CD-ROM or this field In medium.Illustratively, storaging medium can be connect with processing unit, so that processing unit can be read from storaging medium Information, and can be to storaging medium stored and written information.Optionally, storaging medium can also be integrated into processing unit.Processing unit It can be configured in ASIC with storaging medium, ASIC can be configured in operating terminal equipment.

Those skilled in the art it will be appreciated that in said one or multiple examples, retouched by the embodiment of the present invention The above-mentioned function of stating can be realized in hardware, software, firmware or any combination of this three.If realized in software, These functions can store on computer-readable medium, or being transmitted in computer in the form of one or more instructions or code forms can It reads on medium.Computer-readable medium include computer storage medium and convenient for so that allow computer program from one place transfer Arrive communication media elsewhere.

Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all any modification, equivalent substitution, improvement and etc. on the basis of technical solution of the present invention, done should all Including within protection scope of the present invention.

Claims

1. a kind of vehicle-mounted terminal equipment characterized by comprising

Voice-input unit, for acquiring voice input signal；

Image input units, for acquiring image input signal；Described image input signal includes facial image signal, expression figure One of picture signal, lip picture signal and pupil image signal are a variety of；

Voice output unit, for generating sound output signal；

Display unit, for showing interactive information；

Processing unit, for controlling the voice-input unit, described image input unit, the voice output unit and described Display unit, and for handling the voice input signal and described image input signal；Wherein processing unit includes engineering Practise model foundation unit；The machine learning model, which establishes unit, to believe facial image signal, facial expression image signal, lip image Number and one of pupil image signal or a variety of establish a machine learning model；

Communication unit, for being connect with cloud service equipment.

2. vehicle-mounted terminal equipment as described in claim 1, which is characterized in that the voice-input unit is also used to remove or drop Low noise.

3. such as vehicle-mounted terminal equipment of any of claims 1-2, which is characterized in that the car-mounted terminal further includes hair Sound unit, the phonation unit is for sending the sound output signal.

4. a kind of vehicle-mounted interactive system characterized by comprising

Cloud service equipment and vehicle-mounted terminal equipment as claimed in any one of claims 1-3.

5. a kind of personal identification method is existed based on such as vehicle-mounted terminal equipment of any of claims 1-4, feature In, which comprises

The voice-input unit and described image input unit acquire the voice input signal and described image input respectively Signal；

The processing unit extracts the facial characteristics in described image input signal；

The processing unit carries out face recognition and matching according to the facial characteristics, determine user identity and with user's body The associated identity characteristic information of part, wherein the identity characteristic information includes voiceprint；

The processing unit extracts the vocal print feature in the voice input signal；

The processing unit compares the vocal print feature and the voiceprint, and by using described in the contrast verification Family identity.

6. a kind of car localization method, is existed based on such as vehicle-mounted terminal equipment of any of claims 1-4, feature In, which comprises

Described image input unit acquires described image input signal；

The processing unit extracts the lip motion of the user in described image input signal；

The processing unit is reflected according to the lip motion and interior location region and the angular field of view of described image input unit Relationship is penetrated, determines the band of position of the user in the car.

7. method as claimed in claim 6, which is characterized in that the method also includes:

The voice-input unit acquires the voice input signal；

The processing unit carries out auditory localization according to the voice input signal, determines the user in the position of the car It sets.

8. a kind of audio recognition method, based on the vehicle-mounted terminal equipment as described in claim 1-4, which is characterized in that the side Method includes:

The processing unit carries out lip reading identification and Expression Recognition according to described image input signal；

The processing unit carries out speech recognition according to the voice input signal；

The processing unit identifies the lip reading, the result of the Expression Recognition and the speech recognition is weighted synthesis, Generate output text.

9. a kind of feedback generation method, based on the vehicle-mounted terminal equipment as described in claim 1-4, which is characterized in that the side Method includes:

The processing unit according to the voice input signal and described image input signal determine user identity and with the use The associated identity characteristic information of family identity；

The processing unit carries out Expression Recognition according to described image input signal；

The processing unit carries out speech recognition according to the voice input signal and described image input signal；

The processing unit carries out language according to the identity characteristic information, the Expression Recognition result and institute's speech recognition result Reason and good sense solution；

Result of the processing unit according to the semantic understanding, the position of the identity characteristic information and the user in the car Generate feedback result.