EP3274809A1 - Procédé de commande, dispositif de commande, système et véhicule automobile comprenant un tel dispositif de commande - Google Patents

Procédé de commande, dispositif de commande, système et véhicule automobile comprenant un tel dispositif de commande

Info

Publication number
EP3274809A1
EP3274809A1 EP16722267.8A EP16722267A EP3274809A1 EP 3274809 A1 EP3274809 A1 EP 3274809A1 EP 16722267 A EP16722267 A EP 16722267A EP 3274809 A1 EP3274809 A1 EP 3274809A1
Authority
EP
European Patent Office
Prior art keywords
user
analysis
image
analysis module
acquired
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP16722267.8A
Other languages
German (de)
English (en)
French (fr)
Inventor
Achraf BEN-HAMADOU
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Valeo Comfort and Driving Assistance SAS
Original Assignee
Valeo Comfort and Driving Assistance SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Valeo Comfort and Driving Assistance SAS filed Critical Valeo Comfort and Driving Assistance SAS
Publication of EP3274809A1 publication Critical patent/EP3274809A1/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60KARRANGEMENT OR MOUNTING OF PROPULSION UNITS OR OF TRANSMISSIONS IN VEHICLES; ARRANGEMENT OR MOUNTING OF PLURAL DIVERSE PRIME-MOVERS IN VEHICLES; AUXILIARY DRIVES FOR VEHICLES; INSTRUMENTATION OR DASHBOARDS FOR VEHICLES; ARRANGEMENTS IN CONNECTION WITH COOLING, AIR INTAKE, GAS EXHAUST OR FUEL SUPPLY OF PROPULSION UNITS IN VEHICLES
    • B60K35/00Instruments specially adapted for vehicles; Arrangement of instruments in or on vehicles
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60KARRANGEMENT OR MOUNTING OF PROPULSION UNITS OR OF TRANSMISSIONS IN VEHICLES; ARRANGEMENT OR MOUNTING OF PLURAL DIVERSE PRIME-MOVERS IN VEHICLES; AUXILIARY DRIVES FOR VEHICLES; INSTRUMENTATION OR DASHBOARDS FOR VEHICLES; ARRANGEMENTS IN CONNECTION WITH COOLING, AIR INTAKE, GAS EXHAUST OR FUEL SUPPLY OF PROPULSION UNITS IN VEHICLES
    • B60K35/00Instruments specially adapted for vehicles; Arrangement of instruments in or on vehicles
    • B60K35/10Input arrangements, i.e. from user to vehicle, associated with vehicle functions or specially adapted therefor
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60KARRANGEMENT OR MOUNTING OF PROPULSION UNITS OR OF TRANSMISSIONS IN VEHICLES; ARRANGEMENT OR MOUNTING OF PLURAL DIVERSE PRIME-MOVERS IN VEHICLES; AUXILIARY DRIVES FOR VEHICLES; INSTRUMENTATION OR DASHBOARDS FOR VEHICLES; ARRANGEMENTS IN CONNECTION WITH COOLING, AIR INTAKE, GAS EXHAUST OR FUEL SUPPLY OF PROPULSION UNITS IN VEHICLES
    • B60K35/00Instruments specially adapted for vehicles; Arrangement of instruments in or on vehicles
    • B60K35/20Output arrangements, i.e. from vehicle to user, associated with vehicle functions or specially adapted therefor
    • B60K35/21Output arrangements, i.e. from vehicle to user, associated with vehicle functions or specially adapted therefor using visual output, e.g. blinking lights or matrix displays
    • B60K35/211Output arrangements, i.e. from vehicle to user, associated with vehicle functions or specially adapted therefor using visual output, e.g. blinking lights or matrix displays producing three-dimensional [3D] effects, e.g. stereoscopic images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60KARRANGEMENT OR MOUNTING OF PROPULSION UNITS OR OF TRANSMISSIONS IN VEHICLES; ARRANGEMENT OR MOUNTING OF PLURAL DIVERSE PRIME-MOVERS IN VEHICLES; AUXILIARY DRIVES FOR VEHICLES; INSTRUMENTATION OR DASHBOARDS FOR VEHICLES; ARRANGEMENTS IN CONNECTION WITH COOLING, AIR INTAKE, GAS EXHAUST OR FUEL SUPPLY OF PROPULSION UNITS IN VEHICLES
    • B60K2360/00Indexing scheme associated with groups B60K35/00 or B60K37/00 relating to details of instruments or dashboards
    • B60K2360/148Instrument input by voice
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60KARRANGEMENT OR MOUNTING OF PROPULSION UNITS OR OF TRANSMISSIONS IN VEHICLES; ARRANGEMENT OR MOUNTING OF PLURAL DIVERSE PRIME-MOVERS IN VEHICLES; AUXILIARY DRIVES FOR VEHICLES; INSTRUMENTATION OR DASHBOARDS FOR VEHICLES; ARRANGEMENTS IN CONNECTION WITH COOLING, AIR INTAKE, GAS EXHAUST OR FUEL SUPPLY OF PROPULSION UNITS IN VEHICLES
    • B60K2360/00Indexing scheme associated with groups B60K35/00 or B60K37/00 relating to details of instruments or dashboards
    • B60K2360/149Instrument input by detecting viewing direction not otherwise provided for
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/038Indexing scheme relating to G06F3/038
    • G06F2203/0381Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics

Definitions

  • Control method, control device, system and motor vehicle comprising such a control device
  • the present invention generally relates to the field of human-machine interfaces allowing a user to interact without contact with electrical or electronic devices.
  • It relates more particularly to a control method, a control device, a system and a motor vehicle comprising such a control device.
  • auxiliary devices such as a radio, a GPS location system, or a mobile phone, that the driver of the vehicle can be led to control while driving the vehicle. motor vehicle.
  • More and more auxiliary electronic devices equip motor vehicles to make them easier to use or to make more enjoyable, such as devices for locating and assisting with GPS navigation, air conditioning devices in the passenger compartment of the vehicle. vehicle, opening and closing electric windows, viewing devices and obstacle detection, or radio.
  • Such devices are generally controlled by the driver of the vehicle through buttons or a touch screen. For this, he uses at least one of his hands, which distracts him from driving the vehicle itself and makes only one of his hands is then available to maneuver.
  • a non-contact control device allowing the user to control an auxiliary device without having to use one of his hands for this is therefore particularly advantageous.
  • a non-contact control device may for example be based on an automatic speech recognition system by which oral instructions given by the driver to control an auxiliary device are identified.
  • Document US 8442820 discloses an interface for contactless control of an auxiliary device present in a motor vehicle.
  • An oral instruction given by a user to order such device is identified by the interface by an automatic speech recognition method.
  • Such oral instruction can also be determined by reading on the lips of the user, that is to say by analyzing images of his lips acquired when he pronounces such a set.
  • the interface then combines the two setpoints identified by each of these two methods, giving priority to the setpoint identified by speech recognition, to produce an instruction for controlling such an auxiliary device.
  • the invention proposes a method for controlling a device by a user, comprising the following steps:
  • identification by analysis of at least one image comprising at least part of the user's face and acquired by an image sensor, of a device located in a user's viewing direction,
  • the command associated with the determined oral instruction is transmitted to the device for which it is intended.
  • This arrangement is particularly advantageous when several devices to be controlled are present in the environment of the user.
  • oral instructions such as “start-up” or “extinguishing” may be common several of these devices, and it is then crucial to specifically transmit the determined oral instruction to the device for which it is intended.
  • the fact of combining information drawn from the user's viewing direction with an orally given instruction makes it possible to control a device more reliably and more precisely, since several control information of different origins but aimed at the same purpose are thus combined.
  • the invention also proposes that said oral instruction is determined by analyzing data acquired after the identification of the device and acquired for a predetermined duration.
  • the search for an oral instruction given by the user begins and after the user has looked at a device to be controlled, and only extends for a predetermined period. Selecting such a time interval advantageously the identification errors of an oral command, which could otherwise occur more frequently, for example during conversations between the driver and a passenger of the motor vehicle.
  • said spoken instruction is determined by the image analysis comprising at least a portion of the user's face acquired by the image sensor; thanks to such a method, a user who is silent, or who has difficulties in formulating words, can control such a device without having to use one of his hands for this purpose;
  • said oral instruction is determined by analysis of an acoustic signal
  • said oral instruction is determined by comparison of acquired data with selected reference data as a function of the previously identified device.
  • the reference data making it possible to identify the oral set point can thus be selected to correspond to oral instructions specifically associated with the device to be controlled, which device has been identified beforehand. This arrangement makes the identification of the oral instruction more reliable and efficient, because it is thus optimally adapted to each of the devices that can be controlled by the user.
  • the invention also proposes a control device comprising an image sensor and an analysis module designed: to identify a device located in a viewing direction of the user by analyzing at least one image comprising at least a portion of the face of a user and acquired by the image sensor,
  • a control device in which the image sensor is a video camera sensitive to visible domain radiation and infrared radiation is also provided.
  • the analysis module is designed to identify a device located in a viewing direction of the user by conjoint analysis of at least one image comprising at least a portion of the face of a user acquired by the sensor image in the infrared range, and at least one image comprising at least a portion of the face of a user simultaneously acquired by the image sensor in the visible range.
  • the combined use of an image acquired in the visible range and an image acquired in the infrared range facilitates the identification of a device located in a viewing direction of the user.
  • the analysis module is designed to repeat said analysis of at least one image comprising at least a part of a user's face (100) as long as a device located in the user's viewing direction (100) has not been identified;
  • the analysis module is designed to determine a position and an orientation of the user's eyes by said analysis of at least one image comprising at least part of the user's face, and is furthermore designed to identify a device located in a viewing direction of the user according to said position and said orientation;
  • the analysis module is designed to acquire data after the identification of said device, for a predetermined duration, and to determine said oral instruction by analysis of said acquired data;
  • the analysis module is designed to determine said spoken instruction by image analysis comprising at least a portion of the user's face acquired by the image sensor;
  • the analysis module is designed to determine said spoken instruction by analyzing an acoustic signal
  • the analysis module is designed to determine said spoken instruction by comparing acquired data with selected reference data according to the previously identified device;
  • the analysis module is designed to determine said spoken instruction by combining the result of the analysis of an acoustic signal with the result of the image analysis comprising at least part of the user's face acquired by the user. image sensor.
  • a system comprising an image sensor, an analysis module, a first device and a second device, in which the analysis module is designed:
  • the invention also proposes a motor vehicle equipped with a control device as presented above.
  • FIG. 1 shows schematically a device in which a method according to the invention can be implemented, this device being in this example embedded in a motor vehicle.
  • FIG. 2 schematically represents the main steps of such a method
  • - Figure 3 schematically shows one of these steps in more detail.
  • FIG. 1 diagrammatically shows a motor vehicle 130 and a user 100, here the driver of the motor vehicle 130.
  • the latter is required to control one or more auxiliary devices, for example:
  • a first device 121 such as a location device 121 using GPS signals (acronym for "Global Positioning System"), and
  • a second device 122 such as a radio set.
  • auxiliary devices may for example comprise a control device of the headlights of the motor vehicle 130 or a device controlling the opening or closing of windows of the motor vehicle 130.
  • An image sensor 111 such as a video camera, is placed facing the face of the user 100, as shown in FIG. 1.
  • the image sensor 111 is designed and placed so that the face of the user is fully included in the field of view of the image sensor 111. It may be advantageous here to use a so-called multi-modal image sensor 111, that is to say, sensitive both to radiation from the visible range that, for example, to radiations of the infra-red domain. Indeed, as described hereinafter, one of the steps of the method proposed here is to locate the position of the user's face within the images acquired.
  • the information provided by additional images taken in the infrared range facilitates this identification, in particular because the body of the user (especially at the level of his face) has a temperature higher than his environment, and thus detaches in an image taken in the infrared domain. Thanks to the images acquired in the infrared range, the user can also control the devices 121 and 122 according to the method described below, even at night or in contexts in which the brightness is too low for the images acquired in the field visible are exploitable.
  • the images acquired by the image sensor 111 are transmitted, in the form of a video stream v, to an analysis module 113.
  • An acoustic signal sensor 112 for example a microphone, is placed near the image sensor 111, facing the user 100. It delivers an electrical signal s representative of the acoustic signals it measures.
  • An analog-to-digital converter 116 converts the electrical signal s into digital form. The audio stream "a" thus produced is transmitted to the analysis module 113.
  • the analysis module 113 comprises a processor 114, such as a microprocessor, and a storage module 115, such as a rewritable non-volatile memory.
  • the analysis module 113 processes the audio stream "a" and the video stream v to identify in these streams, according to a method detailed below in the description of FIGS. 2 and 3, an oral instruction intended to control one of previously mentioned devices. If such a setpoint is identified, it is transmitted to one of the devices 121 or 122, as explained below.
  • a method according to the invention can be implemented in a control device 110 comprising the analysis module 113, the image sensor 111, the acoustic signal sensor 112 and the analog-digital converter 116 associated therewith.
  • the analysis module 113 analyzes the corresponding video stream v, during the steps ED1 to ED4, to detect whether the eyes of the user are directed to one of the devices 121 or 122.
  • the user's face is first identified in an IM image extracted from this video stream v, during step ED1.
  • the analysis of this image makes it possible to determine the position and orientation of the user's face, which is identified by a set of geometric parameters pv.
  • These parameters pv can for example correspond to coordinates locating in the image IM the position of characteristic points of said face.
  • characteristic points may for example correspond to the ends of the chin or eyebrows, or to the nostrils of the user 100. They may also correspond to the vertices of a polyhedron in which the user's face is inscribed.
  • the pv parameters make it possible to locate the face of the user in a three-dimensional way.
  • the three-dimensional character of the user's face identification can be obtained by optimizing the adequacy of the two-dimensional image.
  • This three-dimensional model may for example comprise data representative of the geometry of rigid parts of the user's face, such as coordinates of points identifying the eyebrows, nose, or temples.
  • the processor 114 can for example calculate by projection, for a given orientation and at a given position of the three-dimensional model, the positions of such characteristic points in the two-dimensional image IM.
  • the position and the orientation of the user's face are then determined by seeking the position and the orientation of the three-dimensional model which optimizes the match between the positions of the characteristic points resulting from the three-dimensional model and the positions of the same characteristic points identified in FIG. the IM image.
  • step ED2 data o representative of the position and orientation of the eyes of the user are obtained by analyzing the IM image.
  • the data pv making it possible to locate the face of the user can be used during step ED2 to extract from the IM image the zone corresponding to the eyes of the user.
  • An image of reduced size occupied mainly by the image of the eyes of the user is thus obtained, and then analyzed to determine the position and orientation of each of the eyeballs of the user.
  • the orientation of an eyeball of the user with respect to his face can for example be deduced from the position, in the image IM, of the center of the pupil of this globe with respect to the ends of the eyelid of the eye. user.
  • Analyzing images whose size has been reduced by selecting mainly the area occupied by the eyes of the user decreases the calculation time required for this treatment.
  • the reliability of the latter is also improved, because when analyzing such an image, mainly occupied by the images of the eyes of the user, the analysis module can not confuse, for example, the center of the pupils of the eyes of the user with the center of his nostrils.
  • the analysis module 113 determines whether the user's eyes are turned towards one of the devices 121 or 122. For this, the data pv to identify the face of the user can be combined with ori data representative of the position and orientation of his eyes, to determine which points of space they are turned.
  • the zone occupied in space by each of said devices is also known (the corresponding pod data are stored in the storage module 115), which finally makes it possible to determine whether the user's eyes are turned in the direction of the one or other of the devices 121 or 122.
  • the image analysis technique implemented is static, in that an IM image of the video stream v is analyzed independently of the other images of this stream.
  • Another possibility, to determine whether the user's eyes are turned to one of the devices 121 or 122, is to perform a dynamic analysis of a sequence of several images.
  • changes occurring from one image to another can detect movements of the face or movements of the user's eyes associated with a change of direction of his gaze.
  • Such movements can for example be detected by following the evolution, in such a sequence of images, of the position of characteristic points of the face such as the end of the chin, the ends of the eyebrows, or the centers of the pupils.
  • the analysis module 113 can then analyze such changes in the user's viewing direction, associated with said movements, to determine whether he is looking at one of the devices 121 or 122. In the situation shown in FIG. the analysis module 113 may for example determine that the driver turned his eyes towards the device 122 when a marked movement of his eyes to his right is detected.
  • the result of the image analysis implemented during steps ED1 to ED3 is summarized by the value assigned to a variable rd.
  • This value indicates whether one of the devices 121 or 122 has been targeted by the user's gaze, and, if so, indicates which device was targeted (and therefore selected by the user as explained below).
  • the value 0 will be assigned to the variable rd if the user's eyes are not turned to any of the auxiliary devices, the value 1 will be assigned to him if the eyes of the user are turned to the device 121 for localization by GPS, and the value 2 will be awarded if the eyes of the user are turned to the radio 122.
  • the user's gaze tracking process thus continues as long as none of the devices 121 or 122 has been selected (by user's gaze orientation 100 in the direction of a device 121, 122).
  • step EIC determines, during step ED4 carried out following step ED3, that the user has looked at one of these devices.
  • the audio stream "a” is acquired and processed by the analysis module 113, in order to detect sound sequences corresponding to oral instructions intended to control the device identified at the end of the step ED3 (and designated by the value of variable rd), and
  • the video stream v is simultaneously acquired and processed by the analysis module 113 to detect, in the image stream of the lips of the user 100, sequences corresponding to such instructions.
  • the audio and video streams are acquired for a limited (predetermined) duration. This duration can reach a few seconds; it is for example less than 10 s.
  • the analysis of these flows can be carried out continuously, as and when they are acquired.
  • the analysis module 113 finally combines the result of the two analyzes, audio and video, mentioned above, to determine if the user 100 has given a set of instructions for controlling the device identified at the end of the step ED3, and what instruction it is.
  • This instruction CO is converted during step EC into a command cmd adapted to control the device identified during step ED3.
  • the command cmd is then transmitted to the latter by the analysis module 113.
  • the device can then execute this command cmd. For example, if the device identified in step ED3 is the radio 122, and if the determined setpoint c 0 is the "start" setpoint, the radio 122 may light up at the same time. receiving the cmd command if it was previously off or in the standby state.
  • the cmd command received by the device can be processed and transformed by the device before executing it, for example to merge it with commands received directly via, for example, a touch screen fitted to said device.
  • the search for an oral instruction in the audio and video streams begins only if the user has looked at one of the devices 121 or 122 and it is limited in time. Selecting such a time interval advantageously the identification errors of an oral command, which could otherwise occur more frequently, for example during conversations between the driver 100 and a passenger of the motor vehicle 130.
  • step EA1 The analysis of the audio stream "a” carried out during the step EIC begins with a step EA1 which makes it possible to test whether this stream is exploitable or not. When the sound level of this stream is very low, for example, or when it contains mainly parasites, it is not useful to exploit it.
  • the processing of audio stream “a” then stops at this stage, and a variable pca indicates that no setpoint has been identified in it.
  • step EA2 if the characteristics of the audio stream "a” indicate that it is exploitable, its processing continues with a filtering step (step EA2) which makes it possible to reduce the contribution of parasites in the stream "a".
  • This filtering can for example be performed by a Wiener filter, or consist of a linear filtering favoring a determined frequency range.
  • the audio stream af resulting from this filtering is processed during step EA3, to produce data representative of its content, hereinafter referred to as descriptors.
  • the analysis module can produce such a descriptor da by identifying phonemes in the af audio stream, that is to say by detecting the presence of a phoneme, by determining which phoneme it is, by identifying its position in the audio stream af and assigning a loudness.
  • the set of descriptors da thus extracted is then compared to reference audio descriptors dar, during step EIA.
  • the dar reference audio descriptors may include a list of phoneme sequences, each sequence of phonemes (for example the sequence / status /) corresponding to an oral instruction intended to control one of the devices 121 or 122 (this setpoint being here the instruction "start").
  • the comparison of the descriptors da and dar makes it possible to identify, in the audio stream "a", such an oral instruction.
  • the result of the EIA step is summarized by ca data which indicates whether an oral instruction has been identified in the audio stream "a", which indicates in the affirmative which instruction it is, and which may also indicate a level of reliability associated with the identification of this instruction.
  • a set of descriptors dr accessed by the processor 114, is stored in the storage module 115. This set contains all the audio descriptors and video descriptors corresponding to oral instructions for controlling the various auxiliary devices present in the device. vehicle, for example those mentioned above.
  • the processor selects from these descriptors dr audio descriptors dar and video descriptors dvr which both correspond to instructions associated specifically with the device identified in the previous step ED3.
  • Comparing the audio descriptors da extracts of the audio stream "a" with reference descriptors dar specifically associated with the previously identified device makes here the identification of a reliable and effective oral setpoint, because it is thus optimally adapted to each of the devices can be controlled by the user.
  • the analysis module 113 processes the video stream v to identify oral instructions given by the user, from the images of his lips, as is explained below.
  • This processing begins with a step EL1 in which three-dimensional representations ib describing the geometry of the lips of the user are obtained from the video stream v.
  • the analysis module first proceeds to a three-dimensional identification of the face of the user, through an analysis of the images of the video stream v similar to that made during step ED1. This identification is carried out for all the images that comprise the video stream acquired during the EIC stage.
  • the three-dimensional identification of the user's face in the field of view of the image sensor 111 makes it possible in particular to identify the position of the lips of the user and thus to then associate with each image a rectangular parallelepiped positioned three-dimensionally around them.
  • the position of each vertex of this rectangular parallelepiped in said (two-dimensional) image is then determined, by projection, which makes it possible to delimit an area of said reduced image (vignette), occupied mainly by an image of the lips of the user. .
  • the reduced size images thus extracted are then analyzed to produce a set ib of three-dimensional representations describing the geometry of the lips of the user for any duration during which the video stream has been acquired.
  • Such a three-dimensional representation can be obtained by a process similar to that used to obtain a three-dimensional representation of the face of the user, which is described above.
  • Using images of reduced size reduces the computation time required for this treatment.
  • the reliability of the latter is also improved, because, when analyzing such an image, occupied mainly by the images of the lips of the user, the analysis module can not confuse a characteristic point of the lips of the user with, for example, a point of the temple of the latter.
  • step EL2 descriptors representative of the shape and movement of the lips of the user are extracted from the three-dimensional representations ib produced in step EL1.
  • descriptors are extracted so as to be representative of geometries usually taken by lips during the pronunciation of phonemes, or to be representative of temporal evolutions of geometry corresponding to the pronunciation of phoneme sequences.
  • the process of extracting such descriptors is moreover designed to effectively discriminate two phonemes or two different phoneme sequences.
  • the descriptors thus extracted are gathered in the form of a vector of descriptors vdv.
  • step EL3 the data contained therein is concatenated and put into a standardized form to produce a standard video descriptor dvn whose format is adapted to a comparison to reference video descriptors dvr.
  • the standardized video descriptor dvn is compared with reference video descriptors dvr, during the step EIV to finally identify, in the video stream v, a spoken instruction given by the user.
  • the reference video descriptors dvr are previously selected from descriptors dr, and correspond to instructions specifically associated with the device previously identified during step ED3.
  • comparing the video descriptor dvn extracted from the stream v to reference descriptors dvr, specifically associated with the device previously identified, here makes the identification of an oral command reliable and efficient, because it is thus optimally adapted to each of the devices that can be controlled by the user.
  • the result of the EIV step is summarized by cv data which indicates whether an oral setpoint has been identified in the video stream v, which indicates in the affirmative which instruction it is, and which can also indicate a level of reliability associated with the identification of this instruction.
  • the analysis module 113 combines the result of the two analyzes, audio and video, described above, in order finally to determine whether the user 100 has given an oral instruction intended to control the device identified in FIG. the result of step ED3, and which instruction it is.
  • this oral instruction is used as an oral instruction to be used to control the device identified at the end of the step ED3, as explained below.
  • the setpoint finally selected can be chosen according to the level of reliability associated with each of these identifications.
  • step EF When a setpoint CO is finally determined at the end of step EF, it is then transmitted to the device identified at the end of step ED3.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Mechanical Engineering (AREA)
  • Chemical & Material Sciences (AREA)
  • Transportation (AREA)
  • Combustion & Propulsion (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)
  • Electric Propulsion And Braking For Vehicles (AREA)
EP16722267.8A 2015-03-27 2016-03-25 Procédé de commande, dispositif de commande, système et véhicule automobile comprenant un tel dispositif de commande Withdrawn EP3274809A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR1552618A FR3034215B1 (fr) 2015-03-27 2015-03-27 Procede de commande, dispositif de commande, systeme et vehicule automobile comprenant un tel dispositif de commande
PCT/FR2016/000059 WO2016156678A1 (fr) 2015-03-27 2016-03-25 Procédé de commande, dispositif de commande, système et véhicule automobile comprenant un tel dispositif de commande

Publications (1)

Publication Number Publication Date
EP3274809A1 true EP3274809A1 (fr) 2018-01-31

Family

ID=53496762

Family Applications (1)

Application Number Title Priority Date Filing Date
EP16722267.8A Withdrawn EP3274809A1 (fr) 2015-03-27 2016-03-25 Procédé de commande, dispositif de commande, système et véhicule automobile comprenant un tel dispositif de commande

Country Status (5)

Country Link
US (1) US10627898B2 (zh)
EP (1) EP3274809A1 (zh)
CN (1) CN107548483B (zh)
FR (1) FR3034215B1 (zh)
WO (1) WO2016156678A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019118089A1 (en) 2017-12-11 2019-06-20 Analog Devices, Inc. Multi-modal far field user interfaces and vision-assisted audio processing
EP3729421A1 (en) 2017-12-22 2020-10-28 Telefonaktiebolaget LM Ericsson (publ) Gaze-initiated voice control
CN108389573B (zh) * 2018-02-09 2022-03-08 北京世纪好未来教育科技有限公司 语种识别方法及装置、训练方法及装置、介质、终端

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6795806B1 (en) * 2000-09-20 2004-09-21 International Business Machines Corporation Method for enhancing dictation and command discrimination
CN1174337C (zh) * 2002-10-17 2004-11-03 南开大学 识别人眼注视与否的方法和装置及其应用
US7435227B2 (en) * 2004-09-13 2008-10-14 Biocognisafe (Bcs) Technologies Method and apparatus for generating an indication of a level of vigilance of an individual
JP4797588B2 (ja) * 2005-11-17 2011-10-19 アイシン精機株式会社 車両周辺表示装置
CN101090482B (zh) * 2006-06-13 2010-09-08 唐琎 一种基于图象处理和信息融合技术的驾驶员疲劳监测系统及方法
JP2008058409A (ja) * 2006-08-29 2008-03-13 Aisin Aw Co Ltd 音声認識方法及び音声認識装置
CN101441513B (zh) * 2008-11-26 2010-08-11 北京科技大学 一种利用视觉进行非接触式人机交互的系统
TWI384383B (zh) * 2009-03-02 2013-02-01 Univ Nat Central 視線察覺裝置及其方法
KR101092820B1 (ko) 2009-09-22 2011-12-12 현대자동차주식회사 립리딩과 음성 인식 통합 멀티모달 인터페이스 시스템
US8762846B2 (en) * 2009-11-16 2014-06-24 Broadcom Corporation Method and system for adaptive viewport for a mobile device based on viewing angle
CN101995731B (zh) * 2010-07-30 2013-03-13 西安理工大学 一种基于液晶的直视型防眩光成像仪及防眩光成像方法
KR20120057033A (ko) * 2010-11-26 2012-06-05 한국전자통신연구원 Iptv 제어를 위한 원거리 시선 추적 장치 및 방법
US20120169582A1 (en) * 2011-01-05 2012-07-05 Visteon Global Technologies System ready switch for eye tracking human machine interaction control system
US20120259638A1 (en) * 2011-04-08 2012-10-11 Sony Computer Entertainment Inc. Apparatus and method for determining relevance of input speech
US9223415B1 (en) * 2012-01-17 2015-12-29 Amazon Technologies, Inc. Managing resource usage for task performance
JP6001758B2 (ja) * 2012-04-27 2016-10-05 ヒューレット−パッカード デベロップメント カンパニー エル.ピー.Hewlett‐Packard Development Company, L.P. ユーザからのオーディオ入力
US9405982B2 (en) * 2013-01-18 2016-08-02 GM Global Technology Operations LLC Driver gaze detection system
KR20160005013A (ko) * 2013-03-01 2016-01-13 토비 에이비 지연 워프 시선 상호작용
US20140350942A1 (en) * 2013-05-23 2014-11-27 Delphi Technologies, Inc. Vehicle human machine interface with gaze direction and voice recognition
CN104253944B (zh) * 2014-09-11 2018-05-01 陈飞 基于目光连接的声音命令下达装置和方法
US10048749B2 (en) * 2015-01-09 2018-08-14 Microsoft Technology Licensing, Llc Gaze detection offset for gaze tracking models

Also Published As

Publication number Publication date
CN107548483A (zh) 2018-01-05
CN107548483B (zh) 2021-06-08
FR3034215B1 (fr) 2018-06-15
US20180203506A1 (en) 2018-07-19
US10627898B2 (en) 2020-04-21
FR3034215A1 (fr) 2016-09-30
WO2016156678A1 (fr) 2016-10-06

Similar Documents

Publication Publication Date Title
CN107209849B (zh) 眼睛跟踪
EP3485351B1 (en) Command processing using multimodal signal analysis
EP2483758B1 (fr) Systeme et procede de reconnaissance de gestes
US20210133468A1 (en) Action Recognition Method, Electronic Device, and Storage Medium
JP5989768B2 (ja) ビデオにおける改良型の顔面認識
US20170277257A1 (en) Gaze-based sound selection
JP2019533209A (ja) 運転者監視のためのシステム及び方法
KR20190129826A (ko) 생체 검측 방법 및 장치, 시스템, 전자 기기, 저장 매체
WO2017093439A1 (fr) Dispositif de commande vocale d'un appareil de capture d'images
KR102304701B1 (ko) 사용자의 음성 입력에 대한 답변을 제공하는 방법 및 장치
FR3038770A1 (fr) Systeme de surveillance de l'etat de vigilance d'un operateur
WO2016156678A1 (fr) Procédé de commande, dispositif de commande, système et véhicule automobile comprenant un tel dispositif de commande
US10679376B2 (en) Determining a pose of a handheld object
EP3901794A1 (fr) Procédé d'authentification ou d'identification d'un individu
EP3506241A1 (fr) Dispositif électronique d'interface entre au moins un système avionique et un ensemble de capteurs, installation avionique, procédé de communication et programme d'ordinateur associés
EP3866064A1 (fr) Procede d'authentification ou d'identification d'un individu
EP2881841B1 (fr) Procédé de reconnaissance continue de gestes d'un utilisateur d'un terminal mobile préhensible muni d'un ensemble capteur de mouvement, et dispositif associé
FR3029655A1 (fr) Dispositif d'entree notamment d'un vehicule automobile pour la saisie sans contact de la position et/ou d'un changement de position d'au moins un doigt d'une main de l'utilisateur
EP3070643B1 (fr) Procédé et système de reconnaissance d'objets par analyse de signaux d'image numérique d'une scène
EP3218219B1 (fr) Interface homme/machine et procédé de contrôle de fonctions d'un véhicule par détection de mouvement et/ou d'expression du conducteur
US20240046946A1 (en) Speech denoising networks using speech and noise modeling
EP3274913B1 (fr) Dispositif et procede pour l'acquisition biometrique de l'iris
EP3491488A1 (fr) Procédé et système de commande de l'affichage d'informations et terminal d'utilisateur mettant en oeuvre ce procédé
CN116343272A (zh) 指纹识别方法和终端设备
Martins A human-machine interface using augmented reality glasses for applications in assistive robotics.

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20171024

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20180518