CN117227740B

CN117227740B - Multi-mode sensing system and method for intelligent driving vehicle

Info

Publication number: CN117227740B
Application number: CN202311185649.0A
Authority: CN
Inventors: 武丹丹; 章广忠; 杨煜; 徐建杭
Original assignee: Nanjing Xiangshang Internet Of Vehicles Technology Co ltd
Current assignee: Nanjing Xiangshang Internet Of Vehicles Technology Co ltd
Priority date: 2023-09-14
Filing date: 2023-09-14
Publication date: 2024-03-19
Anticipated expiration: 2043-09-14
Also published as: CN117227740A

Abstract

The invention belongs to the technical field of multi-mode sensing, and discloses a multi-mode sensing system and a multi-mode sensing method for an intelligent driving vehicle, wherein the multi-mode sensing system comprises the following steps: collecting eye movement data and head posture data, and correlating the head posture data with the eye movement data to generate eye movement comprehensive data; collecting riding member action training data; generating a gaze adjustment command and a fatigue warning command based on the eye movement integrated data analysis; training a machine learning model for identifying dangerous action probability in real time based on riding member action training data; an action warning instruction is generated based on the machine learning model output result and processed based on the gaze adjustment instruction and the fatigue warning instruction.

Description

Multi-mode sensing system and method for intelligent driving vehicle

Technical Field

The invention relates to the technical field of multi-mode sensing, in particular to a multi-mode sensing system and method for intelligently driving a vehicle.

Background

A modality refers to a manner of expressing or perceiving things, and each source or form of information may be referred to as a modality. Multimodal data relates to information received by different sensors such as visual, auditory, tactile and olfactory.

The Chinese patent of the grant publication No. CN109910900B discloses an intelligent driving method, which is more suitable to intelligently judge which driving mode is adopted according to the comprehensive fatigue condition of the human body of a driver and the road condition environment where the vehicle is located, thereby greatly improving the driving safety of the vehicle and relieving the driving pressure of the driver.

However, for the complex and tedious monitoring process of the comprehensive fatigue condition of the human body, the aim of quickly getting on and starting the vehicle cannot be achieved by wearing various sensors for too long on the body, under the condition of emergency, the process is too slow before the vehicle is started, the driving experience is affected, the attention of the sight of the driver cannot be considered, the dangerous driving state of the driver cannot be evaluated in various aspects, and the influence of other personnel behaviors on the vehicle on the driver is not considered.

In view of the above, the present invention provides a multi-modal sensing system and method for intelligently driving a vehicle to solve the above-mentioned problems.

Disclosure of Invention

In order to overcome the above-mentioned drawbacks of the prior art, embodiments of the present invention provide a multi-modal sensing system and method for intelligently driving a vehicle.

In order to achieve the above purpose, the present invention provides the following technical solutions: a multi-modal awareness method of an intelligently driven vehicle, comprising: collecting eye movement data and head posture data, and correlating the head posture data with the eye movement data to generate eye movement comprehensive data;

collecting riding member action training data;

generating a gaze adjustment command and a fatigue warning command based on the eye movement integrated data analysis;

training a machine learning model for identifying dangerous action probability in real time based on riding member action training data;

an action warning instruction is generated based on the machine learning model output result and processed based on the gaze adjustment instruction and the fatigue warning instruction.

A multi-modal awareness system for intelligently driving a vehicle, comprising:

the first data acquisition module acquires eye movement data and head posture data and associates the head posture data with the eye movement data;

the second data acquisition module acquires riding member action training data;

the data analysis module is used for generating a sight line adjusting instruction and a fatigue warning instruction based on the eye movement comprehensive data analysis;

the model training module is used for training a machine learning model for identifying dangerous action probability in real time based on the train member action training data;

and the control module generates an action warning instruction based on the output result of the machine learning model and processes the action warning instruction based on the sight line adjusting instruction and the fatigue warning instruction.

An electronic device, comprising: a processor and a memory, wherein the memory stores a computer program for the processor to call; the processor executes the multi-modal awareness method of intelligently driving a vehicle by invoking a computer program stored in the memory.

A computer readable storage medium storing instructions that, when executed on a computer, cause the computer to perform a multi-modal awareness method of intelligently driving a vehicle as described above.

The multi-mode sensing system for intelligent driving of the vehicle has the technical effects and advantages that:

collecting eye movement data and head posture data, and correlating the head posture data with the eye movement data to generate eye movement comprehensive data; collecting riding member action training data; generating a gaze adjustment command and a fatigue warning command based on the eye movement integrated data analysis; training a machine learning model for identifying dangerous action probability in real time based on riding member action training data; generating action warning instructions based on the machine learning model output results and processing based on the gaze adjustment instructions and the fatigue warning instructions; the method has the advantages that the process of no additional wearing sensor is realized, the time for starting the vehicle is saved, the dangerous actions of passengers are predicted and early-warned by using the multi-mode multi-sensor, the sight state and the fatigue state of the driver are evaluated and analyzed in real time, and the driving inattention, the driving fatigue and the traffic accident occurrence probability caused by the dangerous actions in the vehicle are reduced.

Drawings

FIG. 1 is a schematic diagram of a multi-modal awareness system for an intelligent driving vehicle according to the present invention;

FIG. 2 is a schematic diagram of a multi-modal sensing method for intelligently driving a vehicle according to the present invention;

fig. 3 is a schematic diagram of an electronic device of a multi-modal sensing system for an intelligent driving vehicle according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

Referring to fig. 1, the multi-mode sensing system for an intelligent driving vehicle according to the present embodiment includes a first data acquisition module 1, a second data acquisition module 2, a data analysis module 3, a model training module 4, and a control module 5, where the modules are connected by a wired and/or wireless network.

The first data acquisition module 1 is used for acquiring eye movement data and head posture data.

The eye movement data includes gaze point coordinates and blink information; the gaze point coordinates provide specific location coordinates of the eye gaze, in particular, the gaze point coordinates relative to the coordinate system of the eye tracker; the gaze point coordinates may be used to analyze the focus and movement trajectories of the driver's gaze in the in-vehicle environment;

the blink information is used for recording the blink actions of the driver, including the start time, end time, frequency and duration of blinking; the fatigue degree and the attention level of the driver can be estimated by blink information;

the eye movement data are monitored and acquired by an eye movement instrument arranged in the vehicle; the eye movement instrument can directly monitor and acquire gaze point coordinates and blink information, is an eye movement tracking technical device specially designed for tracking and recording eye movement, and can measure the position of eyes, pupil size, eye movement and other parameters in real time.

The eye tracking technology determines the coordinates of the gaze point by detecting the intensity change of the cornea reflected light, and can only provide the coordinates of the two-dimensional gaze point, namely can only position on a screen coordinate system or the position of a target plane; the practicability is not strong in driving the vehicle by the driver, because the driver needs to observe the left and right and the coming vehicles in the front and rear three-dimensional space in real time to know the road condition of the vehicle, if the eye movement tracking is only carried out on the right front plane, when the driver twists the head to observe other directions, the two-dimensional gaze point coordinate can be misjudged, so the first data acquisition module 1 also acquires the head gesture data, correlates the head gesture data with the eye movement data, and acquires the gaze direction of the driver in the three-dimensional space.

The head pose data includes a head rotation angle; the head rotation angle is an angle by which the head rotates about each axis in three-dimensional space, such as pitch (up-down), roll (tilt), and yaw (left-right).

The head posture data is obtained by combining an infrared sensor with a computer vision technology, and the head posture data is obtained by the following exemplary way:

step S1, an infrared sensor (such as an infrared camera or a depth camera) is used for monitoring and tracking the head of a driver to obtain an infrared image; infrared sensors can detect the position and shape of an object by infrared radiation and reflection principles.

S2, face recognition and key feature point extraction; processing the infrared image through a computer vision technology to perform face detection; after the face is detected, key feature points of the face, such as eyes, mouth, nose, or the like, are identified by using a key feature point extraction algorithm, such as one of a shape model or a deep learning model.

S3, estimating the head posture; estimating the rotation angle of the head based on geometry according to key feature points of the face; specifically, according to the relative positions and the change conditions among the characteristic points, the pitch, roll and yaw angles of the head are calculated.

The process of associating the head pose data with the eye movement data is as follows:

the head posture data and the eye movement data are correlated in time, so that two groups of data are collected at the same time point;

performing mathematical conversion on the head posture data, and mapping the head posture data into a coordinate system identical to the eye movement data; the conversion comprises rotation, translation, scaling and other operations; the associated head pose data and eye movement data are used as eye movement integrated data.

It should be noted that, the fact that the head gesture data are mapped to the same coordinate system of the eye movement data is to eliminate the influence of the head gesture on the eye movement data, and the eye movement comprehensive data are obtained by correlating the head gesture data with the eye movement data, so that the sight direction of the driver can be more accurately determined; has important significance in the aspects of evaluating the attention distribution, fatigue state and the like of the driver.

The first data acquisition module 1 transmits the eye movement integrated data to the data analysis module 3.

The motion data acquisition module 2 is used for acquiring motion training data of members taking a bus. The process of collecting the motion training data of the riding member is as follows:

setting a driving dangerous action sequence; dangerous actions include: the head extends out of the window, a vehicle door is opened in the driving process, the driver leaves the steering wheel, and the co-driver member stretches his hand to rob the steering wheel, so that the driving safety is affected;

arranging a plurality of persons to sit on respective seats in the vehicle, and sequentially performing each dangerous action; when a person performs dangerous actions, the person in other seats keeps static, and the person performing dangerous actions keeps static except the body parts necessary for performing dangerous actions;

shooting video data in a vehicle in real time by using a vehicle-mounted camera, obtaining video data of each dangerous action implemented by personnel on each seat, and marking the video data as preprocessing video data; marking a dangerous action number in a dangerous action sequence as a;

a plurality of other behavior video data different from the dangerous actions are collected and marked as preprocessing interference video data, so that the dangerous actions can be distinguished by using a machine learning model. The pre-processed disturbance video data and the pre-processed video data together form the occupant motion training data.

The second data acquisition module 2 sends the occupant motion training data to the model training module 4.

The data analysis module 3 is based on eye movement integrated data analysis.

Marking a front windshield as a first-stage region of interest on a driver visual angle image by using eye movement data analysis software, and marking a side window as a second-stage region of interest; in particular, a rectangle, polygon, or other shape may be used to represent the region of interest to represent the boundaries of the front windshield and the side windows.

The eye movement data analysis software may be Eye Tracking Analysis Software (eta); ETAS is used to mark and analyze driver eye movement data and provides an image marking tool that can create and adjust a region of interest on a driver perspective image.

The driver visual angle image is obtained by shooting a camera arranged at a driver seat, and reflects the actual field of view of the driver.

When the gaze point coordinates are located in the first-level region of interest, the driver is focused on the front of the road, when the gaze point coordinates are located in the second-level region of interest, the driver may observe a side window rearview mirror or observe a side road condition, but the gaze point coordinates are not suitable to be located in the second-level region of interest for too long, when the gaze point coordinates of the driver are located in the second-level region of interest for too long, the attention is diverted from the front road, the attention is dispersed, and the driver may miss important information such as vehicles in front, pedestrians, traffic lights and the like, so that the risk of traffic accidents is increased; the long-time observation of the side window can cause the hand posture of the driver to become unstable, and control of the steering wheel can be influenced, so that the vehicle deviates from a normal running track, and the risk of traffic accidents is increased.

By the formulaCalculating a gaze offset evaluation value eye_score; wherein t is ₁ Time, t, representing the time when the gaze point coordinates are not in the first order region of interest ₂ Time, t, representing the point of regard coordinates in the secondary region of interest ₃ Representing the time that the gaze point coordinates are neither in the primary nor secondary region of interest.

It should be noted that, when the gaze point coordinates are not in the primary region of interest, important information such as vehicles, pedestrians, traffic lights, etc. in front of the gaze point coordinates are likely to be missed, but the secondary region of interest may be observed, so t is ₁ The influence on the eye score is moderate, and when the point of regard coordinates are in the secondary region of interest, the driver may look at the side window rearview mirror, or look at the side road conditions, t when too long ₂ Has an effect on the eye score, but not as good as t ₁ When the fixation point coordinates are neither in the primary region of interest nor the secondary region of interest, the driver's sight is not on observing road conditions, the influence on road driving safety is the greatest, the road conditions are not observed for a long time, and the probability of traffic accidents is improved.

The eye score comprehensively considers the condition that the vision of the driver is in different areas, reflects the danger degree when the vision in the driving process deviates from road conditions, and the larger the eye score is, the higher the danger degree is.

Setting a gaze offset evaluation threshold value eye_safe, marking as a normal sight state when eye_score < eye_safe, marking as a dangerous sight state when eye_score is greater than or equal to eye_safe, generating a sight line adjusting instruction and sending the sight line adjusting instruction to the control module 5.

By passing throughFormula (VI)Calculating a fatigue evaluation value tired; wherein V is ₁ Indicating real-time driver blink frequency per minute; v (V) ₂ Indicating a normal condition of the driver's blink frequency per minute; u represents the driver head deflection per minute.

The head deflection times of the driver per minute are obtained according to the roll angle in the head gesture data, a roll angle threshold is set, and when the roll angle is larger than the roll angle threshold, the head deflection is marked as head deflection, and the head deflection represents possible fatigue of the driver in driving; the threshold value of the roll angle is usually set between 8 degrees and 12 degrees, and is specifically set by a worker according to personal physiological differences, for example, when a person is tired, the range of head movements is small, the threshold value of the roll angle is set to 8 degrees, and when the person is tired, the range of head movements is large, and the threshold value of the roll angle is set to 12 degrees; the more drivers are biased for each minute, the more serious the fatigue.

The blink frequency of a person per minute under normal conditions is generally 10-20 times, and the blink frequency of the driver per minute under normal conditions is fixed to be 15 times per minute for taking an average value under driving conditions; in the event of prolonged fatigue or lack of adequate rest, the number of blinks may increase; this is because fatigue can lead to fatigue of the body and brain, affecting the normal activity of the eye; the increased number of blinks is used to keep the eyes moist and reduce eye fatigue; the more blinks that increase, the more severe the fatigue.

tired comprehensively considers the blink frequency and the head deflection times of a driver in the fatigue state, and the higher tired represents the more serious the fatigue degree of the driver, the more likely is that the tired is not timely reacted to the road condition, and the traffic accident is caused.

Setting a fatigue evaluation threshold value tired_go, and when tired < tired_go, not performing operation; and when tired is more than or equal to tired_go, a fatigue warning instruction is generated and sent to the control module 5. the tired_go is specifically set by staff according to regional road conditions, when the road conditions are good, the degree of influence of slight fatigue on driving is low, the tired_go is relatively set high, and when the road conditions are bad, the tired_go is relatively set low, so that the fatigue degree of the driver is reflected more sensitively.

The model training module 4 trains a machine learning model for identifying dangerous action probability in real time based on the train member action training data; the training process of the machine learning model is as follows:

setting a label corresponding to the preprocessing video data of the a-th dangerous action as 1, setting a label corresponding to the preprocessing interference video data as 0, and constructing riding member action training data as a data set of a machine learning model; the data set is divided into a training set, a verification set and a test set, wherein the training set accounts for 70% of the data set, and the verification set and the test set each account for 15% of the data set.

Taking the training set as input of a machine learning model, wherein the machine learning model takes the occurrence probability of the a-th dangerous action as output, and the occurrence probability of the a-th dangerous action takes a value of 0-1; taking the occurrence probability of the a-th dangerous action in video data in a vehicle in real time by using a vehicle-mounted camera as a prediction target, and taking a minimized machine learning model loss function value as a training target; and stopping training when the loss function value of the machine learning model is smaller than or equal to a preset target loss value.

The machine learning model loss function may be Mean Square Error (MSE) or Cross Entropy (CE); illustratively, the Mean Square Error (MSE) is determined by applying a loss function valueThe model is trained for the purpose of minimization, so that the machine learning model is better fitted with data, and the performance and accuracy of the model are improved; mse is a loss function value in the loss function, and i is a train member action training data set number; u is the number of the train member action training data sets; y is _i Tag corresponding to action training data for ith group of riding members, < ->An a-th dangerous action occurrence probability based on the i-th group riding member action training data.

The machine learning model can be any one of a double-flow convolutional neural network model or a 3D convolutional neural network model; other model parameters of the machine learning model, such as a target loss value, a depth of a network model, the number of neurons of each layer, an activation function used by the network model, optimization of the loss function and the like, are all realized through actual engineering, and are obtained after experimental tuning is continuously carried out.

After the machine learning model is trained, dangerous actions of a driver on the vehicle and dangerous actions of other people on the vehicle can be analyzed in real time, different dangerous action occurrence probabilities are generated, and early warning and warning of the dangerous actions are facilitated.

The model training module 4 sends the output result to the control module 5.

The control module 5 generates an action warning command based on the output result, and performs corresponding processing based on the fatigue warning command and the sight line adjustment command.

Setting a dangerous action occurrence probability threshold, and generating an action warning instruction when the occurrence probability of the a-th dangerous action is greater than or equal to the dangerous action occurrence probability threshold; the dangerous action occurrence probability threshold value is set to 80 percent and can be adjusted by staff according to actual conditions.

The action warning instruction comprises reminding members in the vehicle through the voice of the vehicle-mounted sound equipment, and stopping the a-th dangerous action as soon as possible; it should be noted that, in some cases, it is not clear to the occupant which actions are dangerous actions, and the action warning command will inform the specifically identified a-th dangerous action, so that the occupant can stop the dangerous action conveniently.

The fatigue warning instruction comprises a vehicle-mounted sound voice reminding driver to concentrate on attention or stop and rest by side; the sight line adjusting instruction comprises a vehicle-mounted sound to remind a driver of the concentration of the sight line.

The embodiment 1 realizes the process of no additional wearing sensor, saves the time for starting the vehicle, uses the multimode multi-sensor to predict and early warn dangerous actions of passengers, evaluates and analyzes the sight state and fatigue condition of the driver in real time, and reduces the probability of traffic accidents caused by inattention of the driver, fatigue of the driver and dangerous actions in the vehicle.

Example 2

Referring to fig. 2, the embodiment is not described in detail in the first embodiment, and a multi-mode sensing system for intelligent driving of a vehicle is provided. The method comprises the following steps:

collecting eye movement data and head posture data, and correlating the head posture data with the eye movement data to generate eye movement comprehensive data;

collecting riding member action training data;

Further, the eye movement data includes gaze point coordinates and blink information; the head pose data includes a head rotation angle;

the process of correlating head pose data with eye movement data is as follows: the head posture data and the eye movement data are correlated in time, so that two groups of data are collected at the same time point; performing mathematical conversion on the head posture data, and mapping the head posture data into a coordinate system identical to the eye movement data; the associated head pose data and eye movement data are used as eye movement integrated data.

Further, the process of collecting the motion training data of the riding member is as follows:

setting a driving dangerous action sequence; arranging a plurality of persons to sit on respective seats in the vehicle, and sequentially performing each dangerous action; when a person performs dangerous actions, the person in other seats keeps static, and the person performing dangerous actions keeps static except the body parts necessary for performing dangerous actions;

shooting video data in a vehicle in real time by using a vehicle-mounted camera, obtaining video data of each dangerous action implemented by personnel on each seat, and marking the video data as preprocessing video data; and collecting a plurality of other behavior video data different from the dangerous actions and marking the other behavior video data as preprocessing interference video data, wherein the preprocessing interference video data and the preprocessing video data together form riding member action training data.

Further, using eye movement data analysis software, marking a front windshield as a first-stage region of interest on a driver visual angle image, and marking a side windshield as a second-stage region of interest on a side window;

by the formulaCalculating a gaze offset evaluation value eye_score; wherein t is ₁ Time, t, representing the time when the gaze point coordinates are not in the first order region of interest ₂ Time, t, representing the point of regard coordinates in the secondary region of interest ₃ Representing the time when the gaze point coordinates are neither in the primary nor secondary region of interest;

setting a gaze offset evaluation threshold value eye_safe, marking as a normal sight state when the eye_score is smaller than the eye_safe, marking as a dangerous sight state when the eye_score is larger than or equal to the eye_safe, and generating a sight adjusting instruction.

Further, by the formulaCalculating a fatigue evaluation value tired; wherein V is ₁ Indicating real-time driver blink frequency per minute; v (V) ₂ Indicating a normal condition of the driver's blink frequency per minute; u represents the head deflection times of a driver per minute;

setting a fatigue evaluation threshold value tired_go, and when tired < tired_go, not performing operation; a fatigue warning instruction is generated when tired is greater than or equal to tired_go.

Further, the training process of the machine learning model is as follows:

Taking the training set as input of a machine learning model, wherein the machine learning model takes the occurrence probability of the a-th dangerous action as output, and the occurrence probability of the a-th dangerous action takes a value of 0-1; taking the occurrence probability of the a-th dangerous action in video data in a vehicle in real time by using a vehicle-mounted camera as a prediction target so as to minimize the loss function value of a machine learning modelAs a training target; mse is a loss function value in the loss function, and i is a train member action training data set number; u is the number of the train member action training data sets; y is _i Tag corresponding to action training data for ith group of riding members, < ->An a-th dangerous action occurrence probability based on the i-th group riding member action training data; stopping training when the loss function of the machine learning model is smaller than or equal to a preset target loss value;

the machine learning model can be any one of a double-flow convolutional neural network model or a 3D convolutional neural network model;

further, setting a dangerous action occurrence probability threshold, and generating an action warning instruction when the occurrence probability of the a-th dangerous action is greater than or equal to the dangerous action occurrence probability threshold;

the action warning instruction comprises reminding members in the vehicle through the voice of the vehicle-mounted sound equipment, and stopping the a-th dangerous action as soon as possible; the fatigue warning instruction comprises a vehicle-mounted sound voice reminding driver to concentrate on attention or stop and rest by side; the sight line adjusting instruction comprises a vehicle-mounted sound to remind a driver of the concentration of the sight line.

Example 3

Referring to fig. 3, an electronic device according to an exemplary embodiment includes: a processor and a memory, wherein the memory stores a computer program for the processor to call;

the processor executes the multi-modal awareness method of intelligently driving a vehicle by invoking a computer program stored in the memory.

Example 4

A computer readable storage medium having stored thereon a computer program that is erasable according to an exemplary embodiment is shown;

the computer program, when run on a computer device, causes the computer device to perform a multimodal perception method of intelligently driving a vehicle as described above.

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present invention are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center over a wired network or a wireless network. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely one, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Finally: the foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A multi-modal awareness method for intelligently driving a vehicle, comprising:

collecting riding member action training data;

generating action warning instructions based on the machine learning model output results and processing based on the gaze adjustment instructions and the fatigue warning instructions;

the eye movement data includes gaze point coordinates and blink information; the head pose data includes a head rotation angle;

the process of correlating head pose data with eye movement data is as follows: the head posture data and the eye movement data are correlated in time, so that two groups of data are collected at the same time point; performing mathematical conversion on the head posture data, and mapping the head posture data into a coordinate system identical to the eye movement data; the head posture data and the eye movement data after being correlated are used as eye movement comprehensive data;

the process of collecting the motion training data of the riding member is as follows:

shooting video data in a vehicle in real time by using a vehicle-mounted camera, obtaining video data of each dangerous action implemented by personnel on each seat, and marking the video data as preprocessing video data; collecting a plurality of other behavior video data different from dangerous actions and marking the other behavior video data as preprocessing interference video data, wherein the preprocessing interference video data and the preprocessing video data jointly form riding member action training data;

marking a front windshield as a first-stage region of interest on a driver visual angle image by using eye movement data analysis software, and marking a side window as a second-stage region of interest;

setting a gaze offset evaluation threshold value eye_safe, marking as a normal sight line state when the eye_score is smaller than the eye_safe, marking as a dangerous sight line state when the eye_score is larger than or equal to the eye_safe, and generating a sight line adjusting instruction;

by the formulaCalculating a fatigue evaluation value tired; wherein V is ₁ Indicating real-time driver blink frequency per minute; v (V) ₂ Indicating a normal condition of the driver's blink frequency per minute; u represents the head deflection times of a driver per minute;

setting a fatigue evaluation threshold value tired_go, and when tired < tired_go, not performing operation; generating a fatigue warning instruction when tired is greater than or equal to tired_go;

the training process of the machine learning model is as follows:

setting a label corresponding to the preprocessing video data of the a-th dangerous action as 1, setting a label corresponding to the preprocessing interference video data as 0, and constructing riding member action training data as a data set of a machine learning model; the data set is divided into a training set, a verification set and a test set, wherein the training set accounts for 70% of the data set, and the verification set and the test set respectively account for 15% of the data set;

setting a dangerous action occurrence probability threshold, and generating an action warning instruction when the occurrence probability of the a-th dangerous action is greater than or equal to the dangerous action occurrence probability threshold;

2. The multi-modal sensing system of an intelligent driven vehicle of claim 1, comprising:

a first data acquisition module (1) that acquires eye movement data and head posture data, and associates the head posture data with the eye movement data;

a second data acquisition module (2) for acquiring motion training data of members riding in the vehicle;

a data analysis module (3) for generating a sight line adjustment instruction and a fatigue warning instruction based on the eye movement integrated data analysis;

the model training module (4) is used for training a machine learning model for identifying dangerous action probability in real time based on the riding member action training data;

and a control module (5) which generates an action warning instruction based on the machine learning model output result and processes the action warning instruction based on the sight line adjustment instruction and the fatigue warning instruction.

3. An electronic device, comprising: a processor and a memory, wherein the memory stores a computer program for the processor to call;

the processor performs a multi-modal awareness method of an intelligent drive vehicle of claim 1 by invoking a computer program stored in the memory.

4. A computer-readable storage medium, characterized by: instructions stored thereon which, when executed on a computer, cause the computer to perform a multimodal perception method of intelligently driving a vehicle as claimed in claim 1.