CN109493305A

CN109493305A - A kind of method and system that human eye sight is superimposed with foreground image

Info

Publication number: CN109493305A
Application number: CN201810991444.4A
Authority: CN
Inventors: 侯喆; 王晋玮
Original assignee: Momenta Suzhou Technology Co Ltd
Current assignee: Momenta Suzhou Technology Co Ltd
Priority date: 2018-08-28
Filing date: 2018-08-28
Publication date: 2019-03-19

Abstract

A kind of method and device for assisting the human eye sight in driving field to be superimposed with foreground image, this method includes following steps: S1: finding the eye position of driver, obtains position of human eye in the three-dimensional coordinate of camera coordinates system；S2: the forward view image of Utilization prospects camera shooting driver；S3: the three-dimensional coordinate of the camera coordinates system of the step S1 and driver's forward view image superposition are formed into superimposed image, the first layer of the superimposed image shows real-time pilot's line of vision position；The second layer of the superimposed image shows the forward view image of the prospect camera shooting in real time.It further include an acquisition device, can acquire high quality is used for neural metwork training image data, provides data support for the capture of sight.The present invention can provide a variety of applications by the superposition of the image for the detection of computer disposal eye position.Neural metwork training data and neural network matched with its additionally, due to the high quality of use, ensure that the precision of line-of-sight detection.

Description

A kind of method and system that human eye sight is superimposed with foreground image

Technical field

This application involves the image real time transfer fields of artificial intelligence.Specifically, what is involved is computer disposal faces Image data is merged with real-time foreground image.

Background technique

Determination to human eye sight direction includes the determination to position of human eye, the determination of human eye fixation point position, when true After determining above-mentioned two o'clock position in real three-dimensional space, the line of the two position is the direction of human eye sight.However it is assisting Driving field either automatic Pilot field, for human eye sight, there is no be fully utilized.

Such as CN106886759A, it is only to be concerned about in whether head has deflection in the prior art, when discovery head There is deflection to illustrate that driver wants to see the blind area being difficult to see that on normal driving position when human eye is watched towards rearview mirror, at this time can The image of blind area is shown.The sight variation of the detection human eye is not fine enough, and when head does not deflect, system can not be looked into Feel the intention of driver at this time out.

Such as CN104619558A, detection system disclosed in the prior art also can detecte the change in human eye sight direction Change, check drive in direction of visual lines whether be directed at mirror, when detect the sight be aligned mirror, then the driving of system default at this time Member need to observe rearview mirror, at this time can respectively according to driver observation be that left-hand mirror or right rear view mirror go to adjust separately this The angle of rearview mirror, to facilitate driver that can see more visuals field by rearview mirror.The prior art there are the shortcomings that In: the information of direction of visual lines variation must can be just obtained in the case where human eye large-scale sight angle transformation, i.e., still There is a problem of that detection sight variation is not fine enough.

Such as CN107054225A, detection system disclosed in the prior art are used to detect the human eye sight side of driver To, when discovery drive in stare display screen and/or control key on vehicle control panel when, illustrate the sight side of driver at this time To not in front of driver's cabin, at this time by field of front vision image projection to the display screen, although to facilitate driver not watch attentively Front, but still it can be seen that objects in front.The detection device it is actually detected be head offset, still rotational angle compared with Greatly, it can not accomplish the accurate capture to human eye sight direction.

Technical solution

To solve the above problems, and be effectively combined in order to which human eye sight direction will be accurately obtained with driving scene, this Invention provides a kind of sight stacking apparatus and method, to solve to be unable to get human eye sight direction accurately side in the prior art To thus the technical issues of can not effectively be combined with driving scene.

The object of the present invention is to provide a kind of foundation on high-precision human eye sight location base, with front sight image The method for forming superposition, and execute the device of this method.In this way, computer or driver can obtain this in real time When position of the sight in field of front vision, provide possibility for subsequent accurate human-computer interaction and Eye-controlling focus application.

One aspect of the present invention includes following technical solution:

A kind of method that human eye sight position is superimposed with foreground image, this method includes following steps:

S1: finding the eye position of driver, obtains position of human eye and sits in the three-dimensional of indoor setting camera (DMS camera) coordinate system Mark and visual lines vector；

S2: Utilization prospects camera shoots the forward view image of driver as foreground image；

S3: by the prospect in the three-dimensional coordinate of the indoor setting camera coordinates system of the step S1 and the step S2 Image superposition forms superimposed image, and the superimposed image includes two layers；Wherein first layer shows real-time pilot's line of vision position； The foreground image of the shooting of prospect camera described in second layer real-time display；

Wherein in the step S3, by nominal data, by indoor setting camera (DMS camera) coordinate system in the step S1 Lower position of human eye and the visual lines vector are transformed into position of human eye and/or direction of visual lines arrow under prospect camera coordinates system Amount；

Wherein using visual lines vector described in neural computing in the step S1.

Preferably, the method for three-dimensional coordinate of the position of human eye in camera coordinates system is obtained in the step S1, including as follows Step:

S11. prospect camera and indoor setting camera (DMS camera) are fixed, and demarcates the prospect camera and the indoor setting phase The geometry site of machine；

S12. facial image is acquired using the indoor setting camera (DMS camera)；

S13. the facial image is passed through into human face characteristic point network model, obtains two inner eye corners on face, two outer Five pixel coordinates at canthus and nose；And the centre coordinate of eyes is calculated；

S14. by the centre coordinate of the obtained eyes of step S13, the eye position by neural network is calculated, and is obtained The visual lines vector of the facial image of acquisition；

S15. human eye three-dimensional is calculated by the obtained centre coordinate of the eyes of step S13 and the coordinate of nose to sit Mark system arrives the spin matrix of indoor setting camera coordinates system, and then obtains the position of human eye and sit at the indoor setting camera (DMS camera) Mark the three-dimensional coordinate of system.

Preferably, under prospect camera coordinates system, it is known that the centre coordinate of the eyes and the visual lines vector.With Eye center is starting point, and using the sight vector as direction, the intersection point of the sight vector and foreground image plane is preceding The blinkpunkt of driver in scape image；Wherein coordinate of the human eye under indoor setting camera coordinates system is Point (x, y, z), and human eye exists Coordinate under prospect camera is Point (u, v, w)；Indoor setting camera according to neural network prediction go out sight vector V (rot_x, Rot_y, rot_z), sight vector V (rot_u, rot_v, rot_w) under prospect camera；It will be under the indoor setting camera coordinates system Coordinate position transform to the coordinate position under the prospect camera coordinates the following steps are included:

Point (u, v, w)=R2* (Point (x, y, z))+T2；

V (rot_u, rot_v, rot_w)=R2*V1 (rot_x, rot_y, rot_z)；

With Point (u, v, w) for starting point, with V (rot_u, rot_v, rot_w) for direction vector, a ray is constructed, With prospect camera as there are an intersection points for plane, which is sight line point.Wherein human eye coordinates system becomes to indoor setting camera coordinates system The spin matrix [R1 | T1] of change；Spin matrix [R2 | T2] of the indoor setting camera to the prospect camera；R1, T1 are respectively represented The rotation and translation transformation parameter converted from human eye coordinates system to indoor setting camera coordinates system；R2, T2 are respectively represented from indoor setting camera The rotation and translation transformation parameter that coordinate system is converted to prospect camera coordinates system.

Preferably, wherein the eye position in step S1 Jing Guo neural network calculates, the facial image acquired Visual lines vector；The neural network includes 5 convolution modules, and each convolution module is tied using ShuffleNet Structure；Using head image as input layer, input picture is standardized as to the size of 224*224, then using the convolution kernel of 3*3 with 2 A pixel is that step-length carries out convolution, and using ReLu activation primitive, the size for obtaining characteristic pattern is 112*112, reuses maximum value Chi Hua, it is down-sampled as step-length progress using 2 pixels, obtain the characteristic pattern having a size of 56*56.

Preferably, 5 convolution modules have used the network structure of reshuffling unit (shuffle unit)；First In the right branch of shuffle unit module, the characteristic pattern of 56*56 first carries out organizing convolution point by point, then carries out channel and shuffles (channel shuffle), then depth convolution (depthwise is carried out by step-length of 2 pixels with the convolution kernel of 3*3 Convolution), point-by-point group convolution is then carried out.

Another aspect of the present invention, provides a kind of system that human eye sight position is superimposed with foreground image, in execution The stacking method stated.

Preferably, which includes a computing devices, the human eye feature in image for extracting acquisition device acquisition Point position, and calculate the human eye position coordinates；Position camera is additionally stared by the human eye and calculates corresponding stare Position coordinates, and calculate human eye position 3D under the coordinate system of the indoor setting camera (DMS camera)；The computing device includes Neural network as claimed in claim 4, the neural network are used for the position of human eye image in each angle face image, sight It stares position coordinates and is input in the neural network and be trained, training is obtained when inputting face image, can be accurate Export human eye sight direction.

Preferably, the acquisition device includes: multiple brackets, and the multiple bracket includes multiple horizontal supports and multiple vertical To bracket；The multiple camera is fixed on the crossover location of the multiple horizontal supports and longitudinal carrier；One rail structure, packet Horizontal slide rail and longitudinal slide rail are included, which can move freely with vertical direction in the horizontal direction on bracket；One light Source is fixed on the crossover location of the horizontal slide rail and the longitudinal slide rail；The human eye stares position camera and the light source is solid It is fixed, so that the human eye is stared position camera and is moved with the light source movement.

Preferably, the multiple camera includes the vertical direction of the face of optical axis direction and the people of camera in 45 ° of direction Camera, which ensure that the eye image for taking maximum 90 ° of face torsion.

It according to another aspect of the present invention, further include application of the above method in automatic Pilot and/auxiliary drive.

The present invention can be accurately by prospect camera coordinates system and indoor setting camera coordinates to the method by above-mentioned coordinate transform System mutually unifies, and completes being superimposed to direction of visual lines and foreground image.Superposition by the image can be the detection of eye position A variety of applications are provided.These applications include for example, the eye position of driver is launched on front windshield in real time, and it is preceding It being projected in advance on windshield, such as in-car multimedia system information interface, these have two layers of information on the windshield, and one Layer is in-car multimedia system information interface.The image can be new line display (HUD) mode show, the display of HUD it is transparent Information interface appears on a pocket of front windshield；One layer be driver eye position, the specifically view Line position is superimposed upon in the HUD system with translucent dot.Because this two tomographic image is all translucent mode, caused by Since effect is that driver sees eye position and multimedia system information, while having no effect on driver to objects in front Observation.By above-mentioned improved HUD system, it can accomplish the accurate human-computer interaction of driver and vehicle driving system.It is existing Have in technology for human-computer interaction, generally requires driver and go to touch or press using hand, which increase the risk of driving, by Has the high-precision of eye tracking in the application, it is already possible to which realization goes accurately to select display screen by the variation of human eye sight On the respective option；It can be additionally used for attention detection, when driver's driving demand power is not concentrated, sight position at this time It sets and is often fixed under regular hour threshold value in some band of position or before sight is not concerned with driving sight Side passes through the detection of eye gaze position at this time, does not concentrate carry out reminding alarm to energy.This also inventive point of the invention it One.

Homemade image collecting device is set to using image collector, can efficiently be acquired in real time on each direction of visual lines The facial image of all angles.And image data and the neural metwork training in the later period collocation of the image acquisition device make With interkniting.This is one of inventive point of the invention.

Selection for special characteristic point, it is only necessary to which a small amount of characteristic point can accomplish the standard to eyes three-dimensional coordinate position It determines, both reduces computational load, while in turn ensuring operational precision, this is one of inventive point of the invention.

For data processing, get rid of the head image data of each angle and human eye portion image data in the prior art It is separately input to be trained in neural network, then the artificial addition that carries out handles the problem of being easy to appear over-fitting.This hair Bright be fused together eye image and head part's posture image is input in neural network, and neural network oneself is allowed to go to calculate eye As soon as the process of eyeball and head superposition, this can greatly improve precision, and final error on line of sight is not more than 3 degree, this is this One of inventive point of invention.

Detailed description of the invention

Fig. 1 show the main flow that human eye sight is superimposed with foreground image；

Fig. 2 show face three-dimensional system of coordinate schematic diagram 1；

Fig. 3 show indoor setting camera coordinates system and face coordinate system corresponding relationship；

Fig. 4 a show face three-dimensional system of coordinate schematic diagram 2 in the embodiment of the present application；

Fig. 4 b show face three-dimensional system of coordinate schematic diagram 3 in the embodiment of the present application；

Fig. 5 is artificial neural network learning schematic diagram；

Fig. 6 is the neural network learning schematic diagram in conjunction with head and eye image.

Fig. 7 a show training and acquires support schematic diagram with face image data；

Fig. 7 b show training facial image data acquisition schematic diagram.

Specific embodiment

To achieve the goals above, feature and advantage can be more obvious and easy to understand, with reference to the accompanying drawing and specific embodiment party Formula is described in further detail the embodiment of the present application.

As shown in Figure 1, being the method and step for being superimposed human eye sight with foreground image.This method includes following steps:

S1: finding the eye position of driver, obtains three-dimensional coordinate of the position of human eye in indoor setting camera coordinates system；

S2: the forward view image of Utilization prospects camera shooting driver；

S3: by the forward view image of the three-dimensional coordinate of the indoor setting camera coordinates system of the step S1 and driver Superposition forms superimposed image, and the first layer of the superimposed image shows real-time pilot's line of vision position；The superimposed image The second layer shows the forward view image of the prospect camera shooting in real time.

It needs before carrying out coordinate conversion to two cameras (indoor setting DMS camera and prospect camera are demarcated), it is specific to demarcate The step of may is that and assume indoor setting camera in vehicle at front windshield.A vertical gridiron pattern, prospect camera are put in headstock first Chessboard grid pattern can be directly furnished, prospect camera can be calculated to tessellated positional relationship according to the photo.Secondly, DMS camera right opposite places a mirror, and DMS camera is allowed to take gridiron pattern by mirror.In addition it is pasted at one jiao of mirror One gridiron pattern.Mirror can be calculated to Chinese herbaceous peony gridiron pattern positional relationship, according to mirror according to mirror-reflection to Chinese herbaceous peony gridiron pattern Upper gridiron pattern, can calculate DMS camera to mirror positional relationship.Therefore DMS camera can be found out indirectly to Chinese herbaceous peony gridiron pattern Positional relationship.Finally according to Chinese herbaceous peony gridiron pattern positional relationship, the positional relationship between two cameras can be found out.

Wherein in step S1 obtain position of human eye the three-dimensional coordinate of indoor setting camera coordinates system method include calculate To the centre coordinate of eyes, specific steps include: that the facial image is obtained face by human face characteristic point network model Five pixel coordinates of upper two inner eye corners, two tail of the eyes and nose；And the centre coordinate of eyes is calculated.

By 4 points of human eye (2 tail of the eyes, 2 inner eye corners) and nose, human eye is obtained under indoor setting camera coordinates system The position 3D.

Facial image passes through human face characteristic point network model, coordinate of available 5 characteristic points on imaging plane.From Left-to-right (left outside canthus, left inside canthus, right inner eye corner, the right tail of the eye) and nose are respectively P1, P2, P3, P4, P5.We adopt With two eye center points and prenasale, totally 3 points are calculated.Specifically coordinate sample calculation is as follows:

For selecting 5 points, calculation method is as follows:

Wherein, left eye central point: l_eye_2D.x=(P1.x+P2.x)/2

L_eye_2D.y=(p1.y+p2.y)/2

Right eye central point:

R_eye_2D.x=(P3.x+p4.x)/2

R_eye_2D.y=(p3.y+p4.y)/2

Nose central point: n_2D.x=P5.x

N_2D.y=P5.y

Indoor setting face three-dimensional system of coordinate is established, Fig. 2 show in the embodiment of the present application face three-dimensional coordinate in indoor setting image It is schematic diagram.

Using two centers as coordinate origin (0,0,0).

Illustrate by example of average face: in this coordinate system,

Left eye centre coordinate l_eye_3D (- 0.03,0,0)；

Right eye centre coordinate r_eye_3D (0.03,0,0)；

Nose coordinate n_3D (0,0.05,0.02) unit: rice.

Fig. 3 show indoor setting camera coordinates system and face three-dimensional system of coordinate corresponding relationship.

Optionally, as shown in fig. 4 a, by left outside canthus, left inside canthus, right inner eye corner, the right tail of the eye can be in the hope of one Central point, the i.e. starting point of ray in Fig. 4 a.Use this point as sight starting point, i.e., the ray is direction of visual lines.

It as shown in Figure 4 b, is what face Critical point model detected positioned at the point of testee or driver's facial surface Point, wherein central point is the eyes tail of the eye, and inner eye corner, 4 points acquire central point.Point positioned at non-facial surface is the 3D acquired Point projects to the point in camera plane.It can be seen that essentially coincide, i.e. the position the face 3D accuracy rate that acquires of this method is very high. The left outside canthus of line, the right tail of the eye and 3 points of nose, i.e. confirmation header planes.Perpendicular to the normal vector of the plane, for representing Head pose.The coordinate for solving two eye centers and nose calculates human eye three-dimensional system of coordinate to indoor setting camera coordinates system The specific steps of spin matrix include:

It is mapped by 3 groups of coordinates, solves the spin matrix of camera coordinates system and above-mentioned face coordinate system transformation.

Wherein:

X '=x/z；Y '=y/z；

r²=x^′2+y^′2；

U=f_x*x″+C_x；V=f_y*y″+C_y；

X, y, z are the transformed 3D coordinate under the camera coordinates system respectively；X, Y, Z are respectively in the step S22 Obtained face three-dimensional coordinate；X ', y ' are the coordinates under the imaging plane coordinate system after normalization；After x ", y " go distortion Coordinate under effect plane coordinate system；Fx, fy are the focal length of camera horizontal and vertical direction respectively；Cx be imaging plane origin with Pixel planes origin x-axis direction difference；Cy is imaging point and pixel planes origin y difference；K1, k2, k3, k4, k5, k6, p₁, p₂ It is camera distortion parameter, by being obtained to camera calibration；U therein, v are to obtain pixel planes servant in the step S21 Face coordinate；By u, v value and above-mentioned formula obtain x, y, z value；R, the value of T are calculated by formula (1) again.

K1, k2, k3, k4, k5, k6, p1, p2 are camera distortion parameters, are obtained by the calibration to camera；

Calculating face position 3D under prospect camera coordinates system includes using formula:

Camera (x, y, z)=[R | T] face (x, y, z)；Wherein R and T respectively represents rotation and translation transformation parameter, by The formula (1) is calculated.

In addition, above-mentioned is simplified calculating, it is 4 characteristic points at two canthus by left eye and right eye by left eye and right eye The special characteristic point position of central point this 2 replaces.That is 3 characteristic points are the center and nose position of left eye and right eye respectively.Its In using the center of left eye and the line of centres of right eye as the starting point of sight.Above-mentioned 3 groups of coordinates mapping be it is left with central point, The mapping of the coordinate of right eye central point and prenasale.

Sight superposition in the step S3 specifically includes:

Obtain spin matrix that human eye coordinates system changes to indoor setting camera coordinates system [R1 | T1].It is available by calibration Spin matrix [R2 | T2] of the indoor setting camera to prospect camera.

If coordinate of the human eye under indoor setting camera is Point (x, y, z), coordinate of the human eye under prospect camera is Point (u,v,w)。

The sight vector V (rot_x, rot_y, rot_z) that indoor setting camera goes out according to neural network prediction is regarded under prospect camera Line vector V (rot_u, rot_v, rot_w).Wherein it can be found in this specification about how neural network predicts sight vector Part of neural network.By the coordinate position under indoor setting camera coordinates system transform to the coordinate position under prospect camera coordinates include with Lower step:

Point (u, v, w)=R2* (Point (x, y, z))+T2

V (rot_u, rot_v, rot_w)=R2*V1 (rot_x, rot_y, rot_z)

With Point (u, v, w) for starting point, with V (rot_u, rot_v, rot_w) for direction vector, a ray is constructed, it must So with prospect camera as there are an intersection points for plane.The point is sight line point.

The intersection point in prospect camera coordinate system can be calculated by above-mentioned formula, i.e. sight is shot in prospect camera Image in position.

Image mentioned above can be the image for display (HUD) display that comes back, and the mode for the display that comes back appears in front On one pocket of wind glass (content of display can be multimedia content)；One layer be driver eye position, specifically It is that the eye position is superimposed upon in the HUD system with translucent dot that ground, which is said,.Because this two tomographic image is all translucent Mode since caused by effect be that driver sees eye position and multimedia content, while having no effect on driver to preceding The observation of square object body.Because can accomplish that high-precision positioning regards using one camera acquisition and neural network deep learning algorithm Line direction, therefore high-precision attention reminds that become can be in driving procedure.Control loop is by detecting the eye position Whether the position that watch is watched attentively.In addition, can also accomplish driver and vehicle by above-mentioned improved HUD system The accurate human-computer interaction of control loop.In the prior art for human-computer interaction, generally require driver go to touch using hand or It presses, which increase the risk of driving, high-precision of the application due to specific eye tracking, it is already possible to which realization passes through human eye The variation of sight goes accurately to select the respective option on display screen.It can be additionally used for attention detection, when driver drives When sailing absent minded, eye position at this time is often fixed in some band of position under regular hour threshold value, Or sight is not concerned in front of driving sight, is passed through the detection of eye gaze position at this time, is not concentrated and mention to energy It wakes up and alarms.

Preferably, position of human eye is obtained in step S1 can be related in the three-dimensional coordinate of camera coordinates system using neural network It is accurately obtained that sight is appropriate, shown in the neural network being directed to is specific as follows:

Neural network (Neural Networks, NN) is widely mutual by a large amount of, simple processing unit (neuron) The complex networks system for being connected and being formed, it reflects many essential characteristics of human brain function, is one highly complex non- Linear system.Neural network has large-scale parallel, distributed storage and processing, self-organizing, adaptive and self-learning ability, especially It is suitble to processing to need while considers many factors and condition, fuzzy message processing problem.

In the resolving problem of direction of visual lines, available data collection quantity is too small, and quality is bad, when use includes human eye area Head picture training neural network will cause very strong over-fitting when directly acquiring the eye gaze position of user.This mistake Fit within the embodiment in actual use of network are as follows: network can be using head pose as final output, rather than the view of human eye Line direction.In order to solve problem above, this patent specially acquires a large amount of head images comprising human eye area and corresponding essence True direction of visual lines data.In the method for this data-driven of deep learning, the quality and quantity of training data is played to Guan Chong A large amount of quality datas of the effect wanted, this patent acquisition restrained effectively above-mentioned over-fitting.

The network structure that the present invention uses mainly includes 5 convolution modules, and each convolution module is tied using ShuffleNet Structure.Head picture comprising human eye area is first standardized as the size of 224*224 (unit: pixel), the picture after standardization into Enter neural network input layer, after the processing of neural network, final output prediction human eye stares the latitude and longitude coordinates of position, i.e., Human eye sight direction.In the training process of neural network, according to loss function predetermined (loss function), make Position longitude and latitude is stared with the human eye of neural network prediction, the true calculation of longitude & latitude for staring position with human eye in data set loses (loss), it is trained by parameter of backpropagation (BP) algorithm to neural network point.It is acquired by this patent a large amount of excellent Prime number evidence, the parameter of neural network have obtained good training, can be by individual the head picture comprising human eye area, essence Really calculate human eye sight direction.

Fig. 5 is the structure chart for estimating the convolutional neural networks in human eye sight direction.Head_picture is input layer, will be defeated Enter the size that graphics standard turns to 224*224, then carries out convolution by step-length of 2 pixels using the convolution kernel of 3*3, use ReLu activation primitive, the size for obtaining characteristic pattern is 112*112, reuses maximum value pond, is dropped using 2 pixels as step-length Sampling, obtains the characteristic pattern having a size of 56*56.Next 5 convolution modules have used the network structure of shuffle unit, Therefore it is only described in detail with the first two module.In the right branch of first shuffle unit module, the feature of 56*56 Figure first carries out point-by-point group convolution, then carries out channel and shuffles (channel shuffle), then with the convolution kernel of 3*3 with 2 pixels Depth convolution (depthwise convolution) is carried out for step-length, then carries out point-by-point group convolution.In first shuffle In the left branch of unit, the characteristic pattern of 56*56 uses the convolution kernel of 3*3, carries out average pond by step-length of 2 pixels.1st The characteristic pattern that the left and right branch of shuffle unit obtains carries out channel cascade, obtains the characteristic pattern of 28*28 a series of.Second In the right branch of a shuffle unit, the characteristic pattern of 28*28 first carries out organizing convolution point by point, and then channel is shuffled, then with 3*3's Convolution kernel carries out depth convolution (depthwise convolution) by step-length of 1 pixel, then carries out point-by-point group convolution. In the left branch of first shuffle unit, the characteristic pattern of 28*28 is without any processing.2nd shuffle unit The obtained characteristic pattern corresponding element of left and right branch be added, then use ReLu activation primitive, output characteristic pattern size remains as 28*28.The structure of 3rd shuffle unit is identical as the 1st structure of shuffle unit, the 3rd shuffle unit Export the characteristic pattern of 14*14.The structure of 4th shuffle unit is identical as the 2nd structure of shuffle unit, and the 4th The output of shuffle unit is the characteristic pattern of 14*14.The structure of 5th shuffle unit and the 1st shuffle unit Structure it is identical, export the characteristic pattern for 7*7.For the characteristic pattern of the 7*7 of the 5th shuffle unit output, with the volume of 7*7 Product core carries out average pond, obtains the characteristic pattern of 1*1 a series of, i.e. a vector.Each component in this vector is carried out Combination, i.e., two angles of exportable angle_size.

Fig. 6 is the data that longitude and latitude accurate label in position is stared using the head image comprising human eye area and human eye, right The schematic diagram that neural network shown in Fig. 5 is trained.In the training process of neural network, the head image comprising human eye area Gaze CNN, network structure described in gaze CNN, that is, Fig. 5 are inputted, gaze CNN estimates human eye according to the image of input and coagulates Depending on position longitude and latitude angle_gaze, according to loss function predetermined (loss function), the people estimated using network Eye stares position longitude and latitude angle_gaze, and the true longitude and latitude label gaze label for staring position with human eye in data set is counted Loss Lg is calculated, is trained by parameter of backpropagation (BP) algorithm to neural network point.It is acquired by this patent a large amount of High-quality data, the parameter of neural network have obtained good training, can by individual the head picture comprising human eye area, Accurately calculate human eye sight direction.

Preferably, the method that the human eye sight is superimposed with foreground image further includes having an image data acquiring device, The device includes:

One acquisition device, the acquisition device have multiple cameras, respectively face multi-angle acquisition camera and light source position Camera is set, is respectively used to acquire the face image of each angle and corresponding human eye stares the eye image of position shooting；

One computing device, the human eye feature point position in image for extracting the acquisition device acquisition, and calculate The human eye position coordinates；Corresponding light source position coordinate is additionally calculated by the light source position camera；The computing device It is also used to construct the neural network of artificial intelligence, for including position of human eye image, eye gaze by each angle face image Position coordinates, which are input in the neural network, carries out machine learning, and training is obtained when inputting face image, can be correct Export direction of visual lines；

One is installed on the camera in vehicle drive room, which passes through for acquiring face image when driver drives The neural network after training exports the people's face image corresponding eye position in real time.

The image capturing system that the present invention uses includes in order to which the camera 5 of above-mentioned fixation puts fixation that is convenient and providing The bracket 10 of camera 5, which is made of the bracket 1 of the bracket 2 of 3 horizontally-parallel arrangements and 3 longitudinal arrangements, every It is both provided with several pedestals 3 on a bracket, can according to need setting industrial camera or light source thereon.

The section of each horizontal support 2 and longitudinal carrier 1 is rectangle, in prolonging for each horizontal support 2 and longitudinal carrier 1 It stretches on direction, on four faces of rectangle, is both provided with strip groove or strip bulge, the guide rail as several pedestals；And Several pedestals 3 have the shape to match with rail shapes, such as protrusion or groove.

Furthermore horizontal support 2 and longitudinal carrier 1 can also have the threaded rod 4 being parallel to each other with bracket extending direction, spiral shell Rasp bar 4 has external screw thread, has the engaging portion combined with bracket end for fixed thread bar 4 in the end of threaded rod 4, often A threaded rod 4 is parallel to each other with corresponding bracket and keeps a fixed spacing, and each pedestal 3 is removed to have and is mutually matched with bracket Engaging portion such as protrusion or groove are also equipped with tapped through hole so as to slide on bracket, and inner sidewall has internal screw thread, The external screw thread of itself and threaded rod 4 cooperates；The motion principle of pedestal 3 is that threaded rod rotation is driven by motor, and threaded rod is made For the internal screw thread of pedestal, and it is driven to move forward and backward, and pedestal ensure that pedestal along bracket extension side with bracket engaging portion To movement.

In the end of each threaded rod 4 other than engaging portion for fixing, it is also equipped with micro-machine, which can To drive threaded rod 4 to be rotated, to control the direction back and forth movement that pedestal 3 extends on guide rail along guide rail；In addition to this, Pedestal 3 can also be moved forward and backward on bracket along bracket extending direction using stepper motor driving.

Positional relationship between horizontal support 2 and longitudinal carrier 1 be also it is adjustable, it is transversal due to longitudinal carrier 1 Face is rectangle, and on four faces of rectangle, is both provided with strip groove, the end of horizontal support 2 has and the groove It is adapted protrusion, the two is combined together, and can according to need the positional relationship of adjustment between the two, in place It sets after determining, is fixed by fixation member；It is also possible to longitudinal carrier 1 on the other hand to be fastened in the groove of horizontal support 2.

In addition, being also equipped with test lamp on bracket to determine people's eyes fixation positions；The lamp is small-sized Led light Source, naturally it is also possible to use other kinds of miniature light sources；The light source is redgreenblue Led lamp.According to pre-set journey Sequence shows different colours.

Subject is set to pay close attention to certain light source, capable of just pressing, which occurs, in the light (such as red light) of only certain color takes pictures Button, even if other moment are also that cannot take pictures by lower button.

Fig. 7 b shows the schematic diagram of one group of picture of 9 cameras while bat using above-mentioned acquisition bracket.Upper right in figure Angle is camera serial number, and in addition the lifted image of subject is scaling board, can be demarcated to each camera, which is by one Calibration is completed in conplane scaling board with subject's face.Every Image Acquisition for completing a coordinate just carries out primary Calibration.In the collection process, because the camera being connected with light source needs the movement such as mobile, this will cause the movement of bracket, this One movement causes each camera to be detached from home position.Therefore in order to acquire the accuracy of image, need using a scaling board to each Camera is re-scaled

Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For system disclosed in embodiment Or for device, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method Part illustrates.

It should also be noted that, herein, relational terms such as first and second and the like are used merely to one Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to contain Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Description of the invention is given for the purpose of illustration and description, and is not exhaustively or will be of the invention It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.It selects and retouches It states embodiment and is to more preferably illustrate the principle of the present invention and practical application, and those skilled in the art is enable to manage The solution present invention is to design various embodiments suitable for specific applications with various modifications.

Claims

1. a kind of method that human eye sight position is superimposed with foreground image, this method includes following steps:

S1: finding the eye position of driver, obtain position of human eye in the three-dimensional coordinate of indoor setting camera (DMS camera) coordinate system and Visual lines vector；

S3: by the foreground image in the three-dimensional coordinate of the indoor setting camera coordinates system of the step S1 and the step S2 Superposition forms superimposed image, and the superimposed image includes two layers；Wherein first layer shows real-time pilot's line of vision position；Second The foreground image of prospect camera shooting described in layer real-time display；

Wherein in the step S3, by nominal data, by indoor setting camera (DMS camera) the coordinate system servant in the step S1 Eye position and the visual lines vector are transformed into position of human eye and/or visual lines vector under prospect camera coordinates system；

Wherein using visual lines vector described in neural computing in the step S1.

2. according to the method described in claim 1, obtaining position of human eye in the step S1 in the indoor setting camera coordinates system The method of three-dimensional coordinate, includes the following steps:

S11. prospect camera and indoor setting camera (DMS camera) are fixed, and demarcates the prospect camera and the indoor setting camera Geometry site；

S12. facial image is acquired using the indoor setting camera；

S13. the facial image is passed through into human face characteristic point network model, obtains two inner eye corners on face, two tail of the eyes And five pixel coordinates of nose；And the centre coordinate of eyes is calculated；

S14. by the centre coordinate of the obtained eyes of step S13, the eye position by the neural network is calculated, and is obtained The visual lines vector of the facial image of acquisition；

S15. human eye three-dimensional system of coordinate is calculated by the obtained centre coordinate of the eyes of step S13 and the coordinate of nose To the spin matrix of indoor setting camera coordinates system, and then obtains the position of human eye and sat in the three-dimensional of the indoor setting camera coordinates system Mark.

3. method according to claim 1 to 2, under the prospect camera coordinates system, it is known that the centre coordinate of the eyes With the visual lines vector.Using eye center as starting point, using the sight vector as direction, the sight vector and foreground picture As the intersection point of plane is the blinkpunkt of the driver in foreground image；Wherein coordinate of the human eye under indoor setting camera coordinates system is Point (x, y, z), coordinate of the human eye under prospect camera are Point (u, v, w)；Indoor setting camera goes out according to neural network prediction Sight vector V (rot_x, rot_y, rot_z), sight vector V (rot_u, rot_v, rot_w) under prospect camera；It will be described The coordinate position that coordinate position under indoor setting camera coordinates system transforms under the prospect camera coordinates includes following formula:

Point (u, v, w)=R2* (Point (x, y, z))+T2；

V (rot_u, rot_v, rot_w)=R2*V1 (rot_x, rot_y, rot_z)；

Wherein with Point (u, v, w) for starting point, with V (rot_u, rot_v, rot_w) for direction vector, a ray is constructed, With prospect camera as there are an intersection points for plane, which is sight line point.Wherein human eye coordinates system becomes to indoor setting camera coordinates system The spin matrix [R1 | T1] of change；Spin matrix [R2 | T2] of the indoor setting camera to the prospect camera；R1, T1 are respectively represented The rotation and translation transformation parameter converted from human eye coordinates system to indoor setting camera coordinates system；R2, T2 are respectively represented from indoor setting camera The rotation and translation transformation parameter that coordinate system is converted to prospect camera coordinates system.

4. method according to claim 1 to 3, wherein the eye position in step S1 Jing Guo the neural network calculates, obtain To the visual lines vector of the facial image of acquisition；The neural network includes 5 convolution modules, each convolution mould Block uses ShuffleNet structure；Using head image as input layer, input picture is standardized as to the size of 224*224, then Convolution is carried out by step-length of 2 pixels using the convolution kernel of 3*3, using ReLu activation primitive, the size for obtaining characteristic pattern is 112*112 reuses maximum value pond, down-sampled as step-length progress using 2 pixels, obtains the characteristic pattern having a size of 56*56.

5. according to the method described in claim 4,5 convolution modules have used the net of reshuffling unit (shuffle unit) Network structure；In the right branch of the first shuffle unit module, the characteristic pattern of 56*56 first carries out organizing convolution point by point, then into Row of channels is shuffled (channel shuffle), then carries out depth convolution using 2 pixels as step-length with the convolution kernel of 3*3 (depthwise convolution) then carries out point-by-point group convolution.

6. a kind of system that human eye sight position is superimposed with foreground image, perform claim requires stacking method described in 1-5.

7. a kind of system that human eye sight position is superimposed with foreground image, which includes a computing devices, adopted for extracting one Human eye feature point position in the image of acquisition means acquisition, and calculate the human eye position coordinates；Additionally pass through the human eye Stare position camera calculate it is corresponding stare position coordinates, and calculate human eye in the seat of the indoor setting camera (DMS camera) Mark is the lower position 3D；The computing device includes neural network as claimed in claim 4, which is used for each angle people Position of human eye image, eye gaze position coordinates in face image are input in the neural network and are trained, trained To when inputting face image, human eye sight direction can be accurately exported.

8. system according to claim 7, the acquisition device includes: multiple brackets, and the multiple bracket includes multiple Horizontal support and multiple longitudinal carriers；The multiple camera is fixed on the intersection position of multiple horizontal supports and longitudinal carrier It sets；One rail structure comprising horizontal slide rail and longitudinal slide rail, the rail structure can be on brackets in the horizontal direction and vertically Direction moves freely；One light source is fixed on the crossover location of the horizontal slide rail and the longitudinal slide rail；The human eye position of gaze It sets camera to fix with the light source, so that the human eye is stared position camera and moved with the light source movement.

9. system according to claim 8, the multiple camera includes that the optical axis direction of camera is vertical with the face of people Direction be in 45 ° of direction camera, which ensure that the eye image for taking maximum 90 ° of face torsion.

10. application of the method described in -5 in automatic Pilot and/auxiliary drive according to claim 1.