CN116311481A - Construction method, device and storage medium of enhanced vision estimation model - Google Patents

Construction method, device and storage medium of enhanced vision estimation model Download PDF

Info

Publication number
CN116311481A
CN116311481A CN202310564969.0A CN202310564969A CN116311481A CN 116311481 A CN116311481 A CN 116311481A CN 202310564969 A CN202310564969 A CN 202310564969A CN 116311481 A CN116311481 A CN 116311481A
Authority
CN
China
Prior art keywords
data set
face
original data
probability distribution
distribution function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310564969.0A
Other languages
Chinese (zh)
Other versions
CN116311481B (en
Inventor
谢伟浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shijing Medical Software Co ltd
Original Assignee
Guangzhou Shijing Medical Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shijing Medical Software Co ltd filed Critical Guangzhou Shijing Medical Software Co ltd
Priority to CN202310564969.0A priority Critical patent/CN116311481B/en
Publication of CN116311481A publication Critical patent/CN116311481A/en
Application granted granted Critical
Publication of CN116311481B publication Critical patent/CN116311481B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Ophthalmology & Optometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a construction method, a device and a storage medium of an enhanced vision estimation model, wherein the method comprises the following steps: the method comprises the steps of obtaining an original data set required by building an enhanced vision estimation model, carrying out statistical calculation on the original data set to generate a probability distribution function set, carrying out sample extraction on the original data set to obtain first replacement data sets, screening face images consistent with face attributes from target face data sets to be converted according to the face attributes of each first sample, carrying out face changing on the first replacement data sets by combining a preset face conversion technology to generate a second replacement data set, then combining the original data set, training the enhanced vision estimation model, and outputting the trained enhanced vision estimation model. The invention provides a construction method, a device and a storage medium for an enhanced vision estimation model, which utilize an artificial intelligent face changing technology to improve the performance of the vision estimation method.

Description

Construction method, device and storage medium of enhanced vision estimation model
Technical Field
The present invention relates to the field of man-machine interaction technologies, and in particular, to a method and apparatus for constructing an enhanced gaze estimation model, and a storage medium.
Background
Currently, eye gaze estimation is one of important tasks of eye movement tracking, and has very wide application scenarios, such as man-machine interaction, intelligent driving, emotion analysis, intention recognition and the like. However, in order for the contour-based gaze estimation model to have good generalization performance, training of the model often requires collecting a large amount of eye movement data with high accuracy, comprehensive gaze coverage, various head poses, and various eye contours. However, the current common method for collecting eye movement data adopts a method of actively collecting the eye movement data by actively watching a specified sighting target by a user or a method of passively collecting the eye movement data by means of a model-based eye movement tracking device, so that the collecting process is difficult to control, and the collected data are difficult to meet the requirements of high precision, uniform sight line distribution, various head postures, various eye shapes and the like.
Disclosure of Invention
The invention provides a construction method, a device and a storage medium for an enhanced vision estimation model, which are used for solving the problems of high acquisition difficulty and difficult acquisition of eye movement data in the prior art.
In order to solve the above problems, the present invention provides a method, an apparatus, and a storage medium for constructing an enhanced gaze estimation model, including:
acquiring an original data set required by the construction of an enhanced vision estimation model, and carrying out statistical calculation on the original data set to generate a probability distribution function set corresponding to the original data set; the probability distribution function set comprises a probability distribution function of the head gesture, a probability distribution function of a sight falling point, a probability distribution function of a distance area and a probability distribution function of an offset; the probability distribution function of the distance area is obtained by calculation according to the distribution information of the face distance camera in the original data set; the probability distribution function of the offset is obtained by calculation according to the distribution condition of the center position of the face in the original data set;
according to the probability distribution function set, sampling from the positioned area according to a mode that the number of sampled samples is inversely proportional to the number of samples in the positioned area, and obtaining a first replacement data set;
according to the face attribute of each first sample in the first replacement data set, face images consistent with the face attribute are respectively screened out from a target face data set to be converted and used as face images to be converted corresponding to each first sample; wherein the face attribute includes: age, gender and race information;
according to a preset face conversion technology and a target face set image to be converted, changing the face of the first replacement data set to generate a second replacement data set;
and training the enhanced vision estimation model by utilizing the original data set and the second replacement data set, and outputting the trained enhanced vision estimation model.
As a preferred solution, the statistical calculation is performed on the original data set to generate a probability distribution function set corresponding to the original data set, which specifically includes:
estimating the head pose of the face in the three-dimensional space in each sample image of the original data set according to a preset head pose estimation method to obtain the head pose distribution information of the face in the original data set;
according to the head gesture distribution information and a preset head gesture area division rule, counting the sample size in each head gesture area to obtain a probability distribution function of the head gesture.
As a preferred solution, the statistical calculation is performed on the original data set to generate a probability distribution function set corresponding to the original data set, which specifically includes:
counting the sight line drop point conditions of an original data set, and obtaining sight line drop point distribution information of the original data set; and counting the sample amount in each area according to the line-of-sight falling point distribution information and a preset line-of-sight falling point range division rule so as to obtain a probability distribution function of the line-of-sight falling points.
As a preferred solution, the statistical calculation is performed on the original data set to generate a probability distribution function set corresponding to the original data set, which specifically includes:
estimating the distribution condition of the face distance camera heads in each sample image of the original data set according to a preset monocular distance measuring method, and obtaining the distribution information of the face distance camera heads in each sample image of the original data set;
according to the distribution information of the face and the camera and a preset distance range dividing rule, counting the sample size in each region to obtain a probability distribution function of the distance region.
As a preferred solution, the statistical calculation is performed on the original data set to generate a probability distribution function set corresponding to the original data set, which specifically includes:
according to a preset face detection method, counting the distribution condition of the face in the original data set from the center position to obtain the distribution information of the face in the original data set from the center position;
and counting the sample size of each region according to the distribution information of the face from the center position and a preset maximum offset dividing rule of the face and the camera so as to obtain a probability distribution function of the offset.
Preferably, according to the probability distribution function set, samples are extracted from the located area in a manner that the number of the extracted samples is inversely proportional to the number of the samples in the located area, so as to obtain a first replacement data set, specifically:
randomly extracting a head gesture area from the head gesture probability distribution function, randomly extracting a sight falling point area from the probability distribution function of the sight falling point, randomly extracting a distance area from the probability distribution function of the distance area, and randomly extracting an offset area from the probability distribution function of the offset;
samples are extracted from the located regions in such a way that the number of samples extracted is inversely proportional to the number of samples in the located regions, and the samples extracted from each region are located to the original samples in the original data set to obtain a first replacement set.
As a preferred scheme, according to a preset face conversion technology and a target face set image to be converted, the first replacement data set is subjected to face change to generate a second replacement data set, which specifically includes:
and according to a preset face conversion technology, using a target face set image to be converted, carrying out face change on the face set image in the first replacement data set, and transferring source face identity information of the target face set image to be converted to a target face of the first replacement data set to obtain a second replacement data set.
Preferably, after outputting the trained sight line estimation model, the method further comprises:
and (3) performing deep learning optimization on the trained enhanced vision estimation model by independently adopting an original data set, and outputting the optimized enhanced vision estimation model.
The invention provides a construction device for an enhanced sight estimation model, which comprises the following steps:
the generation module is used for carrying out statistical calculation on the original data set and generating a probability distribution function set corresponding to the original data set;
the extraction module is used for extracting samples from the probability distribution function set to obtain a first replacement data set corresponding to the original data set;
the screening module is used for screening out face images consistent with the face attribute of the first replacement data set from the target face data set to be converted, and taking the face images as target face set images to be converted;
the face changing module is used for changing the face of the first replacement data set according to a preset face changing technology and a target face set image to be converted to generate a second replacement data set;
and the training module is used for training the enhanced vision estimation model by utilizing the original data set and the second replacement data set simultaneously and outputting the trained enhanced vision estimation model.
The invention provides a storage medium, wherein a computer program is stored on the storage medium, and the program is called and executed by a computer to realize a construction method of an enhanced sight estimation model.
The implementation of the invention has the following beneficial effects:
the invention adopts the head posture estimation method, the statistics of video falling points, the monocular ranging method and the face detection method, so that the difficulty of data acquisition can be reduced; based on the original data sets with uneven sight distribution, uneven head gestures and uneven eye shapes, the artificial intelligent face changing technology is utilized, and the target face set is converted to construct a data set with even sight distribution, uneven head gestures and uneven eye shapes, so that the performance of the sight estimating method based on the shapes is improved. Compared with the prior art, the acquisition process is better controlled, and the acquired data can better meet the requirements of high precision, uniform sight line distribution, various head postures, various eye shapes and the like.
Drawings
Fig. 1: a flow diagram of an embodiment of a method for constructing an enhanced gaze estimation model provided by the invention;
fig. 2: a schematic flow chart of an embodiment of a construction device for enhancing a sight line estimation model is provided by the invention;
fig. 3: a flow diagram of one embodiment of a method of enhancing gaze estimation is provided.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Embodiment one:
gaze estimation is one of the important tasks of eye movement tracking, and has very wide application scenarios such as human-computer interaction, intelligent driving, emotion analysis, intention recognition and the like.
The appearance-based sight line estimation method realizes the sight line estimation function mainly by learning the mapping relation between the face information acquired by the camera and the gazing sight line. In order for an enhanced gaze estimation model to have good generalization performance, training of the model often requires collecting a large amount of eye movement data with high accuracy, comprehensive gaze coverage, various head poses, and various eye shapes. However, the current common method for collecting eye movement data adopts a method of actively collecting the eye movement data by actively watching a specified sighting target by a user or a method of passively collecting the eye movement data by means of a model-based eye movement tracking device, so that the collecting process is difficult to control, and the collected data are difficult to meet the requirements of high precision, uniform sight line distribution, various head postures, various eye shapes and the like.
Referring to fig. 1, fig. 1 is a method for constructing an enhanced gaze estimation model according to an embodiment of the present invention, which includes steps S1 to S5:
step S1, acquiring an original data set required by the construction of an enhanced vision estimation model, and carrying out statistical calculation on the original data set to generate a probability distribution function set corresponding to the original data set; the probability distribution function set comprises a probability distribution function of the head gesture, a probability distribution function of a sight falling point, a probability distribution function of a distance area and a probability distribution function of an offset; the probability distribution function of the distance area is obtained by calculation according to the distribution information of the face distance camera in the original data set; the probability distribution function of the offset is obtained by calculation according to the distribution condition of the center position of the face in the original data set.
In this embodiment, the probability distribution function set includes a head gesture probability distribution function, a probability distribution function of a line-of-sight landing point, a probability distribution function of a distance region, and a probability distribution function of an offset, including:
s1.1, acquiring a head gesture probability distribution function, namely estimating the head gesture of a face in a three-dimensional space in each sample image of an original data set according to a preset head gesture estimation method to acquire head gesture distribution information of the face in the original data set; according to the head gesture distribution information and a preset head gesture area division rule, counting the sample size in each head gesture area to obtain a probability distribution function of the head gesture, wherein the probability distribution function comprises the following specific steps of:
inputting an RGB image or an IR image of a character in an original data set into a pre-trained layered prediction network, and predicting to obtain a yaw angle, a pitch angle and a roll angle of the orientation of the face posture so as to estimate the head posture of the face; wherein the hierarchical prediction network comprises: the system comprises a backbone network, a characteristic pyramid network, a dimension reduction module and a layered prediction module; the backbone network is used for extracting image space features with different sizes, the feature pyramid network is used for fusing the image space features with different sizes to obtain fused features, the dimension reduction module is used for reducing dimensions of the fused features in three different dimensions to obtain the space features of the images in three dimensions, and the different dimensions correspond to different image channel numbers; the hierarchical prediction module includes: three full connection layers; the three full-connection layers respectively predict the spatial characteristics of three dimensions, each full-connection layer predicts an angle of the face gesture orientation, so that the image areas concerned by the three angles of the face gesture orientation predicted by the layered prediction network are different, the mutual interference among the three angle predictions is reduced, and finally the head gesture distribution information of the face in the original data set is obtained;
if the model is finally suitable for yaw angles of-45 degrees to +45 degrees, pitch angles of-45 degrees to +45 degrees and roll angles of-20 degrees to +20 degrees, the model can be divided into 324 areas according to the modes of yaw angle intervals of 10 degrees, pitch angle intervals of 10 degrees and roll angle intervals of 10 degrees, then the sample size of a statistical data set in each area is calculated, and the probability distribution function of the head gesture is obtained by dividing the sample size of each area by the total sample number.
According to the embodiment, the head posture distribution information of the face in the original data set is obtained by adopting the head posture estimation method, the head posture of the user in the three-dimensional space can be estimated from the two-dimensional digital image of the original data set, so that a three-dimensional posture deflection angle parameter is obtained, and the face posture distribution information in the original data set can be obtained rapidly and accurately.
S1.2, acquiring a probability distribution function of a sight line falling point, namely acquiring sight line falling point distribution information of an original data set by counting sight line falling point conditions of the original data set; according to the line of sight falling point distribution information and a preset line of sight falling point range division rule, counting the sample quantity in each area to obtain a probability distribution function of the line of sight falling points, wherein the probability distribution function comprises the following specific steps:
and counting the line-of-sight falling point condition of the original data set, if the final falling point of the line of sight is a screen with the width of 1200cm and the height of 900cm, dividing the screen into 108 areas according to the mode of spacing 100cm, counting the sample size of the data set in each area, and dividing the sample size of each area by the total sample number to obtain a probability distribution function of the falling point of the line of sight.
S1.3, acquiring a probability distribution function of a distance region, namely estimating the distribution condition of a face in each sample image of an original data set from a camera head by a preset monocular distance measurement method, and acquiring the distribution information of the face in each sample image of the original data set from the camera head; according to the distribution information of the face from the camera and a preset distance range dividing rule, counting the sample size in each region to obtain a probability distribution function of the distance region, wherein the probability distribution function specifically comprises the following steps:
the method of monocular distance measurement is adopted, and the distance between the face and the camera is obtained through the pixel size and the real size of the cornea (the diameter of the cornea is 11.8 mm) and the pixel focal length of the camera;
if the model is finally applied to a region ranging from 20cm to 60cm from the screen, 4 regions may be divided at intervals of 10cm, and then the sample size of the data set in each region is counted, and the probability distribution function of the distance region is obtained by dividing the sample size of each region by the total sample size.
According to the embodiment, the monocular distance measurement method is adopted to obtain the distribution information of the face distance cameras in the original data set, the face images of the original data set can be acquired by using the cameras of the electronic equipment, and the cornea information in the face images is obtained by carrying out human eye detection and cornea positioning on the images; and then combining the actual cornea and the focal length of the camera, calculating the distance from the human eyes to the screen by adopting a geometric similarity method in monocular distance measurement, and effectively reducing the statistical cost for acquiring the distribution information of the human face from the camera in the original data set.
S1.4, acquiring a probability distribution function of the offset, namely counting the distribution condition of the face from the center position in the original data set by a preset face detection method to obtain the distribution information of the face from the center position in the original data set; according to the distribution information of the human face from the center position and a preset maximum offset dividing rule of the human face and the camera, counting the sample size of each region to obtain a probability distribution function of the offset, wherein the probability distribution function comprises the following specific steps:
detecting the position of the center of the detection frame on the picture by using a face detection model retinaface to obtain the offset of the face relative to the camera;
if the range of the offset of the face relative to the camera on the original face picture of 1280×720 is: the horizontal axis 400 to 880 and the vertical axis 200 to 520 may be divided into 40 intervals according to 60 pixels, and then the sample size of the data set in each region is counted, and the probability distribution function of the offset is obtained by dividing the sample size of each region by the total sample size.
According to the face detection method, the distribution condition of the face center position in the original data set is obtained, the position and the size of the face can be judged by using the obvious features in the face image, the condition that a plurality of pieces of face image information of the original data set are continuously identified can be simultaneously met, and the convenience for obtaining the distribution condition of the face center position in the original data set can be effectively improved.
And S2, extracting samples from the positioned area according to the probability distribution function set in a mode that the number of the extracted samples is inversely proportional to the number of the samples in the positioned area, and acquiring a first replacement data set.
In the embodiment, a head gesture area is randomly extracted from a head gesture probability distribution function, a sight drop point area is randomly extracted from a sight drop point probability distribution function, a distance area is randomly extracted from a distance area probability distribution function, and an offset area is randomly extracted from an offset probability distribution function; according to the mode that the number of the extracted samples is inversely proportional to the sample quantity in the located area, one sample is respectively extracted from the located area to respectively obtain samples a, b, c, d, one original sample in the original data set is located according to the head gesture distribution area corresponding to the sample a, the sight drop point distribution area corresponding to the b, the face distance camera distribution area corresponding to the c and the face distance center position distribution area corresponding to the d, and a plurality of samples are extracted to be located to obtain a first replacement data set from the original data set. The sample extraction method adopted by the embodiment can effectively ensure the universality and the effectiveness of the sample sources in the first replacement data set.
Step S3, according to the face attribute of each first sample in the first replacement data set, respectively screening face images consistent with the face attribute from the target face data set to be converted, and taking the face images as the face images to be converted corresponding to each first sample; the face attribute comprises: age, gender and race information.
In this embodiment, according to age, sex and race information of each first sample in the first replacement data set, face images with face attributes consistent with each first sample in the first replacement data set are respectively screened from the target face data set to be converted, and are used as face set images a to be converted corresponding to each first sample in the first replacement data set.
And S4, changing the face of the first replacement data set according to a preset face changing technology and a target face set image to be converted, and generating a second replacement data set.
In this embodiment, according to a preset face conversion technology, the face set image a to be converted in step S3 is used to change the face of the face set image in the first replacement data set, and the source face identity information of the target face set image to be converted is migrated to the target face of the first replacement data set, so as to obtain the second replacement data set.
It should be noted that, the "face changing technology" is not a technology means for changing a face in the real world, for example SimSwap, and the method can effectively retain attributes of a face in the first alternative dataset, including pose, expression, eye spirit, and the like, and meanwhile, migrate a face Identity (ID) in a target face set to be converted to the first alternative dataset to obtain the second alternative dataset.
The present embodiment adopts a face transformation technique to replace the first replacement data set to generate a second replacement data set, namely: and migrating the identity information of the source face of the target face set image to be converted to the target face of the first replacement data set to generate a second replacement data set, wherein the identity information of the face can be changed under the condition of keeping the sight feature.
And S5, training the enhanced vision estimation model by utilizing the original data set and the second replacement data set at the same time, and outputting the trained enhanced vision estimation model.
In this embodiment, the original data set and the second alternative data set are combined, and then the enhanced line of sight estimation model is trained, and the trained enhanced line of sight estimation model is output.
After the trained enhanced vision estimation model is output, the method further comprises the step of performing deep learning optimization on the trained enhanced vision estimation model by independently adopting the original data set, and outputting the optimized enhanced vision estimation model. In the embodiment, the enhanced vision estimation model is finely adjusted by utilizing the parameters of the original data set, and the model is inaccurate due to the fact that the deviation exists in face conversion after the second replacement data set is added into the original data set to be combined and trained, and the problem that the model is inaccurate can be solved by further finely adjusting the accurate data due to the fact that a small amount of noise is introduced.
Referring to fig. 2, based on the same inventive concept as the above embodiment, an embodiment of the present invention provides a device for constructing an enhanced gaze estimation model, including:
the generating module 10 is used for carrying out statistical calculation on the original data set and generating a probability distribution function set corresponding to the original data set;
an extraction module 20, configured to extract a sample from the probability distribution function set, to obtain a first alternative data set corresponding to the original data set;
the screening module 30 is configured to screen, from the target face data set to be converted, a face image having a face attribute consistent with that of the first replacement data set, as a target face set image to be converted;
the face changing module 40 is configured to change the face of the first replacement data set according to a preset face changing technology and a target face set image to be converted, so as to generate a second replacement data set;
the training module 50 is configured to train the enhanced gaze estimation model by using the original data set and the second alternative data set at the same time, and output the trained enhanced gaze estimation model.
In one embodiment, the generating module 10 is further configured to:
estimating the head pose of the face in the three-dimensional space in each sample image of the original data set according to a preset head pose estimation method to obtain head pose distribution information of the face in the original data set;
according to the head gesture distribution information and a preset head gesture area division rule, counting the sample size in each head gesture area to obtain a probability distribution function of the head gesture.
Counting the sight line drop point conditions of the original data set to obtain sight line drop point distribution information of the original data set; and counting the sample amount in each area according to the line-of-sight falling point distribution information and a preset line-of-sight falling point range division rule so as to obtain a probability distribution function of the line-of-sight falling points.
Estimating the distribution condition of the face distance camera heads in each sample image of the original data set according to a preset monocular distance measuring method, and obtaining the distribution information of the face distance camera heads in each sample image of the original data set;
according to the distribution information of the face and the camera and a preset distance range dividing rule, counting the sample size in each region to obtain a probability distribution function of the distance region.
And counting the sample size of each region according to the distribution information of the face from the center position and a preset maximum offset dividing rule of the face and the camera so as to obtain a probability distribution function of the offset.
In one embodiment, the extraction module 20 is further configured to:
according to the probability distribution function set, extracting samples from the positioned area according to a mode that the number of the extracted samples is inversely proportional to the number of the samples in the positioned area, and acquiring a first replacement data set, wherein the method specifically comprises the following steps:
randomly extracting a head gesture area from the head gesture probability distribution function, randomly extracting a sight falling point area from the probability distribution function of the sight falling point, randomly extracting a distance area from the probability distribution function of the distance area, and randomly extracting an offset area from the probability distribution function of the offset;
samples are extracted from the located regions in such a way that the number of samples extracted is inversely proportional to the number of samples in the located regions, and the samples extracted from each region are located to the original samples in the original data set to obtain a first replacement set.
In one embodiment, the screening module 30 is further configured to:
according to the face attribute of each first sample in the first replacement data set, face images consistent with the face attribute are respectively screened out from the target face data set to be converted and used as face images to be converted corresponding to each first sample; the face attribute comprises: age, gender and race information.
In one embodiment, the face module 40 is further configured to:
and according to a preset face conversion technology, using a target face set image to be converted, carrying out face change on the face set image in the first replacement data set, and transferring source face identity information of the target face set image to be converted to a target face of the first replacement data set to obtain a second replacement data set.
In one embodiment, training module 50 is further to:
and (3) performing deep learning optimization on the trained enhanced vision estimation model by independently adopting the original data set, and outputting the optimized enhanced vision estimation model.
Referring to fig. 3, based on the same inventive concept as the above embodiment, an embodiment of the present invention provides a method for enhancing line of sight estimation, including:
step S10, an original data set required by the construction of an enhanced vision estimation model is obtained, statistical calculation is carried out on the original data set, and a probability distribution function set corresponding to the original data set is generated; the probability distribution function set comprises a probability distribution function of the head gesture, a probability distribution function of a sight falling point, a probability distribution function of a distance area and a probability distribution function of an offset; the probability distribution function of the distance area is obtained by calculation according to the distribution information of the face distance camera in the original data set; the probability distribution function of the offset is obtained by calculation according to the distribution condition of the center position of the face in the original data set.
Step S20, according to the probability distribution function set, samples are extracted from the positioned area in a mode that the number of the extracted samples is inversely proportional to the number of the samples in the positioned area, and a first replacement data set is obtained.
Step S30, according to the face attribute of each first sample in the first replacement data set, respectively screening face images consistent with the face attribute from the target face data set to be converted, and taking the face images as the face images to be converted corresponding to each first sample; the face attribute comprises: age, gender and race information.
Step S40, according to a preset face conversion technology and a target face set image to be converted, the first replacement data set is subjected to face change, and a second replacement data set is generated.
In this embodiment, by changing the face of the first replacement data set, the generated second replacement data set generates a new face set based on the original data set and the target face set to be converted, so that the identity information of the face is changed under the condition of keeping the original line of sight characteristics, and the function of enhancing the eye movement data of the line of sight estimation is played.
Correspondingly, the embodiment of the invention also provides a computer readable storage medium, which comprises a stored computer program, wherein the computer program runs to control a construction method of an enhanced vision estimation model executed by equipment where the computer readable storage medium is positioned;
wherein the method of constructing the enhanced gaze estimation model, if implemented in the form of a software functional unit and used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the method for constructing the enhanced vision estimation model provided by the embodiment of the invention adopts the head posture estimation method, the statistics of the vision falling points, the monocular ranging method and the face detection method, so that the difficulty of data acquisition can be reduced; based on the original data sets with uneven sight distribution, uneven head gestures and uneven eye shapes, the artificial intelligent face changing technology is utilized, and the target face set is converted to construct a data set with even sight distribution, uneven head gestures and uneven eye shapes, so that the performance of the sight estimating method based on the shapes is improved. Compared with the prior art, the acquisition process is better controlled, and the acquired data can better meet the requirements of high precision, uniform sight line distribution, various head postures, various eye shapes and the like.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and substitutions can be made by those skilled in the art without departing from the technical principles of the present invention, and these modifications and substitutions should also be considered as being within the scope of the present invention.

Claims (10)

1. A method of constructing an enhanced gaze estimation model, comprising:
acquiring an original data set required by the construction of an enhanced vision estimation model, and carrying out statistical calculation on the original data set to generate a probability distribution function set corresponding to the original data set; the probability distribution function set comprises a probability distribution function of the head gesture, a probability distribution function of a sight falling point, a probability distribution function of a distance area and a probability distribution function of an offset; the probability distribution function of the distance area is obtained by calculation according to the distribution information of the face distance camera in the original data set; the probability distribution function of the offset is obtained by calculation according to the distribution condition of the center position of the face in the original data set;
according to the probability distribution function set, sampling from the positioned area according to a mode that the number of sampled samples is inversely proportional to the number of samples in the positioned area, and obtaining a first replacement data set;
according to the face attribute of each first sample in the first replacement data set, face images consistent with the face attribute are respectively screened out from a target face data set to be converted and used as face images to be converted corresponding to each first sample; wherein the face attribute includes: age, gender and race information;
according to a preset face conversion technology and a target face set image to be converted, changing the face of the first replacement data set to generate a second replacement data set;
and training the enhanced vision estimation model by utilizing the original data set and the second replacement data set, and outputting the trained enhanced vision estimation model.
2. The method for constructing an enhanced line-of-sight estimation model according to claim 1, wherein the performing statistical calculation on the original data set generates a probability distribution function set corresponding to the original data set, specifically:
estimating the head pose of the face in the three-dimensional space in each sample image of the original data set according to a preset head pose estimation method to obtain the head pose distribution information of the face in the original data set;
according to the head gesture distribution information and a preset head gesture area division rule, counting the sample size in each head gesture area to obtain a probability distribution function of the head gesture.
3. The method for constructing an enhanced line-of-sight estimation model according to claim 1, wherein the performing statistical calculation on the original data set generates a probability distribution function set corresponding to the original data set, specifically:
counting the sight line drop point conditions of an original data set, and obtaining sight line drop point distribution information of the original data set; and counting the sample amount in each area according to the line-of-sight falling point distribution information and a preset line-of-sight falling point range division rule so as to obtain a probability distribution function of the line-of-sight falling points.
4. The method for constructing an enhanced line-of-sight estimation model according to claim 1, wherein the performing statistical calculation on the original data set generates a probability distribution function set corresponding to the original data set, specifically:
estimating the distribution condition of the face distance camera heads in each sample image of the original data set according to a preset monocular distance measuring method, and obtaining the distribution information of the face distance camera heads in each sample image of the original data set;
according to the distribution information of the face and the camera and a preset distance range dividing rule, counting the sample size in each region to obtain a probability distribution function of the distance region.
5. The method for constructing an enhanced line-of-sight estimation model according to claim 1, wherein the performing statistical calculation on the original data set generates a probability distribution function set corresponding to the original data set, specifically:
according to a preset face detection method, counting the distribution condition of the face in the original data set from the center position to obtain the distribution information of the face in the original data set from the center position;
and counting the sample size of each region according to the distribution information of the face from the center position and a preset maximum offset dividing rule of the face and the camera so as to obtain a probability distribution function of the offset.
6. The method of building an enhanced line of sight estimation model according to claim 1, wherein, according to the set of probability distribution functions, samples are extracted from the located region in such a way that the number of samples extracted is inversely proportional to the number of samples in the located region, obtaining a first set of replacement data, in particular:
randomly extracting a head gesture area from the head gesture probability distribution function, randomly extracting a sight falling point area from the probability distribution function of the sight falling point, randomly extracting a distance area from the probability distribution function of the distance area, and randomly extracting an offset area from the probability distribution function of the offset;
samples are extracted from the located regions in such a way that the number of samples extracted is inversely proportional to the number of samples in the located regions, and the samples extracted from each region are located to the original samples in the original data set to obtain a first replacement set.
7. The method for constructing an enhanced line-of-sight estimation model according to claim 1, wherein the face changing is performed on the first replacement data set according to a preset face transformation technology and a target face set image to be converted, and the second replacement data set is generated, specifically:
and according to a preset face conversion technology, using a target face set image to be converted, carrying out face change on the face set image in the first replacement data set, and transferring source face identity information of the target face set image to be converted to a target face of the first replacement data set to obtain a second replacement data set.
8. The method of building an enhanced line of sight estimation model of claim 1, further comprising, after outputting the trained line of sight estimation model:
and (3) performing deep learning optimization on the trained enhanced vision estimation model by independently adopting an original data set, and outputting the optimized enhanced vision estimation model.
9. An apparatus for constructing an enhanced gaze estimation model, comprising:
the generation module is used for carrying out statistical calculation on the original data set and generating a probability distribution function set corresponding to the original data set;
the extraction module is used for extracting samples from the probability distribution function set to obtain a first replacement data set corresponding to the original data set;
the screening module is used for screening out face images consistent with the face attribute of the first replacement data set from the target face data set to be converted, and taking the face images as target face set images to be converted;
the face changing module is used for changing the face of the first replacement data set according to a preset face changing technology and a target face set image to be converted to generate a second replacement data set;
and the training module is used for training the enhanced vision estimation model by utilizing the original data set and the second replacement data set simultaneously and outputting the trained enhanced vision estimation model.
10. A storage medium, wherein a computer program is stored on the storage medium, and the computer program is called and executed by a computer, to implement the method for constructing the enhanced gaze estimation model according to any one of claims 1 to 8.
CN202310564969.0A 2023-05-19 2023-05-19 Construction method, device and storage medium of enhanced vision estimation model Active CN116311481B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310564969.0A CN116311481B (en) 2023-05-19 2023-05-19 Construction method, device and storage medium of enhanced vision estimation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310564969.0A CN116311481B (en) 2023-05-19 2023-05-19 Construction method, device and storage medium of enhanced vision estimation model

Publications (2)

Publication Number Publication Date
CN116311481A true CN116311481A (en) 2023-06-23
CN116311481B CN116311481B (en) 2023-08-25

Family

ID=86780139

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310564969.0A Active CN116311481B (en) 2023-05-19 2023-05-19 Construction method, device and storage medium of enhanced vision estimation model

Country Status (1)

Country Link
CN (1) CN116311481B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647800A (en) * 2019-08-06 2020-01-03 广东工业大学 Eye contact communication detection method based on deep learning
WO2020135535A1 (en) * 2018-12-29 2020-07-02 华为技术有限公司 Recommendation model training method and related apparatus
CN111582059A (en) * 2020-04-20 2020-08-25 哈尔滨工程大学 Facial expression recognition method based on variational self-encoder
CN112949535A (en) * 2021-03-15 2021-06-11 南京航空航天大学 Face data identity de-identification method based on generative confrontation network
CN113947794A (en) * 2021-10-22 2022-01-18 浙江大学 Fake face changing enhancement detection method based on head posture deviation correction
WO2022143398A1 (en) * 2020-12-29 2022-07-07 华为技术有限公司 Three-dimensional model generation method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020135535A1 (en) * 2018-12-29 2020-07-02 华为技术有限公司 Recommendation model training method and related apparatus
CN110647800A (en) * 2019-08-06 2020-01-03 广东工业大学 Eye contact communication detection method based on deep learning
CN111582059A (en) * 2020-04-20 2020-08-25 哈尔滨工程大学 Facial expression recognition method based on variational self-encoder
WO2022143398A1 (en) * 2020-12-29 2022-07-07 华为技术有限公司 Three-dimensional model generation method and device
CN112949535A (en) * 2021-03-15 2021-06-11 南京航空航天大学 Face data identity de-identification method based on generative confrontation network
CN113947794A (en) * 2021-10-22 2022-01-18 浙江大学 Fake face changing enhancement detection method based on head posture deviation correction

Also Published As

Publication number Publication date
CN116311481B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
CN107909061B (en) Head posture tracking device and method based on incomplete features
US8636361B2 (en) Learning-based visual attention prediction system and method thereof
CN103677274B (en) A kind of interaction method and system based on active vision
Martin et al. Scangan360: A generative model of realistic scanpaths for 360 images
JP2023509953A (en) Target tracking method, device, electronic device and storage medium
WO2020042542A1 (en) Method and apparatus for acquiring eye movement control calibration data
CN102831382A (en) Face tracking apparatus and method
KR20200130440A (en) A method for identifying an object in an image and a mobile device for executing the method (METHOD FOR IDENTIFYING AN OBJECT WITHIN AN IMAGE AND MOBILE DEVICE FOR EXECUTING THE METHOD)
JP5225870B2 (en) Emotion analyzer
CN105912126B (en) A kind of gesture motion is mapped to the adaptive adjusting gain method at interface
CN106846372B (en) Human motion quality visual analysis and evaluation system and method thereof
Deng et al. Learning from images: A distillation learning framework for event cameras
CN110245660B (en) Webpage glance path prediction method based on saliency feature fusion
CN111897433A (en) Method for realizing dynamic gesture recognition and control in integrated imaging display system
CN109544584B (en) Method and system for realizing inspection image stabilization precision measurement
CN116311481B (en) Construction method, device and storage medium of enhanced vision estimation model
CN113065506A (en) Human body posture recognition method and system
TWI478099B (en) Learning-based visual attention prediction system and mathod thereof
CN116382473A (en) Sight calibration, motion tracking and precision testing method based on self-adaptive time sequence analysis prediction
Yang et al. vGaze: Implicit saliency-aware calibration for continuous gaze tracking on mobile devices
CN110275608B (en) Human eye sight tracking method
KR101326644B1 (en) Full-body joint image tracking method using evolutionary exemplar-based particle filter
CN112132864A (en) Robot following method based on vision and following robot
Xiang et al. Single camera based gait analysis method with scaled and image coordinate key-points
JP2012226403A (en) Image area tracking device, image area tracking method, and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant