CN115509351B - Sensory linkage situational digital photo frame interaction method and system - Google Patents

Sensory linkage situational digital photo frame interaction method and system Download PDF

Info

Publication number
CN115509351B
CN115509351B CN202211130909.XA CN202211130909A CN115509351B CN 115509351 B CN115509351 B CN 115509351B CN 202211130909 A CN202211130909 A CN 202211130909A CN 115509351 B CN115509351 B CN 115509351B
Authority
CN
China
Prior art keywords
image
photo frame
digital photo
picture
human eye
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211130909.XA
Other languages
Chinese (zh)
Other versions
CN115509351A (en
Inventor
李顺
王晓帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Goodview Electronic Technology Co ltd
Original Assignee
Shanghai Goodview Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Goodview Electronic Technology Co ltd filed Critical Shanghai Goodview Electronic Technology Co ltd
Priority to CN202211130909.XA priority Critical patent/CN115509351B/en
Publication of CN115509351A publication Critical patent/CN115509351A/en
Application granted granted Critical
Publication of CN115509351B publication Critical patent/CN115509351B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/197Matching; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Ophthalmology & Optometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to the technical field of digital photo frame interaction, and discloses a sense organ linkage situational digital photo frame interaction method and system, wherein the method comprises the following steps: detecting the facial posture in the binarized face image, if an inclined face is detected, rotating the picture in the digital photo frame, detecting the size of the picture and the size of a display screen of the digital photo frame, and displaying the picture with the size smaller than the size of the display screen of the digital photo frame in a full screen manner; detecting the position of a pupil based on a pupil detection algorithm; determining a region watched by human eyes by using an electronic picture interest region detection algorithm based on pupil positions, and if a user watches for a long time, amplifying the watching region; the digital photo frame automatically selects and pushes the pictures similar to the scenes of the picture amplification areas. The method realizes the self-adaptive rotation processing of the digital photo frame picture based on the facial posture perception and the amplification processing of the viewer interest focus picture area based on the eye pupil perception.

Description

Sensory linkage situational digital photo frame interaction method and system
Technical Field
The invention relates to the technical field of digital photo frame interaction, in particular to a sense organ linkage situational digital photo frame interaction method and system.
Background
Digital photo frames have been introduced into many user homes as consumer terminals. The basic function of the digital photo frame is used for displaying and playing pictures. At present, picture playing and the like of the digital photo frame are realized through touch keys or mechanical keys and the like, and human-computer interaction operation experience is limited through manual accurate positioning operation. The existing digital photo frame only supports the automatic display of electronic photos, and lacks the capabilities of monitoring and identifying the interest focus of a viewer, intelligently adjusting the photo frame and optimizing the display. Therefore, the patent provides a sensory linkage contextual digital photo frame interaction method and system aiming at the problem.
Disclosure of Invention
In view of the above, the invention provides a sensory linkage contextual digital photo frame interaction method, which aims to (1) determine a shooting model by combining a face region image, a camera and a world coordinate system, determine a left-right head swing angle of a face posture by means of the eye region image, and when the left-right head swing angle of the face posture is larger, indicate that a user face is inclined and indicate that the user looks at a picture in a digital photo frame in an inclined manner, so that the picture is subjected to adaptive rotation processing, wherein a rotation angle is the inclination angle of the user, and the digital photo frame picture adaptive rotation processing based on face posture sensing is realized; (2) The method comprises the steps of detecting the position of a pupil in an eye region image based on a hole detection algorithm, establishing a corresponding relation between a light spot intersection point and the center point of the eye region based on the position coordinates of the light spot in the pupil, obtaining the eye region, and accordingly enlarging the viewer interest focus image region based on eye pupil perception.
The invention provides a sensory linkage situational digital photo frame interaction method, which comprises the following steps:
s1: the digital photo frame utilizes a camera to shoot to obtain a face image, and the face image is preprocessed to obtain a preprocessed face image, wherein the preprocessing method comprises binarization processing and human eye area image extraction;
s2: detecting the facial posture in the binarized face image, if an inclined face is detected, rotating the picture in the digital photo frame, detecting the size of the picture and the size of a display screen of the digital photo frame, and displaying the picture with the size smaller than the size of the display screen of the digital photo frame in a full screen manner;
s3: detecting the position of a pupil in the human eye region image based on a pupil detection algorithm;
s4: determining a region watched by human eyes by using an electronic picture interest region detection algorithm based on pupil positions, and amplifying the picture in the region when the time of watching the region by a user exceeds a preset threshold;
s5: the digital photo frame automatically selects and pushes the pictures similar to the pictures in the enlarged area scene.
As a further improvement of the method:
optionally, in the step S1, the digital photo frame is shot by a camera to obtain a face image, including:
the digital photo frame comprises a display screen, a camera and a wireless communication module, wherein the display screen is used for displaying an electronic picture, the camera is used for shooting and capturing human eye images, and the wireless communication module is used for acquiring electronic picture data from a cloud end and uploading the captured human eye images to a computer terminal; the four vertex positions of the digital photo frame are provided with infrared light sources which can emit infrared light;
in the specific embodiment of the invention, the computer terminal can send a control instruction to the digital photo frame, the computer terminal determines the facial posture and the eye watching area based on the received human face image and the electronic picture displayed by the digital photo frame, and controls the digital photo frame to amplify the eye watching area or rotate the picture in the digital photo frame based on the control instruction;
the user can control whether the camera is started or not by himself, when the user selects to start the camera, the infrared light sources positioned at the four vertexes of the digital photo frame emit infrared light, the digital photo frame utilizes the camera to shoot a face image, the shot face image is a time sequence image, and the face image set is { I } t |t∈[t 0 ,t e ]In which I t For face images taken at any time t, t 0 Indicating the initial moment, t, of the camera shot e The time interval between adjacent moments is delta t.
Optionally, the binarizing processing is performed on the face image in the step S1 to obtain a binarized face image, and the binarizing processing includes:
the arbitrary face image I t The binarization processing flow is as follows:
s11: for face image I t Performing graying processing on all pixel points to obtain the gray values of the pixel points, and taking the gray values of the pixel points as pixel values to obtain a human face image subjected to graying processing, wherein the graying processing formula is as follows:
Figure BDA0003849092740000021
wherein:
I t,g (x, y) denotes a face image I t Middle x row y column pixel point I t Grey scale value of (x, y), i.e. pixel point I t A pixel value of (x, y);
R t (x,y),G t (x,y),B t (x, y) respectively represent pixel points I t (x, y) values on R, G, B color channels;
s12: carrying out gray stretching processing on the face image after the gray processing, wherein the formula of the gray stretching processing is as follows:
Figure BDA0003849092740000022
wherein:
g t (x, y) representation gray level stretched pixel point I t A pixel value of (x, y);
I′ t,min (x, y) represents a face image I 'after gradation processing' t Of minimum pixel value, l' t,max (x, y) represents a face image I 'after gradation processing' t The maximum pixel value of (a);
s13: initializing threshold values
Figure BDA0003849092740000023
S14: stretching the gray scale of the human face image I t The pixels of (1) are divided into foreground pixels and background pixels, wherein the pixels with pixel values smaller than a threshold value are divided into foreground pixels, and the pixels with pixel values larger than or equal to the threshold value are divided into background pixels;
respectively calculating the average pixel values of a foreground pixel and a background pixel, wherein the average pixel value of the foreground pixel is m 1
The average pixel value of the background pixel is m 2
S15: updating a threshold value
Figure BDA0003849092740000024
S16: repeating the steps S14-S15 until the updated threshold is the same as the original threshold to obtain the final binary threshold
Figure BDA0003849092740000025
Stretching the gray scale of the human face image I t Medium low and highIn the binarization threshold->
Figure BDA0003849092740000026
Is set to 0 above a binarization threshold value->
Figure BDA0003849092740000027
The pixel value of (2) is set to 255, and a face image after binarization processing is obtained.
Optionally, the extracting an eye region image from the binarized face image in the step S1 to obtain an eye region image includes:
extracting a human eye region image from the face image subjected to binarization processing to obtain a human eye region image, wherein the human eye region image extraction process comprises the following steps:
constructing an image extraction model of a human eye region, wherein the image extraction model of the human eye region is formed by cascading n human eye region detection classification models, the input of the human eye region detection classification model is an image region, the output of the human eye region detection classification model is the detection classification result { -1, +1} of the image region, when the output result of the human eye region detection classification model is-1, the input image region is not the human eye image region, when the output result of the human eye region detection classification model is +1, the input image region is the human eye image region, and the training process of the human eye region detection classification model is as follows:
the method comprises the steps of collecting a plurality of human eye region image samples and non-human eye region image samples to train a human eye region detection classification model, extracting sample characteristics by the human eye region detection classification model, detecting and classifying the sample characteristics, and optimizing parameters of the model by taking minimum mean square error of the sample classification as a target;
according to the human eye region detection classification model after parameter optimization, calculating the weight w of any ith human eye region detection classification model i
Figure BDA0003849092740000028
Wherein:
err i representing the number of classification error samples of the ith personal eye region detection classification model after parameter optimization, all representing the total number of samples, i is equal to [1, n ]];
And to the weight w i Carrying out normalization processing, wherein the formula of the normalization processing is as follows:
Figure BDA0003849092740000031
wherein:
w min minimum weight, w, representing n human eye region detection classification models max Maximum weight representing n human eye region detection classification models;
Figure BDA0003849092740000032
normalized weights representing an ith personal eye region detection classification model;
performing the parameter optimization and weight calculation on the n-numbered personal eye region detection classification models to obtain n-numbered personal eye region detection classification models and corresponding normalized weight sets
Figure BDA0003849092740000033
Wherein f is i () represents the ith personal eye region detection classification model;
the cascade combination of the human eye region detection classification models is carried out according to the following formula:
Figure BDA0003849092740000034
Figure BDA0003849092740000035
wherein:
h (x) represents the human eye region image extraction model after cascade combination;
i represents input image data;
dividing the binarized face image into a plurality of sub-images, wherein the size of each sub-image is the size of a normal eye region; and inputting the divided sub-images into a human eye region image extraction model, and if the model output is +1, indicating that the sub-images are the human eye region images which are binary images.
Optionally, the step S2 of detecting the facial pose in the binarized human face image, if an inclined face is detected, performing adaptive rotation processing on the picture in the digital photo frame, and displaying the picture with a size smaller than the size of the display screen of the digital photo frame in a full screen manner, includes:
detecting the facial pose in the binarized face image, wherein the facial pose detection process comprises the following steps:
s21: selecting all pixel points of the eye region image to obtain a pixel point set of the eye region image:
{(x j ,y j )|j∈[1,N]}
wherein:
(x j ,y j ) Representing the coordinates of the jth pixel point in the eye region image, and N representing the total number of the pixel points in the eye region image;
s22: constructing a camera shooting model:
Figure BDA0003849092740000036
wherein:
(f X ,f Y ) Denotes the focal length of the camera, f X Representing the focal length of the camera in the horizontal direction, f Y Represents the focal length of the camera in the vertical direction, (c) X ,c Y ) The focal points respectively correspond to the horizontal direction and the vertical direction;
(X ', Y') represents the pixel coordinates of the photographed object in the image, and (X, Y, Z) represents the coordinates of the photographed object in the world coordinate system;
Figure BDA0003849092740000037
represents a rotation matrix, <' > or>
Figure BDA0003849092740000038
Representing a translation matrix, wherein a rotation matrix and the translation matrix in the model are variables to be solved;
s23: substituting the collected pixel point set of the human eye area image and the corresponding coordinate under the world coordinate system into a camera shooting model to obtain a rotation matrix r' of the human eye area image:
Figure BDA0003849092740000041
Figure BDA0003849092740000042
wherein:
α represents a pitch angle of the face pose, which reflects the face looking up or down;
beta represents the yaw angle of the face pose, which reflects the left and right turn of the face;
gamma represents the roll angle of the face pose, reflecting the yaw of the face;
s24: solving to obtain the roll angle gamma of the facial posture:
Figure BDA0003849092740000043
if gamma is larger than 15 degrees, the face of the user inclines rightwards, and the digital photo frame automatically rotates the displayed picture anticlockwise by gamma degrees;
if gamma < -15 degrees, the face of the user inclines to the left, and the digital photo frame automatically rotates the displayed picture clockwise by gamma degrees;
the digital photo frame detects the size format of the electronic picture to be displayed in real time, and if the picture with the size smaller than the display screen size of the digital photo frame is displayed in a full screen mode, the full screen display process comprises the following steps: and amplifying the electronic picture to be displayed to the size of the display screen of the digital photo frame, and filling the missing pixels in the amplified electronic picture by utilizing a nearest point interpolation algorithm.
Optionally, the detecting, in the step S3, the position of the pupil in the extracted image of the eye region by using a pupil detection algorithm includes:
when the roll angle of the face of the user is between [ -15 °,15 ° ] showing that the image shot by the camera is a face image without facial deviation, the pupil position in the extracted eye region image is detected by using a pupil detection algorithm, and the detection flow of the pupil position is as follows:
s31: calculating any pixel point (x) in the human eye region image j ,y j ) Gradient g of j
Figure BDA0003849092740000044
Wherein:
g′(x j ,y j ) Representing pixel points (x) in the pre-processed eye region image j ,y j ) A pixel value of (a);
s32: constructing an objective function solved by the pupil position center to obtain
Figure BDA0003849092740000045
The maximum pixel point coordinate (x) L ,y L ) As the pupil center:
Figure BDA0003849092740000046
h jL =(x j -x L ,y j -y L )
wherein:
h jL is the displacement vector of the pupil to be detected;
(x L ,y L ) Position coordinates of the center of the pupil of the eye;
t represents transposition, and omega represents a pixel point set in the human eye area image;
will (x) L ,y L ) The radius of the pupil of the eye is set to xi as the center of the circle, and is obtained as (x) L ,y L ) And a circular area with the circle center and xi as the radius is used as the eye pupil position area.
Optionally, the determining, in step S4, a region watched by the human eye by using an electronic picture region of interest detection algorithm based on the pupil position includes:
determining a region watched by human eyes by using an electronic picture region-of-interest detection algorithm based on pupil positions, wherein the human eye watching region determination process of the electronic picture region-of-interest detection algorithm comprises the following steps:
s41: the eye pupil position area is corresponded to an original face image, the eye pupil position area in the original face image is the original eye pupil area, infrared light emitted by infrared light sources at four vertexes of the digital photo frame forms four light spots in the original eye pupil position area, and the position coordinates of the four light spots are respectively
Figure BDA0003849092740000051
Wherein->
Figure BDA0003849092740000052
Represents the coordinates of two light spots on the upper part in the pupil area of the original eye, and is used for judging whether the pupil area of the eye is normal or normal>
Figure BDA0003849092740000053
Represents the left spot coordinate, and>
Figure BDA0003849092740000054
the coordinates of the right spot are indicated,
Figure BDA0003849092740000055
represents the coordinates of two light spots in the lower part of the pupil area of the original eye part>
Figure BDA0003849092740000056
Represents the left spot coordinate, and>
Figure BDA0003849092740000057
representing the right spot coordinates;
s42: respectively calculate
Figure BDA0003849092740000058
And->
Figure BDA0003849092740000059
Cross ratio value V of 12 ,V 23 And then obtaining the central coordinate (x) of the human eye gazing area in the display screen * ,y * ):
Figure BDA00038490927400000510
Wherein:
w represents the length of the digital photo frame display screen, and H represents the width of the digital photo frame display screen;
s43: in the electronic picture displayed on the display screen, the picture is displayed by (x) * ,y * ) Is a rectangular center, and is constructed to be long
Figure BDA00038490927400000511
Broad is->
Figure BDA00038490927400000512
The constructed rectangular area is used as a human eye fixation area.
Optionally, in the step S4, when the time that the user gazes at the area exceeds a preset threshold, the process of enlarging the picture of the area includes:
if the human eye watching areas in the face images shot at adjacent moments are the same, the time that the user watches the area exceeds a preset threshold value delta t, and the digital photo frame amplifies the human eye watching areas.
Optionally, in the step S5, the digital photo frame selects a picture similar to the scene of the picture enlargement area to push, including:
uploading the amplified picture area to a cloud end by the digital picture frame, extracting SIFT characteristics of the picture area by the cloud end, and taking the SIFT characteristics as a characteristic vector representing the picture area;
respectively extracting the characteristic vectors of different electronic pictures in the cloud, calculating the similarity between the characteristic vectors of the electronic pictures in the cloud and the characteristic vector of the amplified picture, selecting the electronic picture with the highest similarity in the cloud for pushing, wherein the similarity calculation method is a cosine similarity calculation method.
In order to solve the above problems, the present invention provides a sensory linkage situational digital photo frame interactive system, which is characterized in that the system comprises:
the image acquisition module is used for shooting to obtain a face image and preprocessing the face image to obtain a preprocessed face image;
the detection device is used for detecting the facial posture in the binaryzation human face image, detecting the position of a pupil in the human eye area image based on a pupil detection algorithm, and determining the area watched by the human eye by using an electronic picture interest area detection algorithm based on the pupil position;
and the picture interaction device is used for rotating and amplifying the pictures in the digital photo frame, detecting the size of the pictures and the size of a display screen of the digital photo frame, displaying the pictures with the size smaller than the size of the display screen of the digital photo frame in a full screen manner, and automatically selecting the pictures similar to the amplified area scenes of the pictures for pushing.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one instruction; and
and the processor executes the instructions stored in the memory to realize the sensory linkage situational digital photo frame interaction method.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, where at least one instruction is stored in the computer-readable storage medium, and the at least one instruction is executed by a processor in an electronic device to implement the sensory linkage contextual digital photo frame interaction method described above.
Compared with the prior art, the invention provides a sensory linkage situational digital photo frame interaction method, which has the following advantages:
firstly, the scheme provides a picture rotation method based on facial gesture interaction, and the facial gesture detection process comprises the following steps of: selecting all pixel points of the eye region image to obtain a pixel point set of the eye region image:
{(x j ,y j )|j∈[1,N]}
wherein: (x) j ,y j ) Representing the coordinates of the jth pixel point in the eye region image, and N representing the total number of the pixel points in the eye region image; constructing a camera shooting model:
Figure BDA0003849092740000061
wherein: (f) X ,f Y ) Denotes the focal length of the camera, f X Representing the focal length of the camera in the horizontal direction, f Y Represents the focal length of the camera in the vertical direction, (c) X ,c Y ) The focal points respectively correspond to the horizontal direction and the vertical direction; (X ', Y') represents the pixel coordinates of the photographed object in the image, and (X, Y, Z) represents the coordinates of the photographed object in the world coordinate system;
Figure BDA0003849092740000062
represents a rotation matrix, <' > or>
Figure BDA0003849092740000063
Representing a translation matrix, wherein a rotation matrix and the translation matrix in the model are variables to be solved; substituting the collected pixel point set of the human eye area image and the corresponding coordinate under the world coordinate system into a camera shooting model to obtain a rotation matrix r' of the human eye area image:
Figure BDA0003849092740000064
Figure BDA0003849092740000065
wherein: α represents the pitch angle of the face pose, which reflects the face looking up or down; beta represents the yaw angle of the face pose, which reflects the left and right turn of the face; gamma represents the roll angle of the face pose, reflecting the yaw of the face; s24: solving to obtain the roll angle gamma of the facial posture:
Figure BDA0003849092740000066
if gamma is larger than 15 degrees, the face of the user inclines rightwards, and the digital photo frame automatically rotates the displayed picture anticlockwise by gamma degrees; if gamma < -15 degrees, the face of the user is inclined to the left, and the digital photo frame automatically rotates the displayed picture clockwise by gamma degrees. According to the scheme, the shooting model is determined by combining the face region image, the camera and the world coordinate system, the left and right head swinging angles of the face posture are determined by means of the eye region image, when the left and right head swinging angles of the face posture are large, the fact that the face of a user inclines indicates that the user looks at the picture in the digital photo frame when the face of the user inclines is indicated, therefore, the picture is subjected to self-adaptive rotation processing, the rotation angle is the inclination angle of the user, and the digital photo frame picture self-adaptive rotation processing based on face posture sensing is achieved.
Meanwhile, the scheme provides a method for amplifying the interested focus picture area of the viewer, when the roll angle of the face of the user is [ -15 degrees, 15 degrees °]In the meantime, the image shot by the camera is a face image without facial offset, and the position of the pupil in the extracted eye area image is detected by using a pupil detection algorithm, wherein the detection process of the pupil position comprises the following steps: calculating any pixel point (x) in the human eye region image j ,y j ) Gradient g of j
Figure BDA0003849092740000067
Wherein: g' (x) j ,y j ) Representing pixel points (x) in the preprocessed eye region image j ,y j ) A pixel value of (a); constructing an objective function solved by the pupil position center to obtain
Figure BDA0003849092740000068
The maximum pixel point coordinate (x) L ,y L ) As the pupil center:
Figure BDA0003849092740000071
h jL =(x j -x L ,y j -y L )
wherein: h is jL Is the displacement vector of the pupil to be detected; (x) L ,y L ) Position coordinates of the center of the pupil of the eye; t represents transposition, and omega represents a pixel point set in the human eye area image; will (x) L ,y L ) As the center of circle, the radius of the pupil of the eye is set to be xi and is obtained as (x) L ,y L ) And a circular area with the circle center and the radius xi is used as an eye pupil position area. Determining a region watched by human eyes by using an electronic picture region-of-interest detection algorithm based on pupil positions, wherein the human eye region-of-interest determination process of the electronic picture region-of-interest detection algorithm comprises the following steps: the eye pupil position area is corresponded to an original face image, the eye pupil position area in the original face image is the original eye pupil area, infrared light emitted by infrared light sources at four vertexes of the digital photo frame forms four light spots in the original eye pupil position area, and the position coordinates of the four light spots are respectively
Figure BDA0003849092740000072
Wherein->
Figure BDA0003849092740000073
Represents the coordinates of two light spots on the upper part in the pupil area of the original eye, and is used for judging whether the pupil area of the eye is normal or normal>
Figure BDA0003849092740000074
Represents the left spot coordinate, and>
Figure BDA0003849092740000075
the coordinates of the right spot are indicated,
Figure BDA0003849092740000076
representing the coordinates of two light spots lower in the pupil area of the original eye part>
Figure BDA0003849092740000077
Represents the left spot coordinate, and>
Figure BDA0003849092740000078
representing the coordinates of the right light spot; respectively count>
Figure BDA0003849092740000079
And->
Figure BDA00038490927400000710
Cross ratio value V of 12 ,V 23 And further obtaining the central coordinate (x) of the human eye watching region in the display screen * ,y * ):
Figure BDA00038490927400000711
Wherein: w represents the length of the digital photo frame display screen, and H represents the width of the digital photo frame display screen; in the electronic picture displayed on the display screen, the picture is displayed by (x) * ,y * ) Is a rectangular center, and is constructed to be long
Figure BDA00038490927400000712
Broad is->
Figure BDA00038490927400000713
The constructed rectangular area is used as a human eye fixation area. If the human eye watching areas in the face images shot at adjacent moments are the same, the time that the user watches the areas exceeds a preset threshold value delta t, and the digital photo frame amplifies the human eye watching areas. According to the scheme, the position of the pupil in the image of the eye region is detected based on a hole detection algorithm, the corresponding relation between the intersection point of the facula and the central point of the eye watching region is established based on the position coordinate of the facula in the pupil, the eye watching region is obtained, and therefore the region of the viewer interest focus image is amplified based on the perception of the eye pupil.
Drawings
Fig. 1 is a schematic flow chart of a sensory linkage contextual digital photo frame interaction method according to an embodiment of the present invention;
fig. 2 is a functional block diagram of a sensory linkage situational digital photo frame interactive system according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device for implementing a sensory linkage contextual digital photo frame interaction method according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides a sense organ linkage situational digital photo frame interaction method. The execution main body of the sensory linkage contextual digital photo frame interaction method includes, but is not limited to, at least one of electronic devices such as a server and a terminal, which can be configured to execute the method provided by the embodiment of the present application. In other words, the sensory linkage contextual digital photo frame interaction method may be executed by software or hardware installed in a terminal device or a server device, and the software may be a block chain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Example 1:
s1: the digital photo frame obtains a face image by shooting with a camera, and preprocesses the face image to obtain a preprocessed face image, wherein the preprocessing method comprises binarization processing and extraction of an image of a human eye region.
The step S1 is that the digital photo frame is shot by a camera to obtain a face image, and the method comprises the following steps:
the digital photo frame comprises a display screen, a camera and a wireless communication module, wherein the display screen is used for displaying electronic pictures, the camera is used for shooting and capturing human eye images, and the wireless communication module is used for acquiring electronic picture data from a cloud and uploading the captured human eye images to a computer terminal; the four vertex positions of the digital photo frame are provided with infrared light sources which can emit infrared light;
the digital photo frame can automatically rotate a picture, amplify the human eye watching area and select a picture similar to the human eye watching area scene for pushing;
the user can control whether the camera is started or not by himself, when the user selects to start the camera, the infrared light sources positioned at the four vertexes of the digital photo frame emit infrared light, the digital photo frame utilizes the camera to shoot a face image, the shot face image is a time sequence image, and the face image set is { I } t |t∈[t 0 ,t e ]In which I t For face images taken at any time t, t 0 Indicating the initial moment, t, of the camera shot e And represents the cut-off time of the camera shooting.
The step S1 of performing binarization processing on the face image to obtain a binarized face image includes:
the arbitrary face image I t The binarization processing flow comprises the following steps:
s11: for face image I t Performing graying processing on all pixel points to obtain the gray values of the pixel points, and taking the gray values of the pixel points as pixel values to obtain a human face image subjected to graying processing, wherein the graying processing formula is as follows:
Figure BDA0003849092740000081
wherein:
I t,g (x, y) denotes a face image I t Middle x row y column pixel point I t Grey scale value of (x, y), i.e. pixel point I t A pixel value of (x, y);
R t (x,y),G t (x,y),B t (x, y) respectively represent pixel points I t (x, y) values on R, G, B color channels;
s12: carrying out gray stretching processing on the face image after the gray processing, wherein the formula of the gray stretching processing is as follows:
Figure BDA0003849092740000082
wherein:
g t (x, y) represents pixel point I after gray stretching t A pixel value of (x, y);
I′ t,min (x, y) represents a face image I 'after gradation processing' t Of minimum pixel value, l' t,max (x, y) represents a face image I 'after gradation processing' t A maximum pixel value of;
s13: initializing threshold values
Figure BDA0003849092740000083
S14: stretching the gray scale of the human face image I t The pixels of (2) are divided into foreground pixels and background pixels, wherein the pixels with the pixel values smaller than a threshold value are divided into the foreground pixels, and the pixels with the pixel values larger than or equal to the threshold value are divided into the background pixels;
respectively calculating the average pixel values of a foreground pixel and a background pixel, wherein the average pixel value of the foreground pixel is m 1 The average pixel value of the background pixel is m 2
S15: updating a threshold value
Figure BDA0003849092740000084
S16: repeating the steps S14-S15 until the updated threshold is the same as the original threshold to obtain the final binary threshold
Figure BDA0003849092740000085
Stretching gray level of human face image I' t Is lower than the binarization threshold value->
Figure BDA0003849092740000086
Is set to 0 above the binarization threshold->
Figure BDA0003849092740000087
The pixel value of (2) is set to 255, and a face image after binarization processing is obtained.
In the step S1, extracting a human eye region image from the binarized human face image to obtain a human eye region image, including:
extracting a human eye region image from the face image subjected to binarization processing to obtain a human eye region image, wherein the human eye region image extraction process comprises the following steps:
constructing an image extraction model of a human eye region, wherein the image extraction model of the human eye region is formed by cascading n human eye region detection classification models, the input of the human eye region detection classification model is an image region, the output is a detection classification result { -1, +1} of the image region, when the output result of the human eye region detection classification model is-1, the input image region is not an image region of a human eye, when the output result of the human eye region detection classification model is +1, the input image region is an image region of the human eye, and the training process of the human eye region detection classification model is as follows:
the method comprises the steps of collecting a plurality of human eye region image samples and non-human eye region image samples to train a human eye region detection classification model, extracting sample characteristics by the human eye region detection classification model, detecting and classifying the sample characteristics, and optimizing parameters of the model by taking minimum mean square error of the sample classification as a target;
according to the human eye region detection classification model after parameter optimization, calculating the weight w of any ith human eye region detection classification model i
Figure BDA0003849092740000091
Wherein:
err i representing the number of classification error samples of the ith personal eye region detection classification model after parameter optimization, all representing the total number of samples, i is equal to [1, n ]];
And to the weight w i Carrying out normalization processing, wherein the formula of the normalization processing is as follows:
Figure BDA0003849092740000092
/>
wherein:
w min minimum weight, w, representing n eye region detection classification models max Maximum weight representing n human eye region detection classification models;
Figure BDA0003849092740000093
normalized weights representing an ith personal eye region detection classification model;
performing the parameter optimization and weight calculation on the n-numbered personal eye region detection classification models to obtain n-numbered personal eye region detection classification models and corresponding normalized weight sets
Figure BDA0003849092740000094
Wherein f is i () represents the ith personal eye region detection classification model;
the cascade combination of the human eye region detection classification models is carried out according to the following formula:
Figure BDA0003849092740000095
Figure BDA0003849092740000096
wherein:
h (x) represents the human eye region image extraction model after cascade combination;
i represents input image data;
dividing the binarized face image into a plurality of sub-images, wherein the size of each sub-image is the size of a normal eye region; inputting the divided sub-images into a human eye region image extraction model, and if the model output is +1, indicating that the sub-images are the human eye region images which are binary images.
S2: detecting the facial posture in the binaryzation human face image, if an inclined face is detected, rotating the picture in the digital photo frame, detecting the size of the picture and the size of a display screen of the digital photo frame, and displaying the picture with the size smaller than the size of the display screen of the digital photo frame in a full screen mode.
The facial gesture in the face image of binarization is detected in S2 step, if the face that detects the slope, then carry out the self-adaptation rotation to the picture in the digital photo frame to picture that the size is less than digital photo frame display screen size carries out full screen display, includes:
detecting the facial pose in the binarized face image, wherein the facial pose detection process comprises the following steps:
s21: selecting all pixel points of the eye region image to obtain a pixel point set of the eye region image:
{(x j ,y j )|j∈[1,N]}
wherein:
(x j ,y j ) Representing the coordinates of the jth pixel point in the eye region image, and N representing the total number of the pixel points in the eye region image;
s22: constructing a camera shooting model:
Figure BDA0003849092740000101
wherein:
(f X ,f Y ) Indicating the focal length of the camera, f X Representing the focal length of the camera in the horizontal direction, f Y Shows the focal length of the camera in the vertical direction, (c) X ,c Y ) The focal points respectively correspond to the horizontal direction and the vertical direction;
(X ', Y') represents the pixel coordinates of the photographed object in the image, and (X, Y, Z) represents the coordinates of the photographed object in the world coordinate system;
Figure BDA0003849092740000102
represents a rotation matrix, <' > or>
Figure BDA0003849092740000103
Representing a translation matrix, wherein a rotation matrix and the translation matrix in the model are variables to be solved;
s23: substituting the collected pixel point set of the human eye area image and the corresponding coordinate under the world coordinate system into a camera shooting model to obtain a rotation matrix r' of the human eye area image:
Figure BDA0003849092740000104
Figure BDA0003849092740000105
wherein:
α represents a pitch angle of the face pose, which reflects the face looking up or down;
beta represents the yaw angle of the facial pose, which reflects the left and right turns of the face;
gamma represents the roll angle of the face pose, reflecting the yaw of the face;
s24: solving to obtain the roll angle gamma of the facial posture:
Figure BDA0003849092740000106
if gamma is larger than 15 degrees, the face of the user inclines rightwards, and the digital photo frame automatically rotates the displayed picture anticlockwise by gamma degrees;
if gamma < -15 degrees, the face of the user inclines to the left, and the digital photo frame automatically rotates the displayed picture clockwise by gamma degrees;
the digital photo frame detects the size format of the electronic picture to be displayed in real time, and if the picture with the size smaller than the display screen size of the digital photo frame is displayed in a full screen mode, the full screen display process comprises the following steps: and amplifying the electronic picture to be displayed to the size of the display screen of the digital photo frame, and filling missing pixels in the amplified electronic picture by utilizing a nearest point interpolation algorithm.
S3: the position of the pupil in the image of the eye region is detected based on a pupil detection algorithm.
In the step S3, detecting the position of the pupil in the extracted image of the eye region by using a pupil detection algorithm, including:
when the roll angle of the face of the user is between [ -15 °,15 ° ] showing that the image shot by the camera is a face image without facial deviation, the pupil position in the extracted eye region image is detected by using a pupil detection algorithm, and the detection flow of the pupil position is as follows:
s31: calculating any pixel point (x) in the human eye region image j ,y j ) Gradient g of j
Figure BDA0003849092740000107
Wherein:
g′(x j ,y j ) Representing pixel points (x) in the pre-processed eye region image j ,y j ) A pixel value of (a);
s32: constructing an objective function solved by the pupil position center to obtain
Figure BDA0003849092740000108
Reaching the maximum pixel point coordinate (x) L ,y L ) As the pupil center:
Figure BDA0003849092740000111
h jL =(x j -x L ,y j -y L )
wherein:
h jL is the displacement vector of the pupil to be detected;
(x L ,y L ) Position coordinates of the center of the pupil of the eye;
t represents transposition, and omega represents a pixel point set in the human eye area image;
will (x) L ,y L ) The radius of the pupil of the eye is set to xi as the center of the circle, and is obtained as (x) L ,y L ) And a circular area with the circle center and the radius xi is used as an eye pupil position area.
S4: determining a region watched by human eyes by using an electronic picture interest region detection algorithm based on pupil positions, and amplifying the picture in the region when the time of watching the region by a user exceeds a preset threshold value.
In the step S4, determining a region gazed by the human eye by using an electronic picture region of interest detection algorithm based on the pupil position, including:
determining a region watched by human eyes by using an electronic picture region-of-interest detection algorithm based on pupil positions, wherein the human eye watching region determination process of the electronic picture region-of-interest detection algorithm comprises the following steps:
s41: the eye pupil position area is corresponded to an original face image, the eye pupil position area in the original face image is the original eye pupil area, infrared light emitted by infrared light sources at four vertexes of the digital photo frame forms four light spots in the original eye pupil position area, and the position coordinates of the four light spots are respectively
Figure BDA0003849092740000112
Wherein->
Figure BDA0003849092740000113
Represents the coordinates of two light spots on the upper part in the pupil area of the original eye, and is used for judging whether the pupil area of the eye is normal or normal>
Figure BDA0003849092740000114
Represents the coordinates of a left spot, and>
Figure BDA0003849092740000115
the coordinates of the right spot are indicated,
Figure BDA0003849092740000116
representing the coordinates of two light spots lower in the pupil area of the original eye part>
Figure BDA0003849092740000117
Represents the left spot coordinate, and>
Figure BDA0003849092740000118
representing the right spot coordinates;
s42: respectively calculate
Figure BDA0003849092740000119
And->
Figure BDA00038490927400001110
Cross ratio value V of 12 ,V 23 And then obtaining the central coordinate (x) of the human eye gazing area in the display screen * ,y * ):
Figure BDA00038490927400001111
Wherein:
w represents the length of the digital photo frame display screen, and H represents the width of the digital photo frame display screen;
s43: in the electronic picture displayed on the display screen, the picture is displayed by (x) * ,y * ) Is a rectangular center, and is constructed to be long
Figure BDA00038490927400001112
Broad is->
Figure BDA00038490927400001113
The constructed rectangular area is used as a human eye watching area.
In the step S4, when the time that the user gazes at the area exceeds a preset threshold, the process of enlarging the picture in the area includes:
if the human eye watching areas in the face images shot at adjacent moments are the same, the time that the user watches the areas exceeds a preset threshold value delta t, and the digital photo frame amplifies the human eye watching areas.
S5: the digital photo frame automatically selects and pushes the pictures similar to the scenes of the picture amplification areas.
In the step S5, the digital photo frame selects a picture similar to the scene of the picture amplification area to be pushed, and the method includes:
uploading the amplified picture area to a cloud end by the digital picture frame, extracting SIFT characteristics of the picture area by the cloud end, and taking the SIFT characteristics as a characteristic vector representing the picture area;
respectively extracting the characteristic vectors of different electronic pictures in the cloud, calculating the similarity between the characteristic vectors of the electronic pictures in the cloud and the characteristic vector of the amplified picture, selecting the electronic picture with the highest similarity in the cloud for pushing, wherein the similarity calculation method is a cosine similarity calculation method.
Example 2:
as shown in fig. 2, a functional block diagram of a sensory linkage contextual digital photo frame interaction system according to an embodiment of the present invention is provided, which can implement the sensory linkage contextual digital photo frame interaction method according to embodiment 1.
The sensory linkage situational digital photo frame interactive system 100 of the present invention may be installed in an electronic device. According to the realized functions, the sensory linkage contextual digital photo frame interaction system can comprise an image acquisition module 101, a detection device 102 and a picture interaction device 103. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
The image acquisition module 101 is used for shooting to obtain a face image, and preprocessing the face image to obtain a preprocessed face image;
the detection device 102 is used for detecting the facial pose in the binarized human face image, detecting the position of a pupil in the human eye area image based on a pupil detection algorithm, and determining the area watched by the human eye by using an electronic picture interest area detection algorithm based on the pupil position;
the picture interaction device 103 is used for rotating and amplifying the pictures in the digital picture frame, detecting the size of the pictures and the size of a display screen of the digital picture frame, displaying the pictures with the size smaller than the size of the display screen of the digital picture frame in a full screen mode, and automatically selecting the pictures similar to the scenes of the amplified pictures to push.
In detail, when the modules in the sensory linkage contextual digital photo frame interaction system 100 according to the embodiment of the present invention are used, the same technical means as the sensory linkage contextual digital photo frame interaction method described in fig. 1 above is adopted, and the same technical effects can be produced, which is not described herein again.
Example 3:
fig. 3 is a schematic structural diagram of an electronic device for implementing a sensory linkage contextual digital photo frame interaction method according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of the program 12, but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (programs 12 for performing digital picture frame interaction, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power diverters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a communication interface 13, and optionally, the communication interface 13 may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, enable:
the digital photo frame obtains a face image by shooting with a camera, and preprocesses the face image to obtain a preprocessed face image;
detecting the facial posture in the binarized face image, if an inclined face is detected, rotating the picture in the digital photo frame, detecting the size of the picture and the size of a display screen of the digital photo frame, and displaying the picture with the size smaller than the size of the display screen of the digital photo frame in a full screen manner;
detecting the position of a pupil in the human eye area image based on a pupil detection algorithm;
determining a region watched by human eyes by using an electronic picture interest region detection algorithm based on pupil positions, and amplifying the picture in the region when the time of watching the region by a user exceeds a preset threshold;
the digital photo frame automatically selects and pushes the pictures similar to the pictures in the enlarged area scene.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiments corresponding to fig. 1 to fig. 3, which is not repeated herein.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one of 8230, and" comprising 8230does not exclude the presence of additional like elements in a process, apparatus, article, or method comprising the element.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (9)

1. A sensory linkage situational digital photo frame interaction method is characterized by comprising the following steps:
s1: the digital photo frame obtains a face image by shooting with a camera, and preprocesses the face image to obtain a preprocessed face image, wherein the preprocessing method comprises binarization processing and extraction of an image of a human eye region;
s2: detect the facial gesture in the face image of binaryzation, if detect the face of slope, then rotate the picture in the digital photo frame and handle to detect picture size and digital photo frame display screen size, carry out full-screen display to the picture that the size is less than digital photo frame display screen size, wherein facial gesture detection and picture rotation process flow include:
detecting the facial pose in the binarized face image, wherein the facial pose detection process comprises the following steps:
s21: selecting all pixel points of the eye region image to obtain a pixel point set of the eye region image:
{(x j ,y j )|j∈[1,N]}
wherein:
(x j ,y j ) Representing the coordinates of the jth pixel point in the eye region image, and N representing the total number of the pixel points in the eye region image;
s22: constructing a camera shooting model:
Figure QLYQS_1
wherein:
(f X ,f Y ) Indicating the focal length of the camera, f X Representing the focal length of the camera in the horizontal direction, f Y Represents the focal length of the camera in the vertical direction, (c) X ,c Y ) The focal points respectively correspond to the horizontal direction and the vertical direction;
(X ', Y') represents the pixel coordinates of the photographed object in the image, and (X, Y, Z) represents the coordinates of the photographed object in the world coordinate system;
Figure QLYQS_2
represents a rotation matrix, <' > or>
Figure QLYQS_3
Representing a translation matrix, wherein a rotation matrix and the translation matrix in the model are variables to be solved;
s23: substituting the collected pixel point set of the human eye area image and the corresponding coordinate under the world coordinate system into a camera shooting model to obtain a rotation matrix r' of the human eye area image:
Figure QLYQS_4
Figure QLYQS_5
wherein:
α represents a pitch angle of the face pose, which reflects the face looking up or down;
beta represents the yaw angle of the face pose, which reflects the left and right turn of the face;
gamma represents the roll angle of the face pose, reflecting the yaw of the face;
s24: solving to obtain the roll angle gamma of the facial posture:
Figure QLYQS_6
/>
if gamma is larger than 15 degrees, the face of the user inclines rightwards, and the digital photo frame automatically rotates the displayed picture anticlockwise by gamma degrees;
if gamma is less than-15 degrees, the face of the user inclines leftwards, and the digital photo frame automatically rotates the displayed picture clockwise by gamma degrees;
the digital photo frame detects the size format of the electronic picture to be displayed in real time, and if the picture with the size smaller than the display screen size of the digital photo frame is displayed in a full screen mode, the full screen display process comprises the following steps: amplifying the electronic picture to be displayed to the size of a display screen of the digital photo frame, and filling missing pixels in the amplified electronic picture by utilizing a nearest point interpolation algorithm;
s3: detecting the position of a pupil in the human eye region image based on a pupil detection algorithm;
s4: determining a region watched by human eyes by using an electronic picture interest region detection algorithm based on pupil positions, and amplifying the picture in the region when the time of watching the region by a user exceeds a preset threshold;
s5: the digital photo frame automatically selects and pushes the pictures similar to the pictures in the enlarged area scene.
2. The sensory linkage situational digital photo frame interaction method of claim 1, wherein in the step S1, the digital photo frame is shot by a camera to obtain a face image, and the method comprises:
the digital photo frame comprises a display screen, a camera and a wireless communication module, wherein the display screen is used for displaying electronic pictures, the camera is used for shooting and capturing human eye images, and the wireless communication module is used for acquiring electronic picture data from a cloud and uploading the captured human eye images to a computer terminal; the four vertex positions of the digital photo frame are provided with infrared light sources which can emit infrared light;
the user can control whether the camera is started or not by himself, when the user selects to start the camera, the infrared light sources positioned at the four vertexes of the digital photo frame emit infrared light, the digital photo frame utilizes the camera to shoot a face image, the shot face image is a time sequence image, and the face image set is { I } t |t∈[t 0 ,t e ]In which I t For face images taken at any time t, t 0 Indicating the initial moment, t, of the camera shot e The time interval between adjacent times is Δ t.
3. The sensory linkage situational digital photo frame interaction method of claim 2, wherein the binarizing processing is performed on the face image in the step S1 to obtain a binarized face image, comprising:
the binarization processing flow of the face image shot at any time t is as follows:
s11: for face image I t Performing graying processing on all pixel points to obtain the gray values of the pixel points, and taking the gray values of the pixel points as pixel values to obtain a human face image subjected to graying processing, wherein the graying processing formula is as follows:
Figure QLYQS_7
wherein:
I t,g (x, y) denotes a face image I t Middle x row y column pixel point I t Grey scale value of (x, y), i.e. pixel point I t A pixel value of (x, y);
R t (x,y),G t (x,y),B t (x, y) respectively represent pixel points I t (x, y) values on R, G, B color channels;
s12: carrying out gray stretching processing on the face image after the gray processing, wherein the formula of the gray stretching processing is as follows:
Figure QLYQS_8
wherein:
g t (x, y) represents pixel point I after gray stretching t A pixel value of (x, y);
I′ t,min (x, y ) Represents a face image I 'after gradation processing' t Of the minimum pixel value of l' t,max (x, y ) Represents a face image I 'after gradation processing' t A maximum pixel value of;
s13: initializing threshold values
Figure QLYQS_9
S14: stretching the gray scale to obtain a human face image I t The pixels of (2) are divided into foreground pixels and background pixels, wherein the pixels with the pixel values smaller than a threshold value are divided into the foreground pixels, and the pixels with the pixel values larger than or equal to the threshold value are divided into the background pixels;
respectively calculating the average pixel value of a foreground pixel and the average pixel value of a background pixel, wherein the average pixel value of the foreground pixel is m 1 The average pixel value of the background pixel is m 2
S15: updating a threshold
Figure QLYQS_10
S16: repeating the steps S14-S15 until the updated threshold is the same as the original threshold to obtain the final binary threshold
Figure QLYQS_11
Stretching the gray scale to obtain a human face image I t Is lower than the binarization threshold value->
Figure QLYQS_12
Is set to 0 above the binarization threshold->
Figure QLYQS_13
The pixel value of (2) is set to 255, and a face image after binarization processing is obtained.
4. The sensory linkage situational digital photo frame interaction method of claim 3, wherein the extracting of the eye region image from the binarized face image in step S1 to obtain the eye region image comprises:
extracting a human eye region image from the face image subjected to binarization processing to obtain a human eye region image, wherein the human eye region image extraction process comprises the following steps:
constructing an image extraction model of a human eye region, wherein the image extraction model of the human eye region is formed by cascading n human eye region detection classification models, the input of the human eye region detection classification model is an image region, the output of the human eye region detection classification model is the detection classification result { -1, +1} of the image region, when the output result of the human eye region detection classification model is-1, the input image region is not the human eye image region, when the output result of the human eye region detection classification model is +1, the input image region is the human eye image region, and the training process of the human eye region detection classification model is as follows:
the method comprises the steps that a plurality of human eye area image samples and non-human eye area image samples are collected to train a human eye area detection classification model, sample features are extracted by the human eye area detection classification model, the sample features are detected and classified, and parameter optimization of the model is carried out with the minimum mean square error of the sample classification as a target;
according to the human eye region detection classification model after parameter optimization, calculating the weight w of any ith human eye region detection classification model i
Figure QLYQS_14
Wherein:
err i representing the number of classification error samples of the ith personal eye region detection classification model after parameter optimization, all representing the total number of samples, i is equal to [1, n ]];
And to the weight w i Carrying out normalization processing, wherein the formula of the normalization processing is as follows:
Figure QLYQS_15
wherein:
w min minimum weight, w, representing n eye region detection classification models max Maximum weight representing n human eye region detection classification models;
Figure QLYQS_16
normalized weights representing an ith personal eye region detection classification model;
performing the parameter optimization and the weight calculation on the n-person eye region detection classification models to obtain n-person eye region detection classification models and corresponding normalization weight sets
Figure QLYQS_17
Wherein f is i () represents the ith personal eye region detection classification model;
the cascade combination of the human eye region detection classification models is carried out according to the following formula:
Figure QLYQS_18
Figure QLYQS_19
wherein:
h (I) represents a human eye region image extraction model after cascade combination;
i represents input image data;
dividing the binarized face image into a plurality of sub-images, wherein the size of each sub-image is the size of a normal eye region; and inputting the divided sub-images into a human eye region image extraction model, and if the model output is +1, indicating that the sub-images are the human eye region images.
5. The sensory linkage contextual digital photo frame interaction method of claim 1, wherein the detecting the position of the pupil in the extracted eye area image using a pupil detection algorithm in step S3 comprises:
when the roll angle of the face of the user is between [ -15 °,15 ° ] showing that the image shot by the camera is a face image without facial deviation, the pupil position in the extracted eye region image is detected by using a pupil detection algorithm, and the detection flow of the pupil position is as follows:
s31: calculating any pixel point (x) in the human eye region image j ,y j ) Gradient g of j
Figure QLYQS_20
Wherein:
g′(x j ,y j ) Representing pixel points (x) in the pre-processed eye region image j ,y j ) The pixel value of (a);
s32: constructing an objective function solved by the center of the pupil position to obtain
Figure QLYQS_21
Reaching the maximum pixel point coordinate (x) L ,y L ) As the pupil center:
Figure QLYQS_22
h jL =(x j -x L ,y j -y L )
wherein:
h jL is the displacement vector of the pupil to be detected;
(x L ,y L ) Position coordinates of the center of the pupil of the eye;
t represents transposition, and omega represents a pixel point set in the human eye area image;
will (x) L ,y L ) The radius of the pupil of the eye is set to xi as the center of the circle, and is obtained as (x) L ,y L ) And a circular area with the circle center and the radius xi is used as an eye pupil position area.
6. The sensory linkage situational digital photo frame interaction method of claim 5, wherein in the step S4, determining the region watched by the human eyes by using an electronic picture interest region detection algorithm based on pupil positions comprises:
determining a region watched by human eyes by using an electronic picture region-of-interest detection algorithm based on pupil positions, wherein the human eye watching region determination process of the electronic picture region-of-interest detection algorithm comprises the following steps:
s41: the eye pupil position area is corresponded to an original face image, the eye pupil position area in the original face image is the original eye pupil area, infrared light emitted by infrared light sources at four vertexes of the digital photo frame forms four light spots in the original eye pupil position area, and the position coordinates of the four light spots are respectively
Figure QLYQS_23
Wherein +>
Figure QLYQS_24
Represents the coordinates of the two spots in the upper part of the pupil area of the original eye, and>
Figure QLYQS_25
indicating left spot sittingMark or is present>
Figure QLYQS_26
The coordinates of the right spot are indicated,
Figure QLYQS_27
represents the coordinates of two light spots in the lower part of the pupil area of the original eye part>
Figure QLYQS_28
Represents the left spot coordinate, and>
Figure QLYQS_29
representing the right spot coordinates;
s42: respectively calculate
Figure QLYQS_30
And->
Figure QLYQS_31
Cross ratio value V of 12 ,V 23 And further obtaining the central coordinate (x) of the human eye watching region in the display screen * ,y * ):
Figure QLYQS_32
Wherein:
w represents the length of the digital photo frame display screen, and H represents the width of the digital photo frame display screen;
s43: in the electronic picture displayed on the display screen, the picture is displayed by (x) * ,y * ) Is a rectangular center, and is constructed to be long
Figure QLYQS_33
Broad is->
Figure QLYQS_34
The constructed rectangular area is used as a human eye watching area.
7. The sensory linkage contextual digital photo frame interaction method of claim 6, wherein in step S4, when the time for the user to look at the area exceeds a preset threshold, the process of magnifying the picture of the area comprises:
if the human eye watching areas in the face images shot at adjacent moments are the same, the time that the user watches the areas exceeds a preset threshold value delta t, and the digital photo frame amplifies the human eye watching areas.
8. The sensory linkage contextual digital photo frame interaction method of claim 1, wherein the digital photo frame in step S5 selects a picture similar to the scene of the picture enlargement area to push, comprising:
uploading the amplified picture area to a cloud end by the digital picture frame, extracting SIFT characteristics of the picture area by the cloud end, and taking the SIFT characteristics as a characteristic vector representing the picture area;
respectively extracting the characteristic vectors of different electronic pictures in the cloud, calculating the similarity between the characteristic vectors of the electronic pictures in the cloud and the characteristic vectors of the amplified pictures, and selecting the electronic picture with the highest similarity in the cloud for pushing, wherein the similarity calculation method is a cosine similarity calculation method.
9. A sensory linkage situational digital photo frame interactive system, comprising:
the image acquisition module is used for shooting to obtain a face image and preprocessing the face image to obtain a preprocessed face image;
the detection device is used for detecting the facial posture in the binaryzation human face image, detecting the position of a pupil in the human eye area image based on a pupil detection algorithm, and determining the area watched by the human eye by using an electronic picture interest area detection algorithm based on the pupil position;
the picture interaction device is used for rotating and amplifying pictures in the digital picture frame, detecting the size of the pictures and the size of a display screen of the digital picture frame, displaying the pictures with the size smaller than the size of the display screen of the digital picture frame in a full screen mode, automatically selecting the pictures similar to the scenes of the amplified areas of the pictures and pushing the pictures, and therefore the sense organ linkage situational digital picture frame interaction method is achieved according to any one of claims 1 to 8.
CN202211130909.XA 2022-09-16 2022-09-16 Sensory linkage situational digital photo frame interaction method and system Active CN115509351B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211130909.XA CN115509351B (en) 2022-09-16 2022-09-16 Sensory linkage situational digital photo frame interaction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211130909.XA CN115509351B (en) 2022-09-16 2022-09-16 Sensory linkage situational digital photo frame interaction method and system

Publications (2)

Publication Number Publication Date
CN115509351A CN115509351A (en) 2022-12-23
CN115509351B true CN115509351B (en) 2023-04-07

Family

ID=84503236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211130909.XA Active CN115509351B (en) 2022-09-16 2022-09-16 Sensory linkage situational digital photo frame interaction method and system

Country Status (1)

Country Link
CN (1) CN115509351B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936324A (en) * 2021-10-29 2022-01-14 Oppo广东移动通信有限公司 Gaze detection method, control method of electronic device and related device
CN114779925A (en) * 2022-03-22 2022-07-22 天津理工大学 Sight line interaction method and device based on single target

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02226372A (en) * 1989-02-27 1990-09-07 Fuji Xerox Co Ltd Inclination correcting device for picture
KR100374708B1 (en) * 2001-03-06 2003-03-04 에버미디어 주식회사 Non-contact type human iris recognition method by correction of rotated iris image
CN107506751B (en) * 2017-09-13 2019-10-08 重庆爱威视科技有限公司 Advertisement placement method based on eye movement control
CN111062328B (en) * 2019-12-18 2023-10-03 中新智擎科技有限公司 Image processing method and device and intelligent robot
CN114973384A (en) * 2022-07-12 2022-08-30 天津科技大学 Electronic face photo collection method based on key point and visual salient target detection

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936324A (en) * 2021-10-29 2022-01-14 Oppo广东移动通信有限公司 Gaze detection method, control method of electronic device and related device
CN114779925A (en) * 2022-03-22 2022-07-22 天津理工大学 Sight line interaction method and device based on single target

Also Published As

Publication number Publication date
CN115509351A (en) 2022-12-23

Similar Documents

Publication Publication Date Title
US10832069B2 (en) Living body detection method, electronic device and computer readable medium
US8750573B2 (en) Hand gesture detection
US8792722B2 (en) Hand gesture detection
EP3576017A1 (en) Method, apparatus, and device for determining pose of object in image, and storage medium
CN107370942B (en) Photographing method, photographing device, storage medium and terminal
CN112162930B (en) Control identification method, related device, equipment and storage medium
WO2019041519A1 (en) Target tracking device and method, and computer-readable storage medium
WO2019033569A1 (en) Eyeball movement analysis method, device and storage medium
EP2864933A1 (en) Method, apparatus and computer program product for human-face features extraction
CN111935479B (en) Target image determination method and device, computer equipment and storage medium
CN110059666B (en) Attention detection method and device
WO2019033570A1 (en) Lip movement analysis method, apparatus and storage medium
CN114783003A (en) Pedestrian re-identification method and device based on local feature attention
US11694331B2 (en) Capture and storage of magnified images
CN114241338A (en) Building measuring method, device, equipment and storage medium based on image recognition
CN112153269B (en) Picture display method, device and medium applied to electronic equipment and electronic equipment
CN112102207A (en) Method and device for determining temperature, electronic equipment and readable storage medium
CN115509351B (en) Sensory linkage situational digital photo frame interaction method and system
CN115860026A (en) Bar code detection method and device, bar code detection equipment and readable storage medium
CN114511877A (en) Behavior recognition method and device, storage medium and terminal
CN111986161A (en) Part missing detection method and system
CN111275183A (en) Visual task processing method and device and electronic system
CN114627535B (en) Coordinate matching method, device, equipment and medium based on binocular camera
JP7211495B2 (en) Training data generator
JP7211496B2 (en) Training data generator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant