CN115509351B

CN115509351B - Sensory linkage situational digital photo frame interaction method and system

Info

Publication number: CN115509351B
Application number: CN202211130909.XA
Authority: CN
Inventors: 李顺; 王晓帆
Original assignee: Shanghai Goodview Electronic Technology Co ltd
Current assignee: Shanghai Goodview Electronic Technology Co ltd
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2023-04-07
Anticipated expiration: 2042-09-16
Also published as: CN115509351A

Abstract

The invention relates to the technical field of digital photo frame interaction, and discloses a sense organ linkage situational digital photo frame interaction method and system, wherein the method comprises the following steps: detecting the facial posture in the binarized face image, if an inclined face is detected, rotating the picture in the digital photo frame, detecting the size of the picture and the size of a display screen of the digital photo frame, and displaying the picture with the size smaller than the size of the display screen of the digital photo frame in a full screen manner; detecting the position of a pupil based on a pupil detection algorithm; determining a region watched by human eyes by using an electronic picture interest region detection algorithm based on pupil positions, and if a user watches for a long time, amplifying the watching region; the digital photo frame automatically selects and pushes the pictures similar to the scenes of the picture amplification areas. The method realizes the self-adaptive rotation processing of the digital photo frame picture based on the facial posture perception and the amplification processing of the viewer interest focus picture area based on the eye pupil perception.

Description

Sensory linkage situational digital photo frame interaction method and system

Technical Field

The invention relates to the technical field of digital photo frame interaction, in particular to a sense organ linkage situational digital photo frame interaction method and system.

Background

Digital photo frames have been introduced into many user homes as consumer terminals. The basic function of the digital photo frame is used for displaying and playing pictures. At present, picture playing and the like of the digital photo frame are realized through touch keys or mechanical keys and the like, and human-computer interaction operation experience is limited through manual accurate positioning operation. The existing digital photo frame only supports the automatic display of electronic photos, and lacks the capabilities of monitoring and identifying the interest focus of a viewer, intelligently adjusting the photo frame and optimizing the display. Therefore, the patent provides a sensory linkage contextual digital photo frame interaction method and system aiming at the problem.

Disclosure of Invention

In view of the above, the invention provides a sensory linkage contextual digital photo frame interaction method, which aims to (1) determine a shooting model by combining a face region image, a camera and a world coordinate system, determine a left-right head swing angle of a face posture by means of the eye region image, and when the left-right head swing angle of the face posture is larger, indicate that a user face is inclined and indicate that the user looks at a picture in a digital photo frame in an inclined manner, so that the picture is subjected to adaptive rotation processing, wherein a rotation angle is the inclination angle of the user, and the digital photo frame picture adaptive rotation processing based on face posture sensing is realized; (2) The method comprises the steps of detecting the position of a pupil in an eye region image based on a hole detection algorithm, establishing a corresponding relation between a light spot intersection point and the center point of the eye region based on the position coordinates of the light spot in the pupil, obtaining the eye region, and accordingly enlarging the viewer interest focus image region based on eye pupil perception.

The invention provides a sensory linkage situational digital photo frame interaction method, which comprises the following steps:

s1: the digital photo frame utilizes a camera to shoot to obtain a face image, and the face image is preprocessed to obtain a preprocessed face image, wherein the preprocessing method comprises binarization processing and human eye area image extraction;

s2: detecting the facial posture in the binarized face image, if an inclined face is detected, rotating the picture in the digital photo frame, detecting the size of the picture and the size of a display screen of the digital photo frame, and displaying the picture with the size smaller than the size of the display screen of the digital photo frame in a full screen manner;

s3: detecting the position of a pupil in the human eye region image based on a pupil detection algorithm;

s4: determining a region watched by human eyes by using an electronic picture interest region detection algorithm based on pupil positions, and amplifying the picture in the region when the time of watching the region by a user exceeds a preset threshold;

s5: the digital photo frame automatically selects and pushes the pictures similar to the pictures in the enlarged area scene.

As a further improvement of the method:

optionally, in the step S1, the digital photo frame is shot by a camera to obtain a face image, including:

the digital photo frame comprises a display screen, a camera and a wireless communication module, wherein the display screen is used for displaying an electronic picture, the camera is used for shooting and capturing human eye images, and the wireless communication module is used for acquiring electronic picture data from a cloud end and uploading the captured human eye images to a computer terminal; the four vertex positions of the digital photo frame are provided with infrared light sources which can emit infrared light;

in the specific embodiment of the invention, the computer terminal can send a control instruction to the digital photo frame, the computer terminal determines the facial posture and the eye watching area based on the received human face image and the electronic picture displayed by the digital photo frame, and controls the digital photo frame to amplify the eye watching area or rotate the picture in the digital photo frame based on the control instruction;

the user can control whether the camera is started or not by himself, when the user selects to start the camera, the infrared light sources positioned at the four vertexes of the digital photo frame emit infrared light, the digital photo frame utilizes the camera to shoot a face image, the shot face image is a time sequence image, and the face image set is { I } _t |t∈[t ₀ ,t _e ]In which I _t For face images taken at any time t, t ₀ Indicating the initial moment, t, of the camera shot _e The time interval between adjacent moments is delta t.

Optionally, the binarizing processing is performed on the face image in the step S1 to obtain a binarized face image, and the binarizing processing includes:

the arbitrary face image I _t The binarization processing flow is as follows:

s11: for face image I _t Performing graying processing on all pixel points to obtain the gray values of the pixel points, and taking the gray values of the pixel points as pixel values to obtain a human face image subjected to graying processing, wherein the graying processing formula is as follows:

wherein:

I _t,g (x, y) denotes a face image I _t Middle x row y column pixel point I _t Grey scale value of (x, y), i.e. pixel point I _t A pixel value of (x, y);

R _t (x,y),G _t (x,y),B _t (x, y) respectively represent pixel points I _t (x, y) values on R, G, B color channels;

s12: carrying out gray stretching processing on the face image after the gray processing, wherein the formula of the gray stretching processing is as follows:

wherein:

g _t (x, y) representation gray level stretched pixel point I _t A pixel value of (x, y);

I′ _t,min (x, y) represents a face image I 'after gradation processing' _t Of minimum pixel value, l' _t,max (x, y) represents a face image I 'after gradation processing' _t The maximum pixel value of (a);

s13: initializing threshold values

S14: stretching the gray scale of the human face image I _t The pixels of (1) are divided into foreground pixels and background pixels, wherein the pixels with pixel values smaller than a threshold value are divided into foreground pixels, and the pixels with pixel values larger than or equal to the threshold value are divided into background pixels;

respectively calculating the average pixel values of a foreground pixel and a background pixel, wherein the average pixel value of the foreground pixel is m ₁ ，

The average pixel value of the background pixel is m ₂ ；

S15: updating a threshold value

S16: repeating the steps S14-S15 until the updated threshold is the same as the original threshold to obtain the final binary threshold

Stretching the gray scale of the human face image I _t Medium low and highIn the binarization threshold->

Is set to 0 above a binarization threshold value->

The pixel value of (2) is set to 255, and a face image after binarization processing is obtained.

Optionally, the extracting an eye region image from the binarized face image in the step S1 to obtain an eye region image includes:

extracting a human eye region image from the face image subjected to binarization processing to obtain a human eye region image, wherein the human eye region image extraction process comprises the following steps:

constructing an image extraction model of a human eye region, wherein the image extraction model of the human eye region is formed by cascading n human eye region detection classification models, the input of the human eye region detection classification model is an image region, the output of the human eye region detection classification model is the detection classification result { -1, +1} of the image region, when the output result of the human eye region detection classification model is-1, the input image region is not the human eye image region, when the output result of the human eye region detection classification model is +1, the input image region is the human eye image region, and the training process of the human eye region detection classification model is as follows:

the method comprises the steps of collecting a plurality of human eye region image samples and non-human eye region image samples to train a human eye region detection classification model, extracting sample characteristics by the human eye region detection classification model, detecting and classifying the sample characteristics, and optimizing parameters of the model by taking minimum mean square error of the sample classification as a target;

according to the human eye region detection classification model after parameter optimization, calculating the weight w of any ith human eye region detection classification model _i ：

Wherein:

err _i representing the number of classification error samples of the ith personal eye region detection classification model after parameter optimization, all representing the total number of samples, i is equal to [1, n ]]；

And to the weight w _i Carrying out normalization processing, wherein the formula of the normalization processing is as follows:

wherein:

w _min minimum weight, w, representing n human eye region detection classification models _max Maximum weight representing n human eye region detection classification models;

normalized weights representing an ith personal eye region detection classification model;

performing the parameter optimization and weight calculation on the n-numbered personal eye region detection classification models to obtain n-numbered personal eye region detection classification models and corresponding normalized weight sets

Wherein f is _i () represents the ith personal eye region detection classification model;

the cascade combination of the human eye region detection classification models is carried out according to the following formula:

wherein:

h (x) represents the human eye region image extraction model after cascade combination;

i represents input image data;

dividing the binarized face image into a plurality of sub-images, wherein the size of each sub-image is the size of a normal eye region; and inputting the divided sub-images into a human eye region image extraction model, and if the model output is +1, indicating that the sub-images are the human eye region images which are binary images.

Optionally, the step S2 of detecting the facial pose in the binarized human face image, if an inclined face is detected, performing adaptive rotation processing on the picture in the digital photo frame, and displaying the picture with a size smaller than the size of the display screen of the digital photo frame in a full screen manner, includes:

detecting the facial pose in the binarized face image, wherein the facial pose detection process comprises the following steps:

s21: selecting all pixel points of the eye region image to obtain a pixel point set of the eye region image:

{(x _j ,y _j )|j∈[1,N]}

wherein:

(x _j ,y _j ) Representing the coordinates of the jth pixel point in the eye region image, and N representing the total number of the pixel points in the eye region image;

s22: constructing a camera shooting model:

wherein:

(f _X ,f _Y ) Denotes the focal length of the camera, f _X Representing the focal length of the camera in the horizontal direction, f _Y Represents the focal length of the camera in the vertical direction, (c) _X ,c _Y ) The focal points respectively correspond to the horizontal direction and the vertical direction;

(X ', Y') represents the pixel coordinates of the photographed object in the image, and (X, Y, Z) represents the coordinates of the photographed object in the world coordinate system;

represents a rotation matrix, <' > or>

Representing a translation matrix, wherein a rotation matrix and the translation matrix in the model are variables to be solved;

s23: substituting the collected pixel point set of the human eye area image and the corresponding coordinate under the world coordinate system into a camera shooting model to obtain a rotation matrix r' of the human eye area image:

wherein:

α represents a pitch angle of the face pose, which reflects the face looking up or down;

beta represents the yaw angle of the face pose, which reflects the left and right turn of the face;

gamma represents the roll angle of the face pose, reflecting the yaw of the face;

s24: solving to obtain the roll angle gamma of the facial posture:

if gamma is larger than 15 degrees, the face of the user inclines rightwards, and the digital photo frame automatically rotates the displayed picture anticlockwise by gamma degrees;

if gamma < -15 degrees, the face of the user inclines to the left, and the digital photo frame automatically rotates the displayed picture clockwise by gamma degrees;

the digital photo frame detects the size format of the electronic picture to be displayed in real time, and if the picture with the size smaller than the display screen size of the digital photo frame is displayed in a full screen mode, the full screen display process comprises the following steps: and amplifying the electronic picture to be displayed to the size of the display screen of the digital photo frame, and filling the missing pixels in the amplified electronic picture by utilizing a nearest point interpolation algorithm.

Optionally, the detecting, in the step S3, the position of the pupil in the extracted image of the eye region by using a pupil detection algorithm includes:

when the roll angle of the face of the user is between [ -15 °,15 ° ] showing that the image shot by the camera is a face image without facial deviation, the pupil position in the extracted eye region image is detected by using a pupil detection algorithm, and the detection flow of the pupil position is as follows:

s31: calculating any pixel point (x) in the human eye region image _j ,y _j ) Gradient g of _j ：

Wherein:

g′(x _j ,y _j ) Representing pixel points (x) in the pre-processed eye region image _j ,y _j ) A pixel value of (a);

s32: constructing an objective function solved by the pupil position center to obtain

The maximum pixel point coordinate (x) _L ,y _L ) As the pupil center:

h _jL ＝(x _j -x _L ,y _j -y _L )

wherein:

h _jL is the displacement vector of the pupil to be detected;

(x _L ,y _L ) Position coordinates of the center of the pupil of the eye;

t represents transposition, and omega represents a pixel point set in the human eye area image;

will (x) _L ,y _L ) The radius of the pupil of the eye is set to xi as the center of the circle, and is obtained as (x) _L ,y _L ) And a circular area with the circle center and xi as the radius is used as the eye pupil position area.

Optionally, the determining, in step S4, a region watched by the human eye by using an electronic picture region of interest detection algorithm based on the pupil position includes:

determining a region watched by human eyes by using an electronic picture region-of-interest detection algorithm based on pupil positions, wherein the human eye watching region determination process of the electronic picture region-of-interest detection algorithm comprises the following steps:

s41: the eye pupil position area is corresponded to an original face image, the eye pupil position area in the original face image is the original eye pupil area, infrared light emitted by infrared light sources at four vertexes of the digital photo frame forms four light spots in the original eye pupil position area, and the position coordinates of the four light spots are respectively

Wherein->

Represents the coordinates of two light spots on the upper part in the pupil area of the original eye, and is used for judging whether the pupil area of the eye is normal or normal>

Represents the left spot coordinate, and>

the coordinates of the right spot are indicated,

represents the coordinates of two light spots in the lower part of the pupil area of the original eye part>

Represents the left spot coordinate, and>

representing the right spot coordinates;

s42: respectively calculate

And->

Cross ratio value V of ₁₂ ,V ₂₃ And then obtaining the central coordinate (x) of the human eye gazing area in the display screen ^* ,y ^* )：

Wherein:

w represents the length of the digital photo frame display screen, and H represents the width of the digital photo frame display screen;

s43: in the electronic picture displayed on the display screen, the picture is displayed by (x) ^* ,y ^* ) Is a rectangular center, and is constructed to be long

Broad is->

The constructed rectangular area is used as a human eye fixation area.

Optionally, in the step S4, when the time that the user gazes at the area exceeds a preset threshold, the process of enlarging the picture of the area includes:

if the human eye watching areas in the face images shot at adjacent moments are the same, the time that the user watches the area exceeds a preset threshold value delta t, and the digital photo frame amplifies the human eye watching areas.

Optionally, in the step S5, the digital photo frame selects a picture similar to the scene of the picture enlargement area to push, including:

uploading the amplified picture area to a cloud end by the digital picture frame, extracting SIFT characteristics of the picture area by the cloud end, and taking the SIFT characteristics as a characteristic vector representing the picture area;

respectively extracting the characteristic vectors of different electronic pictures in the cloud, calculating the similarity between the characteristic vectors of the electronic pictures in the cloud and the characteristic vector of the amplified picture, selecting the electronic picture with the highest similarity in the cloud for pushing, wherein the similarity calculation method is a cosine similarity calculation method.

In order to solve the above problems, the present invention provides a sensory linkage situational digital photo frame interactive system, which is characterized in that the system comprises:

the image acquisition module is used for shooting to obtain a face image and preprocessing the face image to obtain a preprocessed face image;

the detection device is used for detecting the facial posture in the binaryzation human face image, detecting the position of a pupil in the human eye area image based on a pupil detection algorithm, and determining the area watched by the human eye by using an electronic picture interest area detection algorithm based on the pupil position;

and the picture interaction device is used for rotating and amplifying the pictures in the digital photo frame, detecting the size of the pictures and the size of a display screen of the digital photo frame, displaying the pictures with the size smaller than the size of the display screen of the digital photo frame in a full screen manner, and automatically selecting the pictures similar to the amplified area scenes of the pictures for pushing.

In order to solve the above problem, the present invention also provides an electronic device, including:

a memory storing at least one instruction; and

and the processor executes the instructions stored in the memory to realize the sensory linkage situational digital photo frame interaction method.

In order to solve the above problem, the present invention further provides a computer-readable storage medium, where at least one instruction is stored in the computer-readable storage medium, and the at least one instruction is executed by a processor in an electronic device to implement the sensory linkage contextual digital photo frame interaction method described above.

Compared with the prior art, the invention provides a sensory linkage situational digital photo frame interaction method, which has the following advantages:

firstly, the scheme provides a picture rotation method based on facial gesture interaction, and the facial gesture detection process comprises the following steps of: selecting all pixel points of the eye region image to obtain a pixel point set of the eye region image:

{(x _j ,y _j )|j∈[1,N]}

wherein: (x) _j ,y _j ) Representing the coordinates of the jth pixel point in the eye region image, and N representing the total number of the pixel points in the eye region image; constructing a camera shooting model:

wherein: (f) _X ,f _Y ) Denotes the focal length of the camera, f _X Representing the focal length of the camera in the horizontal direction, f _Y Represents the focal length of the camera in the vertical direction, (c) _X ,c _Y ) The focal points respectively correspond to the horizontal direction and the vertical direction; (X ', Y') represents the pixel coordinates of the photographed object in the image, and (X, Y, Z) represents the coordinates of the photographed object in the world coordinate system;

represents a rotation matrix, <' > or>

Representing a translation matrix, wherein a rotation matrix and the translation matrix in the model are variables to be solved; substituting the collected pixel point set of the human eye area image and the corresponding coordinate under the world coordinate system into a camera shooting model to obtain a rotation matrix r' of the human eye area image:

wherein: α represents the pitch angle of the face pose, which reflects the face looking up or down; beta represents the yaw angle of the face pose, which reflects the left and right turn of the face; gamma represents the roll angle of the face pose, reflecting the yaw of the face; s24: solving to obtain the roll angle gamma of the facial posture:

if gamma is larger than 15 degrees, the face of the user inclines rightwards, and the digital photo frame automatically rotates the displayed picture anticlockwise by gamma degrees; if gamma < -15 degrees, the face of the user is inclined to the left, and the digital photo frame automatically rotates the displayed picture clockwise by gamma degrees. According to the scheme, the shooting model is determined by combining the face region image, the camera and the world coordinate system, the left and right head swinging angles of the face posture are determined by means of the eye region image, when the left and right head swinging angles of the face posture are large, the fact that the face of a user inclines indicates that the user looks at the picture in the digital photo frame when the face of the user inclines is indicated, therefore, the picture is subjected to self-adaptive rotation processing, the rotation angle is the inclination angle of the user, and the digital photo frame picture self-adaptive rotation processing based on face posture sensing is achieved.

Meanwhile, the scheme provides a method for amplifying the interested focus picture area of the viewer, when the roll angle of the face of the user is [ -15 degrees, 15 degrees °]In the meantime, the image shot by the camera is a face image without facial offset, and the position of the pupil in the extracted eye area image is detected by using a pupil detection algorithm, wherein the detection process of the pupil position comprises the following steps: calculating any pixel point (x) in the human eye region image _j ,y _j ) Gradient g of _j ：

Wherein: g' (x) _j ,y _j ) Representing pixel points (x) in the preprocessed eye region image _j ,y _j ) A pixel value of (a); constructing an objective function solved by the pupil position center to obtain

The maximum pixel point coordinate (x) _L ,y _L ) As the pupil center:

h _jL ＝(x _j -x _L ,y _j -y _L )

wherein: h is _jL Is the displacement vector of the pupil to be detected; (x) _L ,y _L ) Position coordinates of the center of the pupil of the eye; t represents transposition, and omega represents a pixel point set in the human eye area image; will (x) _L ,y _L ) As the center of circle, the radius of the pupil of the eye is set to be xi and is obtained as (x) _L ,y _L ) And a circular area with the circle center and the radius xi is used as an eye pupil position area. Determining a region watched by human eyes by using an electronic picture region-of-interest detection algorithm based on pupil positions, wherein the human eye region-of-interest determination process of the electronic picture region-of-interest detection algorithm comprises the following steps: the eye pupil position area is corresponded to an original face image, the eye pupil position area in the original face image is the original eye pupil area, infrared light emitted by infrared light sources at four vertexes of the digital photo frame forms four light spots in the original eye pupil position area, and the position coordinates of the four light spots are respectively

Wherein->

Represents the left spot coordinate, and>

the coordinates of the right spot are indicated,

representing the coordinates of two light spots lower in the pupil area of the original eye part>

Represents the left spot coordinate, and>

representing the coordinates of the right light spot; respectively count>

And->

Cross ratio value V of ₁₂ ,V ₂₃ And further obtaining the central coordinate (x) of the human eye watching region in the display screen ^* ,y ^* )：

Wherein: w represents the length of the digital photo frame display screen, and H represents the width of the digital photo frame display screen; in the electronic picture displayed on the display screen, the picture is displayed by (x) ^* ,y ^* ) Is a rectangular center, and is constructed to be long

Broad is->

The constructed rectangular area is used as a human eye fixation area. If the human eye watching areas in the face images shot at adjacent moments are the same, the time that the user watches the areas exceeds a preset threshold value delta t, and the digital photo frame amplifies the human eye watching areas. According to the scheme, the position of the pupil in the image of the eye region is detected based on a hole detection algorithm, the corresponding relation between the intersection point of the facula and the central point of the eye watching region is established based on the position coordinate of the facula in the pupil, the eye watching region is obtained, and therefore the region of the viewer interest focus image is amplified based on the perception of the eye pupil.

Drawings

Fig. 1 is a schematic flow chart of a sensory linkage contextual digital photo frame interaction method according to an embodiment of the present invention;

fig. 2 is a functional block diagram of a sensory linkage situational digital photo frame interactive system according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device for implementing a sensory linkage contextual digital photo frame interaction method according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The embodiment of the application provides a sense organ linkage situational digital photo frame interaction method. The execution main body of the sensory linkage contextual digital photo frame interaction method includes, but is not limited to, at least one of electronic devices such as a server and a terminal, which can be configured to execute the method provided by the embodiment of the present application. In other words, the sensory linkage contextual digital photo frame interaction method may be executed by software or hardware installed in a terminal device or a server device, and the software may be a block chain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.

Example 1:

s1: the digital photo frame obtains a face image by shooting with a camera, and preprocesses the face image to obtain a preprocessed face image, wherein the preprocessing method comprises binarization processing and extraction of an image of a human eye region.

The step S1 is that the digital photo frame is shot by a camera to obtain a face image, and the method comprises the following steps:

the digital photo frame comprises a display screen, a camera and a wireless communication module, wherein the display screen is used for displaying electronic pictures, the camera is used for shooting and capturing human eye images, and the wireless communication module is used for acquiring electronic picture data from a cloud and uploading the captured human eye images to a computer terminal; the four vertex positions of the digital photo frame are provided with infrared light sources which can emit infrared light;

the digital photo frame can automatically rotate a picture, amplify the human eye watching area and select a picture similar to the human eye watching area scene for pushing;

the user can control whether the camera is started or not by himself, when the user selects to start the camera, the infrared light sources positioned at the four vertexes of the digital photo frame emit infrared light, the digital photo frame utilizes the camera to shoot a face image, the shot face image is a time sequence image, and the face image set is { I } _t |t∈[t ₀ ,t _e ]In which I _t For face images taken at any time t, t ₀ Indicating the initial moment, t, of the camera shot _e And represents the cut-off time of the camera shooting.

The step S1 of performing binarization processing on the face image to obtain a binarized face image includes:

the arbitrary face image I _t The binarization processing flow comprises the following steps:

wherein:

wherein:

g _t (x, y) represents pixel point I after gray stretching _t A pixel value of (x, y);

I′ _t,min (x, y) represents a face image I 'after gradation processing' _t Of minimum pixel value, l' _t,max (x, y) represents a face image I 'after gradation processing' _t A maximum pixel value of;

s13: initializing threshold values

S14: stretching the gray scale of the human face image I _t The pixels of (2) are divided into foreground pixels and background pixels, wherein the pixels with the pixel values smaller than a threshold value are divided into the foreground pixels, and the pixels with the pixel values larger than or equal to the threshold value are divided into the background pixels;

respectively calculating the average pixel values of a foreground pixel and a background pixel, wherein the average pixel value of the foreground pixel is m ₁ The average pixel value of the background pixel is m ₂ ；

S15: updating a threshold value

Stretching gray level of human face image I' _t Is lower than the binarization threshold value->

Is set to 0 above the binarization threshold->

In the step S1, extracting a human eye region image from the binarized human face image to obtain a human eye region image, including:

constructing an image extraction model of a human eye region, wherein the image extraction model of the human eye region is formed by cascading n human eye region detection classification models, the input of the human eye region detection classification model is an image region, the output is a detection classification result { -1, +1} of the image region, when the output result of the human eye region detection classification model is-1, the input image region is not an image region of a human eye, when the output result of the human eye region detection classification model is +1, the input image region is an image region of the human eye, and the training process of the human eye region detection classification model is as follows:

Wherein:

/>

wherein:

w _min minimum weight, w, representing n eye region detection classification models _max Maximum weight representing n human eye region detection classification models;

wherein:

i represents input image data;

dividing the binarized face image into a plurality of sub-images, wherein the size of each sub-image is the size of a normal eye region; inputting the divided sub-images into a human eye region image extraction model, and if the model output is +1, indicating that the sub-images are the human eye region images which are binary images.

S2: detecting the facial posture in the binaryzation human face image, if an inclined face is detected, rotating the picture in the digital photo frame, detecting the size of the picture and the size of a display screen of the digital photo frame, and displaying the picture with the size smaller than the size of the display screen of the digital photo frame in a full screen mode.

The facial gesture in the face image of binarization is detected in S2 step, if the face that detects the slope, then carry out the self-adaptation rotation to the picture in the digital photo frame to picture that the size is less than digital photo frame display screen size carries out full screen display, includes:

{(x _j ,y _j )|j∈[1,N]}

wherein:

s22: constructing a camera shooting model:

wherein:

(f _X ,f _Y ) Indicating the focal length of the camera, f _X Representing the focal length of the camera in the horizontal direction, f _Y Shows the focal length of the camera in the vertical direction, (c) _X ,c _Y ) The focal points respectively correspond to the horizontal direction and the vertical direction;

represents a rotation matrix, <' > or>

wherein:

beta represents the yaw angle of the facial pose, which reflects the left and right turns of the face;

s24: solving to obtain the roll angle gamma of the facial posture:

the digital photo frame detects the size format of the electronic picture to be displayed in real time, and if the picture with the size smaller than the display screen size of the digital photo frame is displayed in a full screen mode, the full screen display process comprises the following steps: and amplifying the electronic picture to be displayed to the size of the display screen of the digital photo frame, and filling missing pixels in the amplified electronic picture by utilizing a nearest point interpolation algorithm.

S3: the position of the pupil in the image of the eye region is detected based on a pupil detection algorithm.

In the step S3, detecting the position of the pupil in the extracted image of the eye region by using a pupil detection algorithm, including:

Wherein:

Reaching the maximum pixel point coordinate (x) _L ,y _L ) As the pupil center:

h _jL ＝(x _j -x _L ,y _j -y _L )

wherein:

h _jL is the displacement vector of the pupil to be detected;

(x _L ,y _L ) Position coordinates of the center of the pupil of the eye;

will (x) _L ,y _L ) The radius of the pupil of the eye is set to xi as the center of the circle, and is obtained as (x) _L ,y _L ) And a circular area with the circle center and the radius xi is used as an eye pupil position area.

S4: determining a region watched by human eyes by using an electronic picture interest region detection algorithm based on pupil positions, and amplifying the picture in the region when the time of watching the region by a user exceeds a preset threshold value.

In the step S4, determining a region gazed by the human eye by using an electronic picture region of interest detection algorithm based on the pupil position, including:

Wherein->

Represents the coordinates of a left spot, and>

the coordinates of the right spot are indicated,

Represents the left spot coordinate, and>

representing the right spot coordinates;

s42: respectively calculate

And->

Wherein:

Broad is->

The constructed rectangular area is used as a human eye watching area.

In the step S4, when the time that the user gazes at the area exceeds a preset threshold, the process of enlarging the picture in the area includes:

if the human eye watching areas in the face images shot at adjacent moments are the same, the time that the user watches the areas exceeds a preset threshold value delta t, and the digital photo frame amplifies the human eye watching areas.

S5: the digital photo frame automatically selects and pushes the pictures similar to the scenes of the picture amplification areas.

In the step S5, the digital photo frame selects a picture similar to the scene of the picture amplification area to be pushed, and the method includes:

Example 2:

as shown in fig. 2, a functional block diagram of a sensory linkage contextual digital photo frame interaction system according to an embodiment of the present invention is provided, which can implement the sensory linkage contextual digital photo frame interaction method according to embodiment 1.

The sensory linkage situational digital photo frame interactive system 100 of the present invention may be installed in an electronic device. According to the realized functions, the sensory linkage contextual digital photo frame interaction system can comprise an image acquisition module 101, a detection device 102 and a picture interaction device 103. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

The image acquisition module 101 is used for shooting to obtain a face image, and preprocessing the face image to obtain a preprocessed face image;

the detection device 102 is used for detecting the facial pose in the binarized human face image, detecting the position of a pupil in the human eye area image based on a pupil detection algorithm, and determining the area watched by the human eye by using an electronic picture interest area detection algorithm based on the pupil position;

the picture interaction device 103 is used for rotating and amplifying the pictures in the digital picture frame, detecting the size of the pictures and the size of a display screen of the digital picture frame, displaying the pictures with the size smaller than the size of the display screen of the digital picture frame in a full screen mode, and automatically selecting the pictures similar to the scenes of the amplified pictures to push.

In detail, when the modules in the sensory linkage contextual digital photo frame interaction system 100 according to the embodiment of the present invention are used, the same technical means as the sensory linkage contextual digital photo frame interaction method described in fig. 1 above is adopted, and the same technical effects can be produced, which is not described herein again.

Example 3:

The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a program 12, stored in the memory 11 and executable on the processor 10.

The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of the program 12, but also to temporarily store data that has been output or is to be output.

The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (programs 12 for performing digital picture frame interaction, etc.) stored in the memory 11 and calling data stored in the memory 11.

The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.

Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.

For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power diverters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

Further, the electronic device 1 may further include a communication interface 13, and optionally, the communication interface 13 may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.

Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

The program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, enable:

the digital photo frame obtains a face image by shooting with a camera, and preprocesses the face image to obtain a preprocessed face image;

detecting the facial posture in the binarized face image, if an inclined face is detected, rotating the picture in the digital photo frame, detecting the size of the picture and the size of a display screen of the digital photo frame, and displaying the picture with the size smaller than the size of the display screen of the digital photo frame in a full screen manner;

detecting the position of a pupil in the human eye area image based on a pupil detection algorithm;

determining a region watched by human eyes by using an electronic picture interest region detection algorithm based on pupil positions, and amplifying the picture in the region when the time of watching the region by a user exceeds a preset threshold;

the digital photo frame automatically selects and pushes the pictures similar to the pictures in the enlarged area scene.

Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiments corresponding to fig. 1 to fig. 3, which is not repeated herein.

It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one of 8230, and" comprising 8230does not exclude the presence of additional like elements in a process, apparatus, article, or method comprising the element.

Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A sensory linkage situational digital photo frame interaction method is characterized by comprising the following steps:

s1: the digital photo frame obtains a face image by shooting with a camera, and preprocesses the face image to obtain a preprocessed face image, wherein the preprocessing method comprises binarization processing and extraction of an image of a human eye region;

s2: detect the facial gesture in the face image of binaryzation, if detect the face of slope, then rotate the picture in the digital photo frame and handle to detect picture size and digital photo frame display screen size, carry out full-screen display to the picture that the size is less than digital photo frame display screen size, wherein facial gesture detection and picture rotation process flow include:

{(x _j ，y _j )|j∈[1，N]}

wherein:

(x _j ，y _j ) Representing the coordinates of the jth pixel point in the eye region image, and N representing the total number of the pixel points in the eye region image;

s22: constructing a camera shooting model:

wherein:

(f _X ，f _Y ) Indicating the focal length of the camera, f _X Representing the focal length of the camera in the horizontal direction, f _Y Represents the focal length of the camera in the vertical direction, (c) _X ，c _Y ) The focal points respectively correspond to the horizontal direction and the vertical direction;

represents a rotation matrix, <' > or>

wherein:

s24: solving to obtain the roll angle gamma of the facial posture:

/>

if gamma is less than-15 degrees, the face of the user inclines leftwards, and the digital photo frame automatically rotates the displayed picture clockwise by gamma degrees;

the digital photo frame detects the size format of the electronic picture to be displayed in real time, and if the picture with the size smaller than the display screen size of the digital photo frame is displayed in a full screen mode, the full screen display process comprises the following steps: amplifying the electronic picture to be displayed to the size of a display screen of the digital photo frame, and filling missing pixels in the amplified electronic picture by utilizing a nearest point interpolation algorithm;

2. The sensory linkage situational digital photo frame interaction method of claim 1, wherein in the step S1, the digital photo frame is shot by a camera to obtain a face image, and the method comprises:

the user can control whether the camera is started or not by himself, when the user selects to start the camera, the infrared light sources positioned at the four vertexes of the digital photo frame emit infrared light, the digital photo frame utilizes the camera to shoot a face image, the shot face image is a time sequence image, and the face image set is { I } _t |t∈[t ₀ ，t _e ]In which I _t For face images taken at any time t, t ₀ Indicating the initial moment, t, of the camera shot _e The time interval between adjacent times is Δ t.

3. The sensory linkage situational digital photo frame interaction method of claim 2, wherein the binarizing processing is performed on the face image in the step S1 to obtain a binarized face image, comprising:

the binarization processing flow of the face image shot at any time t is as follows:

wherein:

I _t，g (x, y) denotes a face image I _t Middle x row y column pixel point I _t Grey scale value of (x, y), i.e. pixel point I _t A pixel value of (x, y);

R _t (x，y)，G _t (x，y)，B _t (x, y) respectively represent pixel points I _t (x, y) values on R, G, B color channels;

wherein:

I′ _t，min (x， _y ) Represents a face image I 'after gradation processing' _t Of the minimum pixel value of l' _t，max (x， _y ) Represents a face image I 'after gradation processing' _t A maximum pixel value of;

s13: initializing threshold values

S14: stretching the gray scale to obtain a human face image I _t The pixels of (2) are divided into foreground pixels and background pixels, wherein the pixels with the pixel values smaller than a threshold value are divided into the foreground pixels, and the pixels with the pixel values larger than or equal to the threshold value are divided into the background pixels;

respectively calculating the average pixel value of a foreground pixel and the average pixel value of a background pixel, wherein the average pixel value of the foreground pixel is m ₁ The average pixel value of the background pixel is m ₂ ；

S15: updating a threshold

Stretching the gray scale to obtain a human face image I _t Is lower than the binarization threshold value->

Is set to 0 above the binarization threshold->

4. The sensory linkage situational digital photo frame interaction method of claim 3, wherein the extracting of the eye region image from the binarized face image in step S1 to obtain the eye region image comprises:

the method comprises the steps that a plurality of human eye area image samples and non-human eye area image samples are collected to train a human eye area detection classification model, sample features are extracted by the human eye area detection classification model, the sample features are detected and classified, and parameter optimization of the model is carried out with the minimum mean square error of the sample classification as a target;

Wherein:

wherein:

performing the parameter optimization and the weight calculation on the n-person eye region detection classification models to obtain n-person eye region detection classification models and corresponding normalization weight sets

wherein:

h (I) represents a human eye region image extraction model after cascade combination;

i represents input image data;

dividing the binarized face image into a plurality of sub-images, wherein the size of each sub-image is the size of a normal eye region; and inputting the divided sub-images into a human eye region image extraction model, and if the model output is +1, indicating that the sub-images are the human eye region images.

5. The sensory linkage contextual digital photo frame interaction method of claim 1, wherein the detecting the position of the pupil in the extracted eye area image using a pupil detection algorithm in step S3 comprises:

s31: calculating any pixel point (x) in the human eye region image _j ，y _j ) Gradient g of _j ：

Wherein:

g′(x _j ，y _j ) Representing pixel points (x) in the pre-processed eye region image _j ，y _j ) The pixel value of (a);

s32: constructing an objective function solved by the center of the pupil position to obtain

Reaching the maximum pixel point coordinate (x) _L ，y _L ) As the pupil center:

h _jL ＝(x _j -x _L ，y _j -y _L )

wherein:

h _jL is the displacement vector of the pupil to be detected;

(x _L ，y _L ) Position coordinates of the center of the pupil of the eye;

will (x) _L ，y _L ) The radius of the pupil of the eye is set to xi as the center of the circle, and is obtained as (x) _L ，y _L ) And a circular area with the circle center and the radius xi is used as an eye pupil position area.

6. The sensory linkage situational digital photo frame interaction method of claim 5, wherein in the step S4, determining the region watched by the human eyes by using an electronic picture interest region detection algorithm based on pupil positions comprises:

Wherein +>

Represents the coordinates of the two spots in the upper part of the pupil area of the original eye, and>

indicating left spot sittingMark or is present>

The coordinates of the right spot are indicated,

Represents the left spot coordinate, and>

representing the right spot coordinates;

s42: respectively calculate

And->

Cross ratio value V of ₁₂ ，V ₂₃ And further obtaining the central coordinate (x) of the human eye watching region in the display screen ^* ，y ^* )：

Wherein:

s43: in the electronic picture displayed on the display screen, the picture is displayed by (x) ^* ，y ^* ) Is a rectangular center, and is constructed to be long

Broad is->

The constructed rectangular area is used as a human eye watching area.

7. The sensory linkage contextual digital photo frame interaction method of claim 6, wherein in step S4, when the time for the user to look at the area exceeds a preset threshold, the process of magnifying the picture of the area comprises:

8. The sensory linkage contextual digital photo frame interaction method of claim 1, wherein the digital photo frame in step S5 selects a picture similar to the scene of the picture enlargement area to push, comprising:

respectively extracting the characteristic vectors of different electronic pictures in the cloud, calculating the similarity between the characteristic vectors of the electronic pictures in the cloud and the characteristic vectors of the amplified pictures, and selecting the electronic picture with the highest similarity in the cloud for pushing, wherein the similarity calculation method is a cosine similarity calculation method.

9. A sensory linkage situational digital photo frame interactive system, comprising:

the picture interaction device is used for rotating and amplifying pictures in the digital picture frame, detecting the size of the pictures and the size of a display screen of the digital picture frame, displaying the pictures with the size smaller than the size of the display screen of the digital picture frame in a full screen mode, automatically selecting the pictures similar to the scenes of the amplified areas of the pictures and pushing the pictures, and therefore the sense organ linkage situational digital picture frame interaction method is achieved according to any one of claims 1 to 8.