CN114895790A

CN114895790A - Man-machine interaction method and device, electronic equipment and storage medium

Info

Publication number: CN114895790A
Application number: CN202210589538.5A
Authority: CN
Inventors: 杨亚军; 游城; 魏学华; 杨克庆
Original assignee: Shenzhen Stereo Technology Co ltd
Current assignee: Shenzhen Stereo Technology Co ltd
Priority date: 2022-05-27
Filing date: 2022-05-27
Publication date: 2022-08-12

Abstract

The embodiment of the invention discloses a man-machine interaction method, a man-machine interaction device, electronic equipment and a storage medium, wherein the man-machine interaction method is applied to a terminal, the terminal is provided with a display screen, and the method comprises the following steps: controlling a display screen to display; acquiring a first eye parameter and a first light shadow of the eyes of the user, wherein the first eye parameter is used for representing the position of the iris of the eyes of the user, and the first light shadow is the light shadow of the display screen on the eyes of the user; acquiring a first viewing distance from eyes of a user to a display screen; obtaining a watching coordinate of the eyes of the user on the display screen according to the first watching distance, the first eye parameter and the first shadow; and obtaining the input information of the user on the display screen according to the gazing coordinates. According to the embodiment of the invention, interaction can be realized through eyes without touching the terminal, so that the interaction experience of a user is improved.

Description

Man-machine interaction method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of human-computer interaction technologies, and in particular, to a human-computer interaction method, an apparatus, an electronic device, and a storage medium.

Background

At present, intelligent terminals such as mobile phones and tablet computers gradually enter the lives of people, and how to further improve the interaction experience of users becomes the research direction of future human-computer interaction. In the related technology, human-computer interaction is generally realized between a user and an intelligent terminal such as a mobile phone and a tablet personal computer by touching a screen through fingers, the human-computer interaction mode depends on human hands, the interaction mode is single, and the user experience is poor.

Disclosure of Invention

The embodiment of the invention provides a man-machine interaction method and device, electronic equipment and a storage medium, which can realize interaction through eyes without touching a terminal, and improve the interaction experience of a user.

In a first aspect, an embodiment of the present invention provides a human-computer interaction method, which is applied to a terminal, where the terminal is provided with a display screen, and the method includes: controlling the display screen to display; acquiring a first eye parameter and a first light shadow of the eyes of the user, wherein the first eye parameter is used for representing the position of the iris of the eyes of the user, and the first light shadow is the light shadow of the display screen on the eyes of the user; acquiring a first viewing distance from eyes of a user to the display screen; obtaining a watching coordinate of the eyes of the user on the display screen according to the first watching distance, the first eye parameter and the first shadow; and obtaining the input information of the user on the display screen according to the gazing coordinate.

In some embodiments, the terminal is provided with a front camera; the acquiring a first eye parameter and a first light shadow of an eye of a user comprises: acquiring a face image of a user, wherein the face image is obtained by shooting through the front-facing camera; analyzing the face image to obtain first eye socket position information, first iris position information and a first light shadow, wherein the first eye socket position information is used for representing the position of the eye socket of the user, and the first iris position information is used for representing the position of the iris of the eye of the user; and obtaining a first eye parameter according to the first eye socket position information and the first iris position information.

In some embodiments, the parsing from the facial image to obtain first orbital position information, first iris position information, and first shadows comprises: detecting a face eye region of the face image based on a preset detector to obtain first eye socket position information; converting the face image into a gray level image, carrying out binarization processing on the gray level image to obtain a first preprocessed image, and obtaining a first light shadow according to a rectangular or circular noise image in the first preprocessed image; and corroding and expanding the first preprocessed image, eliminating noise in the image to obtain a second preprocessed image, and extracting the position of a circular area which represents the iris of the user eye in the second preprocessed image by using a circular structural element to obtain first iris position information of the user eye.

In some embodiments, said deriving a gaze coordinate of a user's eye gaze on said display screen from said first viewing distance, said first eye parameter and said first shadow comprises: matching a coordinate mapping relation table under a corresponding distance according to the first viewing distance; calculating the shadow coordinates of the first shadow; and looking up a table in the coordinate mapping relation table according to the first eye parameter and the light and shadow coordinate to obtain a gazing coordinate of the eyes of the user gazing on the display screen.

In some embodiments, the shape of the first light shadow is a rectangle or a circle, and the light shadow coordinates are a geometric center point or an average point of the first light shadow.

In some embodiments, the method further comprises: displaying a sample cursor on the display screen; acquiring a second eye parameter and a second light shadow when the user eyes watch the sample cursor, wherein the second eye parameter is used for representing the position of the iris of the user eyes, and the second light shadow is the light shadow of the display screen or the sample cursor on the user eyes; acquiring a second viewing distance from the user's eye to the display screen when the user's eye gazes at the sample cursor; and recording the corresponding relation between the second eye parameters and the second light and shadow at the second viewing distance so as to establish a coordinate mapping relation table of the sample cursor.

In some embodiments, said displaying a sample cursor on said display screen comprises: uniformly dividing the display screen into a plurality of display areas; and respectively displaying a sample cursor on each display area.

In some embodiments, the recording the correspondence between the second eye parameter and the second shadow at the second viewing distance to establish a coordinate mapping relationship table of the sample cursor includes: sequentially displaying the sample cursor on each display area; when the watching position of the user is unchanged, respectively recording the corresponding relation between the plurality of second eye parameters and the plurality of second light shadows corresponding to the sample cursors watched by the user in sequence under the second watching distance so as to establish a coordinate mapping relation table of the sample cursors.

In some embodiments, the second viewing distance is multiple; the obtaining a second eye parameter and a second shadow when the user's eye gazes at the sample cursor includes: acquiring a second eye parameter and a second light shadow when the user eyes at a plurality of different second viewing distances are gazing at the sample cursor; recording the corresponding relation between the second eye parameter and the second shadow at the second viewing distance to establish a coordinate mapping relation table of the sample cursor, including: recording the corresponding relation between the second eye parameters and the second light and shadow corresponding to a plurality of different second viewing distances so as to establish a coordinate mapping relation table of the sample cursor at different viewing distances.

In some embodiments, the obtaining a second ocular parameter and a second shadow when the user's eye gazes at the sample cursor comprises: acquiring a second eye parameter and a second light shadow of the user eyes when the user eyes watch the sample cursor at different watching angles at the second watching distance; the viewing angle is an angle formed by a plane where the face of the user is located and the display screen.

In some embodiments, said deriving a gaze coordinate of a user's eye gaze on said display screen from said first viewing distance, said first eye parameter and said first shadow comprises: acquiring a preset fitting coordinate model or a neural network model, wherein the fitting coordinate model and the neural network model are respectively obtained by establishing a third viewing distance, a third eye parameter and a third light shadow in a sample and corresponding sample coordinates; and inputting the first viewing distance, the first eye parameter and the first shadow into the fitting coordinate model or the neural network model to obtain a watching coordinate of the eyes of the user on the display screen.

In some embodiments, the method further comprises: obtaining a coordinate mapping relation table of the third viewing distance, the third eye parameter and the third light shadow with the corresponding sample coordinate, and fitting based on the coordinate mapping table to obtain a fitting coordinate model from the third viewing distance, the third eye parameter and the third light shadow to the sample cursor; or acquiring a coordinate mapping relation table of the third viewing distance, the third eye parameter and the third light shadow with the corresponding sample coordinate, establishing a neural network model for obtaining the sample coordinate based on the third viewing distance, the third eye parameter and the third light shadow, and optimizing the neural network model through the coordinate mapping relation table.

In some embodiments, said deriving user input information on said display screen from said gaze coordinates comprises: displaying a virtual keyboard through the display screen; acquiring a key value mapping relation table of each key value of the keys on the virtual keyboard and the corresponding input position; and triggering a corresponding target key value from the virtual keyboard according to the gaze coordinate and the key value mapping relation table.

In some embodiments, said deriving user input information on said display screen from said gaze coordinates comprises: recording the watching time of the user on the watching coordinate, and obtaining the input information of the user on the display screen according to the watching coordinate when the watching time is greater than a first time threshold; or acquiring the limb action of the user after gazing the gazing coordinate, and obtaining the input information of the user on the display screen according to the gazing coordinate when the limb action is matched with the target limb action.

In some embodiments, the first iris position information includes left eye iris position information and right eye iris position information; the acquiring a first viewing distance from the eyes of the user to the display screen comprises: respectively obtaining left eye pupil position information and right eye pupil position information of the eyes of the user according to the left eye iris position information and the right eye iris position information; calculating the picture pupil distance of the user in the face image according to the left eye pupil position information and the right eye pupil position information; and calculating to obtain a first viewing distance from the eyes of the user to the display screen according to the picture interpupillary distance.

In some embodiments, the calculating a first viewing distance from the user's eyes to the display screen according to the picture interpupillary distance includes: acquiring a preset standard interpupillary distance; acquiring the focal length of the face image shot by the front camera, and obtaining the initial distance of the face image to an imaging point according to the focal length; obtaining a first proportion according to the picture interpupillary distance and the standard interpupillary distance, and obtaining a first viewing distance from the eyes of the user to the display screen according to the first proportion and the initial distance; or acquiring a preset distance lookup table; looking up a table from the distance lookup table according to the picture interpupillary distance to obtain a first viewing distance from the eyes of the user to the display screen; or acquiring a reference distance, a reference object size and a picture size corresponding to the reference object shot by the front camera; acquiring a preset standard interpupillary distance; and obtaining a first viewing distance from the eyes of the user to the display screen according to the reference distance, the reference object size, the picture interpupillary distance and the standard interpupillary distance.

In some embodiments, the front camera is a sub-screen camera, and the sub-screen camera is disposed at a central position of the display screen.

In a second aspect, an embodiment of the present invention further provides a human-computer interaction device, including: the first module is used for controlling a display screen to display; the second module is used for acquiring a first eye parameter and a first light shadow of the eyes of the user, wherein the first eye parameter is used for representing the position of the iris of the eyes of the user, and the first light shadow is the light shadow of the display screen on the eyes of the user; the third module is used for acquiring a first viewing distance from the eyes of the user to the display screen; a fourth module, configured to obtain a gazing coordinate where the user's eye gazes on the display screen according to the first viewing distance, the first eye parameter, and the first shadow; and the fifth module is used for obtaining the input information of the user on the display screen according to the gazing coordinate.

In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor implements the human-computer interaction method according to the embodiment of the first aspect when executing the computer program.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the storage medium stores a program, and the program is executed by a processor to implement the human-computer interaction method according to the embodiment of the first aspect of the present invention.

The embodiment of the invention at least comprises the following beneficial effects: the embodiment of the invention provides a human-computer interaction method, a device, electronic equipment and a storage medium, wherein the human-computer interaction method is applied to a terminal, the terminal is provided with a display screen, the display screen is controlled to display by executing human-computer interaction prevention, then a first eye parameter and a first light shadow of eyes of a user are obtained, the first eye parameter is used for representing the position of an iris of the eyes of the user, the first light shadow is the light shadow of the display screen on the eyes of the user, after the display screen emits light, the light shadow is formed in the eyes of the user based on the characteristics of the eyes of the user, then a first watching distance from the eyes of the user to the display screen is obtained, a watching coordinate of the eyes of the user on the display screen is obtained according to the first watching distance, the first eye parameter and the first light shadow, because the eyes of the human are of a spherical structure, when the eyes face different angles and positions of the display screen, the positions of the formed first light shadow in human eyes are different, the watching coordinate can be calculated by combining the first watching distance and the first eye parameter, and finally the input information of the user on the display screen is obtained according to the watching coordinate, so that the embodiment of the invention can realize interaction through eyes, a terminal does not need to be touched, and the interaction experience of the user is improved.

Drawings

FIG. 1 is a schematic diagram of an internal structure of a human eye according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a terminal provided by an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a human-computer interaction method according to an embodiment of the present invention;

FIG. 4a is a schematic diagram of light and shadow in an actual image provided by one embodiment of the present invention;

FIG. 4b is a schematic diagram of the position of the light shadow when looking in different directions according to one embodiment of the present invention;

FIG. 5 is a flowchart illustrating a human-computer interaction method according to another embodiment of the invention;

FIG. 6 is a flowchart illustrating a human-computer interaction method according to another embodiment of the invention;

FIG. 7 is a flowchart illustrating a human-computer interaction method according to another embodiment of the invention;

FIG. 8 is a flowchart illustrating a human-computer interaction method according to another embodiment of the invention;

FIG. 9 is a flowchart illustrating a human-computer interaction method according to another embodiment of the invention;

FIG. 10 is a flowchart illustrating a human-computer interaction method according to another embodiment of the invention;

FIG. 11 is a schematic illustration of a display screen being viewed in accordance with an embodiment of the present invention;

FIG. 12 is a schematic diagram of diagonal display screen viewing provided by one embodiment of the present invention;

FIG. 13 is a schematic view of an oblique display screen view provided by another embodiment of the present invention;

FIG. 14 is a flowchart illustrating a human-computer interaction method according to another embodiment of the invention;

FIG. 15 is a flowchart illustrating a human-computer interaction method according to another embodiment of the invention;

FIG. 16 is a flowchart illustrating a human-computer interaction method according to another embodiment of the invention;

FIG. 17 is a diagram of an application scenario for a human-computer interaction method provided by an embodiment of the invention;

FIG. 18 is a flowchart illustrating a human-computer interaction method according to another embodiment of the invention;

FIG. 19 is a flowchart illustrating a human-computer interaction method according to another embodiment of the invention;

FIG. 20a is a schematic diagram of a face image provided by one embodiment of the present invention;

FIG. 20b is a schematic diagram of a face image provided by another embodiment of the present invention;

FIG. 21 is a flowchart illustrating a human-computer interaction method according to another embodiment of the invention;

FIG. 22 is a schematic diagram of relative lens (imaging point) imaging provided by one embodiment of the present invention;

FIG. 23 is a schematic diagram of calculating a first viewing distance by the triangle principle, according to an embodiment of the present invention;

FIG. 24 is a schematic diagram of obtaining a first viewing distance according to a reference frame according to an embodiment of the present invention;

FIG. 25 is a schematic diagram of obtaining a first viewing distance according to a reference frame according to another embodiment of the present invention;

FIG. 26 is a schematic structural diagram of a human-computer interaction device according to an embodiment of the present invention;

fig. 27 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It should be understood that in the description of the embodiments of the present invention, a plurality (or a plurality) means two or more, more than, less than, more than, etc. are understood as excluding the number, and more than, less than, etc. are understood as including the number. If the description of "first", "second", etc. is used for the purpose of distinguishing technical features, it is not intended to indicate or imply relative importance or to implicitly indicate the number of indicated technical features or to implicitly indicate the precedence of the indicated technical features.

At present, intelligent terminals such as mobile phones and tablet computers gradually enter the lives of people, and how to further improve the interaction experience of users becomes the research direction of future human-computer interaction. In the related technology, human-computer interaction between a user and an intelligent terminal such as a mobile phone and a tablet personal computer generally needs to be achieved through touching a screen with fingers, and an applicant finds that the human-computer interaction mode achieved by touching the screen with the fingers of the user is seriously dependent on human hands, the interaction mode is single, once the user is separated from the human hands, the human-computer interaction cannot be achieved, and user experience is further reduced.

It is understood that the eye is a spherical lens, which is typically spherical, filled with a transparent gelatinous material, has a focusing lens, and an iris that controls the amount of light entering the eye. Referring to fig. 1, the human eye has a three-layer shell consisting of three transparent structural shells, the outermost of which consists of the cornea and the middle of which consists of the ciliary body and the iris, within which are the aqueous humor, the vitreous body and the flexible lens, the aqueous-like fluid being a clear liquid contained in two regions: the exposed region of the lens, in the anterior chamber of the eye intermediate the cornea and iris, is suspended by the zonules of the ciliary body (zonules of the ciliary body) consisting of clear fibrils. The vitreous, posterior chamber of the eye is a clear jelly larger than the anterior chamber of the eye, located behind and in the rest of the lens, wrapped around the zonules and lens.

The cornea and sclera are connected by a ring called the limbus. The iris is located in the center of the eye, which is generally black, and the center of the iris is the pupil, which, because the cornea is transparent, becomes the visible part instead of the cornea, and the fundus (the area relative to the pupil or iris) appears as an optical mirror.

Based on this, the embodiment of the invention provides a human-computer interaction method, a human-computer interaction device, an electronic device and a storage medium.

The basic idea of the embodiment of the invention is that the reflection characteristic of the iris is utilized, the current image state information of human eyes including the iris is captured through the front-facing camera, and the current image state information is processed according to the eye parameters and the light and shadow obtained through identification, so that the fixation position of the eyes is identified, and the interaction mode based on the human eyes is realized.

The terminal in the embodiment of the invention can be mobile terminal equipment and can also be non-mobile terminal equipment. The mobile terminal equipment can be a mobile phone, a tablet computer, a notebook computer, a palm computer, vehicle-mounted terminal equipment, wearable equipment, a super mobile personal computer, a netbook, a personal digital assistant and the like; the non-mobile terminal equipment can be a personal computer, a television, a teller machine or a self-service machine and the like; the embodiments of the present invention are not particularly limited.

The terminal may include a processor, an external memory interface, an internal memory, a Universal Serial Bus (USB) interface, a charging management module, a power management module, a battery, a mobile communication module, a wireless communication module, an audio module, a speaker, a receiver, a microphone, an earphone interface, a sensor module, a button, a motor, an indicator, a front camera, a rear camera, a display screen, and a Subscriber Identity Module (SIM) card interface, etc. The terminal can realize shooting functions through a front camera, a rear camera, a video coder-decoder, a GPU, a display screen, an application processor and the like.

The front camera or the rear camera is used for capturing still images or videos. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to an ISP (image signal processor) to be converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the terminal may include 1 or N front-facing cameras, where N is a positive integer greater than 1.

The terminal realizes the display function through the GPU, the display screen, the application processor and the like. The GPU is a microprocessor for image processing and is connected with a display screen and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen is used to display images, videos, and the like. The display screen includes a display panel. The display panel may be a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), or the like.

In an embodiment, the display screen in the embodiment of the present invention is a 3D display screen, which may be referred to as a display screen for short, and may be a naked-eye 3D display screen, the multimedia data can be processed through the naked eye 3D display screen and split into a left part and a right part, for example, the 2D video is cut into two parts, and the light-emitting refraction directions of the two parts are changed, so that the user can see a 3D picture after watching, the formed 3D picture has a negative parallax, can be displayed between a user and a display screen to realize naked eye 3D viewing effect, or the display screen is a 2D display screen, a 3D grating pad pasting can be externally arranged at the terminal, emergent light of the 2D display screen is refracted, so that a user can watch a 3D display effect after watching the display screen through the 3D grating film.

For example, as shown in fig. 2, in the embodiment of the present invention, a terminal is taken as an example of a mobile phone, for example, a front panel 10 on the mobile phone is provided with a front camera 11 to acquire image information, and the front panel 10 is further provided with a display screen 12 to perform screen display, and it is understood that when the terminal is a 3D vision training terminal or a mobile phone capable of displaying a 3D screen, a 3D screen may be displayed through the display screen 12.

It should be noted that, in the embodiment of the present invention, the front camera 11 may be disposed on the same plane as the disposed display screen 12, and the position of the front camera 11 is fixed, in an embodiment, the front camera 11 may be perpendicular to the display screen 12 or not perpendicular to the display screen 12, and the front camera 11 may be located within the display screen 12 or located at the periphery of the display screen 12, as shown in fig. 2, in addition, the front camera 11 may also have a perpendicular distance from the display screen 12, so that the front camera 11 and the display screen 12 are not located on the same plane, and the terminal may perform parameter verification according to the front cameras 11 disposed differently, so as to implement the human-computer interaction method and the control method in the embodiment of the present invention.

In the following, a man-machine interaction method, an apparatus, an electronic device, and a storage medium according to embodiments of the present invention are described, and first, a man-machine interaction method according to an embodiment of the present invention is described.

Referring to fig. 3, an embodiment of the present invention provides a human-computer interaction method, which is applied in a terminal, and the human-computer interaction method may include, but is not limited to, step S101 to step S104.

And step S101, controlling a display screen to display.

Step S102, a first eye parameter and a first light shadow of the eyes of the user are obtained, wherein the first eye parameter is used for representing the position of the iris of the eyes of the user, and the first light shadow is the light shadow of the display screen on the eyes of the user.

Step S103, acquiring a first viewing distance from the eyes of the user to the display screen.

And step S104, obtaining a gazing coordinate of the eyes of the user gazing on the display screen according to the first watching distance, the first eye parameter and the first shadow.

And step S105, obtaining the input information of the user on the display screen according to the gazing coordinates.

It should be noted that, in the human-computer interaction method in the embodiment of the present invention, a display screen is controlled to display, the display screen is lighted and then displays by emitting light, and then a first eye parameter and a first light shadow of an eye of a user are obtained, where the first eye parameter is parameter information used for representing a position of an iris of the eye of the user, the position of the iris of the user in the eye can be known by the first eye parameter, the first light shadow is a light shadow of the display screen on the eye of the user, after the display screen is lighted, the light shadow is formed in an optical mirror surface of the eye based on characteristics of the eye, then a first viewing distance from the eye of the user to the display screen is obtained, a gaze coordinate of the eye of the user on the display screen is obtained according to the first viewing distance, the first eye parameter and the first light shadow, and finally input information of the user on the display screen is obtained according to the gaze coordinate, according to the embodiment of the invention, interaction can be realized through eyes without touching the terminal, so that the interaction experience of a user is improved.

Specifically, because the human eye is a spherical structure, as shown in fig. 4a, based on the physical characteristics that the human eye has an optical mirror surface, after the human eye views the light-emitting display screen, there will be a light shadow of the display screen in the human eye, and when the human eye faces the display screen at different angles and positions, the positions of the formed light shadows in the human eye are also different, which is represented by that when the human eye views the display screen at different angles and positions, as shown in fig. 4b, the relative positions between the light shadows and the iris of the eye are also different, so that after the first light shadow in the human eye is obtained, the gaze coordinate can be calculated by combining the first viewing distance and the first eye parameter, and it can be understood that the light shadows in fig. 4a and 4b can be the first light shadow in the above embodiments.

It can be understood that, in the embodiment of the present invention, the gaze coordinate of the user's eye gazing on the display screen is obtained through the first viewing distance, the first eye parameter, and the first shadow, and the specific position in the display screen is obtained through the gaze coordinate, so that the corresponding target position on the display screen is selected, the position on the display screen is selected by identifying the state of the eye, instead of the mode of selecting by directly touching the display screen with a finger, and according to the area selected by the eye, the terminal can perform further operation accordingly, thereby implementing human-computer interaction.

Referring to fig. 5, in an embodiment, the terminal is provided with a front camera, and the step S102 may further include, but is not limited to, step S201 to step S203.

In step S201, a face image of the user is acquired, and the face image is captured by the front camera.

Step S202, first eye socket position information, first iris position information and a first light shadow are obtained according to the face image, wherein the first eye socket position information is used for representing the position of the eye socket of the user, and the first iris position information is used for representing the position of the iris of the eye of the user.

Step S203, a first eye parameter is obtained according to the first eye socket position information and the first iris position information.

It should be noted that, in the embodiment of the present invention, an image is acquired by a front camera arranged on a terminal, and a first eye parameter required in the embodiment of the present invention is finally obtained, specifically, in the embodiment of the present invention, a face image of a user is obtained by shooting by the front camera, and the face image is analyzed, so as to obtain first orbital position information, first iris position information, and a first light shadow of the user, where the first orbital position information indicates a position of an orbit of the user, the orbit refers to a frame formed by edges of an eyelid on a face of the user, and is a bone cavity similar to a quadrilateral cone, which accommodates tissues such as an eyeball, and is symmetrical to each other, and the adult orbit is about 4 to 5cm deep, the first iris position information indicates a position of an iris in an eye of the user, and a current position of the iris of the user in the eye can be obtained by a position relationship between the first iris position information and the first orbital position information, through the first light shadow and the first eye socket position information, the position of the light shadow of the display screen in the eyes of the user can be obtained, so that the position relationship between the first light shadow and the first iris position information can be obtained, and the observation direction of the user can be identified.

It should be noted that, the human-computer interaction method in the embodiment of the present invention may acquire the face image of the user through the front camera provided on the terminal, so that no additional sensor or camera is required, the present application may acquire the image through the front camera provided in the terminal, and further perform calculation of the eye watching condition according to the recognized face image, so as to finally realize selection of the corresponding target position on the display screen.

It should be noted that, in the embodiment of the present invention, the face image may be obtained by directly recognizing the face of the user through the front-facing camera, and in an embodiment, the face image is obtained by performing a cropping process on the image obtained by the front-facing camera, for example, the terminal obtains the image through the front-facing camera, the image includes the face of the user and may also include some other impurities, which may interfere with iris recognition, so that the embodiment of the present invention cuts the image to crop out the face region of the user to obtain the face image, so as to improve the accuracy of recognition.

Referring to fig. 6, in an embodiment, the step S202 may further include, but is not limited to, the steps S301 to S302.

Step S301, detecting a face eye region of the face image based on a preset detector to obtain first eye socket position information.

Step S302, converting the face image into a gray image, performing binarization processing on the gray image to obtain a first preprocessed image, and obtaining a first light shadow according to a rectangular or circular noise image in the first preprocessed image.

Step S303, carrying out corrosion and expansion processing on the first preprocessed image, eliminating noise in the image to obtain a second preprocessed image, and extracting the position of a circular area representing the iris of the user 'S eye in the second preprocessed image by using a circular structural element to obtain first iris position information of the user' S eye.

It should be noted that, in the embodiment of the present invention, the face image is recognized through image processing, so as to obtain the required first orbit position information, first iris position information, and first light shadow, specifically, in the embodiment of the present invention, a preset detector is used to perform face-eye region detection on the face image, so as to obtain the first orbit position information, convert the face image into a gray-scale image, perform binarization processing on the gray-scale image, so as to obtain a first preprocessed image, and obtain the first light shadow according to a rectangular or circular noise image in the first preprocessed image, where the rectangular or circular noise image is the light shadow of a display screen in human eyes after image processing, further, the first preprocessed image is subjected to erosion and expansion processing, and noise in the image is removed, so as to obtain a second preprocessed image, because the iris of human eyes is circular, therefore, the position of the circular area which represents the iris of the user eye in the second preprocessed image is extracted by using the circular structural elements, and the first iris position information of the user eye is obtained.

In an embodiment, the embodiment of the present invention implements processing on a face image based on opencv, and performs face recognition using a cascade classifier as a detector, where the cascade classifier is implemented based on Local Binary Pattern (LBP) features and Haar (Haar-like features), and based on classifier data obtained by training LBP and Haar features for a specific target, object recognition can be stored, loaded and effectively performed, an LBP operator can obtain an LBP code for each pixel point in an image, and after extracting an original LBP operator from the image, the obtained original LBP feature is still an image, so that in the use of the embodiment of the present invention, the cascade analyzer data is detected by calling related faces, and after intercepting a face region, the top half of the face region is taken, and then the top half is uniformly divided into left and right parts, namely the left part and the right part of human eyes, the eye area is intercepted according to the proportion of the upper half part occupied by the eyes, the selective calibration of the eye area is completed, moreover, the eye detection is realized through an eye cascade detector in opencv, the detected sub-image of the eye object is cached as a template, so that when the detector cannot detect the eye area, the matching of the eye area is completed by using the prepared template image, when the iris is positioned, the calibrated eye area is subjected to binarization processing, the outline of the eye can be obtained, the first orbit position information is obtained, the position of the iris is required to be obtained, the image can be subjected to opening operation (firstly corroded and then expanded) through a circular structural element, at the moment, the circular area in the center has noise, and the noise needs to be removed firstly, this includes a rectangular or circular light shadow, so that a first light shadow can be obtained. The iris position is then extracted by the circular structural element to obtain first iris position information, for example, the center position of the circular structure may be used as the position of the iris, that is, the center position of the circular structure may be used as the first iris position information.

It is understood that, in the embodiment of the present invention, the first eye socket position information, the first iris position information, and the first light shadow obtained by processing the face image based on opencv are taken as an example, and are not represented as a limitation to the embodiment of the present invention.

Referring to fig. 7, in an embodiment, the step S104 may further include, but is not limited to, steps S401 to S403.

Step S401, matching a coordinate mapping relation table under the corresponding distance according to the first viewing distance.

In step S402, the light and shadow coordinates of the first light and shadow are calculated.

And S403, looking up a table in a coordinate mapping relation table according to the first eye parameter and the light and shadow coordinate to obtain a watching coordinate of the eyes of the user watching on the display screen.

It should be noted that, in the embodiment of the present invention, different coordinate mapping relationship tables are established according to different viewing distances, where the coordinate mapping relationship table is a mapping table for obtaining corresponding viewing coordinates according to a first eye parameter and a first shadow, in the embodiment of the present invention, a coordinate mapping relationship table under a corresponding distance is matched according to a value of the first viewing distance, and a shadow coordinate of the coordinate is calculated according to the first shadow, it is understood that the shadow coordinate may be a geometric center point or an average point of the first shadow, and may be set according to an actual situation, and the shape of the first shadow may be a rectangle or a circle, and represents a shape formed after image processing of the shadow of the display screen, in the embodiment of the present invention, according to the first eye parameter and the shadow coordinate, a viewing coordinate where the user's eye is gazed on the display screen is obtained by table lookup in the coordinate mapping relationship table, and it is understood that, the watching coordinate corresponds to the display screen and can be represented as a specific position on the display screen, so that the human-computer interaction method in the embodiment of the invention can obtain the watching coordinate watched by the eyes of the user on the display screen through the coordinate mapping relation table, is simple and efficient, does not need a terminal to perform complex calculation, does not need to consume a large amount of calculation resources, and improves the processing efficiency of human-computer interaction.

Referring to fig. 8, in an embodiment, the human-computer interaction method may further include, but is not limited to, steps S501 to S403.

Step S501, a sample cursor is displayed on the display screen.

Step S502, a second eye parameter and a second light shadow of the user when the user eyes watch the sample cursor are obtained, wherein the second eye parameter is used for representing the position of the iris of the user eyes, and the second light shadow is the light shadow of the display screen or the sample cursor on the user eyes.

In step S503, a second viewing distance from the user to the display screen when the user' S eye gazes at the sample cursor is obtained.

Step S504, at the second viewing distance, recording a corresponding relationship between the second eye parameter and the second shadow to establish a coordinate mapping relationship table of the sample cursor.

It should be noted that, in the embodiment of the present invention, a coordinate mapping relationship table may be established in advance according to sample data, so as to obtain a gaze coordinate directly according to the pre-established coordinate mapping relationship table in a process of human-computer interaction, in the process of establishing the coordinate mapping relationship table in advance, a sample cursor is displayed on a display screen first, where the sample cursor may be a bright point or a bright ring existing in the display screen, and the sample cursor may be displayed in a blinking manner, so as to enable a user to observe and pay attention, and then a second eye parameter and a second light shadow of the user when the user's eye gazes at the sample cursor are obtained, where the second eye parameter is used to represent a position of an iris of the user's eye, and the second light shadow is a light shadow of the display screen or the sample cursor on the user's eye, it can be understood that the second eye parameter is similar to the first eye parameter in the foregoing embodiment, and the second light shadow is similar to the first light in the foregoing embodiment, the method includes the steps of obtaining a first viewing distance from the user to the display screen when the user watches the sample cursor, and obtaining a distance between the user and the display screen when the user watches the sample cursor.

Referring to fig. 9, in an embodiment, the step S501 may further include, but is not limited to, step S601 to step S602.

Step S601, uniformly dividing the display screen into a plurality of display areas.

Step S602, a sample cursor is displayed on each display area.

It should be noted that, in the embodiment of the present invention, a coordinate mapping relation for a plurality of display areas on a display screen needs to be established through a coordinate mapping relation table, so that the display screen is uniformly divided into a plurality of display areas, and a sample cursor is displayed on each display area respectively, it can be understood that a terminal may sequentially display the sample cursor on each display area, or may randomly display the sample cursor on any one display area, in an embodiment, the terminal divides the display screen into 100 areas, changes a gaze point through eyeball rotation, collects human eye parameters through the gaze point concerned by eyes, divides the screen into a plurality of areas, for example, 100 areas, marks each area watched by eyes, for example, the marked area flickers when being marked, and marks eyeball position characteristics of left and right eyes when the area watched by eyes is marked, thereby acquiring a corresponding second ocular parameter and second light shadow.

Referring to fig. 10, in an embodiment, the step S504 may further include, but is not limited to, the steps S701 to S702.

In step S701, sample cursors are sequentially displayed on the respective display areas.

Step S702, when the viewing position of the user is unchanged, recording a corresponding relationship between a plurality of second eye parameters and a plurality of second shadows corresponding to the user viewing each sample cursor in sequence at a second viewing distance, respectively, so as to establish a coordinate mapping relationship table of the sample cursors.

It should be noted that, in the embodiment of the present invention, the corresponding relationships between the plurality of second eye parameters and the plurality of second shadows corresponding to each sample cursor are sequentially collected under the condition that the fixed distance is not changed according to the displayed sample cursors, so as to establish a coordinate mapping relationship table of the sample cursors, and the established coordinate mapping relationship table may further represent the second eye parameters and the second shadows that the user watches the display screen when a certain position is not changed.

Specifically, assuming that the display screen is directly in front of the user, as shown in fig. 11, the user watches a sample cursor at the center of the display screen with both eyes, and there is a light shadow of the sample cursor at the center of the eyeball, at this time, the sample cursor is on the right of the left eye, the left eye looks right, the sample cursor is projected on the left of the center point of the eyeball, the sample cursor is on the left of the right eye, the right eye looks left, the sample cursor is projected on the right of the center point of the eyeball, and then the position of the light shadow on the eyeball (the position of the sample cursor on the eyeball) is marked, so that a second light shadow (the shadow of the cursor on the eyeball) is marked when the user watches the center of the screen. When the user looks up, the light shadow is arranged below the eyeballs, when the user looks to the left upper side, the light shadow is arranged at the right lower side of the eyeballs (the light shadow of the right eye is more inclined to the right), the positions of the light shadows of the cursor on the eyeballs of the left eye and the right eye are marked, the other directions are not repeated, the eye conditions of the user watching one hundred sample cursors above, below, left and right of the display screen are marked, the corresponding relation between a plurality of second eye parameters and a plurality of second light shadows corresponding to the sample cursors watched by the user in sequence is recorded respectively, and the coordinate mapping relation table of the sample cursors is established.

In an embodiment, there are a plurality of second viewing distances, and the step S502 further includes: acquiring second eye parameters and second light shadows when the user eyes gaze the sample cursor under a plurality of different second viewing distances; the step 504 further includes: and recording the corresponding relation between the second eye parameters and the second light and shadow corresponding to a plurality of different second viewing distances so as to establish a coordinate mapping relation table of the sample cursor at different viewing distances.

It can be understood that, when the distance of the user is different, the images taken by the front camera on the terminal are different, and the rotation of the display screen viewed by the eyes of the user is different, for example, when the eyes of the user are closer to the display screen and the user views the uppermost position and the lowermost position of the display screen, the rotation of the eyes is larger, and when the eyes of the user are farther from the display screen and the user views the uppermost position and the lowermost position of the display screen, the rotation of the eyes is much smaller, the second viewing distance in the embodiment of the present invention is multiple, and the user views the sample cursor at different viewing distances, so that the corresponding relationship between the second eye parameter and the second light shadow is recorded at multiple different second viewing distances to establish the coordinate mapping relationship table of the sample cursor, so that the established coordinate mapping relationship table is based on the mapping tables at multiple different viewing distances, therefore, after the first viewing distance from the eyes of the user to the display screen is obtained, the coordinate mapping relation table under the corresponding second viewing distance can be obtained, and the gazing condition of the eyes of the user under the distance can be identified more accurately.

In one embodiment, the embodiment of the invention respectively marks according to 30cm, 40cm … … 90 cm and 100 cm of human eyes in front of a display screen, establishes the corresponding relation between the sample cursor of the point of regard and the eye parameters and light shadow in the human eyes at different distances, and calls a coordinate mapping relation table at the current distance according to the distance between the screen and the human eyes in the actual use process of a user, so that the gazing coordinate of the user can be judged, corresponding to corresponding software operation, and the interaction between the human eyes and equipment is realized.

In an embodiment, the step S502 further includes: and under a second viewing distance, acquiring a second eye parameter and a second light shadow of the user when the user eyes watch the sample cursor at different viewing angles, wherein the viewing angle is an angle formed by a plane where the user face is located and the display screen.

It should be noted that, when the angle of the display screen is rotated, for example, the screen is rotated to the right and back by 5 °, and then the shadow projected on the eyeball is larger on the left and smaller on the right, in the embodiment of the present invention, when the coordinate mapping relationship table is established, the measurement is performed according to the angle formed by the plane where the user's face is located and the display screen, for example, when the distance between the user and the display screen is kept unchanged at the second viewing distance, the table is established once when the user is facing the display screen, and the table is established at the angle of 5 °, 10 °, or 30 °, and the table can be set according to actual requirements, and the established coordinate mapping relationship table meets the requirement that when the user views the display screen at different angles, an accurate gaze coordinate can still be obtained.

It can be understood that as shown in fig. 12 and 13, when the user views the display screen at different angles, the front-facing camera can obtain images of the display screen viewed by the user at different angles, so that the finally established coordinate mapping relation table can meet the requirement that an accurate gazing coordinate can still be obtained when the user views at different angles.

Illustratively, by combining the above manners, by measuring parameters of the user's eyes watching the sample cursor at different distances and at different angles, a coordinate mapping relation table based on the user's gaze at different angles and at different distances is established, so that the user can look up the table to obtain corresponding gaze coordinates at different angles and at different distances when watching the display screen in practical application. It can be understood that the more the parameters of the human eyes corresponding to the sample cursor are measured, the higher the accuracy of the established table is, the higher the accuracy of the gazing coordinate is, and similarly, the more the parameters of the human eyes are measured at the distance and the angle, the higher the accuracy of the established table is, and the higher the accuracy of the gazing coordinate is.

Referring to fig. 14, in an embodiment, the step S104 may further include, but is not limited to, steps S801 to S802.

Step S801, a preset fitting coordinate model or a neural network model is obtained, and the fitting coordinate model and the neural network model are respectively obtained by establishing a third viewing distance, a third eye parameter and a third light shadow in a sample and corresponding sample coordinates.

Step S802, inputting the first viewing distance, the first eye parameter and the first shadow into a fitting coordinate model or a neural network model to obtain a gazing coordinate of the user' S eyes gazing on the display screen.

It should be noted that, in the embodiment of the present invention, in addition to obtaining the gazing coordinate by directly searching the coordinate mapping relationship table under the first viewing distance according to the first eye parameter and the first light shadow, a preset fitting coordinate model or a neural network model may be obtained, and the first viewing distance, the first eye parameter and the first light shadow are input into the fitting coordinate model or the neural network model, so as to obtain the gazing coordinate of the user's eye gazing on the display screen, in the embodiment of the present invention, the fitting coordinate model and the neural network model are all established by the third viewing distance, the third eye parameter and the third light shadow in the sample and the corresponding sample coordinate, wherein the fitting is to connect a series of points on the plane by a smooth curve, because there are numerous possibilities of the curve, there are various fitting methods, and the fitted curve may be generally expressed by a function, according to different fitting names of the functions, the fitting coordinate function of the embodiment of the invention can obtain corresponding watching coordinates according to the mathematical relationship among the first watching distance, the first eye parameter and the first lighting shadow, the neural network mode is obtained after continuous training according to the third watching distance, the third eye parameter and the third lighting shadow in the sample and the corresponding sample coordinates, and after the terminal obtains the first eye parameter, the first watching distance and the first lighting shadow of the user, the first eye parameter, the first watching distance and the first lighting shadow can be input into a preset neural network model to output an accurate watching coordinate.

Referring to fig. 15, in an embodiment, the human-computer interaction method may further include, but is not limited to, steps S901 to S902.

Step S901, obtaining a coordinate mapping relation table of a third viewing distance, a third eye parameter, and a third light shadow with corresponding sample coordinates, and fitting based on the coordinate mapping table to obtain a fitting coordinate model from the third viewing distance, the third eye parameter, and the third light shadow to the sample cursor.

Step S902, or acquiring a coordinate mapping relationship table of the third viewing distance, the third eye parameter, and the third light shadow with the corresponding sample coordinate, establishing a neural network model for obtaining the sample coordinate based on the third viewing distance, the third eye parameter, and the third light shadow, and optimizing the neural network model through the coordinate mapping relationship table.

It should be noted that, the fitting coordinate function in the embodiment of the present invention is established according to a coordinate mapping relationship table that obtains the third viewing distance, the third eye parameter, the third shadow and the corresponding sample coordinate, and the fitting coordinate model from the third viewing distance, the third eye parameter, and the third shadow to the sample cursor is obtained based on the coordinate mapping table, the coordinate mapping relationship table may be as described in the above embodiments, and no further description is given here, and the fitting coordinate model can be established more accurately through the coordinate mapping relationship table, in an embodiment, the embodiment of the present invention establishes the fitting coordinate model from the third viewing distance, the third eye parameter, and the third shadow to the sample cursor through a least square method based on the coordinate mapping table, the least square method is a mathematical optimization technique, and it finds the best function matching of data through a sum of squares minimizing errors, unknown data can be easily obtained by the least square method, and the sum of squares of errors between these obtained data and actual data is minimized.

It should be noted that the neural network model in the embodiment of the present invention may also be established according to a coordinate mapping relationship table for obtaining the third viewing distance, the third eye parameter, and the third light shadow and the corresponding sample coordinate, the neural network model for obtaining the sample coordinate may be established based on the third viewing distance, the third eye parameter, and the third light shadow, and the neural network model may be optimized through the coordinate mapping relationship table, and the neural network model may be corrected in an auxiliary manner through the coordinate mapping relationship table, so that a more accurate loss value may be obtained when the neural network model is trained, and the optimized neural network model may obtain an accurate gazing coordinate according to the mapping relationship.

Referring to fig. 16, in an embodiment, the step S105 may further include, but is not limited to, step S1001 to step S1003.

Step S1001 displays a virtual keyboard through a display screen.

Step S1002, a key value mapping relationship table of each key value of the virtual keyboard and the corresponding input position is obtained.

And step S1003, triggering a corresponding target key value from the virtual keyboard according to the gaze coordinate and key value mapping relation table.

It should be noted that, as shown in fig. 17, when the virtual keyboard is displayed on the display screen in the embodiment of the present invention, an input on the virtual keyboard may be implemented according to a human-computer interaction method, and in a process of obtaining input information according to a gaze coordinate, a key value mapping relationship table of key values of each key on the virtual keyboard and a corresponding input location is first obtained, it can be understood that, in the embodiment of the present invention, a key value mapping relationship table is established in advance according to a presented position of the virtual keyboard, so that a certain coordinate above may correspond to a corresponding input key, and then a corresponding target key value is triggered from the virtual keyboard according to the gaze coordinate and the key value mapping relationship table, for example, in the key value mapping relationship table, a position corresponding to a coordinate (x, y) is a "5" key on the virtual keyboard, when the gaze coordinate obtained in the above embodiment is (x, y), the target key can be matched to be a '5' key, so that the input information is the input key value of the '5' key, and finally, the user can finish the input of the corresponding target key value by watching the '5' key of the virtual keyboard on the display screen through eyes, so that the man-machine interaction operation is realized. According to the embodiment of the invention, the acquisition of the image can be realized through the front camera of the terminal without additionally arranging a sensor or a camera, so that the man-machine interaction is finally realized, the cost is low, and the interaction experience of a user is improved.

Referring to fig. 18, in an embodiment, the step S105 may further include, but is not limited to, steps S1101 to S1102.

Step S1101, recording the fixation time of the user on the fixation coordinate, and obtaining the input information of the user on the display screen according to the fixation coordinate when the fixation time is greater than a first time threshold.

Step S1102, or acquiring a limb action of the user after watching the gazing coordinate, and when the limb action is matched with the target limb action, obtaining input information of the user on the display screen according to the gazing coordinate.

It should be noted that, in the embodiments of the present invention, whether a user performs an information input operation is determined according to the duration of time that a user focuses on a gaze coordinate, and after a terminal selects a gaze coordinate in a display screen according to the gaze situation of the user's eyes, whether the user selects the gaze coordinate is determined according to the duration of time that the user's eyes stay at the gaze coordinate, it can be understood that, when the user watches the display screen of the terminal, the eyes generally rotate along with the change of the content, for example, when the user reads an article and a chat record, the eyes may change along with the position of the content, at this time, the terminal recognizes that the position on the display screen where the user's eyes are focused constantly changes, and determines that the user does not need to interact at this time, that the user does not need to select the gaze coordinate on the screen at this time, and when the terminal detects that the user's eyes stay at a certain position and gaze for a certain time, and judging that the user performs the selection operation, wherein the selection operation is the selection of the target position after the time that the user stays at the position exceeds a first time threshold.

It can be understood that, after the user stays at the gazing coordinate for a certain time, the terminal determines that the user selects the position, and sends a selection prompt to the user, for example, after the user gazes to stay at the certain position for 3 seconds, the terminal determines that the user needs to select the position, and to further improve the accuracy of interaction, the terminal sends the selection prompt on a display screen or through a loudspeaker, or sends a prompt box whether to select through the display screen, or sends a selected prompt sound through the loudspeaker, and the user can confirm through the prompts, thereby avoiding the situation of selection errors.

In addition, the embodiment of the invention can obtain the body action made by the user after obtaining the gazing coordinate watched by the user, match the obtained body action with the target body action, and obtain the input information of the user on the display screen according to the gazing coordinate, for example, when the body action of the user is recognized as an OK gesture, the target body action is also an OK gesture, and then the selection of the gazing coordinate by the user is determined and the input information is obtained, and if the body action of the user is waving at the moment, the body action is different from the target body action, which indicates that the user does not need to input information currently.

It is understood that the limb movement may be obtained by analyzing an image captured by a front camera disposed on the terminal, and is not limited in particular.

Referring to fig. 19, in an embodiment, the first iris position information includes left-eye iris position information and right-eye iris position information, and the step S103 may further include, but is not limited to, steps S1201 to S1202.

And step S1201, respectively obtaining left eye pupil position information and right eye pupil position information of the eyes of the user according to the left eye iris position information and the right eye iris position information.

Step S1202, calculating according to the left eye pupil position information and the right eye pupil position information to obtain the picture pupil distance of the user in the face image.

Step S1203, calculating a first viewing distance from the user' S eyes to the display screen according to the inter-pupillary distance of the image.

It should be noted that, the human-computer interaction method in the embodiment of the present invention may perform ranging based on a camera, obtain a face image of a user by acquiring the face image, where the face image is obtained by a front camera, perform pupil recognition on the face image, obtain left-eye pupil position information and right-eye pupil position information of eyes of the user according to center points of the left-eye iris position information and the right-eye iris position information in the above embodiment, calculate a picture inter-pupil distance of the user in the face image according to the left-eye pupil position information and the right-eye pupil position information of eyes of the user, determine the picture inter-pupil distance according to the number of pixels of a display screen, and finally calculate a viewing distance from the eyes of the user to the display screen according to the picture inter-pupil distance, which is a first viewing distance, and the embodiment of the present invention may implement ranging by the front camera arranged on a terminal, the pupil position of the user is obtained through the image analysis acquired by the front camera, the required picture interpupillary distance is calculated, the picture interpupillary distance represents the interpupillary distance of the user in the acquired image, the watching distance from the eyes of the user to the display screen can be calculated according to the picture interpupillary distance, the distance measurement cost is low, and the distance measurement can be realized without additionally arranging other sensors.

It can be understood that the terminal obtains the face image through the front camera, the interpupillary distance of the user on the face image is the interpupillary distance of the picture, when the front camera is used as a reference object in the shooting process, the position of the user for the camera can be changed at any time, through the camera imaging, the user can make different image sizes at different distances, as shown in figure 20a and figure 20b, in fig. 20a, the user is closer to the camera, and the size of the user's face is larger in the resulting image, and therefore the interpupillary distance is larger, in fig. 20b, the user is far from the camera, the size of the user's face is smaller in the image, therefore, the interpupillary distance is smaller, and the first watching distance from the eyes of the user to the display screen can be judged according to the interpupillary distance in the user image shot by the front camera.

For example, a reference object of known dimensions may be used to measure at a known distance and the viewing distance calculated according to a predetermined formula. For example, there is a 10cm reference object (such as a ruler) 50cm in front of the display screen, according to the parameter characteristics of the front camera, the shot picture is taken, the 10cm object becomes a certain size (can be determined according to the number of pixels) in the image obtained by the front camera, now it is known that the target object (i.e. two pupils) 6.3cm in the formed image, the size in the picture is determined, and therefore the distance from the target object to the display screen can be calculated.

It should be noted that the inter-pupillary distance is a distance between pupils of both eyes of a user, which may also be referred to as a pupillary distance for short, and refers to a length between centers of both pupils, a normal range of the inter-pupillary distance of an adult is between 58-64mm, and the inter-pupillary distance itself is determined by heredity and development of an individual, so that inter-pupillary distances of different ages are different, and for a certain user, the inter-pupillary distance is constant, so that the distance from the user to the terminal can be determined according to the size of the inter-pupillary distance of a picture in a face image, and a first viewing distance from the eyes of the user to the display screen can be calculated.

It should be noted that, in the embodiment of the present invention, the image of the user is recognized by the front-facing camera, and the distance measurement can be implemented without setting an additional sensor device, so that the design cost is low, and no additional hardware setting is required, and the method can be applied to a terminal having the front-facing camera.

In an embodiment, the front camera is a camera under the screen, and the display screen is an OLED screen, so that the front camera can be arranged below the display screen, specifically, the camera under the screen is arranged at the central position of the display screen, and the central position of the display screen is arranged, so that the facial image of the user is acquired, the interpupillary distance of the user can be measured more accurately, the distance measurement with higher precision is realized, and the watching condition of the eyes of the user can be judged more accurately.

Referring to fig. 21, in an embodiment, the step S1203 may further include, but is not limited to, step S1301 to step S1303.

Step S1301, acquiring a preset standard interpupillary distance; acquiring the focal length of a face image shot by a front camera, and obtaining the initial distance of the face image to an imaging point according to the focal length; and obtaining a first proportion according to the picture interpupillary distance and the standard interpupillary distance, and obtaining a first viewing distance from the eyes of the user to the display screen according to the first proportion and the initial distance.

Step S1301, or acquiring a preset distance lookup table; and looking up a table from the distance lookup table according to the interpupillary distance of the picture to obtain a first viewing distance from the eyes of the user to the display screen.

Step S1301, or acquiring a reference distance, a reference object size, and a picture size corresponding to the reference object captured by the front-facing camera; acquiring a preset standard interpupillary distance; and obtaining a first viewing distance from the eyes of the user to the display screen according to the reference distance, the reference object size, the picture interpupillary distance and the standard interpupillary distance.

It should be noted that, in obtaining the first viewing distance from the user's eyes to the display screen by calculating according to the picture interpupillary distance, specifically, in the embodiment of the present invention, a preset standard interpupillary distance is obtained first, where the standard interpupillary distance is an actual interpupillary distance of the user, and the standard interpupillary distance may be set by default, for example, to 63mm, or the standard interpupillary distance may be input by the user, so that the user may accurately input the interpupillary distance, or through big data and artificial intelligence analysis, the interpupillary distances of people of different ages and different genders are different, and the data analysis conclusion is substituted for the interpupillary distance of an adult of 63mm, so that a more accurate interpupillary distance may be obtained, and a more accurate first viewing distance may be obtained. And then, acquiring the focal length of the face image shot by the front camera, obtaining the initial distance of the face image corresponding to the imaging point according to the focal length, finally obtaining a first proportion according to the picture interpupillary distance and the standard interpupillary distance, and obtaining a first viewing distance from the eyes of the user to the display screen according to the first proportion and the initial distance.

It can be understood that each camera has a certain field of view (FOV) and focal length when shooting, the focal length and field of view of each camera are one-to-one corresponding to each other and can be obtained through a public manner or obtained through measurement, the field of view is an included angle between two ends of a camera cone, and the focal length is a distance from a lens of the camera to an internal "sensor", however, in practice, the sensor is behind the lens, and for simplification, the lens is assumed to be in front of the sensor and is mirrored relative to the lens, so that a picture shown in fig. 22 can be obtained, for example, a plane where the sensor is located is a plane where a face image is located, the formed face image is equivalent to being above the plane where the lens is located, the position where the lens is located can be described as an imaging point in the embodiment of the present invention, the plane where the face imaging point is located is below the plane where the face image is located, and the plane of the face image is parallel to the plane of the imaging point, so that the position of the plane of the face image relative to the plane of the imaging point can be obtained according to the focal distance.

It can be understood that the plane where the face image is located corresponds to the plane where the display screen is located, and is determined according to the wide angle and the focal length of the front camera, in an embodiment, the plane where the face image is located is the plane where the display screen is located, or the plane where the face image is located can be obtained by subtracting a small distance from the plane where the face image is located, which can be obtained by calculating in advance according to the physical parameters of the front camera and applied to subsequent processing.

It should be added that, in the embodiment of the present invention, the initial distance is obtained according to the focal length, and the initial distance may also be obtained by obtaining the angle of view of shooting, but since the angle of view and the focal length are in a one-to-one correspondence relationship, taking the example of obtaining the focal length to perform processing, it should be noted that the initial distance may be obtained by calculation according to the characteristics of the camera imaging, or may be obtained by measurement in advance, but it is understood that each different focal length corresponds to one initial distance, and no specific limitation is made herein.

It can be understood that, as shown in fig. 23, according to the imaging characteristics of the camera, a triangle is formed between the line segment of the actual inter-pupillary distance of the user and the imaging point, and the line segment of the picture inter-pupillary distance is located in the triangle and is parallel to the line segment of the actual inter-pupillary distance.

It should be noted that, in the process of obtaining the first viewing distance from the eyes of the user to the display screen, calculation is performed according to the characteristics of the triangle, in an embodiment, a triangle between the line segment where the picture interpupillary distance is located and the imaging point in fig. 23 is defined as a first triangle, a triangle between the line segment where the actual interpupillary distance is located and the imaging point in fig. 23 is defined as a second triangle, and the first triangle and the second triangle are similar triangles.

As shown in fig. 23, the distance from the user's eye to the imaging point can be obtained according to a first ratio and an initial distance, the first ratio is the picture interpupillary distance Q divided by the actual interpupillary distance K, then the initial distance H0 is divided by the first ratio to obtain the distance H1, and finally the distance H1 is subtracted by the initial distance H0 to obtain a first viewing distance H from the user's eye to the display screen, and the calculation formula of H is as follows:

H＝H1-H0 (1)

H1＝H0/(Q/K) (2)

in an embodiment, the embodiment of the present invention may obtain a more accurate first viewing distance according to a rotation angle of a face of a user, and perform correction, specifically, when the user views a display screen, the user may view the display screen at a certain angle, for a front camera, an obtained image is a two-dimensional planar image, the rotation angle of the face of the user cannot be distinguished according to the image, and if the pupil distance is directly calculated at this time, an error may be caused, which may cause an inaccuracy in distance measurement, and thus, a geometric calculation may be performed according to a specific position of the front camera to correct a parameter. In addition, when the pupil is not directly in front of the front camera, the pupil is also corrected by the geometric principle, and when the display screen and the front camera are not on the same plane, the pupil can also be corrected according to the distance difference, which is not limited in the embodiment of the present invention.

In addition, it should be noted that the first viewing distance in the embodiment of the present invention may also be obtained by querying according to a preset distance lookup table, specifically, in the embodiment of the present invention, a mapping relation table from the picture pupillary distance to the first viewing distance may be pre-established, and in the process of obtaining the first viewing distance by calculation, the preset distance lookup table is first obtained, and the first viewing distance from the user's eyes to the display screen is obtained by looking up a table from the distance lookup table according to the measured picture pupillary distance.

It should be noted that the distance lookup table in the above embodiment may be calculated according to data in a sample, and it is understood that, when error reduction needs to be performed by measuring the face proportion of the user, the distance lookup table may also be established based on the rotation angle, and is not limited specifically herein.

In addition, it should be noted that, in the embodiment of the present invention, the first viewing distance may also be obtained according to the establishment of the reference frame, specifically, in the embodiment of the present invention, a reference object is placed in front of the terminal, the reference distance from the reference object to the display screen and the object size of the reference object are measured, the front camera is used to shoot an image of the reference object, the size of the reference object in the image is calculated in the image to obtain a frame size, and then the reference frame may be established according to the frame size, so that the first viewing distance from the user's eyes to the display screen may be obtained according to the reference distance, the reference object size, the frame pupillary distance and the standard pupillary distance by obtaining a preset standard pupillary distance.

Specifically, the present invention first divides the standard inter-pupillary distance by the inter-pupillary distance of the frame to obtain a first coefficient, then divides the reference distance by the second coefficient to obtain a third coefficient according to the size of the reference object in the frame, and finally obtains the first viewing distance from the eyes of the user to the display screen according to the product of the third coefficient and the first coefficient, and besides, both the frame size and the inter-pupillary distance of the frame can be calculated according to the pixels of the display screen, for example, as shown in fig. 24 and fig. 25, when the object size of the standard reference object is 10cm, the reference distance is 50cm, the frame size is AB, and the standard inter-pupillary distance is 6.3cm, the inter-pupillary distance of the frame is AB, the first coefficient is 6.3 ÷ AB, the second coefficient is 10 ÷ AB, and finally the formula of the first viewing distance h can be established as follows:

50÷(10÷AB)＝h÷(6.3÷ab) (3)

since the picture size AB and the picture interpupillary distance AB are known, the first viewing distance h can be obtained according to the formula (3).

Referring to fig. 26, an embodiment of the present invention further provides a human-computer interaction device, where the device includes:

a first module 2601 is configured to control a display screen to display.

A second module 2602, configured to obtain a first eye parameter and a first light shadow of the user's eye, where the first eye parameter is used to represent a position of an iris of the user's eye, and the first light shadow is a light shadow of the display screen on the user's eye.

A third module 2603 is configured to obtain a first viewing distance from the user's eyes to the display screen.

A fourth module 2604, configured to obtain a gazing coordinate of the user's eye gazing on the display screen according to the first viewing distance, the first eye parameter, and the first shadow.

A fifth module 2605 is configured to obtain the input information of the user on the display screen according to the gazing coordinates.

It should be noted that, the human-computer interaction device in the embodiment of the present invention may implement the human-computer interaction method in any one of the above embodiments, where the human-computer interaction device may be a terminal device such as a mobile phone, a tablet computer, a 3D vision training terminal, and the like, where the human-computer interaction device, by executing the human-computer interaction method, controls the display screen to display, and then obtains a first eye parameter and a first light shadow of the eyes of the user, where the first eye parameter is used to represent a position of an iris of the eyes of the user, the first light shadow is a light shadow of the display screen on the eyes of the user, after the display screen emits light, the light shadow is formed in the eyes of the user based on characteristics of the eyes of the user, then obtains a first viewing distance from the eyes of the user to the display screen, and obtains a viewing coordinate of the eyes of the user on the display screen according to the first viewing distance, the first eye parameter and the first light shadow, because the human eyes are of a spherical structure, when the human eyes face the display screen at different angles and positions, the positions of the formed first light shadow in the human eyes are different, the watching coordinate can be calculated by combining the first watching distance and the first eye parameter, and finally the input information of the user on the display screen is obtained according to the watching coordinate, so that the embodiment of the invention can realize interaction through the eyes without touching the terminal, and the interaction experience of the user is improved.

It should be noted that the first module 2601, the second module 2602, the third module 2603, the fourth module 2604 and the fifth module 2605 may be each functional module on the terminal, and in an embodiment, each of the modules may be each functional module in a processor, and may be executed by a processor arranged on the terminal.

Fig. 27 illustrates an electronic device 2700 provided by an embodiment of the present invention. The electronic apparatus 2700 includes: a processor 2701, a memory 2702 and a computer program stored on the memory 2702 and being executable on the processor 2701, the computer program being operable to perform the above-mentioned interaction method when executed.

The processor 2701 and the memory 2702 may be connected by a bus or other means.

The memory 2702 is used as a non-transitory computer readable storage medium for storing non-transitory software programs and non-transitory computer executable programs, such as the interaction method described in the embodiments of the present invention. The processor 2701 implements the human-computer interaction method described above by running non-transitory software programs and instructions stored in the memory 2702.

The memory 2702 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area can store and execute the human-computer interaction method. Further, the memory 2702 may include high speed random access memory 2702 and may also include non-transitory memory 2702, such as at least one storage device memory device, flash memory device, or other non-transitory solid state memory device. In some embodiments, the memory 2702 may optionally include memory 2702 located remotely from the processor 2701, and such remote memory 2702 may be connected to the electronic device 2700 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Non-transitory software programs and instructions required to implement the above-described interaction method are stored in the memory 2702, and when executed by the one or more processors 2701, perform the above-described human-computer interaction method, for example, perform method steps S101 to S105 in fig. 3, method steps S201 to S203 in fig. 5, method steps S301 to S303 in fig. 6, method steps S401 to S403 in fig. 7, method steps S501 to S504 in fig. 8, method steps S601 to S602 in fig. 9, method steps S701 to S702 in fig. 10, method steps S801 to S802 in fig. 14, method steps S901 to S902 in fig. 15, method steps S1001 to S1003 in fig. 16, method steps S1101 to S1102 in fig. 18, method steps S1201 to S1201 in fig. 19, and method steps S1301 to S1303 in fig. 21.

The above-described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, storage device storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

It should also be appreciated that the various implementations provided by the embodiments of the present invention can be combined arbitrarily to achieve different technical effects. While the preferred embodiments of the present invention have been described in detail, it will be understood, however, that the invention is not limited thereto, and that various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present invention.

Claims

1. A human-computer interaction method is applied to a terminal, the terminal is provided with a display screen, and the method is characterized by comprising the following steps:

controlling the display screen to display;

acquiring a first eye parameter and a first light shadow of the eyes of the user, wherein the first eye parameter is used for representing the position of the iris of the eyes of the user, and the first light shadow is the light shadow of the display screen on the eyes of the user;

acquiring a first viewing distance from eyes of a user to the display screen;

obtaining a watching coordinate of the eyes of the user on the display screen according to the first watching distance, the first eye parameter and the first shadow;

and obtaining the input information of the user on the display screen according to the gazing coordinate.

2. The human-computer interaction method according to claim 1, wherein the terminal is provided with a front camera; the acquiring a first eye parameter and a first light shadow of an eye of a user comprises:

acquiring a face image of a user, wherein the face image is obtained by shooting through the front-facing camera;

analyzing the face image to obtain first eye socket position information, first iris position information and a first light shadow, wherein the first eye socket position information is used for representing the position of the eye socket of the user, and the first iris position information is used for representing the position of the iris of the eye of the user;

and obtaining a first eye parameter according to the first eye socket position information and the first iris position information.

3. The human-computer interaction method of claim 2, wherein the obtaining the first orbital position information, the first iris position information and the first light shadow from the facial image by analysis comprises:

detecting a face eye region of the face image based on a preset detector to obtain first eye socket position information;

converting the face image into a gray level image, carrying out binarization processing on the gray level image to obtain a first preprocessed image, and obtaining a first light shadow according to a rectangular or circular noise image in the first preprocessed image;

and corroding and expanding the first preprocessed image, eliminating noise in the image to obtain a second preprocessed image, and extracting the position of a circular area which represents the iris of the user eye in the second preprocessed image by using a circular structural element to obtain first iris position information of the user eye.

4. The human-computer interaction method according to claim 1, wherein the deriving a gaze coordinate of a user's eye gaze on the display screen from the first viewing distance, the first eye parameter, and the first shadow comprises:

matching a coordinate mapping relation table under a corresponding distance according to the first viewing distance;

calculating the shadow coordinates of the first shadow;

and looking up a table in the coordinate mapping relation table according to the first eye parameter and the light and shadow coordinate to obtain a gazing coordinate of the eyes of the user gazing on the display screen.

5. The human-computer interaction method according to claim 4, wherein the shape of the first light shadow is a rectangle or a circle, and the light shadow coordinates are a geometric center point or an average point of the first light shadow.

6. A human-computer interaction method according to claim 4, characterized in that the method further comprises:

displaying a sample cursor on the display screen;

acquiring a second eye parameter and a second light shadow when the user eyes watch the sample cursor, wherein the second eye parameter is used for representing the position of the iris of the user eyes, and the second light shadow is the light shadow of the display screen or the sample cursor on the user eyes;

acquiring a second viewing distance from the user's eye to the display screen when the user's eye gazes at the sample cursor;

and recording the corresponding relation between the second eye parameter and the second light shadow at the second viewing distance so as to establish a coordinate mapping relation table of the sample cursor.

7. The human-computer interaction method according to claim 6, wherein the displaying a sample cursor on the display screen comprises:

uniformly dividing the display screen into a plurality of display areas;

and respectively displaying a sample cursor on each display area.

8. The human-computer interaction method according to claim 7, wherein the recording the correspondence between the second eye parameter and the second shadow at the second viewing distance to establish a coordinate mapping relationship table of the sample cursor comprises:

sequentially displaying the sample cursor on each display area;

when the watching position of the user is unchanged, respectively recording the corresponding relation between the plurality of second eye parameters and the plurality of second light shadows corresponding to the sample cursors watched by the user in sequence under the second watching distance so as to establish a coordinate mapping relation table of the sample cursors.

9. The human-computer interaction method according to claim 6, wherein the second viewing distance is plural; the obtaining a second eye parameter and a second shadow when the user's eye gazes at the sample cursor includes:

acquiring a second eye parameter and a second light shadow when the user eyes at a plurality of different second viewing distances are gazing at the sample cursor;

recording the corresponding relation between the second eye parameter and the second shadow at the second viewing distance to establish a coordinate mapping relation table of the sample cursor, including:

recording the corresponding relation between the second eye parameters and the second light and shadow corresponding to a plurality of different second viewing distances so as to establish a coordinate mapping relation table of the sample cursor at different viewing distances.

10. The human-computer interaction method of claim 6, wherein the obtaining of the second eye parameter and the second shadow when the user's eye gazes at the sample cursor comprises:

acquiring a second eye parameter and a second light shadow of the user eyes when the user eyes watch the sample cursor at different watching angles at the second watching distance;

the viewing angle is an angle formed by a plane where the face of the user is located and the display screen.

11. The human-computer interaction method according to claim 1, wherein the deriving a gaze coordinate of a user's eye gaze on the display screen from the first viewing distance, the first eye parameter, and the first shadow comprises:

acquiring a preset fitting coordinate model or a neural network model, wherein the fitting coordinate model and the neural network model are respectively obtained by establishing a third viewing distance, a third eye parameter and a third light shadow in a sample and corresponding sample coordinates;

and inputting the first viewing distance, the first eye parameter and the first shadow into the fitting coordinate model or the neural network model to obtain a watching coordinate of the eyes of the user on the display screen.

12. A human-computer interaction method according to claim 11, characterized in that the method further comprises:

obtaining a coordinate mapping relation table of the third viewing distance, the third eye parameter and the third light shadow with the corresponding sample coordinate, and fitting based on the coordinate mapping table to obtain the fitting coordinate model from the third viewing distance, the third eye parameter and the third light shadow to the sample cursor;

or, obtaining a coordinate mapping relation table of the third viewing distance, the third eye parameter, and the third light shadow with the corresponding sample coordinate, establishing the neural network model for obtaining the sample coordinate based on the third viewing distance, the third eye parameter, and the third light shadow, and optimizing the neural network model through the coordinate mapping relation table.

13. The human-computer interaction method according to claim 1, wherein the obtaining of the input information of the user on the display screen according to the gaze coordinate comprises:

displaying a virtual keyboard through the display screen;

acquiring a key value mapping relation table of each key value of the keys on the virtual keyboard and the corresponding input position;

and triggering a corresponding target key value from the virtual keyboard according to the gaze coordinate and the key value mapping relation table.

14. The human-computer interaction method according to claim 1, wherein the obtaining of the input information of the user on the display screen according to the gaze coordinate comprises:

recording the watching time of the user on the watching coordinate, and obtaining the input information of the user on the display screen according to the watching coordinate when the watching time is greater than a first time threshold;

or acquiring the limb action of the user after watching the gazing coordinate, and obtaining the input information of the user on the display screen according to the gazing coordinate when the limb action is matched with the target limb action.

15. The human-computer interaction method according to claim 2, wherein the first iris position information includes left-eye iris position information and right-eye iris position information; the acquiring a first viewing distance from the eyes of the user to the display screen comprises:

respectively obtaining left eye pupil position information and right eye pupil position information of the eyes of the user according to the left eye iris position information and the right eye iris position information;

calculating the picture pupil distance of the user in the face image according to the left eye pupil position information and the right eye pupil position information;

and calculating to obtain a first viewing distance from the eyes of the user to the display screen according to the picture interpupillary distance.

16. The human-computer interaction method according to claim 15, wherein the calculating a first viewing distance from the user's eyes to the display screen according to the picture interpupillary distance comprises:

acquiring a preset standard interpupillary distance; acquiring the focal length of the face image shot by the front camera, and obtaining the initial distance of the face image to an imaging point according to the focal length; obtaining a first proportion according to the picture interpupillary distance and the standard interpupillary distance, and obtaining a first viewing distance from the eyes of the user to the display screen according to the first proportion and the initial distance;

or acquiring a preset distance lookup table; looking up a table from the distance lookup table according to the picture interpupillary distance to obtain a first viewing distance from the eyes of the user to the display screen;

or, acquiring a reference distance, a reference object size and a picture size corresponding to the reference object shot by the front camera; acquiring a preset standard interpupillary distance; and obtaining a first viewing distance from the eyes of the user to the display screen according to the reference distance, the reference object size, the picture interpupillary distance and the standard interpupillary distance.

17. The human-computer interaction method according to claim 2, wherein the front-facing camera is an off-screen camera, and the off-screen camera is arranged at a central position of the display screen.

18. A human-computer interaction device, comprising:

the first module is used for controlling a display screen to display;

the second module is used for acquiring a first eye parameter and a first light shadow of the eyes of the user, wherein the first eye parameter is used for representing the position of the iris of the eyes of the user, and the first light shadow is the light shadow of the display screen on the eyes of the user;

the third module is used for acquiring a first viewing distance from the eyes of the user to the display screen;

a fourth module, configured to obtain a gazing coordinate where the user's eye gazes on the display screen according to the first viewing distance, the first eye parameter, and the first shadow;

and the fifth module is used for obtaining the input information of the user on the display screen according to the gazing coordinate.

19. An electronic device, comprising a memory storing a computer program, and a processor implementing the human-computer interaction method according to any one of claims 1 to 17 when the processor executes the computer program.

20. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the human-computer interaction method according to any one of claims 1 to 17.