WO2023040551A9 - 一种在显示屏上显示图像的方法、电子设备与装置 - Google Patents

一种在显示屏上显示图像的方法、电子设备与装置 Download PDF

Info

Publication number
WO2023040551A9
WO2023040551A9 PCT/CN2022/112819 CN2022112819W WO2023040551A9 WO 2023040551 A9 WO2023040551 A9 WO 2023040551A9 CN 2022112819 W CN2022112819 W CN 2022112819W WO 2023040551 A9 WO2023040551 A9 WO 2023040551A9
Authority
WO
WIPO (PCT)
Prior art keywords
display screen
target user
target
face
relative position
Prior art date
Application number
PCT/CN2022/112819
Other languages
English (en)
French (fr)
Other versions
WO2023040551A1 (zh
WO2023040551A8 (zh
Inventor
陈树德
巫军
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023040551A1 publication Critical patent/WO2023040551A1/zh
Publication of WO2023040551A9 publication Critical patent/WO2023040551A9/zh
Publication of WO2023040551A8 publication Critical patent/WO2023040551A8/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions

Definitions

  • the present application relates to the field of naked-eye three-dimensional display, and in particular to a method for displaying images on a display screen, electronic equipment and a device.
  • Naked-eye three-dimensional (3-dimension, 3D) refers to the technology that the user can watch the stereoscopic visual effect on the display screen of the electronic device without the help of external tools such as polarizing glasses. And the reason why the user can observe the stereoscopic effect like the real object on the two-dimensional display screen is because the images displayed on the display screen cause the human eyes to produce visual illusions due to the difference in color gray scale.
  • the naked-eye 3D technology generally uses parallax as the starting point. Assuming that the user's observation point is basically unchanged, the image is obtained by rendering the three-dimensional scene, and through various display methods, the user's eyes can observe different images when viewing the image. In order to achieve the optical illusion as if seeing a three-dimensional space.
  • the current naked-eye 3D technology is based on the assumption that the viewpoint will not change. That is to say, when rendering a 3D scene, the viewpoint position used to indicate the position of the virtual camera in the 3D scene is preset. Fixed value.
  • the display screen displays the rendered image, the user can only observe the 3D effect by viewing the rendered image at a fixed position. If the user moves the position, the 3D effect cannot be observed, resulting in a strange experience. Therefore, current image display methods are not flexible enough.
  • Embodiments of the present application provide a method for displaying an image on a display screen, an electronic device, and a device, and provide a method for displaying a three-dimensional image that adapts to a user's location, so as to improve user experience.
  • the embodiment of the present application provides a method for displaying an image on a display screen.
  • the method includes:
  • the method may be applied to an electronic device, and the electronic device may have a display screen, or the electronic device may output an image to a display screen bound to the electronic device, so that the display screen displays the image output by the electronic device.
  • the electronic device may have a camera or microphone array, or the electronic device may be bound to the camera or microphone array, and the positional relationship between the electronic device and the bound camera or microphone array can be obtained.
  • the electronic device can determine the first relative position between the object and the display screen. After the first relative position of the target user is determined, the first viewpoint position required for rendering the 3D scene can be determined according to the first relative position, so as to ensure that the determined viewpoint position matches the current target user's position.
  • the electronic device renders the three-dimensional scene according to the first viewpoint position to obtain the first target image, and displays the first target image on the display screen.
  • the first target image rendered according to the determined first viewpoint position is more suitable for the target user at the current location.
  • the three-dimensional effect can be observed at the position where the three-dimensional effect can be observed, and the user does not need to find a position where the three-dimensional effect can be observed, thereby improving user experience.
  • the position of the view window is acquired, and the position of the view window is used to indicate the position of the near clipping plane when rendering the 3D scene;
  • the determining the position of the first viewpoint according to the first relative position includes: determining the relative position between the first viewpoint and the window when rendering the 3D scene according to the first relative position; The relative position between the viewpoint and the viewport and the position of the viewport determine the position of the first viewpoint.
  • the electronic device can acquire the position of the window, where the position of the window can be a parameter preset according to the scene where the first target image is displayed.
  • the electronic device may determine the relative position between the first viewpoint and the window according to the first relative position, thereby associating the relative position between the viewpoint and the window with the relative position of the user's face and the display screen, so that according to the determined first viewpoint
  • the first target image obtained by position rendering is more suitable for the user to view the three-dimensional effect at the current position.
  • the method further includes: determining a second relative position between the target user and the display screen; determining a second relative position according to the second relative position Two viewpoint positions; the second relative position is different from the first relative position, and the second viewpoint position is different from the first viewpoint position; rendering the 3D scene according to the second viewpoint position to obtain a second target image, and display the second target image on the display screen.
  • the electronic device can update the viewpoint position in real time as the user moves according to the method for displaying images on the display screen provided by the embodiment of the present application, without requiring the user to watch the target image at a fixed viewpoint position, providing a A flexible way to display images.
  • the first relative position between the target user and the display screen includes: a target azimuth of the target user's face relative to the display screen.
  • the first relative position between the target user and the display screen may include the target azimuth angle of the target user's face relative to the display screen, so as to prepare to locate the position of the target user's face.
  • the determining the first relative position between the target user and the display screen includes: acquiring a scene image captured by a camera, the scene image including a scene within a preset range in front of the display screen; The first relative position is determined according to the position of the face of the target user in the scene image in the scene image.
  • the determining the first relative position according to the position of the face of the target user in the scene image in the scene image includes: according to the position of the target user in the scene image The position of the face of the target user in the scene image, and determine the horizontal azimuth and vertical azimuth between the target user's face and the display screen.
  • the position of the face of the target user in the scene image may be a reference point of the face, for example, the reference point of the face may be the coordinates of the central point of the face in the reference image.
  • the electronic device can collect the scene image captured by the camera including the face of the target user.
  • the electronic device may determine the first relative position according to the position of the target user's face in the scene image, and at this time, the determined target azimuth of the target user's face relative to the display screen may include a horizontal azimuth and a vertical azimuth, thereby Accurately locate the position of the target user's face relative to the display screen.
  • the method before determining the first relative position according to the position of the face of the target user in the scene image in the scene image, the method further includes: based on a face detection algorithm, Determine the face of the target user in the scene image.
  • the electronic device can recognize the face of the target user from the scene image based on the face detection algorithm, so as to locate the target user.
  • the determining the target user's face in the scene image based on the face detection algorithm includes: determining the target according to the stored historical position information of the target user's face The movement information of the user's face, the movement information is used to indicate the speed and acceleration when the target user's face moves; according to the latest historical position information of the target user's face and the movement information Predicting the predicted position of the face of the target user in the scene image based on the Kalman algorithm; performing face detection on the area corresponding to the predicted position in the scene image, and determining the target user in the scene image human face.
  • the electronic device when the electronic device performs face detection, it can first predict the position of the target user's face in the scene image, and then optimize the search space during face detection, and improve the efficiency of face detection.
  • the determining the first relative position includes: determining the position of the face of the target user in the scene image; according to the position of the face of the target user in the scene image position, the conversion relationship between the camera coordinate system and the world coordinate system, determine the position of the target line between the target user's face and the camera; determine the target orientation according to the position of the target line horn.
  • the electronic device determines the target azimuth angle of the target user's face relative to the display screen through the scene image captured by the camera, it can first determine the position of the target line between the target user's face and the camera, and then determine Target azimuth, to obtain accurate azimuth information of the face of the target user.
  • the determining the first relative position between the target user and the display screen includes: performing sound source localization on the target user based on the sound information of the target user collected by the microphone array, The first relative position is obtained.
  • perform sound source localization for the target user and determine the horizontal azimuth and vertical azimuth of the target user's face relative to the display screen.
  • the electronic device can acquire the sound information of the target user collected by the microphone array, and perform sound source localization on the target user to determine the target azimuth angle of the target user's face relative to the display screen.
  • the embodiments of the present application provide multiple ways of determining the target azimuth angle of the target user's face relative to the display screen.
  • different methods can be selected to determine the target azimuth angle according to the specific structure of the electronic device, so as to flexibly realize the positioning of the target user.
  • the first relative position between the target user and the display screen further includes: a target distance of the target user's face relative to the display screen;
  • the first relative position between the user and the display screen further includes: performing depth estimation on the face of the target user to determine the target distance.
  • the first relative position between the target user and the display screen may also include a target distance of the target user's face relative to the display screen.
  • the electronic device may perform depth estimation on the face of the target user to determine the distance to the target, and further locate the position of the target user relative to the display screen.
  • the method further includes: acquiring the size of the window;
  • the determining the relative position between the first viewpoint and the window when rendering the 3D scene according to the first relative position includes: determining the 3D scene according to the size of the window and the actual size of the display screen. A proportional relationship between the scene and the physical world; determining the relative position between the viewpoint and the window according to the proportional relationship and the first relative position.
  • the electronic device can obtain the size of the window, determine the proportional relationship between the three-dimensional scene and the physical world according to the size of the window and the actual size of the display screen, and determine the relative position between the viewpoint and the window according to the proportional relationship and the first relative position.
  • the size of the view window is different, and the display effect of the target image obtained after rendering is different. Therefore, the relative position between the view point and the view window is determined according to the above-mentioned proportional relationship and the first relative position, which can ensure that the determined relative position of the view point and the view window adapts to the The scene where the target image is currently displayed.
  • the method before determining the first relative position between the target user and the display screen, the method further includes: when judging that there are multiple users in front of the display screen, from the multiple Determine the target user among users.
  • the determining the target user from the plurality of users includes: displaying the face images of the plurality of users on the display screen, receiving a selection instruction, and applying the selection instruction
  • the user to which the corresponding face image belongs is used as the target user; or the user among the multiple users who is closest to the display screen is used as the target user; or the face of the multiple users is sideways to the target user.
  • the user with the smallest angle of the display screen is used as the target user; or the user with the highest usage frequency among the multiple users is used as the target user.
  • the method further includes: displaying a reminder message that multiple users are currently in front of the display screen on the display screen.
  • the electronic device when the electronic device detects multiple faces, it can determine the face of the target user from multiple faces, and can remind the user that there are multiple faces currently in the detection range, so as to ensure that the target user can observe the rendered image
  • the effect of the target image avoiding the strange experience caused by some users not being able to observe the 3D effect when multiple users observe at the same time.
  • an embodiment of the present application provides an image display device, the device includes a plurality of functional modules; the plurality of functional modules interact to implement the method in the above first aspect and its various implementation manners.
  • the multiple functional modules can be implemented based on software, hardware or a combination of software and hardware, and the multiple functional modules can be combined or divided arbitrarily based on specific implementations.
  • an embodiment of the present application provides an electronic device, including a processor and a memory, where computer program instructions are stored in the memory, and when the electronic device is running, the processor executes the method provided in the first aspect above.
  • the embodiment of the present application further provides a computer program, which, when the computer program is run on a computer, causes the computer to execute the method provided in any one of the above aspects.
  • the embodiment of the present application also provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a computer, the computer executes any one of the above-mentioned aspects provided method.
  • the embodiment of the present application further provides a chip, the chip is used to read a computer program stored in a memory, and execute the method provided in any one of the above aspects.
  • an embodiment of the present application further provides a chip system, where the chip system includes a processor, configured to support a computer device to implement the method provided in any one of the above aspects.
  • the chip system further includes a memory, and the memory is used to store necessary programs and data of the computer device.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • FIG. 1 is a schematic diagram of a pointing light source 3D technology
  • FIG. 2 is a schematic diagram of a scene where naked-eye 3D technology is applicable
  • FIG. 3A is a schematic diagram of a three-dimensional scene provided by an embodiment of the present application.
  • FIG. 3B is a schematic diagram of a rendered image provided by an embodiment of the present application.
  • FIG. 4 is a flowchart of a method for displaying an image on a display screen provided by an embodiment of the present application
  • FIG. 5 is a schematic diagram of a scene image captured by a camera provided in an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a scene of a sound source localization technology provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a sound source localization technology based on a microphone array provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a display interface of an electronic device provided in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a scene image including multiple human faces provided by an embodiment of the present application.
  • FIG. 10 is a flow chart of the first method for displaying an image on a display screen provided by an embodiment of the present application.
  • FIG. 11 is a flowchart of a second method for displaying an image on a display screen provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of an image display device provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Naked-eye three-dimensional (3-dimension, 3D) refers to the technology that users can watch stereoscopic visual effects on the display screen of electronic equipment without using external tools such as polarizing glasses.
  • Face detection is a deep learning algorithm used to detect faces in images, such as identifying whether the image contains a face, and further, it can also determine the position of the area corresponding to the face in the image .
  • Monocular depth estimation refers to the estimation of the distance of each pixel in the image relative to the shooting source by using a red, green, blue (RGB) image under one or the only viewing angle.
  • RGB red, green, blue
  • augmented reality augmented reality
  • virtual reality virtual reality
  • AR technology also known as augmented reality technology, mainly includes technologies and means such as multimedia, 3D modeling, and scene fusion.
  • AR technology can combine real world information and virtual world information to display to viewers. Specifically, when viewing the image processed by the AR technology, the user needs to wear a head-mounted display and observe the image through the head-mounted display.
  • AR technology can bind virtual objects to a certain position in the real environment through simulation processing, such as binding virtual objects to some picture feature points or surfaces with specific patterns.
  • the computer vision algorithm is used to continuously calculate the position of the space point where the virtual object is located in the screen, and the virtual three-dimensional object is rendered and projected at the corresponding position, so that the virtual object and the real environment are superimposed and displayed on the display screen of the helmet display , the image viewed by the user includes both the current real environment and the virtual objects superimposed in the real environment, so as to bring the user an experience that the virtual objects actually exist in the real environment.
  • VR technology referred to as virtual technology, also known as virtual environment, uses computer simulation to generate a three-dimensional virtual scene.
  • This technology integrates the latest developments in computer graphics, computer simulation, artificial intelligence, sensing, display, and network parallel processing.
  • a head-up display with positioning function is required, and auxiliary positioning posts distributed in the surrounding space need to be set.
  • the user can wear the head display, and the auxiliary positioning post can continuously locate the position and posture of the head display in the real world, so that the 3D virtual world can be rendered as a parallax image that matches the current position and posture of the user.
  • the binocular images are presented to the user separately, thus giving the user an experience as if they are in a virtual scene.
  • the current AR or VR requires the user to wear a device such as a head-up display to observe the rendered image, while the naked-eye three-dimensional (3-dimension, 3D) Technology, without the need for users to wear external tools such as head-mounted displays or polarized glasses, they can watch stereoscopic visual effects on the display screen of electronic devices.
  • naked-eye 3D technology In naked-eye 3D technology, the reason why users can observe real-like three-dimensional graphics on a two-dimensional display screen is because the difference in the grayscale of colors displayed on the display screen creates a visual illusion for the human eye.
  • naked-eye 3D technology generally uses parallax as the starting point. Assuming that the user's observation point is basically unchanged, the virtual space image is rendered, and through various means, when the user watches the virtual space image, the user's eyes can observe different In order to achieve the optical illusion as if seeing a three-dimensional virtual space.
  • pointing light source 3D technology is a relatively common naked-eye 3D technology.
  • Figure 1 is a schematic diagram of a pointing light source 3D technology.
  • the display screen of an electronic device in pointing light source 3D technology is equipped with two sets of LEDs.
  • the responsive LCD panel and driving method alternately display odd and even frames, and reflect them to the user's left and right eyes respectively, so that the rendered image content can enter the viewer's left and right eyes in a sorted manner to generate parallax, making human observation to the image with 3D effect.
  • FIG. 2 is a schematic diagram of a scene where naked-eye 3D technology is applicable.
  • the scene includes an electronic device 20 and a user 21 , and the electronic device 20 includes a display screen.
  • the electronic device 20 can render the three-dimensional scene to obtain an image, and display the image on the display screen.
  • the user can observe the target image rendered by the electronic device in a certain virtual three-dimensional space (ie, a three-dimensional scene) and displayed on the display screen.
  • FIG. 3A is a schematic diagram of a three-dimensional scene provided by the embodiment of the present application. Referring to the position of the viewpoint and the position of the window marked in FIG. A window when viewing a 3D scene, where the position of the viewport can be used to indicate the position of the near clipping plane when rendering the 3D scene.
  • the electronic device 20 renders the three-dimensional scene to obtain an image, and displays the image on the display screen.
  • the image that the user can observe may be, for example, FIG.
  • FIG. 3B is only an example, and the naked-eye 3D effect observed by the user in a specific implementation is more three-dimensional and real.
  • the position of the viewpoint used to indicate the position of the virtual camera in the 3D scene is a preset fixed value.
  • the display screen displays the rendered image
  • the user can only observe the 3D effect by viewing the rendered image at a fixed position. If the user moves the position, the 3D effect cannot be observed, resulting in a strange experience. Therefore, current image display methods are not flexible enough.
  • an embodiment of the present application provides a method for displaying an image on a display screen, which is used to provide a three-dimensional image display method adapted to a user's location, so as to improve user experience.
  • Fig. 4 is a flowchart of a method for displaying an image on a display screen provided by an embodiment of the present application.
  • the image display method provided by the embodiment of the present application can be applied to the electronic device in the scene shown in Fig. 2, and the electronic device can have display screen, or the electronic device can output images to a display screen bound to the electronic device, so that the display screen displays the image output by the electronic device.
  • the electronic device may have a camera or microphone array, or the electronic device may be bound to the camera or microphone array, and the positional relationship between the electronic device and the bound camera or microphone array can be obtained.
  • the method for displaying an image on a display screen includes the following steps:
  • the electronic device determines a first relative position between the target user and the display screen, and the target user is located in front of the display screen.
  • the target user in this embodiment of the present application may be, for example, user 21 in the scene shown in FIG. 2 , and the target user is located in front of the display screen, so that the user can observe the three-dimensional effect of the image displayed on the display screen.
  • the first relative position between the target user and the display screen may include the target azimuth angle of the target user's face relative to the display screen, and further, the first relative position may also include the target user's face.
  • the target distance of the face relative to the display may include the methods for determining the target azimuth and target distance in the embodiments of the present application.
  • the electronic device determines the target azimuth angle of the target user's face relative to the display screen.
  • the target azimuth angle of the target user's face relative to the display screen may include a horizontal azimuth angle of the target user's face relative to the display screen and a horizontal azimuth angle of the target user's face relative to the display screen.
  • the horizontal azimuth angle of the target user's face relative to the display screen can be used to indicate the horizontal angle of the target user's face relative to the display screen
  • the vertical azimuth angle of the target user's face relative to the display screen can be used It represents the vertical angle of the target user's face relative to the display screen.
  • the embodiment of the present application provides two methods for determining the target azimuth angle, and the following two methods for determining the target azimuth angle are introduced:
  • the electronic device determines the azimuth of the target based on the scene image collected by the camera.
  • the electronic device can acquire scene images captured by the camera.
  • the orientation of the camera is consistent with the orientation of the display screen, and the scene images captured by the camera include scenes within a preset range in front of the display screen.
  • the camera can capture a scene image including the target user.
  • FIG. 5 is a schematic diagram of a scene image, which includes a background and a human face.
  • the electronic device can detect the scene image collected by the camera to determine the face and target of the target user in the scene image. The target azimuth between the user's face and the display.
  • the electronic device may determine the face of the target user in the scene image based on a face detection algorithm. Specifically, the electronic device can use the scene image as the input of the face detection model, and obtain the position of the target user's face in the scene image output by the face detection model, wherein the position of the target user's face in the scene image Specifically, the location may be the location coordinates of the detection frame corresponding to the face of the target user in the scene image.
  • the face detection model is a model trained based on a face detection algorithm and a face dataset.
  • the face dataset includes images and positions of faces in the images.
  • the scene image in the face data set can be used as the input of the initial face detection model to obtain the predicted face position output by the initial face detection module, and calculate the predicted face position and The loss value between the face positions in the actual image, adjust the parameters of the initial face detection model according to the loss value, repeat the above training process until the loss value corresponding to the initial face detection model converges within the preset range, then it can be considered that the training is over , to get the face detection model.
  • the electronic device can also detect the human eyes of the target user in the scene image, determine the position of the target user's human eyes, and then determine the position of the target user's face according to the position of the target user's human eyes.
  • the electronic device can Using the determined position of the human eyes of the target user as the position of the target user's face can also realize the positioning of the current position of the user.
  • the face detection algorithm can also be used to determine the position of the target user's eyes in the scene image.
  • the electronic device can input the scene image into the trained face detection model, and obtain the output of the face detection model. The location of the target user's eyes in the scene image.
  • the human eye position is also used as an output value of the face detection model.
  • a human eye detection model can also be trained based on the detection algorithm and the dataset marked with the human eye position, and the electronic device can use the human eye detection model to determine the human eye position.
  • the face detection model or the training method of the human eye detection model that can be used to detect the position of the target user's eyes can be implemented by referring to the above-mentioned training method of the face detection model, which will not be repeated in the embodiments of the present application.
  • the electronic device After determining the position of the target user's face in the scene image, the electronic device can determine the target azimuth between the target user's face and the display screen according to the position of the target user's face in the scene image .
  • the electronic device may determine the azimuth angle between the face of the target user and the camera according to the position of the face of the target user in the scene image.
  • the azimuth between the target user and the camera may be the azimuth between the target connection line between the face of the target user and the camera and the normal vector of the camera, and the azimuth may also include a horizontal azimuth and a vertical azimuth.
  • the camera coordinate system is determined
  • the conversion relationship with the world coordinate system is also called calibrating the camera.
  • R the relationship between the position of the object in the camera coordinate system and the position of the object in the world coordinate system in the real environment in the scene image captured by the camera
  • R is the object in the real environment
  • C is the position of the object in the camera coordinate system
  • M is the conversion relationship between the camera coordinate system and the world coordinate system
  • M can also be understood as the conversion matrix between the camera coordinate system and the world coordinate system
  • the parameters in the M matrix are camera parameters
  • the process of solving M is the process of calibrating the camera.
  • camera parameters can be divided into internal parameters and external parameters, wherein internal parameters are intrinsic parameters of the lens, such as lens center position (C x , C y ) and focal length f x , f y , and internal parameters can use pixel length express.
  • the external parameter is the camera position parameter, which is the rigid transformation between the camera coordinate system and the world coordinate system. Specifically, it can be the rotation amount and translation amount of the camera coordinate system relative to the world coordinate system. Based on the above introduction, the camera coordinate system and the world coordinate system can satisfy the following formula:
  • (u, v) are the coordinates of the target point in the camera coordinate system
  • (x, y, z) are the corresponding coordinates of the target point in the world coordinate system
  • is the internal parameter in the camera parameters is an extrinsic parameter in camera parameters.
  • the conversion relationship M between the camera coordinate system and the world coordinate system can be obtained to satisfy the following formula:
  • the target user's face can be determined according to the position of the target user's face in the scene image and the conversion relationship between the camera coordinate system and the world coordinate system The position of the target line between the camera and the camera.
  • the expression X of the target connection line in the world coordinate system satisfies the following formula:
  • the position of a point F(u f , v f ) can be used to represent the position of the target user's face in the scene image, and this point can be the midpoint of the two eyes in the target user's face Or the center point of the face detection frame of the target user.
  • Solving this formula can determine the expression of the target line in the world coordinate system, and then can determine the azimuth between the target line and the normal vector of the camera, and calculate the distance between the target line and the normal vector of the camera.
  • Azimuth is the azimuth between the face of the target user and the camera. If the normal vector of the camera is perpendicular to the plane where the display screen is located, if the camera is set on the plane where the display screen is located, then the azimuth between the target user's face and the camera can be used as the orientation between the target user's face and the display screen horn.
  • the electronic device can And the angle difference between the normal vector of the camera and the normal vector of the plane where the display screen is located determines the target azimuth angle between the face of the target user and the display screen.
  • the electronic device determines the azimuth angle of the target based on the sound information collected by the microphone array.
  • FIG. 6 is a schematic diagram of a scene of a sound source localization technology provided by an embodiment of the present application.
  • the electronic device may have a microphone array, and the target user speaks within a preset range in front of the display screen, and the microphone array may collect information of the target user's voice.
  • FIG. 7 is a schematic diagram of a sound source localization technology based on a microphone array provided in an embodiment of the present application.
  • the microphone array shown in FIG. 7 includes six microphones (MIC1, MIC2, MIC3, MIC4, MIC5 and MIC6). After the target user makes a sound, these six microphones collect the target user's voice at the same time. The distance from the sound source is different, and the time delay of the sound collected by different microphones is also inconsistent.
  • the electronic device can estimate the distance difference between different microphones and the sound source according to the time delays of sound collection by different microphones. For example, the distance difference between MIC1 and MIC2 and the sound source in Figure 7 is d cos ⁇ , according to the actual installation distance between MIC1 and MIC2, the horizontal azimuth angle ⁇ between the face of the target user and the microphone array can be obtained. Similarly, the electronic device may also determine the vertical azimuth between the face of the target user and the microphone array according to the above method.
  • the azimuth between the face of the target user and the microphone array may be used as the target azimuth between the target user and the display screen; If the plane where the microphone array is located is not parallel to the plane where the display screen is located, the target user's face and display can be determined according to the azimuth between the face of the target user and the microphone array and the angle between the plane where the microphone array is located and the plane where the display screen is located. Target azimuth between screens.
  • the electronic device determines the target distance of the target user's face relative to the display screen.
  • the electronic device may perform depth estimation on the face of the target user in the scene image collected by the camera based on a monocular depth estimation algorithm, and determine the target distance of the face of the target user relative to the display screen.
  • the electronic device can perform face detection on the scene image, and the specific implementation can refer to the face detection method introduced in the above-mentioned embodiments, which will not be repeated here.
  • the electronic device may use the scene image as an input of the monocular depth estimation model, and obtain the depth information of the target user's face output by the monocular depth estimation model, and the depth information may be used as the target distance.
  • the monocular depth estimation model is obtained by training based on the monocular depth estimation algorithm and the depth image data set, and can determine the deep learning model of the depth information of the image.
  • the depth image dataset includes images and depth information of objects contained in the images.
  • the images in the depth image data set can be used as the input of the initial monocular depth estimation model to obtain the predicted depth information output by the initial monocular depth estimation model, and calculate the predicted depth information and The loss value between the actual depth information, adjust the parameters of the initial monocular depth estimation model according to the loss value, repeat the above training process until the loss value corresponding to the initial monocular depth estimation model converges within the preset range, then it can be considered that the training is over, Obtain the monocular depth estimation model.
  • the face area in the depth image containing the human face can also be used as the input of the initial monocular depth estimation model for training, so that the electronic device can perform the training based on the monocular depth estimation.
  • the model determines the target distance the target user's face can also be used as the input of the monocular depth estimation model, and the depth information of the target user's face output by the monocular depth estimation model can be obtained, and the depth information of the target user's face information as target distance.
  • the face of the target user can be determined according to the depth information and the distance between the camera and the plane where the display screen is located. The target distance between the face and the display.
  • the angle difference between the normal vector of the camera and the normal vector of the plane where the display screen is located, the angle between the plane where the microphone array is located and the plane where the display screen is located, and the angle between the plane where the camera and the display screen are located
  • the distance of is an attribute parameter of the electronic device, which can be pre-stored in the electronic device.
  • the electronic device determines a first viewpoint position according to the first relative position, where the first viewpoint position is used to indicate a position of a virtual camera when rendering a three-dimensional scene.
  • the first viewpoint position is determined based on the position of the face of the current target user and is used for rendering the three-dimensional scene.
  • the electronic device may obtain the position of the window, where the position of the window may be preset according to the scene where the rendered target image is displayed, for example, the position of the window may be based on the scene where the target image is displayed The fixed position of the scene setting, or the position of the window may also be changed according to the scene in which the target image is actually displayed.
  • the electronic device determines the relative position between the first viewpoint and the window when rendering the 3D scene according to the first relative position, and then according to the relative position between the first viewpoint and the window and the position of the window , to determine the position of the first viewpoint.
  • the electronic device may use the first relative position as the relative position between the first viewpoint and the window, that is, the relative position between the first viewpoint and the window at this time is the face of the target user and the display screen The relative position between them, the target user can observe the three-dimensional effect of the target image through the display screen at the current position.
  • the electronic device before rendering the 3D scene, may also acquire the size of the window.
  • the size of the window may also be a parameter set based on the scene where the target image is displayed.
  • the electronic device can determine the proportional relationship between the three-dimensional scene and the physical world according to the size of the window and the actual size of the display screen. For example, when the size of the window is the same as the actual size of the display screen, the ratio of the 3D scene to the physical world is 1:1; and for example, when the size of the window is 1:2, the 3D scene The ratio to the physical world is 2:1.
  • the electronic device can determine the relative position between the viewpoint and the window according to the proportional relationship between the three-dimensional scene and the physical reality, and the first relative position. For example, when the proportional relationship is 2:1, the value of each parameter in the relative position of the viewpoint and the window can be It is twice the value of each parameter in the first relative position.
  • the electronic device renders the three-dimensional scene according to the position of the first viewpoint to obtain a first target image, and displays the first target image on a display screen.
  • the target image obtained after the electronic device renders the 3D scene according to the first viewpoint position is more suitable for the user to observe the 3D effect, and the viewpoint position in the 3D scene corresponding to the target image matches the current position of the user, so that no The user looks for a viewpoint position where a three-dimensional scene can be observed, and the three-dimensional effect can be observed at the current position of the user.
  • the rendering processing in this embodiment of the present application may be performed by a renderer in an electronic device.
  • the electronic device after the electronic device displays the target image on the display screen, it can also determine the second relative position between the target user and the display screen again based on the image display method provided by the embodiment shown in FIG.
  • the two relative positions determine the second viewpoint position.
  • the electronic device renders the three-dimensional scene based on the position of the second viewpoint to obtain the second target image, and displays the second target image on the display screen.
  • the position of the viewpoint when the electronic device renders the three-dimensional scene can be adjusted in real time, so that the position of the viewpoint of the rendered three-dimensional scene can be adjusted following the user's moving position, without causing the problem that the 3D effect cannot be observed.
  • the electronic device may display a standby screen.
  • a countdown animation can also be displayed on the standby screen to remind the user that after the countdown ends, the electronic device will exit the naked-eye 3D mode. If the user returns to the detection range of the camera before the countdown ends, the electronic device will continue to display the target image. When the countdown ends and the face of the target user is still not detected, the electronic device exits the naked-eye 3D mode.
  • the electronic device displays a countdown animation it may simultaneously display a reminder message reminding that there are currently multiple faces within the detection range. For example, FIG.
  • FIG. 8 is a schematic diagram of a display interface of an electronic device provided in an embodiment of the present application.
  • a countdown animation is displayed on the display screen, and a reminder message "Multiple faces are currently detected, please keep a single person within the detection range" is displayed at the same time.
  • the electronic device can determine the first relative position between the target and the display screen. After the first relative position of the target user is determined, the first viewpoint position required for rendering the 3D scene may be determined according to the first relative position, so as to ensure that the determined viewpoint matches the current position of the target user.
  • the electronic device renders the 3D scene according to the position of the first viewpoint to obtain the first target image, and displays the first target image on the display screen, and the rendered first target image according to the determined first viewpoint position is more suitable for the user to observe the 3D effect , to improve user experience.
  • the viewpoint position can be updated in real time as the user moves, without the need for the user to watch the 3D image at a fixed viewpoint position, providing a flexible display image method.
  • Method 1 The electronic device determines the background in the scene image, so that after the scene image is acquired, it is compared with the determined background to determine the position of the face of the target user.
  • the electronic device used to display the target image is generally placed in a relatively fixed position, so the background of the scene image captured by the camera of the electronic device rarely changes, or it can be considered that the captured scene image If the background is basically unchanged, the area where the scene image captured by the camera changes relative to the background may be the position of the face of the target user. Then, the face detection can be performed on the area where the scene image changes relative to the background, so as to Improve the efficiency of face detection.
  • the electronic device determines the movement information of the target user's face according to the stored historical location information of the target user's face, and the movement information is used to indicate the speed and acceleration of the target user's face when it moves.
  • the electronic device predicts the predicted position of the target user's face in the scene image based on the Kalman algorithm based on the latest historical position information and movement information of the target user's face, and performs face detection on the area corresponding to the predicted position in the scene image. Detect and determine the face of the target user in the scene image.
  • the electronic device can acquire the scene image captured by the camera multiple times, and determine the face of the target user in the scene image, and the electronic device can store the target user's face.
  • the position information of the user's face in multiple scene images is used as the historical position information of the target user's face.
  • the electronic device may determine the movement information of the target user's face according to the historical position information of the target user's face.
  • the electronic device may determine the movement information of the target user's face according to the last three historical position information of the target user's face, where the movement information of the target user's face may include the speed and speed when the target user's face moves. acceleration.
  • the electronic device can predict the predicted position of the target user's face in the scene image based on the Kalman algorithm based on the latest historical position information and movement information of the target user's face, and obtain the position of the target user's face in the scene image.
  • the electronic device may perform face detection in an area corresponding to the predicted location in the scene image to determine the face of the target user in the scene image. In this way, the search space during face detection can be optimized, and the efficiency of face detection can be improved.
  • the Kalman algorithm is also called the Kalman filter algorithm.
  • the Kalman algorithm can estimate the state of the dynamic system according to the measurement data when the measurement variance is known.
  • a preset covariance matrix may be stored in the electronic device, and after the electronic device determines the movement information of the face of the target user, it may, according to the last historical position information of the face of the target user, The movement information and the covariance matrix are based on the Kalman algorithm to estimate the predicted position of the target user's face in the scene image and update the covariance matrix.
  • the electronic device performs face detection on the area corresponding to the predicted position in the scene image, and after determining the face of the target user and the position of the face of the target user in the scene image, the electronic device may
  • the location of the target user's face updates the movement information of the target user's face, such as updating the acceleration and speed of the target user's face when it moves.
  • the electronic device calculates the test margin and the Kalman gain according to the predicted position of the target user's face and the position of the target user's face determined after the face detection, and corrects the next predicted target user's position according to the test margin and the Kalman gain. The predicted position of the face, so as to get a more accurate estimate.
  • Method 3 When the electronic device detects the face of the target user in the scene image, it can perform face detection on multiple consecutive frames of the scene image. If the number of frames of the scene image in which the face of the target user is detected is greater than the preset threshold , the detected face of the target user can be considered, and then the position of the target user's face can be determined. In this manner, it is possible to avoid the situation that a human face is detected incorrectly, and to ensure the accuracy of human face detection.
  • the electronic device does not detect the face of the target user during the process of displaying the target image, it will display the standby screen. In this scenario, you can also refer to method 3. Specifically, if the electronic device does not detect When the number of frames of the scene image of the target user's face is greater than a preset threshold, it may be considered that the target user's face is not currently detected.
  • the electronic device can also determine the target user in the following ways:
  • the electronic device receives a selection instruction triggered by a user, and uses the user mentioned in the face image corresponding to the selection instruction as a target user.
  • the electronic device may display information reminding the user to select the face of the target user.
  • the user can trigger the selection command by touching the screen, and after receiving the selection command, the electronic device can use the user whose face at the position corresponding to the selection command belongs to as the target user; or the electronic device can number multiple faces in the scene image,
  • the user triggers the selection instruction through audio input, and the selection instruction may include the number corresponding to the face of the target user.
  • the electronic device may use the user whose face corresponds to the number in the selection instruction as the target user.
  • FIG. 9 is a schematic diagram of a scene image including multiple faces.
  • the scene image includes face A, face B, and face C as an example.
  • the user can select one of the faces to trigger
  • the electronic device may use the user to which the face A selected by the user belongs as the target user.
  • the embodiment of the present application does not limit the manner in which the user triggers the selection instruction.
  • the user may also trigger the selection instruction through the control device of the electronic device.
  • the electronic device takes the user closest to the display screen among the multiple users as the target user.
  • the electronic device when it determines that the scene image includes multiple human faces, it may separately determine the distance between each human face and the display screen, and use the user who has the closest human face as the target user.
  • the manner in which the electronic device determines the distance between each face and the display screen can be implemented by referring to the manner in which the electronic device determines the target distance based on the monocular depth estimation algorithm in S401 , which will not be repeated here.
  • the electronic device takes the user whose face is at the smallest angle to the display screen among the multiple users as the target user.
  • the electronic device may determine the rotation angle between the plane where each face is located relative to the plane of the display screen, and use the user whose face with the smallest rotation angle belongs to as the target user.
  • the electronic device takes the user with the highest usage frequency among the multiple users as the target user.
  • the electronic device may store the face of the user with high usage frequency and the usage frequency of the user as the frequently used user in the local storage.
  • the multiple faces can be matched with the faces of frequently used users, and if the matching is successful, the user identified as the frequently used user and the user with the highest frequency of use can be used as the target user.
  • the first relative position of the target user relative to the display screen includes a target azimuth of the target user's face relative to the display screen and a target distance of the target user's face relative to the display screen.
  • Fig. 10 is a flow chart of the first method for displaying an image on a display screen according to an embodiment of the present application. Referring to Fig. 10, the method includes the following steps:
  • S1001 The electronic device acquires a scene image captured by a camera for a current scene.
  • S1002 The electronic device determines the face of the target user in the scene image based on the face detection algorithm.
  • S1003 The electronic device determines a target azimuth of the target user's face relative to the display screen according to the position of the target user's face in the scene image.
  • the electronic device performs depth estimation on the face of the target user, and determines a target distance between the face of the target user and the display screen.
  • S1005 The electronic device obtains the position of the window.
  • S1006 The electronic device uses the target azimuth and the target distance as the first relative position, and determines the relative position between the first viewpoint and the window when rendering the 3D scene according to the first relative position.
  • S1007 The electronic device determines the position of the first viewpoint according to the relative position between the first viewpoint and the window and the position of the window.
  • the electronic device renders the three-dimensional scene according to the position of the first viewpoint to obtain a first target image, and displays the first target image on the display screen.
  • Fig. 11 is a flowchart of a second method for displaying an image on a display screen according to an embodiment of the present application. Referring to Fig. 11, the method includes the following steps:
  • S1101 The electronic device acquires information about the target user's voice collected by the microphone array.
  • the electronic device determines a target azimuth of the target user's face relative to the display screen according to the target user's voice information.
  • S1103 The electronic device acquires the scene image captured by the camera for the current scene.
  • S1104 The electronic device determines the face of the target user in the scene image based on the face detection algorithm.
  • S1105 The electronic device performs depth estimation on the face of the target user, and determines a target distance between the face of the target user and the display screen.
  • S1106 The electronic device obtains the position of the window.
  • S1107 The electronic device uses the target azimuth and the target distance as the first relative position, and determines the relative position between the first viewpoint and the window when rendering the 3D scene according to the first relative position.
  • S1108 The electronic device determines the position of the first viewpoint according to the relative position between the first viewpoint and the window and the position of the window.
  • the electronic device renders the three-dimensional scene according to the position of the first viewpoint to obtain a first target image, and displays the first target image on a display screen.
  • FIG. 12 is a schematic structural diagram of an image display device 1200 provided by an embodiment of the present application.
  • the image display device 1200 includes a processing unit 1201 , a rendering unit 1202 and a display unit 1203 .
  • the functions of each unit in the image display device 1200 will be introduced below.
  • the processing unit 1201 is configured to determine a first relative position between the target user and the display screen, the target user is located in front of the display screen; determine a first viewpoint position according to the first relative position, and the first viewpoint The position is used to indicate the position of the virtual camera when rendering the 3D scene;
  • a rendering unit 1202 configured to render the 3D scene according to the position of the first viewpoint to obtain a first target image
  • a display unit 1203, configured to display the first target image on the display screen.
  • the processing unit 1201 is further configured to: acquire the position of the view window, where the position of the view window is used to indicate the position of the near clipping plane when rendering the 3D scene;
  • the processing unit 1201 is specifically configured to: determine the relative position between the first viewpoint and the window when rendering the 3D scene according to the first relative position; As well as the position of the view window, the position of the first viewpoint is determined.
  • the processing unit 1201 is further configured to: after the rendering unit obtains the first target image, determine a second relative position between the target user and the display screen; The second relative position determines a second view point position; the second relative position is different from the first relative position, and the second view point position is different from the first view point position; the second view point position is rendered according to the second view point position obtaining a second target image in the three-dimensional scene, and displaying the second target image on the display screen.
  • the first relative position between the target user and the display screen includes: a target azimuth of the target user's face relative to the display screen.
  • the processing unit 1201 is specifically configured to: acquire a scene image captured by a camera, the scene image includes a scene within a preset range in front of the display screen; The position of the face in the scene image determines the first relative position.
  • the processing unit 1201 is specifically configured to: perform sound source localization on the target user based on the sound information of the target user collected by the microphone array, to obtain the first relative position.
  • the first relative position between the target user and the display screen further includes: a target distance of the target user's face relative to the display screen;
  • the processing unit 1201 is further configured to: perform depth estimation on the face of the target user, and determine the target distance.
  • the processing unit 1201 is further configured to: before determining the first relative position between the target user and the display screen, when it is judged that there are multiple users in front of the display screen, from The target user is determined among the plurality of users.
  • the processing unit 1201 is specifically configured to: display the face images of the multiple users on the display screen, receive a selection instruction, and assign the user whose face image corresponds to the selection instruction to as the target user; or take the user who is closest to the display screen among the multiple users as the target user; or take the user whose face is at the smallest angle to the display screen among the multiple users as the target user; or use the user with the highest usage frequency among the multiple users as the target user.
  • FIG. 13 is a schematic structural diagram of an electronic device 1300 provided in an embodiment of the present application.
  • the electronic device 1300 can be used to realize the implementation shown in FIG. 4 The function of the electronic device in the example.
  • the electronic device 1300 includes: a display screen 1301 , a processor 1302 , a memory 1303 and a bus 1304 .
  • the electronic device 1300 may also include a camera 1305 and a microphone array 1306, wherein the display screen 1301, the processor 1302, the memory 1303, the camera 1305 and the microphone array 1306 communicate through the bus 1304, and may also be realized by other means such as wireless transmission. communication.
  • the memory 1303 stores program codes, and the processor 1302 can call the program codes stored in the memory 1303 to perform the following operations:
  • the processor 1302 is further configured to: acquire the position of the view window, where the position of the view window is used to indicate the position of the near clipping plane when rendering the 3D scene;
  • the processor 1302 is specifically configured to: determine, according to the first relative position, the relative position between the first viewpoint and the window when rendering the 3D scene; As well as the position of the view window, the position of the first viewpoint is determined.
  • the processor 1302 is further configured to: determine a second relative position between the target user and the display screen 1301 after the rendering unit obtains the first target image;
  • the second relative position determines a second viewpoint position; the second relative position is different from the first relative position, and the second viewpoint position is different from the first viewpoint position; rendering according to the second viewpoint position
  • the 3D scene obtains a second target image, and displays the second target image on the display screen 1301 .
  • the first relative position between the target user and the display screen 1301 includes: a target azimuth of the target user's face relative to the display screen 1301 .
  • the processor 1302 is specifically configured to: acquire a scene image captured by the camera 1305, the scene image including a scene within a preset range in front of the display screen 1301; The position of the user's face in the scene image is used to determine the first relative position.
  • the processor 1302 is specifically configured to: perform sound source localization on the target user based on the sound information of the target user collected by the microphone array 1306 to obtain the first relative position.
  • the first relative position between the target user and the display screen 1301 further includes: a target distance of the target user's face relative to the display screen 1301;
  • the processor 1302 is further configured to: perform depth estimation on the face of the target user, and determine the target distance.
  • the processor 1302 is further configured to: before determining the first relative position between the target user and the display screen 1301, when judging that there are multiple users in front of the display screen 1301 , determining the target user from the plurality of users.
  • the processor 1302 is specifically configured to: display the face images of the multiple users on the display screen 1301, receive a selection instruction, and assign the face images corresponding to the selection instruction to the The user is used as the target user; or the user among the multiple users who is closest to the display screen 1301 is used as the target user; or the face of the multiple users is sideways to the angle of the display screen 1301 The smallest user is used as the target user; or the user with the highest usage frequency among the multiple users is used as the target user
  • the memory 1304 in FIG. 13 of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • the volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
  • RAM Static Random Access Memory
  • SRAM Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM, DDR SDRAM enhanced synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM synchronous connection dynamic random access memory
  • Synchlink DRAM, SLDRAM Direct Memory Bus Random Access Memory
  • Direct Rambus RAM Direct Rambus RAM
  • this embodiment of the present application also provides a computer program, which, when the computer program is run on a computer, causes the computer to execute the method for displaying an image on a display screen provided by the embodiment shown in FIG. 4 .
  • an embodiment of the present application also provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a computer, the computer executes the implementation shown in FIG. 4 .
  • the example provides a method of displaying an image on the display.
  • the storage medium may be any available medium that can be accessed by a computer.
  • computer-readable media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage media or other magnetic storage devices, or may be used to carry or store information in the form of instructions or data structures desired program code and any other medium that can be accessed by a computer.
  • an embodiment of the present application further provides a chip for reading a computer program stored in a memory to implement the method for displaying an image on a display screen provided by the embodiment shown in FIG. 4 .
  • an embodiment of the present application provides a chip system
  • the chip system includes a processor, configured to support a computer device to implement the method for displaying an image on a display screen provided by the embodiment shown in FIG. 4 .
  • the chip system further includes a memory, and the memory is used to store necessary programs and data of the computer device.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Geometry (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Processing Or Creating Images (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种在显示屏上显示图像的方法、电子设备与装置。在该方法中,电子设备确定目标用户与显示屏之间的第一相对位置,目标用户位于显示屏的前方(S401);电子设备根据第一相对位置确定第一视点位置,第一视点位置用于指示对三维场景进行渲染时的虚拟相机的位置(S402);电子设备根据第一视点位置渲染三维场景得到第一目标图像,并在显示屏上显示第一目标图像(S403)。通过该方案,电子设备确定出的第一视点位置可以与当前目标用户的位置匹配,电子设备根据第一视点位置渲染得到第一目标图像,在显示屏上显示的第一目标图像更适合目标用户在当前所处的位置处观察到三维效果,无需用户去寻找能够观察到三维效果的位置,提升用户体验。

Description

一种在显示屏上显示图像的方法、电子设备与装置
相关申请的交叉引用
本申请要求在2021年09月18日提交中国专利局、申请号为202111113031.4、申请名称为“一种在显示屏上显示图像的方法、电子设备与装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及裸眼三维显示领域,尤其涉及一种在显示屏上显示图像的方法、电子设备与装置。
背景技术
裸眼三维(3-dimension,3D)是指不借助偏振光眼镜等外部工具,用户即可在电子设备的显示屏上观看到立体视觉效果的技术。而用户之所以能在二维的显示屏上观察到如实物般的立体视觉效果,是因为在显示屏上显示的图像由于色彩灰度的不同而使人眼产生视觉上的错觉。
目前裸眼3D技术一般以视差作为切入点,在假设用户观察点基本不变的情况下,渲染三维场景得到图像,并通过各种显示手段使得用户观看该图像时,用户的双眼可以观察到不同的画面,以此达到仿佛看到了立体空间的视错觉。
但目前的裸眼3D技术进行渲染是基于视点不会发生变化的假设实现的,也就是说,现有在对三维场景进行渲染时,用于指示三维场景中虚拟相机位置的视点位置是预设的固定值。当显示屏显示渲染后的图像时,用户只能在固定的位置观看渲染后的图像才能观察到3D效果,若用户移动位置,则无法观察到3D效果,而造成异样体验。因此,目前的图像显示方法不够灵活。
发明内容
本申请实施例提供一种在显示屏上显示图像的方法、电子设备与装置,提供一种适应用户所处位置的三维图像显示方法,以提升用户体验。
第一方面,本申请实施例提供一种在显示屏上显示图像的方法。该方法包括:
确定目标用户与所述显示屏之间的第一相对位置,所述目标用户位于所述显示屏的前方;根据所述第一相对位置确定第一视点位置,所述第一视点位置用于指示对三维场景进行渲染时的虚拟相机的位置;根据所述第一视点位置渲染所述三维场景得到第一目标图像,并在所述显示屏上显示所述第一目标图像。
可选的,该方法可以应用于电子设备,该电子设备可以具有显示屏,或者该电子设备可以将图像输出到与电子设备绑定的显示屏,以使该显示屏显示电子设备输出的图像。进一步的,该电子设备可以具有摄像头或麦克风阵列,或者,该电子设备可以与摄像头或麦克风阵列绑定,且电子设备与绑定的摄像头或麦克风阵列之间的位置关系是可以获取到的。
在以上方法中,电子设备可以确定目标用于与显示屏之间的第一相对位置。在确定出 目标用户的第一相对位置后,可以根据第一相对位置确定渲染三维场景时所需的第一视点位置,进而保证确定出的视点位置与当前目标用户的位置匹配。电子设备根据第一视点位置渲染三维场景得到第一目标图像,并在显示屏上显示第一目标图像,根据确定出的第一视点位置进行渲染后的第一目标图像更适合目标用户在当前所处的位置处观察到三维效果,无需用户去寻找能够观察到三维效果的位置,提升用户体验。
在一个可能的设计中,获取视窗的位置,所述视窗的位置用于指示渲染所述三维场景时近裁剪面的位置;
所述根据所述第一相对位置确定第一视点位置,包括:根据所述第一相对位置确定对所述三维场景进行渲染时的第一视点与视窗之间的相对位置;根据所述第一视点和视窗之间的相对位置以及所述视窗的位置,确定所述第一视点的位置。
通过该设计,电子设备可以获取视窗的位置,其中,视窗的位置可以是根据显示第一目标图像的场景预设的参数。电子设备可以根据第一相对位置确定第一视点与视窗之间的相对位置,从而将视点与视窗的相对位置与用户人脸和显示屏的相对位置关联起来,使得根据确定出的第一视点的位置渲染得到的第一目标图像更适合用户在当前所处的位置上观看到三维效果。
在一个可能的设计中,在得到所述第一目标图像之后,所述方法还包括:确定所述目标用户与所述显示屏之间的第二相对位置;根据所述第二相对位置确定第二视点位置;所述第二相对位置与所述第一相对位置不同,所述第二视点位置与所述第一视点位置不同;根据所述第二视点位置渲染所述三维场景得到第二目标图像,并在所述显示屏上显示所述第二目标图像。
通过该设计,在图像显示过程中,电子设备可以根据本申请实施例提供的在显示屏上显示图像的方法,随着用户移动实时更新视点位置,无需用户在固定视点位置观看目标图像,提供一种灵活的显示图像方法。
在一个可能的设计中,所述目标用户与所述显示屏之间的第一相对位置,包括:所述目标用户的人脸相对于所述显示屏的目标方位角。
通过该设计,目标用户与显示屏之间的第一相对位置可以包括目标用户的人脸相对于显示屏的目标方位角,从而准备定位目标用户的人脸的位置。
在一个可能的设计中,所述确定目标用户与所述显示屏之间的第一相对位置,包括:获取摄像头拍摄的场景图像,所述场景图像包括所述显示屏前方预设范围的场景;根据所述场景图像中所述目标用户的人脸在所述场景图像的位置,确定所述第一相对位置。
在一个可能的设计中,所述根据所述场景图像中所述目标用户的人脸在所述场景图像的位置,确定所述第一相对位置,包括:根据所述场景图像中所述目标用户的人脸在所述场景图像的位置,确定所述目标用户的人脸与所述显示屏之间的水平方位角和垂直方位角。其中,所述目标用户的人脸在场景图像的位置可以为人脸参考点,例如,人脸参考点可以为人脸中心点在参考图像中的坐标。
通过以上设计,电子设备可以采集摄像头拍摄到的包括目标用户的人脸在内的场景图像。电子设备可以根据目标用户的人脸在场景图像中的位置确定第一相对位置,此时确定出的目标用户的人脸相对于显示屏的目标方位角可以包括水平方位角和垂直方位角,从而准确定位目标用户的人脸相对于显示屏的位置。
在一个可能的设计中,在根据所述场景图像中所述目标用户的人脸在所述场景图像的 位置,确定所述第一相对位置之前,所述方法还包括:基于人脸检测算法,确定所述场景图像中的所述目标用户的人脸。
通过该设计,电子设备可以基于人脸检测算法从场景图像中识别目标用户的人脸,以便于对目标用户进行定位。
在一个可能的设计中,所述基于人脸检测算法,确定所述场景图像中的所述目标用户的人脸,包括:根据存储的所述目标用户的人脸的历史位置信息确定所述目标用户的人脸的移动信息,所述移动信息用于指示所述目标用户的人脸发生移动时的速度和加速度;根据所述目标用户的人脸的最近一次的历史位置信息以及所述移动信息,基于卡尔曼算法预测所述目标用户的人脸在所述场景图像中的预测位置;对所述场景图像中所述预测位置对应的区域进行人脸检测,确定所述场景图像中的目标用户的人脸。
通过该设计,电子设备在进行人脸检测时,可以先预测场景图像中的目标用户的人脸的位置,进而优化人脸检测时的搜索空间,提升人脸检测的效率。
在一个可能的设计中,所述确定所述第一相对位置,包括:确定所述目标用户的人脸在所述场景图像中的位置;根据所述目标用户的人脸在所述场景图像中的位置、摄像头坐标系与世界坐标系之间的转换关系,确定所述目标用户的人脸与所述摄像头之间的目标连线的位置;根据所述目标连线的位置确定所述目标方位角。
通过该设计,电子设备在通过摄像头拍摄的场景图像确定目标用户的人脸相对于显示屏的目标方位角时,可以先确定目标用户的人脸与摄像头之间的目标连线的位置,再确定目标方位角,以获取准确的目标用户的人脸的方位角信息。
在一个可能的设计中,所述确定目标用户与所述显示屏之间的第一相对位置,包括:基于麦克风阵列采集到的目标用户的声音的信息,对所述目标用户进行声源定位,得到所述第一相对位置。
可选地,对目标用户进行声源定位,确定目标用户的人脸相对于显示屏的水平方位角和垂直方位角。
通过该设计,电子设备可以获取麦克风阵列采集到的目标用户的声音的信息,并且对目标用户进行声源定位,以确定目标用户的人脸相对于显示屏的目标方位角。
也就是说,本申请实施例中提供多种确定目标用户的人脸相对于显示屏的目标方位角的方式。具体实施中,可以根据电子设备的具体构造选择不同方式确定目标方位角,灵活实现对目标用户定位。
在一个可能的设计中,所述目标用户与所述显示屏之间的第一相对位置,还包括:所述目标用户的人脸相对于所述显示屏的目标距离;所述确定所述目标用户与所述显示屏之间的第一相对位置,还包括:对所述目标用户的人脸进行深度估计,确定所述目标距离。
通过该设计,目标用户与显示屏之间的第一相对位置还可以包括目标用户的人脸相对于显示屏的目标距离。具体的,电子设备可以对目标用户的人脸进行深度估计,以确定目标距离,进一步对目标用户相对于显示屏的位置进行定位。
在一个可能的设计中,所述方法还包括:获取所述视窗的尺寸;
所述根据所述第一相对位置确定对所述三维场景进行渲染时的第一视点与视窗之间的相对位置,包括:根据所述视窗尺寸和所述显示屏的实际尺寸,确定所述三维场景与物理世界的比例关系;根据所述比例关系、所述第一相对位置确定所述视点和视窗之间的相对位置。
通过该设计,电子设备可以获取视窗的尺寸,根据视窗尺寸与显示屏的实际尺寸确定三维场景与物理世界的比例关系,根据该比例关系以及第一相对位置确定视点和视窗之间的相对位置。视窗的尺寸不同,渲染后得到的目标图像的显示效果不同,因此根据上述的比例关系以及第一相对位置确定视点和视窗之间的相对位置,能够保证确定出的视点与视窗的相对位置适应与当前显示目标图像的场景。
在一个可能的设计中,在所述确定目标用户与所述显示屏之间的第一相对位置之前,所述方法还包括:在判断所述显示屏前有多个用户时,从所述多个用户中确定所述目标用户。
在一个可能的设计中,所述从所述多个用户中确定所述目标用户,包括:在所述显示屏上显示所述多个用户的人脸图像,接收选择指令,将所述选择指令对应的人脸图像所属的用户作为所述目标用户;或者将所述多个用户中距离所述显示屏最近的用户作为所述目标用户;或者将所述多个用户中人脸侧向于所述显示屏的角度最小的用户作为所述目标用户;或者将所述多个用户中,使用频率最高的用户作为所述目标用户。
在一个可能的设计中,所述方法还包括:在所述显示屏上显示当前有多个用户处于所述显示屏的前方的提醒消息。
通过以上设计,当电子设备检测到多个人脸时,可以从多个人脸中确定目标用户的人脸,并可以提醒用户当前有多个人脸处于检测范围,从而保证目标用户可以观察到渲染后的目标图像的效果,避免多个用户同时观察时,部分用户无法观察到3D效果而造成的异样体验。
第二方面,本申请实施例提供一种图像显示装置,所述装置包括多个功能模块;所述多个功能模块相互作用,实现上述第一方面及其各实施方式中的方法。所述多个功能模块可以基于软件、硬件或软件和硬件的结合实现,且所述多个功能模块可以基于具体实现进行任意组合或分割。
第三方面,本申请实施例提供一种电子设备,包括处理器和存储器,所述存储器中存储计算机程序指令,所述电子设备运行时,所述处理器执行上述第一方面提供的方法。
第四方面,本申请实施例还提供一种计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行上述任一方面提供的方法。
第五方面,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当所述计算机程序被计算机执行时,使得所述计算机执行上述任一方面提供的方法。
第六方面,本申请实施例还提供一种芯片,所述芯片用于读取存储器中存储的计算机程序,执行上述任一方面提供的方法。
第七方面,本申请实施例还提供一种芯片系统,该芯片系统包括处理器,用于支持计算机装置实现上述任一方面提供的方法。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器用于保存该计算机装置必要的程序和数据。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
附图说明
图1为一种指向光源3D技术的示意图;
图2为一种裸眼3D技术适用的场景示意图;
图3A为本申请实施例提供的一种三维场景示意图;
图3B为本申请实施例提供的一种渲染后的图像示意图;
图4为本申请实施例提供的一种在显示屏上显示图像的方法的流程图;
图5为本申请实施例提供的一种摄像头拍摄到的场景图像的示意图;
图6为本申请实施例提供的一种声源定位技术的场景示意图;
图7为本申请实施例提供的一种基于麦克风阵列的声源定位技术示意图;
图8为本申请实施例提供的一种电子设备的显示界面示意图;
图9为本申请实施例提供的一种场景图像中包括多个人脸的示意图;
图10为本申请实施例提供的第一种在显示屏上显示图像的方法的流程图;
图11为本申请实施例提供的第二种在显示屏上显示图像的方法的流程图;
图12为本申请实施例提供的一种图像显示装置的结构示意图;
图13为本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
为了方便理解本申请实施例,下面介绍与本申请实施例相关的术语:
(1)裸眼三维(3-dimension,3D),是指不借助偏振光眼镜等外部工具,用户即可在电子设备的显示屏上观看到立体视觉效果的技术。
(2)人脸检测,为一种深度学习算法,用于对图像中的人脸进行检测,如识别图像中是否包含人脸,进一步地,还可以确定人脸对应的区域在图像中的位置。
(3)单目深度估计,是指利用一张或者唯一视角下的红绿蓝(red、green、blue,RGB)图像,估计图像中每个像素相对拍摄源的距离。
随着图像处理技术的发展和显示设备性能的提升,增强现实(augmented reality,AR)和虚拟现实(virtual reality,VR)也更多的应用于各种生活或娱乐场景中。
AR技术也被称为扩增现实技术,主要包括多媒体、三维建模以及场景融合等技术和手段。AR技术可以将真实世界信息和虚拟世界信息综合在一起显示给观看者。具体的,用户在观看AR技术处理后的图像时,需要佩戴头盔显示器,并通过头盔显示器观察图像。AR技术可以通过模拟仿真处理,将虚拟物体绑定于真实环境中的某个位置,例如将虚拟物体绑定于一些画面特征点或特定图案的表面。利用计算机视觉算法不断计算该虚拟物体所处的空间点在画面中的位置,将虚拟的三维物体渲染出来并投射在相应的位置,从而实现虚拟物体和真实环境叠加显示在头盔显示器的显示画面中,用户观看到的图像中既包括当前的真实环境,也包括叠加在真实环境中的虚拟物体,以此给用户带来虚拟物体仿佛真实地存在于真实环境中的体验。
VR技术,简称虚拟技术,也称虚拟环境,是利用计算机模拟产生一个三维的虚拟场景,该技术集成了电脑图形、电脑仿真、人工智能、感应、显示及网络并行处理等技术的最新发展成果。目前VR实现中,需要一个带有定位功能的头部显示器,并且需要设置分布于周围空间中的辅助定位桩。用户可以佩戴头部显示器,辅助定位桩可以不断地定位头部显示器在现实世界中的位置和姿态,从而将三维虚拟世界渲染成与用户当前所处的位置和姿态相匹配的、且具有视差的双目图像分别呈现给用户,从而带给用户仿佛身处于虚拟场景的体验。
通过上述对AR技术和VR技术的介绍可以看出,目前的AR或VR均需要用户佩戴 如头部显示器这样的设备,才能观察到渲染处理后的图像,而裸眼三维(3-dimension,3D)技术,无需用户佩戴头部显示器或为偏振光眼镜等外部工具,即可在电子设备的显示屏上观看到立体视觉效果的技术。
在裸眼3D技术中,用户之所以能在二维的显示屏上观察到如实物般的三维图形,是因为显示在显示屏上的色彩灰度的不同而使人眼产生视觉上的错觉。目前裸眼3D技术一般以视差作为切入点,在假设用户观察点基本不变的情况下,渲染虚拟空间图像,并通过各种手段使得用户观看该虚拟空间图像时,用户的双眼可以观察到不同的画面,以此达到仿佛看到了立体的虚拟空间的视错觉。例如,指向光源3D技术为一种较为常见的裸眼3D技术,图1为一种指向光源3D技术的示意图,参考图1,指向光源3D技术中电子设备的显示屏中搭配两组LED,通过快速反应的LCD面板和驱动方法,交替显示奇偶帧画面,并分别反射给用户的左眼和右眼,可以使得渲染后的图像内容以排序方式进入观看者的左右眼从而产生视差,使人眼观察到3D效果的图像。
图2为一种裸眼3D技术适用的场景示意图,参考图2,该场景包括电子设备20以及用户21,电子设备20包括显示屏。电子设备20可以对三维场景进行渲染得到图像,并将图像显示在显示屏上。此时用户可以观察到电子设备对某个虚拟的立体空间(即三维场景)渲染得到的、并在显示屏显示的目标图像。
电子设备20在对三维场景渲染时,需要确定渲染三维场景所需的视点位置,视点位置用于指示对三维场景进行渲染时虚拟相机的位置。例如,图3A为本申请实施例提供的一种三维场景示意图,参考图3A中标注出的视点位置和视窗位置,视点的位置可以认为是用户观察三维场景时所在的位置,视窗可以看作用户观察三维场景时的一个窗口,其中,视窗的位置可以用于指示渲染三维场景时近裁剪面的位置。当视点位置和视窗位置如图3A所示时,电子设备20对三维场景进行渲染得到图像,并在显示屏上显示该图像后,用户能够观察到的图像例如可以为图3B。可以看出,用户观察到的图像,为假设用户处于三维场景中的视点位置,通过视窗能够观察到的虚拟场景对应的图像。当然,图3B仅作为一种示例,具体实施中用户所观察到的裸眼3D效果更为立体真实。
现有的裸眼3D技术中,对三维场景进行图像渲染时,是基于视点不会发生变化的假设实现的。也就是说,现有在对三维场景进行渲染时,用于指示三维场景中虚拟相机位置的视点位置是预设的固定值。当显示屏显示渲染后的图像时,用户只能在固定的位置观看渲染后的图像才能观察到3D效果,若用户移动位置,则无法观察到3D效果,而造成异样体验。因此,目前的图像显示方法不够灵活。
基于以上问题,本申请实施例提供一种在显示屏上显示图像的方法,用以提供一种适应用户所处位置的三维图像显示方法,以提升用户体验。
图4为本申请实施例提供的一种在显示屏上显示图像的方法的流程图,本申请实施例提供的图像显示方法可以应用于图2所示场景中的电子设备,该电子设备可以具有显示屏,或者该电子设备可以将图像输出到与电子设备绑定的显示屏,以使该显示屏显示电子设备输出的图像。进一步的,该电子设备可以具有摄像头或麦克风阵列,或者,该电子设备可以与摄像头或麦克风阵列绑定,且电子设备与绑定的摄像头或麦克风阵列之间的位置关系是可以获取到的。
参考图4,本申请实施例提供的在显示屏上显示图像的方法包括以下步骤:
S401:电子设备确定目标用户与显示屏之间的第一相对位置,目标用户位于显示屏的 前方。
可选的,本申请实施例中目标用户例如可以为图2所示的场景中的用户21,目标用户位于显示屏的前方,便于用户观察到显示屏显示的图像的三维效果。
一种可选的实施方式中,目标用户与显示屏之间的第一相对位置可以包括目标用户的人脸相对于显示屏的目标方位角,进一步地,第一相对位置还可以包括目标用户的人脸相对于显示屏的目标距离。下面对本申请实施例中确定目标方位角和目标距离的方式分别进行介绍:
一、电子设备确定目标用户的人脸相对于显示屏的目标方位角。
可选的,目标用户的人脸相对于显示屏的目标方位角可以包括目标用户的人脸相对于显示屏的水平方位角和目标用户的人脸相对于显示屏的水平方位角。其中,目标用户的人脸相对于显示屏的水平方位角可以用于表示目标用户的人脸相对于显示屏在水平方向上的角度,目标用户的人脸相对于显示屏的垂直方位角可以用于表示目标用户的人脸相对于显示屏在垂直方向上的角度。
本申请实施例提供两种用于确定目标方位角的方式,下面对这两种确定目标方位角的方式进行介绍:
方式1、电子设备基于摄像头采集到的场景图像确定目标方位角。
当电子设备具有摄像头,或者电子设备与摄像头绑定时,电子设备可以获取摄像头拍摄的场景图像。摄像头的朝向与显示屏的朝向一致,摄像头拍摄到的场景图像包括显示屏前方预设范围的场景。当目标用户处于显示屏前方预设范围内时,摄像头即可拍摄到包括目标用户在内的场景图像。
例如,例如图5为一张场景图像的示意图,该场景图像中包含背景以及人脸。为保证电子设备渲染处理后的图像更适合当前处于显示屏前方预设范围内的用户进行观看,电子设备可以对摄像头采集到的场景图像进行检测,确定场景图像中的目标用户的人脸以及目标用户的人脸与显示屏之间的目标方位角。
一种可选的实施方式中,电子设备可以基于人脸检测算法,确定场景图像中的目标用户的人脸。具体来说,电子设备可以将场景图像作为人脸检测模型的输入,并获取人脸检测模型输出的目标用户的人脸在场景图像中的位置,其中,目标用户的人脸在场景图像中的位置具体可以为目标用户的人脸对应的检测框在场景图像中的位置坐标。
可选地,人脸检测模型为基于人脸检测算法以及人脸数据集训练后的模型。其中,人脸数据集中包括图像以及图像中人脸的位置。在对人脸检测模型进行训练时,可以将人脸数据集中的场景图像作为初始人脸检测模型的输入,获取初始人脸检测模块输出的预测人脸位置,根据损失函数计算预测人脸位置和实际图像中人脸位置之间的损失值,根据损失值调整初始人脸检测模型的参数,重复以上训练过程直至初始人脸检测模型对应的损失值收敛在预设范围内,则可以认为训练结束,得到人脸检测模型。
另外,电子设备还可以对场景图像中的目标用户的人眼进行检测,确定目标用户的人眼的位置,进而根据目标用户的人眼的位置确定目标用户的人脸的位置,例如电子设备可以将确定出的目标用户的人眼的位置作为目标用户的人脸的位置,同样可以实现对用户当前所处位置进行定位。可选地,人脸检测算法也可以用于确定场景图像中目标用户的人眼的位置,例如,电子设备可以将场景图像输入到训练后的人脸检测模型,并获取人脸检测模型输出的目标用户的人眼在场景图像中的位置。需要说明的是,在这种情况下,在对人 脸检测模型进行训练时,需要使用标注出人眼位置的数据集,并将人眼位置也作为人脸检测模型的一个输出值。当然,实施中也可以基于检测算法以及标注出人眼位置的数据集训练得到一个人眼检测模型,电子设备可以使用该人眼检测模型确定人眼位置。类似地,可以用于检测目标用户的人眼位置的人脸检测模型或人眼检测模型的训练方法均可以参见上述人脸检测模型的训练方法实施,本申请实施例对此不再赘述。
在确定出场景图像中的目标用户的人脸的位置之后,电子设备可以根据场景图像中目标用户的人脸在场景图像中的位置,确定目标用户的人脸与显示屏之间的目标方位角。
一种可选的实施方式中,电子设备可以根据场景图像中目标用户的人脸的位置,确定目标用户的人脸与摄像头之间的方位角。其中,目标用户与摄像头之间的方位角可以为目标用户的人脸与摄像头之间的目标连线与摄像头法向量之间的方位角,该方位角同样可以包括水平方位角和垂直方位角。
可选的,在确定目标用户的人脸与摄像头之间的目标连线与摄像头基准方位之间的方位角之前,需要确定摄像头坐标系与世界坐标系之间的转换关系,其中确定摄像头坐标系与世界坐标系之间的转换关系又称为对摄像头进行标定。具体来说,假设摄像头拍摄到的场景图像中物体在摄像头坐标系的位置与真实环境中物体在世界坐标系的位置之间的关系为:R=M*C,其中,R为真实环境中物体在世界坐标系的位置,C为物体在摄像头坐标系的位置,M为摄像头坐标系与世界坐标系之间的转换关系,M还可以理解为摄像头坐标系与世界坐标系之间的转换矩阵,M矩阵中的参数为摄像头参数,则求解M的过程为对摄像头进行标定的过程。进一步地,摄像头参数可以分为内参数和外参数,其中,内参数为镜头固有参数,如镜头中心位置(C x,C y)和焦距大小f x,f y,内参数均可以使用像素长度表示。外参数为摄像头位置参数,是摄像头坐标系与世界坐标系的刚性变换,具体可以为摄像头坐标系相对于世界坐标系的旋转量和平移量。基于上述介绍,摄像头坐标系与世界坐标系可以满足以下公式:
Figure PCTCN2022112819-appb-000001
其中,(u,v)为摄像头坐标系中目标点的坐标,(x,y,z)为目标点在世界坐标系中对应的坐标,
Figure PCTCN2022112819-appb-000002
为摄像头参数中的内参数,
Figure PCTCN2022112819-appb-000003
为摄像头参数中的外参数。
使用测量得到的摄像头坐标系中目标点的坐标和该目标点在世界坐标系中的坐标代入公式进行求解后,可以得到摄像头坐标系与世界坐标系之间的转换关系M满足以下公式:
Figure PCTCN2022112819-appb-000004
在确定出摄像头坐标系与世界坐标系之间的转换关系后,可以根据目标用户的人脸在场景图像中的位置以及摄像头坐标系与世界坐标系之间的转换关系,确定目标用户的人脸与摄像头之间的目标连线的位置。可选地,假设目标用户的人脸在场景图像中的位置为F(u f,v f),则目标连线在世界坐标系中的表达式X满足下列公式:
F=M*X
其中,本申请实施例中可以使用一个点F(u f,v f)的位置代表目标用户的人脸在场景图像中的位置,该点可以为目标用户的人脸中的两眼的中点或者目标用户的人脸检测框的中 心点。
对该公式进行求解可以确定目标连线在世界坐标系中的表达式,进而可以确定出目标连线与摄像头的法向量之间的方位角,并将目标连线与摄像头的法向量之间的方位角作为目标用户的人脸与摄像头之间的方位角。若摄像头的法向量与显示屏所在平面垂直,如摄像头设置在显示屏所在平面上,则可以将目标用户的人脸与摄像头之间的方位角作为目标用户的人脸与显示屏之间的方位角。若摄像头的法向量与显示屏所在平面并不垂直,如摄像头的法向量与显示屏所在平面的法向量之间存在角度差,则电子设备可以根据目标用户的人脸与摄像头之间的方位角以及摄像头的法向量与显示屏所在平面的法向量之间的角度差确定目标用户的人脸与显示屏之间的目标方位角。
方式2、电子设备基于麦克风阵列采集到的声音的信息确定目标方位角。
当电子设备具有麦克风阵列,或者电子设备与麦克风阵列绑定时,电子设备可以获取麦克风阵列采集到的目标用户的声音的信息,并根据声源定位技术确定目标用户的人脸与显示屏之间的目标方位角。例如,图6为本申请实施例提供的一种声源定位技术的场景示意图。参考图6,电子设备可以具有麦克风阵列,目标用户在显示屏前方的预设范围内讲话,麦克风阵列可以采集目标用户的声音的信息。
电子设备获取到麦克风阵列采集的目标用户的声音的信息,通过麦克风阵列中多个麦克风采集声音的时延,确定目标用户的人脸相对于麦克风阵列的方位角。例如,图7为本申请实施例提供的一种基于麦克风阵列的声源定位技术示意图。参考图7,图7中示出麦克风阵列包括六个麦克风(MIC1、MIC2、MIC3、MIC4、MIC5和MIC6),目标用户发出声音后,这六个麦克风同时采集目标用户的声音,由于不同麦克风与声源的距离不同,不同麦克风采集到声音的时延也不一致。电子设备可以根据不同麦克风采集声音的时延,估计不同麦克风与声源的距离差。例如图7中MIC1和MIC2与声源的距离差为d cosθ,根据MIC1和MIC2之间实际安装距离,可以求出目标用户的人脸与麦克风阵列之间的水平方位角θ。同样的,电子设备也可以根据上述方法确定目标用户的人脸与麦克风阵列之间的垂直方位角。
一种可选的实施方式中,若麦克风阵列所在平面与显示屏所在平面平行,则可以将目标用户的人脸与麦克风阵列之间的方位角作为目标用户与显示屏之间的目标方位角;若麦克风阵列所在平面与显示屏所在平面不平行,则可以根据目标用户的人脸与麦克风阵列之间的方位角以及麦克风阵列所在平面与显示屏所在平面的夹角确定目标用户的人脸与显示屏之间的目标方位角。
二、电子设备确定目标用户的人脸相对于显示屏的目标距离。
一种可选的实施方式中,电子设备可以基于单目深度估计算法对摄像头采集到的场景图像中的目标用户的人脸进行深度估计,确定目标用户的人脸相对于显示屏的目标距离。在该方法中,电子设备可以对场景图像进行人脸检测,具体实施可以参加上述实施例中介绍的人脸检测方法,此处不再赘述。
具体实施中,电子设备可以将场景图像作为单目深度估计模型的输入,并获取单目深度估计模型输出的目标用户的人脸的深度信息,该深度信息可以作为目标距离。
其中,单目深度估计模型是基于单目深度估计算法和深度图像数据集进行训练得到的,可以确定图像的深度信息的深度学习模型。具体来说,深度图像数据集中包括图像以及图像包含的物体的深度信息。在对单目深度估计模型进行训练时,可以将深度图像数据集中 的图像作为初始单目深度估计模型的输入,获取初始单目深度估计模型输出的预测深度信息,根据损失函数计算预测深度信息和实际深度信息之间的损失值,根据损失值调整初始单目深度估计模型的参数,重复以上训练过程直至初始单目深度估计模型对应的损失值收敛在预设范围内,则可以认为训练结束,得到单目深度估计模型。
可以理解的是,对单目深度估计模型进行训练时,也可以将包含人脸的深度图像中的人脸区域作为初始单目深度估计模型的输入进行训练,从而电子设备在基于单目深度估计模型确定目标距离时,也可以将目标用户的人脸作为单目深度估计模型的输入,并获取单目深度估计模型输出的目标用户的人脸的深度信息,并将目标用户的人脸的深度信息作为目标距离。
可选的,若摄像头未设置在显示屏所在的平面,则在获取到目标用户的人脸的深度信息后,可以根据该深度信息和摄像头与显示屏所在平面之间的距离确定目标用户的人脸相对于显示屏之间的目标距离。
需要说明的是,在上述实施方式中,摄像头的法向量与显示屏所在平面的法向量之间的角度差、麦克风阵列所在平面与显示屏所在平面的夹角以及摄像头与显示屏所在平面之间的距离为电子设备的属性参数,可以预存在电子设备中。
S402:电子设备根据所述第一相对位置确定第一视点位置,所述第一视点位置用于指示对三维场景进行渲染时的虚拟相机的位置。
可选的,第一视点位置为基于当前目标用户的人脸的位置确定出的,用于对三维场景进行渲染时的视点的位置。
一种可选的实施方式中,电子设备可以获取视窗的位置,其中,视窗的位置可以为根据显示渲染后的目标图像的场景预先设定的,例如,视窗的位置可以为根据显示目标图像的场景设定的固定位置,或者视窗的位置也可以根据实际显示目标图像的场景变化。电子设备在获取到视窗的位置之后,根据第一相对位置确定对三维场景进行渲染时的第一视点与视窗之间的相对位置,再根据第一视点和视窗之间的相对位置以及视窗的位置,确定第一视点的位置。
举例来说,电子设备可以将第一相对位置作为第一视点与视窗之间的相对位置,也就是说,此时第一视点与视窗之间的相对位置即为目标用户的人脸与显示屏之间的相对位置,则目标用户在当前位置即可通过显示屏观察到目标图像的三维效果。
在本申请一些实施例中,在对三维场景进行渲染之前,电子设备还可以获取视窗的尺寸。其中,视窗的尺寸也可以是基于显示目标图像的场景进行设定的参数。电子设备在获取到视窗的尺寸后,可以根据视窗尺寸和显示屏的实际尺寸确定三维场景与物理世界的比例关系。例如,当视窗的尺寸与显示屏的实际尺寸相同时,三维场景与物理世界的比例即为1:1;又例如当视窗的尺寸与显示屏的实际尺寸的比例为1:2,则三维场景与物理世界的比例为2:1。电子设备可以根据三维场景与物理实际的比例关系、第一相对位置确定视点和视窗之间的相对位置,例如,当比例关系为2:1时,视点和视窗的相对位置中各个参数的值可以为第一相对位置中各个参数的值的2倍。
S403:电子设备根据第一视点位置渲染三维场景得到第一目标图像,并在显示屏上显示第一目标图像。
可以理解的是,电子设备根据第一视点位置对三维场景进行渲染后得到的目标图像更适合用户观察到三维效果,该目标图像对应的三维场景中视点位置用户当前所处的位置匹配,从而无需用户寻找能够观察到三维场景的视点位置,而在用户当前所处位置即可观察到三维效果。
可选地,本申请实施例中的渲染处理可以由电子设备中的渲染器执行。
在本申请一些实施例中,电子设备在显示屏上显示目标图像后,还可以基于图4所示实施例提供的显示图像方法再次确定目标用户与显示屏之间的第二相对位置,根据第二相对位置确定第二视点位置。电子设备基于第二视点位置渲染三维场景后得到第二目标图像,并在显示屏上显示第二目标图像。通过该设计,可以对电子设备渲染三维场景时的视点位置进行实时调整,从而可以跟随用户移动位置调整渲染三维场景视点位置,而不会造成无法观察到3D效果的问题。
一种可选的实施方式中,电子设备在显示目标图像过程中,若未在场景图像中检测到人脸,则可以显示待机画面。可选地,还可以在待机画面中显示倒计时动画,提醒用户在倒计时结束后,电子设备将退出裸眼3D模式,若用户在倒计时结束前回到摄像头检测范围内,则电子设备继续显示目标图像,若倒计时结束仍未检测到目标用户的人脸,则电子设备退出裸眼3D模式。可选地,电子设备显示倒计时动画时,可以同时显示提醒当前有多个人脸处于检测范围内的提醒消息,例如图8为本申请实施例提供的一种电子设备的显示界面示意图,电子设备可以在显示屏上显示倒计时动画,同时显示“当前检测到多张人脸,请保持单人处于检测范围”的提醒消息。
通过本申请实施例提供的在显示屏上显示图像的方法,电子设备可以确定目标用于与显示屏之间的第一相对位置。在确定出目标用户的第一相对位置后,可以根据第一相对位置确定渲染三维场景时所需的第一视点位置,进而保证确定出的视点位置与当前目标用户的位置匹配。电子设备根据第一视点位置渲染三维场景得到第一目标图像,并在显示屏上显示第一目标图像,根据确定出的第一视点位置进行渲染后的第一目标图像更适合用户观察到三维效果,提升用户体验。并且在图像显示过程中,可以根据本申请实施例提供的在显示屏上显示图像的方法,随着用户移动实时更新视点位置,无需用户在固定视点位置观看3D图像,提供一种灵活的显示图像方法。
一种可选的实施方式中,在图4所示的显示图像方法的S401中电子根据摄像头采集到的场景图像确定目标用户的人脸与显示屏之间的目标方位角或目标距离时,当电子设备在对场景图像中的目标用户的人脸进行检测时,还可以根据以下方式提升目标用户的人脸检测的效率:
方式1、电子设备确定场景图像中的背景,从而在获取到场景图像后,与确定出的背景进行对比,以确定目标用户的人脸的位置。
一种可能的场景中,用于显示目标图像的电子设备一般会放置在一个较为固定的位置,则电子设备的摄像头拍摄到的场景图像的背景极少变化,或者可以认为拍摄到的场景图像的背景基本不变,则此时摄像头拍摄到的场景图像相对于背景发生变化的区域有可能为目标用户的人脸的位置,则可以对场景图像相对于背景发生变化的区域进行人脸检测,以提高人脸检测的效率。
方式2、电子设备根据存储的目标用户的人脸的历史位置信息确定目标用户的人脸的移动信息,移动信息用于指示目标用户的人脸发生移动时的速度和加速度。电子设备根据目标用户的人脸的最近一次的历史位置信息以及移动信息,基于卡尔曼算法预测目标用户的人脸在场景图像中的预测位置,并对场景图像中预测位置对应的区域进行人脸检测,确定场景图像中的目标用户的人脸。
一种可选的实施方式中,在本申请实施例提供的图像显示方法中,电子设备可以多次获取摄像头拍摄的场景图像,并确定场景图像中的目标用户的人脸,电子设备可以存储目标用户的人脸在多个场景图像中的位置信息作为目标用户的人脸的历史位置信息。在电子设备存储有目标用户的人脸的历史位置信息时,电子设备可以根据目标用户的人脸的历史位置信息确定目标用户的人脸的移动信息。例如,电子设备可以根据最后三次的目标用户的人脸的历史位置信息确定目标用户的人脸的移动信息,其中目标用户的人脸的移动信息可以包括目标用户的人脸发生移动时的速度和加速度。电子设备可以根据目标用户的人脸的最近一次的历史位置信息以及移动信息,基于卡尔曼算法预测目标用户的人脸在场景图像中的预测位置,在得到目标用户的人脸在场景图像中的预测位置后,电子设备可以在场景图像中预测位置对应的区域进行人脸检测,确定场景图像中的目标用户的人脸。通过该方式,可以优化人脸检测时的搜索空间,提升人脸检测的效率。
其中,卡尔曼算法又称为卡尔曼滤波算法,卡尔曼算法可以在已知测量方差的情况下,根据测量数据对动态系统的状态进行估计。在本申请实施例中,电子设备中可以存储有预设的协方差矩阵,电子设备在确定出目标用户的人脸的移动信息后,可以根据目标用户的人脸的最后一次的历史位置信息、移动信息以及协方差矩阵,基于卡尔曼算法估计目标用户的人脸在场景图像中的预测位置并更新协方差矩阵。
可选地,电子设备对场景图像中预测位置对应的区域进行人脸检测,确定出目标用户的人脸以及目标用户的人脸在场景图像中的位置后,电子设备可以根据本次确定出的目标用户的人脸的位置更新目标用户的人脸的移动信息,如更新目标用户的人脸发送移动时的加速度和速度。电子设备根据目标用户的人脸的预测位置以及人脸检测后确定出的目标用户的人脸的位置计算测试余量和卡尔曼增益,根据测试余量和卡尔曼增益修正下一次预测目标用户的人脸时的预测位置,从而得到更加准确的估计值。
方式3、电子设备在对场景图像中的目标用户的人脸进行检测时,可以对连续多帧场景图像进行人脸检测,若检测到目标用户的人脸的场景图像的帧数大于预设阈值时,可以认为检测到的目标用户的人脸,再确定目标用户的人脸的位置。通过该方式,可以避免错误检测到人脸的情况,保证人脸检测的准确性。
通过前述介绍可知,电子设备在显示目标图像的过程中,若未检测到目标用户的人脸,则显示待机画面,在该场景中,同样可以参考方式3,具体来说,若电子设备未检测到目标用户的人脸的场景图像的帧数大于预设阈值时,可以认为当前未检测到目标用户的人脸。
另外,若显示屏前有多个用户时,摄像头采集到的场景图像中可能包括多个人脸,此时电子设备还可以通过以下方式确定目标用户:
方式1、电子设备接收用户触发的选择指令,将选择指令对应的人脸图像所述的用户作为目标用户。
可选地,电子设备在检测到多个人脸时,可以显示提醒用户选择目标用户的人脸的信 息。用户可以通过触摸屏幕触发选择指令,电子设备在接收到选择指令后,可以将选择指令对应的位置的人脸所属的用户作为目标用户;或者电子设备可以将场景图像中的多个人脸进行编号,用户通过音频输入触发选择指令,该选择指令中可以包括目标用户的人脸对应的编号,电子设备在接收到选择指令后,可以将选择指令中的编号对应的人脸所属的用户作为目标用户。
举例来说,图9为一种场景图像中包括多个人脸的示意图,图9中以场景图像中包括人脸A、人脸B和人脸C为例,用户可以选择其中一个人脸以触发选择指令,如用户选择人脸A,电子设备可以将用户选择的人脸A所属的用户作为目标用户。
当然,本申请实施例对用户触发选择指令的方式并不作限定,例如用户还可以通过电子设备的控制装置触发选择指令等。
方式2、电子设备将多个用户中距离显示屏最近的用户作为目标用户。
可选地,电子设备在确定场景图像中包括多个人脸时,可以分别确定每个人脸与显示屏之间的距离,并将距离最近的人脸所属的用户作为目标用户。其中,电子设备确定每个人脸与显示屏之间的距离的方式可以参见S401中电子设备基于单目深度估计算法确定目标距离的方式实施,此处不再赘述。
方式3、电子设备将多个用户中人脸侧向于显示屏的角度最小的用户作为目标用户。
可选地,电子设备在确定场景图像中包括多个人脸时,可以确定每个人脸所在平面相对于显示屏的平面之间的旋转角度,并将其中旋转角度最小的人脸所属的用户作为目标用户。
方式4、电子设备将多个用户中使用频率最高的用户作为目标用户。
本申请实施例中电子设备可以将使用频率较高的用户的人脸以及该用户的使用频率作为常用用户的保存到本地存储中。在确定场景图像中包括多个人脸时,可以对多个人脸分别与常用用户的人脸进行匹配,若匹配成功,则可以将其中识别为常用用户且使用频率最高的用户作为目标用户。
下面以两个具体示例对本申请实施例提供的在显示屏上显示图像的方法进行进一步介绍。在以下两个示例中,目标用户相对于显示屏的第一相对位置包括目标用户的人脸相对于显示屏的目标方位角和目标用户的人脸相对于显示屏的目标距离。
示例一
图10为本申请实施例提供的第一种在显示屏上显示图像的方法的流程图,参考图10,该方法包括以下步骤:
S1001:电子设备获取摄像头针对当前场景拍摄的场景图像。
S1002:电子设备基于人脸检测算法,确定场景图像中的目标用户的人脸。
S1003:电子设备根据目标用户的人脸在场景图像中的位置确定目标用户的人脸相对于显示屏的目标方位角。
S1004:电子设备对目标用户的人脸进行深度估计,确定目标用户的人脸相对于显示屏之间的目标距离。
S1005:电子设备获取视窗的位置。
S1006:电子设备将目标方位角和目标距离作为第一相对位置,根据第一相对位置确定对三维场景进行渲染时的第一视点与视窗之间的相对位置。
S1007:电子设备根据第一视点和视窗之间的相对位置以及视窗的位置,确定第一视点的位置。
S1008:电子设备根据第一视点位置渲染三维场景得到第一目标图像,并在显示屏上显示第一目标图像。
示例二
图11为本申请实施例提供的第二种在显示屏上显示图像的方法的流程图,参考图11,该方法包括以下步骤:
S1101:电子设备获取麦克风阵列采集的到的目标用户的声音的信息。
S1102:电子设备根据目标用户的声音的信息确定目标用户的人脸相对于显示屏的目标方位角。
S1103:电子设备获取摄像头针对当前场景拍摄的场景图像。
S1104:电子设备基于人脸检测算法,确定场景图像中的目标用户的人脸。
S1105:电子设备对目标用户的人脸进行深度估计,确定目标用户的人脸相对于显示屏之间的目标距离。
S1106:电子设备获取视窗的位置。
S1107:电子设备将目标方位角和目标距离作为第一相对位置,根据第一相对位置确定对三维场景进行渲染时的第一视点与视窗之间的相对位置。
S1108:电子设备根据第一视点和视窗之间的相对位置以及视窗的位置,确定第一视点的位置。
S1109:电子设备根据第一视点位置渲染三维场景得到第一目标图像,并在显示屏上显示第一目标图像。
基于相同的技术构思,本申请还提供了一种图像显示装置1200,该图像显示装置1200可以应用于图2所示场景中的电子设备20,以实现图4所示的在显示屏上显示图像的方法中电子设备的功能。图12为本申请实施例提供的一种图像显示装置1200的结构示意图,所述图像显示装置1200包括处理单元1201、渲染单元1202和显示单元1203。下面对图像显示装置1200中的各个单元的功能进行介绍。
处理单元1201,用于确定目标用户与显示屏之间的第一相对位置,所述目标用户位于所述显示屏的前方;根据所述第一相对位置确定第一视点位置,所述第一视点位置用于指示对三维场景进行渲染时的虚拟相机的位置;
渲染单元1202,用于根据所述第一视点位置渲染所述三维场景得到第一目标图像;
显示单元1203,用于在所述显示屏上显示所述第一目标图像。
在一种实施方式中,所述处理单元1201还用于:获取视窗的位置,所述视窗的位置用于指示渲染所述三维场景时近裁剪面的位置;
所述处理单元1201具体用于:根据所述第一相对位置确定对所述三维场景进行渲染时的第一视点与视窗之间的相对位置;根据所述第一视点和视窗之间的相对位置以及所述视窗的位置,确定所述第一视点的位置。
在一种实施方式中,所述处理单元1201还用于:在所述渲染单元得到所述第一目标图像之后,确定所述目标用户与所述显示屏之间的第二相对位置;根据所述第二相对位置 确定第二视点位置;所述第二相对位置与所述第一相对位置不同,所述第二视点位置与所述第一视点位置不同;根据所述第二视点位置渲染所述三维场景得到第二目标图像,并在所述显示屏上显示所述第二目标图像。
在一种实施方式中,所述目标用户与所述显示屏之间的第一相对位置,包括:所述目标用户的人脸相对于所述显示屏的目标方位角。
在一种实施方式中,所述处理单元1201具体用于:获取摄像头拍摄的场景图像,所述场景图像包括所述显示屏前方预设范围的场景;根据所述场景图像中所述目标用户的人脸在所述场景图像的位置,确定所述第一相对位置。
在一种实施方式中,所述处理单元1201具体用于:基于麦克风阵列采集到的目标用户的声音的信息,对所述目标用户进行声源定位,得到所述第一相对位置。
在一种实施方式中,所述目标用户与所述显示屏之间的第一相对位置,还包括:所述目标用户的人脸相对于所述显示屏的目标距离;
所述处理单元1201还用于:对所述目标用户的人脸进行深度估计,确定所述目标距离。
在一种实施方式中,所述处理单元1201还用于:在所述确定目标用户与所述显示屏之间的第一相对位置之前,在判断所述显示屏前有多个用户时,从所述多个用户中确定所述目标用户。
在一种实施方式中,所述处理单元1201具体用于:在所述显示屏上显示所述多个用户的人脸图像,接收选择指令,将所述选择指令对应的人脸图像所属的用户作为所述目标用户;或者将所述多个用户中距离所述显示屏最近的用户作为所述目标用户;或者将所述多个用户中人脸侧向于所述显示屏的角度最小的用户作为所述目标用户;或者将所述多个用户中,使用频率最高的用户作为所述目标用户。
关于图像显示装置1200所能实现的其他功能,可参考图4所示的实施例的相关介绍,不多赘述。
基于相同的技术构思,本申请还提供了一种电子设备1300,图13为本申请实施例提供的一种电子设备1300的结构示意图,所述电子设备1300可以用于实现图4所示的实施例中电子设备的功能。参阅图13所示,所述电子设备1300包括:显示屏1301、处理器1302、存储器1303和总线1304。进一步地,电子设备1300还可以包括摄像头1305和麦克风阵列1306、其中,显示屏1301、处理器1302、存储器1303、摄像头1305和麦克风阵列1306通过总线1304进行通信,也可以通过无线传输等其他手段实现通信。该存储器1303存储程序代码,且处理器1302可以调用存储器1303中存储的程序代码执行以下操作:
确定目标用户与所述显示屏1301之间的第一相对位置,所述目标用户位于所述显示屏1301的前方;根据所述第一相对位置确定第一视点位置,所述第一视点位置用于指示对三维场景进行渲染时的虚拟相机的位置;根据所述第一视点位置渲染所述三维场景得到第一目标图像,并在所述显示屏1301上显示所述第一目标图像。
在一种实施方式中,所述处理器1302还用于:获取视窗的位置,所述视窗的位置用于指示渲染所述三维场景时近裁剪面的位置;
所述处理器1302具体用于:根据所述第一相对位置确定对所述三维场景进行渲染时的第一视点与视窗之间的相对位置;根据所述第一视点和视窗之间的相对位置以及所述视 窗的位置,确定所述第一视点的位置。
在一种实施方式中,所述处理器1302还用于:在所述渲染单元得到所述第一目标图像之后,确定所述目标用户与所述显示屏1301之间的第二相对位置;根据所述第二相对位置确定第二视点位置;所述第二相对位置与所述第一相对位置不同,所述第二视点位置与所述第一视点位置不同;根据所述第二视点位置渲染所述三维场景得到第二目标图像,并在所述显示屏1301上显示所述第二目标图像。
在一种实施方式中,所述目标用户与所述显示屏1301之间的第一相对位置,包括:所述目标用户的人脸相对于所述显示屏1301的目标方位角。
在一种实施方式中,所述处理器1302具体用于:获取摄像头1305拍摄的场景图像,所述场景图像包括所述显示屏1301前方预设范围的场景;根据所述场景图像中所述目标用户的人脸在所述场景图像的位置,确定所述第一相对位置。
在一种实施方式中,所述处理器1302具体用于:基于麦克风阵列1306采集到的目标用户的声音的信息,对所述目标用户进行声源定位,得到所述第一相对位置。
在一种实施方式中,所述目标用户与所述显示屏1301之间的第一相对位置,还包括:所述目标用户的人脸相对于所述显示屏1301的目标距离;
所述处理器1302还用于:对所述目标用户的人脸进行深度估计,确定所述目标距离。
在一种实施方式中,所述处理器1302还用于:在所述确定目标用户与所述显示屏1301之间的第一相对位置之前,在判断所述显示屏1301前有多个用户时,从所述多个用户中确定所述目标用户。
在一种实施方式中,所述处理器1302具体用于:在所述显示屏1301上显示所述多个用户的人脸图像,接收选择指令,将所述选择指令对应的人脸图像所属的用户作为所述目标用户;或者将所述多个用户中距离所述显示屏1301最近的用户作为所述目标用户;或者将所述多个用户中人脸侧向于所述显示屏1301的角度最小的用户作为所述目标用户;或者将所述多个用户中,使用频率最高的用户作为所述目标用户
可以理解,本申请图13中的存储器1304可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
基于以上实施例,本申请实施例还提供了一种计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行图4所示的实施例提供的在显示屏上显示图像的方法。
基于以上实施例,本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,所述计算机程序被计算机执行时,使得计算机执行图4所示 的实施例提供的在显示屏上显示图像的方法。其中,存储介质可以是计算机能够存取的任何可用介质。以此为例但不限于:计算机可读介质可以包括RAM、ROM、EEPROM、CD-ROM或其他光盘存储、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质。
基于以上实施例,本申请实施例还提供了一种芯片,所述芯片用于读取存储器中存储的计算机程序,实现图4所示的实施例提供的在显示屏上显示图像的方法。
基于以上实施例,本申请实施例提供了一种芯片系统,该芯片系统包括处理器,用于支持计算机装置实现图4所示的实施例提供的在显示屏上显示图像的方法。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器用于保存该计算机装置必要的程序和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的保护范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (21)

  1. 一种在显示屏上显示图像的方法,其特征在于,所述方法包括:
    确定目标用户与所述显示屏之间的第一相对位置,所述目标用户位于所述显示屏的前方;
    根据所述第一相对位置确定第一视点位置,所述第一视点位置用于指示对三维场景进行渲染时的虚拟相机的位置;
    根据所述第一视点位置渲染所述三维场景得到第一目标图像,并在所述显示屏上显示所述第一目标图像。
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:获取视窗的位置,所述视窗的位置用于指示渲染所述三维场景时近裁剪面的位置;
    所述根据所述第一相对位置确定第一视点位置,包括:
    根据所述第一相对位置确定对所述三维场景进行渲染时的第一视点与视窗之间的相对位置;
    根据所述第一视点和视窗之间的相对位置以及所述视窗的位置,确定所述第一视点的位置。
  3. 如权利要求1或2所述的方法,其特征在于,在得到所述第一目标图像之后,所述方法还包括:
    确定所述目标用户与所述显示屏之间的第二相对位置;
    根据所述第二相对位置确定第二视点位置;所述第二相对位置与所述第一相对位置不同,所述第二视点位置与所述第一视点位置不同;
    根据所述第二视点位置渲染所述三维场景得到第二目标图像,并在所述显示屏上显示所述第二目标图像。
  4. 如权利要求1-3任一所述的方法,其特征在于,所述目标用户与所述显示屏之间的第一相对位置,包括:所述目标用户的人脸相对于所述显示屏的目标方位角。
  5. 如权利要求4所述的方法,其特征在于,所述确定目标用户与所述显示屏之间的第一相对位置,包括:
    获取摄像头拍摄的场景图像,所述场景图像包括所述显示屏前方预设范围的场景;
    根据所述场景图像中所述目标用户的人脸在所述场景图像的位置,确定所述第一相对位置。
  6. 如权利要求4所述的方法,其特征在于,所述确定目标用户与所述显示屏之间的第一相对位置,包括:
    基于麦克风阵列采集到的目标用户的声音的信息,对所述目标用户进行声源定位,得到所述第一相对位置。
  7. 如权利要求4-6任一所述的方法,其特征在于,所述目标用户与所述显示屏之间的第一相对位置,还包括:所述目标用户的人脸相对于所述显示屏的目标距离;
    所述确定所述目标用户与所述显示屏之间的第一相对位置,还包括:对所述目标用户的人脸进行深度估计,确定所述目标距离。
  8. 如权利要求1-7任一所述的方法,其特征在于,在所述确定目标用户与所述显示屏之间的第一相对位置之前,所述方法还包括:
    在判断所述显示屏前有多个用户时,从所述多个用户中确定所述目标用户。
  9. 如权利要求8所述的方法,其特征在于,所述从所述多个用户中确定所述目标用户,包括:
    在所述显示屏上显示所述多个用户的人脸图像,接收选择指令,将所述选择指令对应的人脸图像所属的用户作为所述目标用户;或者
    将所述多个用户中距离所述显示屏最近的用户作为所述目标用户;或者
    将所述多个用户中人脸侧向于所述显示屏的角度最小的用户作为所述目标用户;或者
    将所述多个用户中,使用频率最高的用户作为所述目标用户。
  10. 一种图像显示装置,其特征在于,所述装置包括处理单元、渲染单元和显示单元;
    所述处理单元,用于确定目标用户与显示屏之间的第一相对位置,所述目标用户位于所述显示屏的前方;根据所述第一相对位置确定第一视点位置,所述第一视点位置用于指示对三维场景进行渲染时的虚拟相机的位置;
    所述渲染单元,用于根据所述第一视点位置渲染所述三维场景得到第一目标图像;
    所述显示单元,用于在所述显示屏上显示所述第一目标图像。
  11. 如权利要求10所述的装置,其特征在于,所述处理单元还用于:获取视窗的位置,所述视窗的位置用于指示渲染所述三维场景时近裁剪面的位置;
    所述处理单元具体用于:根据所述第一相对位置确定对所述三维场景进行渲染时的第一视点与视窗之间的相对位置;根据所述第一视点和视窗之间的相对位置以及所述视窗的位置,确定所述第一视点的位置。
  12. 如权利要求10或11所述的装置,其特征在于,所述处理单元还用于:
    在所述渲染单元得到所述第一目标图像之后,确定所述目标用户与所述显示屏之间的第二相对位置;
    根据所述第二相对位置确定第二视点位置;所述第二相对位置与所述第一相对位置不同,所述第二视点位置与所述第一视点位置不同;
    根据所述第二视点位置渲染所述三维场景得到第二目标图像,并在所述显示屏上显示所述第二目标图像。
  13. 如权利要求10-12任一项所述的装置,其特征在于,所述目标用户与所述显示屏之间的第一相对位置,包括:所述目标用户的人脸相对于所述显示屏的目标方位角。
  14. 如权利要求13所述的装置,其特征在于,所述处理单元具体用于:
    获取摄像头拍摄的场景图像,所述场景图像包括所述显示屏前方预设范围的场景;
    根据所述场景图像中所述目标用户的人脸在所述场景图像的位置,确定所述第一相对位置。
  15. 如权利要求13所述的装置,其特征在于,所述处理单元具体用于:
    基于麦克风阵列采集到的目标用户的声音的信息,对所述目标用户进行声源定位,得到所述第一相对位置。
  16. 如权利要求13-15任一项所述的装置,其特征在于,所述目标用户与所述显示屏之间的第一相对位置,还包括:所述目标用户的人脸相对于所述显示屏的目标距离;
    所述处理单元还用于:对所述目标用户的人脸进行深度估计,确定所述目标距离。
  17. 如权利要求10-16任一项所述的装置,其特征在于,所述处理单元还用于:
    在所述确定目标用户与所述显示屏之间的第一相对位置之前,在判断所述显示屏前有 多个用户时,从所述多个用户中确定所述目标用户。
  18. 如权利要求17所述的装置,其特征在于,所述处理单元具体用于:
    在所述显示屏上显示所述多个用户的人脸图像,接收选择指令,将所述选择指令对应的人脸图像所属的用户作为所述目标用户;或者
    将所述多个用户中距离所述显示屏最近的用户作为所述目标用户;或者
    将所述多个用户中人脸侧向于所述显示屏的角度最小的用户作为所述目标用户;或者
    将所述多个用户中,使用频率最高的用户作为所述目标用户。
  19. 一种电子设备,其特征在于,包括显示屏、处理器和存储器;所述存储器中存储计算机程序指令,所述电子设备运行时,所述处理器执行所述存储器中存储的所述计算机程序指令以实现上述权利要求1至9中任一所述的方法的操作步骤。
  20. 一种计算机可读存储介质,其特征在于,包括计算机指令,当所述计算机指令在被处理器运行时,使得电子设备执行如权利要求1至9任一项所述的方法。
  21. 一种计算机程序产品,其特征在于,当所述计算机程序产品在处理器上运行时,使得电子设备执行如权利要求1至9任一项所述的方法。
PCT/CN2022/112819 2021-09-18 2022-08-16 一种在显示屏上显示图像的方法、电子设备与装置 WO2023040551A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111113031.4A CN115840546A (zh) 2021-09-18 2021-09-18 一种在显示屏上显示图像的方法、电子设备与装置
CN202111113031.4 2021-09-18

Publications (3)

Publication Number Publication Date
WO2023040551A1 WO2023040551A1 (zh) 2023-03-23
WO2023040551A9 true WO2023040551A9 (zh) 2023-08-31
WO2023040551A8 WO2023040551A8 (zh) 2023-11-09

Family

ID=85574500

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/112819 WO2023040551A1 (zh) 2021-09-18 2022-08-16 一种在显示屏上显示图像的方法、电子设备与装置

Country Status (2)

Country Link
CN (1) CN115840546A (zh)
WO (1) WO2023040551A1 (zh)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102640502B (zh) * 2009-10-14 2015-09-23 诺基亚公司 自动立体渲染和显示装置
WO2013163287A1 (en) * 2012-04-25 2013-10-31 Visual Physics, Llc Security device for projecting a collection of synthetic images
CN106710002A (zh) * 2016-12-29 2017-05-24 深圳迪乐普数码科技有限公司 基于观察者视角定位的ar实现方法及其系统
KR102397089B1 (ko) * 2017-07-28 2022-05-12 삼성전자주식회사 이미지 처리 방법 및 이미지 처리 장치
KR102447101B1 (ko) * 2017-09-12 2022-09-26 삼성전자주식회사 무안경 3d 디스플레이를 위한 영상 처리 방법 및 장치
CN109769111A (zh) * 2018-11-22 2019-05-17 利亚德光电股份有限公司 图像显示方法、装置、系统、存储介质和处理器
US10839594B2 (en) * 2018-12-11 2020-11-17 Canon Kabushiki Kaisha Method, system and apparatus for capture of image data for free viewpoint video
CN112967390B (zh) * 2019-11-30 2022-02-25 北京城市网邻信息技术有限公司 场景切换方法及装置、存储介质

Also Published As

Publication number Publication date
WO2023040551A1 (zh) 2023-03-23
CN115840546A (zh) 2023-03-24
WO2023040551A8 (zh) 2023-11-09

Similar Documents

Publication Publication Date Title
CN110402415A (zh) 记录增强现实数据的技术
JPWO2016203792A1 (ja) 情報処理装置、情報処理方法及びプログラム
WO2013155217A1 (en) Realistic occlusion for a head mounted augmented reality display
US20190042834A1 (en) Methods and apparatus for real-time interactive anamorphosis projection via face detection and tracking
US9681122B2 (en) Modifying displayed images in the coupled zone of a stereoscopic display based on user comfort
US20120120071A1 (en) Shading graphical objects based on face images
US11720996B2 (en) Camera-based transparent display
US20190043245A1 (en) Information processing apparatus, information processing system, information processing method, and program
JP2012079291A (ja) プログラム、情報記憶媒体及び画像生成システム
US11069137B2 (en) Rendering captions for media content
US20220398705A1 (en) Neural blending for novel view synthesis
CN112655202B (zh) 用于头戴式显示器的鱼眼镜头的减小带宽立体失真校正
WO2021124920A1 (ja) 情報処理装置、情報処理方法、および記録媒体
CN110969706B (zh) 增强现实设备及其图像处理方法、系统以及存储介质
US20220036779A1 (en) Information processing apparatus, information processing method, and recording medium
CN113870213A (zh) 图像显示方法、装置、存储介质以及电子设备
WO2023040551A9 (zh) 一种在显示屏上显示图像的方法、电子设备与装置
KR102197504B1 (ko) 사전 계산된 조명으로 증강 현실 환경을 구성하는 기법
US20230396750A1 (en) Dynamic resolution of depth conflicts in telepresence
US20190089899A1 (en) Image processing device
US20230316810A1 (en) Three-dimensional (3d) facial feature tracking for autostereoscopic telepresence systems
CN114020150A (zh) 图像显示方法、装置、电子设备及介质
WO2021065607A1 (ja) 情報処理装置および方法、並びにプログラム
US20240078743A1 (en) Stereo Depth Markers
US20220232201A1 (en) Image generation system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22868927

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE