WO2023226628A1 - 图像展示方法、装置、电子设备及存储介质 - Google Patents

图像展示方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023226628A1
WO2023226628A1 PCT/CN2023/089010 CN2023089010W WO2023226628A1 WO 2023226628 A1 WO2023226628 A1 WO 2023226628A1 CN 2023089010 W CN2023089010 W CN 2023089010W WO 2023226628 A1 WO2023226628 A1 WO 2023226628A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
background
target
coordinate system
foreground
Prior art date
Application number
PCT/CN2023/089010
Other languages
English (en)
French (fr)
Inventor
焦少慧
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202210575768.6A external-priority patent/CN115002442B/zh
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2023226628A1 publication Critical patent/WO2023226628A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/275Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals
    • H04N13/279Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals the virtual viewpoint locations being selected by the viewers or determined by tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/282Image signal generators for generating image signals corresponding to three or more geometrical viewpoints, e.g. multi-view systems

Definitions

  • the embodiments of the present disclosure relate to the field of data processing technology, for example, to an image display method, device, electronic device, and storage medium.
  • Free-angle video is a popular video form nowadays. It provides users with the function of interactively selecting viewing angles, giving fixed two-dimensional (2D) videos a viewing experience of "changing scenes as you move", thereby giving users It brought a strong three-dimensional impact.
  • free-view videos are mainly displayed by building a separate interactive player.
  • the interactive player can be displayed to the user through a sliding bar, so that the user can watch videos from different viewing angles by dragging the sliding bar.
  • this method will cause the user's viewing freedom to be limited and the user experience to be poor.
  • Embodiments of the present disclosure provide an image display method, device, electronic device, and storage medium.
  • an image display method which may include:
  • the converted image is an image obtained by converting the pixel points in the image coordinate system in the foreground image to the augmented reality coordinate system.
  • the foreground image is obtained from the video
  • the image containing the foreground object extracted from the frame, the target video includes free-view video or light field video;
  • the background pose convert the pixels in the perspective image to the background shooting coordinate system where the background shooting device is located to obtain the target image
  • an image display device which may include:
  • the conversion image acquisition module is configured to obtain the conversion image corresponding to each video frame in the target video, where the conversion image is to convert the pixel points located in the image coordinate system in the foreground image to an incremented image.
  • the image obtained under the strong reality coordinate system, the foreground image is an image containing the foreground object extracted from the video frame, and the target video includes a free-view video or a light field video;
  • a perspective image determination module configured to obtain the background pose of the background shooting device at the target time, and determine the perspective image corresponding to the background pose from at least one conversion image corresponding to the target time;
  • the target image obtaining module is set to convert the pixels in the perspective image to the background shooting coordinate system where the background shooting device is located according to the background pose, and obtain the target image;
  • the augmented reality image display module is configured to merge the background image captured by the background shooting device at the target time with the target image, and display the merged augmented reality image.
  • embodiments of the present disclosure also provide an electronic device, which may include:
  • processors one or more processors
  • memory configured to store one or more programs
  • one or more processors are caused to implement the image display method provided by any embodiment of the present disclosure.
  • embodiments of the disclosure also provide a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the image display method provided by any embodiment of the disclosure is implemented.
  • Figure 1 is a flow chart of an image display method in an embodiment of the present disclosure
  • Figure 2 is a flow chart of another image display method in an embodiment of the present disclosure.
  • Figure 3 is a flow chart of another image display method in an embodiment of the present disclosure.
  • Figure 4 is a schematic diagram of an example of another image display method in an embodiment of the present disclosure.
  • Figure 5 is a structural block diagram of an image display device in an embodiment of the present disclosure.
  • Figure 6 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure.
  • the term “include” and its variations are open-ended, ie, “including but not limited to.”
  • the term “based on” means “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; and the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • Figure 1 is a flow chart of an image display method provided in an embodiment of the present disclosure.
  • This embodiment can display video frames in the target video through augmented reality (Augmented Reality, AR), thereby realizing AR display of the target video.
  • the method can be executed by the image display device provided by the embodiment of the present disclosure.
  • the device can be implemented by software and/or hardware.
  • the device can be integrated on an electronic device.
  • the electronic device can be various terminal devices (such as mobile phones, tablet or head-mounted display device, etc.) or server.
  • the method according to the embodiment of the present disclosure includes the following steps:
  • the converted image is an image obtained by converting the pixel points located in the image coordinate system in the foreground image to the augmented reality coordinate system.
  • the foreground image is obtained from Images containing foreground objects extracted from video frames.
  • the target video includes free-view video or light field video.
  • the target video may be a video with multiple viewing angles, for example, it may be a free-viewing video or a light field video.
  • the free-viewing video may be a plurality of foreground shooting devices deployed in a ring around the photographed object (i.e., the foreground object). ) in order to obtain a video obtained by synchronously collecting the foreground object; the light field video can be obtained by using multiple foreground shooting devices distributed on a plane or a sphere to simultaneously capture light fields from different viewpoints in the target space.
  • the video obtained after the sample, the target space is equipped with Set a foreground object.
  • the above-mentioned foreground shooting device can be a camera (such as a light field camera or an ordinary camera), a video camera or a camera, etc.; the above-described process of obtaining free-view video and light-field video is only an example, and can also be based on other methods. Obtaining them is not specifically limited here.
  • the video frame may be a video image in the target video.
  • a foreground image containing a foreground object is extracted (that is, extracted).
  • the foreground object may be the main object and/or the main object in the target video. handheld objects, etc.
  • Each video frame corresponds to its own conversion image.
  • the conversion image can be understood as the image after converting the pixel points located in the image coordinate system in the foreground image corresponding to the video frame to the AR coordinate system.
  • the image coordinate system It can be understood as the spatial coordinate system in which the foreground image is located, and the AR coordinate system can be understood as the screen coordinate system of the image display device used to display the subsequently generated AR image.
  • the meaning of converting image settings is that, taking the foreground shooting device as a camera as an example, in order to achieve AR display of video frames, the multi-camera collection points when shooting video frames cannot match the virtual camera position points during AR display. , so projection transformation is required here, and a new perspective image (i.e., converted image) at the virtual camera position point is generated, so that it can match the AR display and obtain the correct perspective image under camera transformation (i.e., the image that needs to be displayed correctly).
  • the image display device can directly obtain the pre-processed converted image and apply it, or it can process each directly obtained video frame separately to obtain the converted image and then apply it, etc., which are not specifically limited here.
  • S120 Obtain the background pose of the background shooting device at the target time, and determine the perspective image corresponding to the background pose from each converted image corresponding to the target time.
  • the background shooting device may be a device different from the foreground shooting device for shooting background objects in the AR image, and the background pose may be the pose of the background shooting device at the target time, for example, the device position and the device orientation may be used Represented with 6 degrees of freedom; the target time can be a historical moment, a current moment or a future moment, etc., and is not specifically limited here.
  • the target time can be a historical moment, a current moment or a future moment, etc., and is not specifically limited here.
  • each converted image corresponding to the target time can be understood as the converted image corresponding to those video frames collected synchronously with the video frame.
  • each converted image corresponding to the target moment can be the 50th video frame that is collected simultaneously.
  • the corresponding converted image The shooting angles of each converted image corresponding to the target moment are different from each other.
  • the background angle corresponding to the background pose is determined from each shooting angle.
  • the background angle can be understood as the user's viewing angle at the target moment, and then these transformed images are
  • the converted image with the viewing angle is used as the angle image, so that the AR image generated and displayed based on the angle image is an image that matches the viewing angle.
  • the background shooting coordinate system can be the spatial coordinate system where the background shooting device is located.
  • the AR coordinate system and the background shooting coordinate system are different spatial coordinate systems.
  • the AR coordinate system can be the screen coordinate system of the mobile phone.
  • the background The shooting coordinate system can be the spatial coordinate system where the camera in the mobile phone is located; for example, the AR coordinate system can be the screen coordinate system of the head-mounted display device, and the background shooting coordinate system can be the spatial coordinate system where the camera in the tablet is located; etc. , there is no specific limit here.
  • the perspective image located in the AR coordinate system is converted to the background shooting coordinate system to obtain the target image.
  • the background internal parameters of the background shooting device can also be considered, which can reflect the focal length and distortion of the background shooting device.
  • P t-cam K cam [R cam
  • P AR represents the perspective image pixels
  • K cam represents the background intrinsic parameter
  • R cam represents the rotation matrix of the background shooting device
  • t cam represents the translation matrix of the background shooting device, here R cam and t cam represent the background pose.
  • the background image can be an image captured by the background shooting device at the target time.
  • the background image and the target image are merged.
  • the merging method can be fusion or superposition, etc., and then the AR image obtained after the merging is displayed. Realizes the effect of AR display of video frames. Then, when the corresponding AR images are displayed sequentially according to the order in which each video frame in the target video is collected, the effect of AR display of the target video is achieved. In this way, the user can watch the video from the corresponding perspective in the target video by moving the spatial position of the background shooting device, thus ensuring the user's degree of freedom when watching the target video and realizing a six-degree-of-freedom target video. user viewing process.
  • the above embodiment realizes the display process of the target video by placing the target video in the AR field for playback, rather than by rendering the three-dimensional model.
  • This can display a sense of fineness that cannot be expressed by the three-dimensional model, such as the appearance of characters. The silk is clearly displayed and the user experience is better.
  • the conversion image may be to convert the pixel points located in the image coordinate system in the foreground image extracted from the video frame to the AR coordinate system.
  • the image after downloading; obtain the background pose of the background shooting device at the target time, And determine the perspective image corresponding to the background pose from each converted image corresponding to the target moment; convert the pixels in the perspective image to the background shooting coordinate system where the background shooting device is located according to the background pose to obtain the target image; thus, The background image captured by the background shooting device at the target time is merged with the target image, and the merged AR image is displayed.
  • the above embodiment can display the video frames in the target video based on the AR method, that is, play the target video based on the AR method, which realizes the interactive viewing process of the target video through the AR method, thus ensuring the user's degree of freedom when watching the target video. Better user experience.
  • determining the perspective image corresponding to the background pose from each conversion image corresponding to the target time may include: an augmented reality image to be displayed at the previous moment of the target time
  • the corresponding video frame is used as the previous frame, and the next frame of the previous frame is determined from each video frame; each converted image corresponding to each next frame is used as each converted image corresponding to the target time, and the target time is obtained respectively.
  • the corresponding shooting angle of each converted image determine the background angle corresponding to the background pose from each shooting angle, and use the converted image with the background angle in each converted image corresponding to the target time as the angle image.
  • the previous frame may be the video frame in each video frame corresponding to the AR image displayed at the previous moment of the target moment, that is, the video frame corresponding to the target image involved in merging the AR.
  • the next frame may be a video frame in each video frame that can be played after the previous frame is played. Since the target video is a video with multiple perspectives, there are multiple synchronously collected next frames. Each converted image corresponding to each next frame is used as each converted image corresponding to the target time, and the shooting angle of each converted image is obtained respectively, which can represent the foreground shooting device used to capture the video frame corresponding to the converted image. From what angle was the shot taken?
  • the background perspective corresponding to the background pose can be determined from each shooting perspective, which can reflect the user's viewing perspective at the target time, and then the converted image with the background perspective among the conversion images corresponding to the target time is used as the perspective Image, the AR image generated and displayed based on the image from this perspective is an image that matches the background perspective.
  • merging the background image captured by the background shooting device at the target time with the target image, and displaying the merged augmented reality image may include: obtaining the background The background image captured by the shooting device at the target time, identifies the background plane in the background image, and obtains the plane position of the background plane in the background image; merges the background image and the target image based on the plane position, so that the augmented reality obtained after the merger
  • the foreground object in the image is located on the background plane; showing an augmented reality image.
  • the background plane may be a plane in the background image used to support the foreground object, that is, the plane captured by the background shooting device; the plane position may be the position of the background plane in the background image.
  • the background image and the target image are merged based on the plane position, so that the foreground object in the resulting AR image is located on the background plane, such as a dancing little girl standing on a desk. Dancing on the surface, thus increasing the interest of AR images.
  • Figure 2 is a flow chart of another image display method provided in an embodiment of the present disclosure. This embodiment is adjusted based on the above embodiment.
  • the above image display method may also include: extracting a foreground image from the video frame for each video frame; obtaining the calibration result of the foreground shooting device used to capture the video frame; and converting the foreground image into the foreground image according to the calibration result.
  • the pixel points located in the image coordinate system are converted to the foreground shooting coordinate system where the foreground shooting device is located to obtain the calibration image; the pixel points in the calibration image are converted to the augmented reality coordinate system to obtain the converted image.
  • the method of this embodiment may include the following steps:
  • the M*N video frames in these M*N video frames can be processed based on S210-S230. every frame.
  • the foreground image is extracted from it.
  • This extraction process can be understood as a matting process, which can be implemented in a variety of ways, such as binary classification of video frames, portrait matting, and background prior-based matting. Picture or green screen cutout, etc., to obtain the foreground image.
  • the calibration result can be the result obtained after calibrating the foreground shooting equipment. In practical applications, it can be represented by the foreground pose and foreground internal parameters.
  • the calibration can be performed in the following manner: separately obtain the video frame sequences shot by each foreground shooting device, and determine the feature matching relationship between these video frame sequences; according to the feature matching relationship, respectively Obtain the calibration results of each foreground shooting equipment. Since the above calibration process is a self-calibration process, it can be completed by shooting each video frame sequence without involving a calibration board, thus achieving the effect of shortening the calibration time and reducing the difficulty of calibration.
  • the above example is only one method of obtaining the calibration result.
  • the calibration result can also be obtained based on other methods, which are not specifically limited here.
  • each foreground shooting device has been aligned before shooting the target video, which means that the foreground shooting coordinate system is the same spatial coordinate system, then the pixel points in the calibration image can be directly converted to the AR coordinate system, and we get Convert the image; otherwise, you can first fix the axis of each foreground shooting coordinate system, and then convert the pixels in the calibration image; and so on.
  • S240 Obtain the background pose of the background shooting device at the target time, and determine the perspective image corresponding to the background pose from each converted image corresponding to the target time.
  • S260 Merge the background image captured by the background shooting device at the target time with the target image, and display the merged augmented reality image.
  • a foreground image is extracted from the video frame, and then each pixel in the foreground image is converted to a foreground shooting according to the calibration result of the foreground shooting device used to shoot the video frame. coordinate system, and then convert the resulting calibration image to the AR coordinate system, thus achieving the accurate acquisition of the converted image.
  • converting the pixel points in the calibration image to the augmented reality coordinate system to obtain the converted image includes: obtaining a fixed-axis coordinate system, wherein the fixed-axis coordinate system includes: The foreground pose of the foreground shooting device or the coordinate system determined by the captured video frame; convert the pixels in the calibration image to a fixed-axis coordinate system to obtain a fixed-axis image; convert the pixels in the fixed-axis image to augmented reality Under the coordinate system, the transformed image is obtained.
  • the target video captured by the misaligned foreground shooting devices will jitter when the viewing angle changes, which will directly affect the user's viewing experience of the target video.
  • the above-mentioned fixed-axis coordinate system can be obtained in a variety of ways, such as based on the foreground poses of each foreground shooting device.
  • the corresponding homography matrix can be calculated based on each foreground pose, thereby obtaining the fixed-axis coordinates. system; another example is to determine based on the video frames shot by each foreground shooting device, such as performing feature matching on these video frames to obtain a fixed-axis coordinate system; etc., which are not specifically limited here.
  • the fixed-axis image is converted to the AR coordinate system to obtain a converted image, so as to avoid the jitter of the converted image under perspective change.
  • the pixel points in the calibration image are converted to a fixed-axis coordinate system
  • Obtaining the fixed-axis image may include: obtaining the first homography matrix from the foreground shooting coordinate system to the fixed-axis coordinate system, and converting the pixels in the calibration image to the fixed-axis coordinate system based on the first homography matrix to obtain the fixed-axis image.
  • P fix-axis H F P, where P represents the pixels in the calibrated image, and H F represents the first homography matrix .
  • converting the pixel points in the fixed-axis image to the augmented reality coordinate system to obtain the converted image may include: obtaining a second homography matrix from the fixed-axis coordinate system to the augmented reality coordinate system, and based on the second The homography matrix converts the pixels in the fixed-axis image to the augmented reality coordinate system to obtain the converted image.
  • PAR H A P fix-axis
  • P fix-axis represents the pixels in the fixed-axis image
  • H A represents the second homography matrix.
  • FIG. 3 is a flow chart of another image display method provided in an embodiment of the present disclosure. This embodiment is adjusted based on the above embodiment.
  • merging the background image captured by the background photography device at the target time with the target image, and displaying the merged augmented reality image may include: obtaining the background captured by the background photography device at the target time. Image; based on the transparency information of each pixel in the target image, fuse the target image and the background image to obtain an augmented reality image, and display the augmented reality image.
  • the explanations of terms that are the same as or corresponding to the above embodiments will not be repeated here.
  • the method of this embodiment may include the following steps:
  • the converted image is an image obtained by converting the pixel points located in the image coordinate system in the foreground image to the augmented reality coordinate system.
  • the foreground image is obtained from Images containing foreground objects extracted from video frames.
  • the target video includes free-view video or light field video.
  • S320 Obtain the background pose of the background shooting device at the target time, and determine the perspective image corresponding to the background pose from each converted image corresponding to the target time.
  • S350 Based on the transparency information of each pixel in the target image, fuse the target image and the background image to obtain an augmented reality image, and display the augmented reality image.
  • the embodiment of the present disclosure realizes the display process of the target video by placing the target video in the AR field for playback, and does not realize it by drawing a three-dimensional model in real time through illumination.
  • the target video cannot It is drawn again, which itself is video data, so here the AR image is obtained through fusion.
  • the fusion of the target image and the background image is achieved through the transparency information of each pixel in the target image, thereby ensuring the effective effect of the AR image.
  • the above image display method may also include: obtaining the background image. Color temperature; adjust the image parameters of the target image based on the color temperature, and update the target image according to the adjustment results, where the image parameters include white balance and/or brightness.
  • the color temperature of the background image can be obtained first, so that the image parameters such as white balance and/or brightness of the target image can be adjusted based on the color temperature. Adjust so that the adjusted target image matches the background image in tone, thereby ensuring the overall consistency of the AR image obtained after subsequent fusion and providing a better user experience.
  • each video frame the camera used to shoot the video frame is calibrated, and the spatial transformation of each pixel in the video frame is implemented according to the calibration result, thereby obtaining the calibration image; obtaining the calibration axis coordinate system, and convert each pixel point in the calibration image to the fixed-axis coordinate system to obtain the fixed-axis image; obtain the AR coordinate system, and convert each pixel point in the fixed-axis image to the AR coordinate system to obtain the target image; in order to expand the viewing angle of the target video, a virtual image from a virtual perspective can be generated based on the target image from the physical perspective, and the virtual image is also used as the target image; the target image and the background image captured by the camera in the mobile phone are compared Fusion to obtain AR images; each AR image is displayed in sequence, thereby achieving the AR display effect of the target video.
  • FIG. 5 is a structural block diagram of an image display device provided in an embodiment of the present disclosure.
  • the device is used to execute the image display method provided in any of the above embodiments.
  • This device has the same concept as the image display methods in the above embodiments.
  • the device may include: a conversion image acquisition module 410 , a perspective image determination module 420 , a target image acquisition module 430 and an augmented reality image display module 440 .
  • the converted image acquisition module 410 is configured to obtain a converted image corresponding to each video frame in the target video, where the converted image is to convert the pixel points located in the image coordinate system in the foreground image to the augmented reality coordinate system.
  • the resulting image, the foreground image is an image containing a foreground object extracted from the video frame, and the target video includes a free-view video or a light field video;
  • the perspective image determination module 420 is configured to obtain the background pose of the background shooting device at the target time, and determine the perspective image corresponding to the background pose from each conversion image corresponding to the target time;
  • the target image obtaining module 430 is configured to convert the pixels in the perspective image to the background shooting coordinate system where the background shooting device is located according to the background pose to obtain the target image;
  • the augmented reality image display module 440 is configured to merge the background image captured by the background shooting device at the target time with the target image, and display the merged augmented reality image.
  • the device may further include:
  • the foreground image extraction module is configured to extract the foreground image from the video frame for each video frame
  • a calibration result acquisition module configured to acquire the calibration results of the foreground shooting equipment used to shoot video frames
  • the calibration image obtaining module is configured to convert the pixels in the foreground image located in the image coordinate system to the foreground shooting coordinate system where the foreground shooting device is located based on the calibration results to obtain the calibration image;
  • the conversion image acquisition module is configured to convert the pixel points in the calibration image into the augmented reality coordinate system to obtain the conversion image.
  • modules which can include:
  • a fixed-axis coordinate system acquisition unit is configured to obtain a fixed-axis coordinate system, where the fixed-axis coordinate system includes a coordinate system determined based on the foreground pose of each foreground shooting device or the captured video frame;
  • the fixed-axis image obtaining unit is configured to convert each pixel point in the calibration image into a fixed-axis coordinate system to obtain a fixed-axis image;
  • the conversion image acquisition unit is set to convert the pixel points in the fixed-axis image to the augmented reality coordinate system to obtain the conversion image.
  • the fixed-axis image obtaining unit is set to:
  • the converted image obtaining unit is set to:
  • the augmented reality image display module 440 may include:
  • the background image acquisition unit is configured to acquire the background image captured by the background shooting device at the target time
  • the augmented reality image display unit is configured to fuse the target image and the background image based on the transparency information of each pixel in the target image, obtain an augmented reality image, and display the augmented reality image.
  • the device may further include:
  • the color temperature acquisition module is set to obtain the color temperature of the background image before fusing the target image and the background image based on the transparency information of each pixel in the target image;
  • the target image update module is configured to adjust the image parameters of the target image based on the color temperature, and update the target image according to the adjustment results, where the image parameters include white balance and/or brightness.
  • the perspective image determination module 420 may include:
  • the next frame determination unit is configured to use the video frame corresponding to the augmented reality image displayed at the previous moment of the target time as the previous frame, and determine the next frame of the previous frame from each video frame;
  • the shooting angle acquisition unit is configured to use each converted image corresponding to each next frame as each converted image corresponding to the target time, and obtain the shooting angle of each converted image corresponding to the target time;
  • the perspective image obtaining unit is configured to determine the background perspective corresponding to the background pose from each shooting perspective, and use the converted image with the background perspective in each conversion image corresponding to the target time as the perspective image.
  • the augmented reality image display module 440 may include:
  • the plane position obtaining unit is configured to obtain the background image captured by the background shooting device at the target time, identify the background plane in the background image, and obtain the plane position of the background plane in the background image;
  • An image merging unit configured to merge the background image and the target image based on the plane position, so that the foreground object in the merged augmented reality image is located on the background plane;
  • the augmented reality image display unit is configured to display the augmented reality image.
  • the image display device obtaineds the converted image corresponding to each video frame in the target video through the converted image acquisition module.
  • the converted image may be the image located in the foreground image extracted from the video frame.
  • the pixel points in the coordinate system are converted to the image in the AR coordinate system; the background pose of the background shooting device at the target time is obtained through the perspective image determination module, and then the background pose is obtained from the target moment.
  • the above device can display the video frames in the target video based on the AR method, that is, it can play the target video based on the AR method. It realizes the interactive viewing process of the target video through the AR method, thus ensuring the user's degree of freedom when watching the target video. Better user experience.
  • the image display device provided by the embodiments of the present disclosure can execute the image display method provided by any embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.
  • FIG. 6 a schematic structural diagram of an electronic device (such as the terminal device or server in FIG. 6 ) 500 suitable for implementing embodiments of the present disclosure is shown.
  • Electronic devices in embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA), tablet computers (PAD), portable multimedia players (Portable Media Player , PMP), mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals), and fixed terminals such as digital televisions (Television, TV), desktop computers, etc.
  • PDA Personal Digital Assistant
  • PMP portable multimedia players
  • mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals)
  • fixed terminals such as digital televisions (Television, TV), desktop computers, etc.
  • the electronic device shown in FIG. 6 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
  • the electronic device 500 may include a processing device (such as a central processing unit, a graphics processor, etc.) 501, which may process data according to a program stored in a read-only memory (Read-Only Memory, ROM) 502 or from a storage device. 508 loads the program in the random access memory (Random Access Memory, RAM) 503 to perform various appropriate actions and processing. In the RAM 503, various programs and data required for the operation of the electronic device 500 are also stored.
  • the processing device 501, ROM 502 and RAM 503 are connected to each other via a bus 504.
  • An input/output (I/O) interface 505 is also connected to bus 504.
  • the following devices can be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a Liquid Crystal Display (LCD) , an output device 507 such as a speaker, a vibrator, etc.; a storage device 508 including a magnetic tape, a hard disk, etc.; and a communication device 509. Communication device 509 may allow The electronic device 500 communicates wirelessly or wiredly with other devices to exchange data. Although electronic device 500 is shown in FIG. 6 with various means, it should be understood that implementation or availability of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
  • input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.
  • LCD Liquid Crystal Display
  • an output device 507 such as a speaker, a vibrator, etc.
  • embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via communication device 509, or from storage device 508, or from ROM 502.
  • the processing device 501 When the computer program is executed by the processing device 501, the above-mentioned functions defined in the method of the embodiment of the present disclosure are performed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof.
  • Examples of computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), erasable programmable read only memory Memory (Erasable Programmable Read-Only Memory, EPROM) or flash memory, optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above .
  • a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code contained on a computer-readable medium can be transmitted using any appropriate medium, including but not limited to: wires, optical cables, radio frequency (Radio Frequency, RF), etc., or any suitable combination of the above.
  • the client and server can communicate using any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium.
  • Communications e.g., communications network
  • Examples of communication networks include Local Area Network (LAN), Wide Area Network (Wide Area Network) Network, WAN), the Internet (e.g., the Internet), and end-to-end networks (e.g., ad hoc end-to-end networks), as well as any network currently known or developed in the future.
  • LAN Local Area Network
  • Wide Area Network Wide Area Network
  • WAN Wide Area Network
  • the Internet e.g., the Internet
  • end-to-end networks e.g., ad hoc end-to-end networks
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs.
  • the electronic device executes the above-mentioned one or more programs.
  • the converted image is an image obtained by converting the pixel points in the image coordinate system in the foreground image to the augmented reality coordinate system.
  • the foreground image is obtained from the video frame. Images containing foreground objects extracted from the target video include free-view video or light field video;
  • the background pose convert the pixels in the perspective image to the background shooting coordinate system where the background shooting device is located to obtain the target image
  • the storage medium may be a non-transitory storage medium.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages - such as Java, Smalltalk, C++, or a combination thereof. Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as an Internet service provider through Internet connection
  • each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks represented one after another may actually execute substantially in parallel. OK, they can sometimes be executed in reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure can be implemented in software or hardware. Among them, the name of the unit does not constitute a limitation on the unit itself under certain circumstances.
  • the conversion image acquisition module can also be described as "obtaining the conversion image corresponding to each video frame in the target video, where the conversion image It is an image obtained by converting the pixel points in the image coordinate system in the foreground image to the augmented reality coordinate system.
  • the foreground image is an image containing the foreground object extracted from the video frame.
  • the target video includes free-view video or Modules for light field video".
  • exemplary types of hardware logic components include: field programmable gate array (Field Programmable Gate Array, FPGA), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), application specific standard product (Application Specific Standard Product (ASSP), System on Chip (SOC), Complex Programmable Logic Device (CPLD), etc.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media examples include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM) ) or flash memory, optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • flash memory optical fiber
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the foregoing.
  • Example 1 provides an image display method, which may include:
  • the converted image is an image obtained by converting the pixel points in the image coordinate system in the foreground image to the augmented reality coordinate system.
  • the foreground image is obtained from the video frame. Images containing foreground objects extracted from the target video include free-view video or light field video;
  • the background pose convert the pixels in the perspective image to the background shooting coordinate system where the background shooting device is located to obtain the target image
  • Example 2 provides the method of Example 1.
  • the above image display method may also include:
  • the pixel points in the foreground image located in the image coordinate system are converted to the foreground shooting coordinate system where the foreground shooting device is located, and the calibration image is obtained;
  • Example 3 provides the method of Example 2 to convert the pixel points in the calibration image to the augmented reality coordinate system to obtain the converted image, which may include:
  • the fixed-axis coordinate system includes a coordinate system determined based on the foreground pose of each foreground shooting device or the captured video frame;
  • Example 4 provides the method of Example 3 to convert the pixel points in the calibration image to a fixed-axis coordinate system to obtain a fixed-axis image, which may include:
  • Example 5 provides the method of Example 3 to convert the pixel points in the fixed-axis image to the augmented reality coordinate system to obtain the converted image, which may include:
  • Example 6 provides the method of Example 1, merging the background image captured by the background shooting device at the target time with the target image, and the combined augmented reality image Make a presentation, which can include:
  • the target image and the background image are fused to obtain an augmented reality image and display the augmented reality image.
  • Example 7 provides the method of Example 6. Before fusing the target image and the background image based on the transparency information of each pixel in the target image, the above image display method, May also include:
  • Example 8 provides the method of Example 1, which determines the perspective image corresponding to the background pose from each conversion image corresponding to the target moment, which may include:
  • the video frame corresponding to the augmented reality image displayed at the previous moment of the target moment is regarded as the previous frame, and the next frame of the previous frame is determined from each video frame;
  • the background perspective corresponding to the background pose is determined from each shooting perspective, and the converted image with the background perspective among the converted images corresponding to the target time is used as the perspective image.
  • Example 9 provides the method of Example 1, merging the background image captured by the background shooting device at the target time with the target image, and the combined augmented reality image Make a presentation, which can include:
  • Example 10 provides an image display device, which may include:
  • the conversion image acquisition module is configured to obtain the conversion image corresponding to each video frame in the target video, where the conversion image is an image obtained by converting the pixel points located in the image coordinate system in the foreground image to the augmented reality coordinate system.
  • the foreground image is an image containing a foreground object extracted from a video frame
  • the target video includes a free-view video or a light field video;
  • the viewing angle image determination module is configured to obtain the background pose of the background shooting device at the target time, and determine the viewing angle image corresponding to the background pose from each conversion image corresponding to the target time;
  • the target image obtaining module is set to convert the pixels in the perspective image to the background shooting coordinate system where the background shooting device is located according to the background pose, and obtain the target image;
  • the augmented reality image display module is configured to merge the background image captured by the background shooting device at the target time with the target image, and display the merged augmented reality image.

Abstract

本公开实施例公开了一种图像展示方法、装置、电子设备及存储介质。该方法包括:获取与目标视频中的每个视频帧分别对应的转换图像;获取背景拍摄设备于目标时刻下的背景位姿,从与目标时刻对应的至少一个转换图像中确定与背景位姿对应的视角图像;根据背景位姿,将视角图像中的像素点转换到背景拍摄设备所在的背景拍摄坐标系下,得到目标图像;将背景拍摄设备于目标时刻下拍摄的背景图像、与目标图像进行合并,并将合并后得到的增强现实图像进行展示。

Description

图像展示方法、装置、电子设备及存储介质
本申请要求在2022年05月24日提交中国专利局、申请号为202210575768.6的中国专利申请的优先权,以上申请的全部内容通过引用结合在本申请中。
技术领域
本公开实施例涉及数据处理技术领域,例如涉及一种图像展示方法、装置、电子设备及存储介质。
背景技术
自由视角视频是时下热门的一种视频形式,其是通过向用户提供交互选择观看角度的功能,赋予了固定二维(two dimensional,2D)视频“移步换景”的观看体验,从而给用户带来了强烈的立体冲击。
目前主要是通过构建单独的交互式播放器来展示自由视角视频,该交互式播放器可以通过滑动条方式展现给用户,以使用户通过拖动滑动条来观看不同视角下的视频。但是,这种方式会导致用户观看的自由度受到限制,体验不佳。
发明内容
本公开实施例提供了一种图像展示方法、装置、电子设备及存储介质。
第一方面,本公开实施例提供了一种图像展示方法,可以包括:
获取与目标视频中的每个视频帧分别对应的转换图像,其中,转换图像是将前景图像中位于图像坐标系下的像素点转换到增强现实坐标系下后得到的图像,前景图像是从视频帧中提取出的包含前景对象的图像,目标视频包括自由视角视频或是光场视频;
获取背景拍摄设备于目标时刻下的背景位姿,从与目标时刻对应的至少一个转换图像中确定与背景位姿对应的视角图像;
根据背景位姿,将视角图像中的像素点转换到背景拍摄设备所在的背景拍摄坐标系下,得到目标图像;
将背景拍摄设备于目标时刻下拍摄的背景图像、与目标图像进行合并,并将合并后得到的增强现实图像进行展示。
第二方面,本公开实施例还提供了一种图像展示装置,可以包括:
转换图像获取模块,设置为获取目标视频中的每个视频帧分别对应的转换图像,其中,转换图像是将前景图像中的位于图像坐标系下的像素点转换到增 强现实坐标系下后得到的图像,前景图像是从视频帧中提取出的包含前景对象的图像,目标视频包括自由视角视频或是光场视频;
视角图像确定模块,设置为获取背景拍摄设备于目标时刻下的背景位姿,从与目标时刻对应的至少一个转换图像中确定与背景位姿对应的视角图像;
目标图像得到模块,设置为根据背景位姿,将视角图像中的像素点转换到背景拍摄设备所在的背景拍摄坐标系下,得到目标图像;
增强现实图像展示模块,设置为将背景拍摄设备于目标时刻下拍摄到的背景图像、与目标图像进行合并,并将合并后得到的增强现实图像进行展示。
第三方面,本公开实施例还提供了一种电子设备,可以包括:
一个或多个处理器;
存储器,设置为存储一个或多个程序,
当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现本公开任意实施例所提供的图像展示方法。
第四方面,本公开实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现本公开任意实施例所提供的图像展示方法。
附图说明
贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1是本公开实施例中的一种图像展示方法的流程图;
图2是本公开实施例中的另一种图像展示方法的流程图;
图3是本公开实施例中的另一种图像展示方法的流程图;
图4是本公开实施例中的另一种图像展示方法中一类示例的示意图;
图5是本公开实施例中的一种图像展示装置的结构框图;
图6是本公开实施例中的一种电子设备的结构示意图。
具体实施方式
下面将参照附图描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用 于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
图1是本公开实施例中所提供的一种图像展示方法的流程图。本实施例可以通过增强现实(Augmented Reality,AR)方式展示目标视频中的视频帧,从而实现目标视频的AR展示。该方法可以由本公开实施例提供的图像展示装置来执行,该装置可以由软件和/或硬件的方式实现,该装置可以集成在电子设备上,该电子设备可以是各种终端设备(如手机、平板电脑或头戴式显示设备等)或是服务器。
参见图1,本公开实施例的方法包括如下步骤:
S110、获取目标视频中的各视频帧分别对应的转换图像,其中,转换图像是将前景图像中的位于图像坐标系下的像素点转换到增强现实坐标系下后得到的图像,前景图像是从视频帧中提取出的包含前景对象的图像,目标视频包括自由视角视频或是光场视频。
其中,目标视频可以是具有多个视角的视频,例如可以是自由视角视频或是光场视频,该自由视角视频可以是多台前景拍摄设备呈圆环形环绕部署在被拍摄对象(即前景对象)的周围,以便对该前景对象进行同步采集后得到的视频;该光场视频可以是通过分布在平面或球面上的多台前景拍摄设备来同时捕获目标空间内来自不同视点即视角的光场样本后得到的视频,该目标空间内设 置有前景对象。需要说明的是,上述前景拍摄设备可以是相机(如光场相机或是普通相机)、摄像机或是摄像头等;上述阐述的自由视角视频和光场视频的得到过程仅是示例,也可以基于其余方式得到它们,在此未做具体限定。
视频帧可以是目标视频中的一张视频图像,针对每个视频帧,从中提取(即抠取)出包含前景对象的前景图像,该前景对象可以是目标视频中的主体对象和/或主体对象的手持物等。每个视频帧均对应有各自的转换图像,该转换图像可以理解为将与该视频帧对应的前景图像中位于图像坐标系下的像素点转换到AR坐标系下后的图像,该图像坐标系可以理解为该前景图像所在的空间坐标系,该AR坐标系可以理解为用于展示后续生成的AR图像的图像展示设备的屏幕坐标系。需要说明的是,转换图像设置的意义是,以前景拍摄设备是相机为例,为了实现视频帧的AR展示,拍摄视频帧时的多相机采集点无法与AR展示时的虚拟相机位置点匹配上,因此这里需要投影变换,并生成虚拟相机位置点上的新视角图像(即转换图像),从而可以与AR展示相匹配,得到相机变换情况下的正确视角图像(即需要正确显示的图像)。除此外,图像展示装置可以直接获取到预先处理完成的转换图像并应用,也可以对直接获取到的各视频帧分别进行处理后得到转换图像再应用,等等,在此未做具体限定。
S120、获取背景拍摄设备于目标时刻下的背景位姿,并从与目标时刻对应的各转换图像中确定与背景位姿对应的视角图像。
其中,背景拍摄设备可以是不同于前景拍摄设备的用于拍摄AR图像中的背景对象的设备,背景位姿可以是该背景拍摄设备于目标时刻下的位姿,例如可以通过设备位置和设备朝向进行表示,6个自由度;该目标时刻可以是历史时刻、当前时刻或是未来时刻等,在此未做具体限定。针对与目标时刻下展示的AR图像对应的视频帧,与目标时刻对应的各转换图像可以理解为与该视频帧同步采集的那些视频帧对应的转换图像。示例性的,假设目前时刻下展示的AR图像对应的视频帧是目标视频中的第50帧视频帧,那么与目标时刻对应的各转换图像可以是与同步采集到的那些第50帧视频帧分别对应的转换图像。与目标时刻对应的各转换图像的拍摄视角互不相同,从各拍摄视角中确定与背景位姿对应的背景视角,该背景视角可以理解为用户在目标时刻下的观看视角,然后将这些转换图像中具有该观看视角的转换图像作为视角图像,以便基于该视角图像生成并展示的AR图像是与观看视角匹配的图像。
S130、根据背景位姿,将视角图像中的像素点转换到背景拍摄设备所在的背景拍摄坐标系下,得到目标图像。
其中,背景拍摄坐标系可以是背景拍摄设备所在的空间坐标系,需要说明的是,AR坐标系和背景拍摄坐标系是不同的空间坐标系,如AR坐标系可以是手机的屏幕坐标系,背景拍摄坐标系可以是手机内的相机所在的空间坐标系;再如AR坐标系可以是头戴式显示设备的屏幕坐标系,背景拍摄坐标系可以是平板内的相机所在的空间坐标系;等等,在此未做具体限定。
根据背景位姿将位于AR坐标系下的视角图像转换到背景拍摄坐标系下,得到目标图像。在实际应用中,例如,为了得到与背景图像更为匹配的目标图像,除考虑背景位姿外,还可以考虑背景拍摄设备的背景内参,其可以反映出背景拍摄设备的焦距和畸变等情况。在此基础上,示例性的,假设目标图像中的像素点通过Pt-cam进行表示,那么Pt-cam=Kcam[Rcam|tcam]PAR,其中,PAR表示视角图像中的像素点,Kcam表示背景内参,Rcam表示背景拍摄设备的旋转矩阵,tcam表示背景拍摄设备的平移矩阵,这里通过Rcam和tcam表示背景位姿。
S140、将背景拍摄设备于目标时刻下拍摄的背景图像与目标图像进行合并,并将合并后得到的增强现实图像进行展示。
其中,背景图像可以是背景拍摄设备于目标时刻下拍摄到的图像,将背景图像和目标图像进行合并,合并方式可以是融合或是叠加等,然后将合并后得到的AR图像进行展示,由此实现了视频帧的AR展示的效果。那么,在按照目标视频中的各视频帧的先后采集顺序依次展示相应的AR图像时,由此实现了目标视频的AR展示的效果。这样一来,用户通过挪动背景拍摄设备的空间位置这种交互方式即可观看到目标视频中相应视角下的视频,从而保证了用户观看目标视频时的自由度,实现了六自由度的目标视频的用户观看过程。另外,上述实施例通过将目标视频放到AR领域中进行播放来实现目标视频的展示过程,并非是通过渲染三维模型来实现,由此可以展示出三维模型无法表现出的精细感,如人物发丝的清晰展示,用户体验更佳。
本公开实施例,通过获取与目标视频中的各视频帧分别对应的转换图像,该转换图像可以是将从视频帧中提取出的前景图像中位于图像坐标系下的像素点转换到AR坐标系下后的图像;获取背景拍摄设备于目标时刻下的背景位姿, 并从与目标时刻对应的各转换图像中确定与背景位姿对应的视角图像;根据背景位姿将视角图像中像素点转换到背景拍摄设备所在的背景拍摄坐标系,得到目标图像;从而,将背景拍摄设备于目标时刻下拍摄的背景图像、与目标图像进行合并,并将合并后得到的AR图像进行展示。上述实施例,可以基于AR方式展示目标视频中的视频帧,即基于AR方式播放目标视频,其通过AR方式实现了目标视频的交互式观看过程,从而保证了用户观看目标视频时的自由度,用户体验较佳。
一种实施例中,在上述实施例的基础上,从与目标时刻对应的各转换图像中确定与背景位姿对应的视角图像,可以包括:将于目标时刻的上一时刻展示的增强现实图像对应的视频帧作为上一帧,并从各视频帧中确定上一帧的下一帧;将与各下一帧分别对应的各转换图像作为与目标时刻对应的各转换图像,分别获取目标时刻对应的各转换图像的拍摄视角;从各拍摄视角中确定与背景位姿对应的背景视角,并将目标时刻对应的各转换图像中具有背景视角的转换图像作为视角图像。其中,上一帧可以是各视频帧中与目标时刻的上一时刻展示的AR图像对应的视频帧,即与在合并得到该AR时涉及的目标图像对应的视频帧。下一帧可以是各视频帧中在上一帧播放完毕后可播放的视频帧,由于目标视频是具有多个视角的视频,因此这里存在多个同步采集到的下一帧。与各下一帧分别对应的各转换图像作为与目标时刻对应的各转换图像,并分别获取每个转换图像的拍摄视角,其可以表示出用于拍摄该转换图像对应的视频帧的前景拍摄设备是以什么视角进行拍摄。从而,可以从各拍摄视角中确定与背景位姿对应的背景视角,其可以反映出用户在目标时刻下的观看视角,然后将目标时刻对应的各转换图像中具有该背景视角的转换图像作为视角图像,以基于该视角图像生成并展示的AR图像是与背景视角匹配的图像。
另一种实施例中,在上述实施例的基础上,将背景拍摄设备于目标时刻下拍摄的背景图像与目标图像进行合并,并将合并后得到的增强现实图像进行展示,可以包括:获取背景拍摄设备于目标时刻下拍摄的背景图像,识别背景图像中的背景平面,得到背景平面在背景图像中的平面位置;基于平面位置对背景图像与目标图像进行合并,以使合并后得到的增强现实图像中的前景对象位于背景平面上;展示增强现实图像。其中,背景平面可以是背景图像中的用于依托前景对象的平面,即背景拍摄设备拍摄到的平面;平面位置可以是背景平面在背景图像中的位置。基于平面位置对背景图像与目标图像进行合并,以使由此得到的AR图像中的前景对象位于背景平面上,如跳舞小女孩站立在办公桌 面上跳舞,由此增加了AR图像的趣味性。
图2是本公开实施例中提供的另一种图像展示方法的流程图。本实施例以上述实施例为基础进行调整。本实施例中,上述图像展示方法,还可以包括:针对每个视频帧,从视频帧中提取出前景图像;获取用于拍摄视频帧的前景拍摄设备的标定结果;根据标定结果将前景图像中的位于图像坐标系下的像素点转换到前景拍摄设备所在的前景拍摄坐标系下,得到标定图像;将标定图像中的像素点转换到增强现实坐标系下,得到转换图像。其中,与上述各实施例相同或相应的术语的解释在此不再赘述。
相应的,如图2所示,本实施例的方法可以包括如下步骤:
S210、针对目标视频中的每个视频帧,从视频帧中提取出包含前景对象的前景图像,其中,目标视频包括自由视角视频或是光场视频。
其中,假设目标视频由N个前景拍摄设备拍摄得到,并且各前景拍摄设备同步采集M帧视频帧,N和M均是正整数,那么可以基于S210-S230分别处理这M*N帧视频帧中的每一帧。例如,针对每个视频帧,从中提取出前景图像,这一提取过程可以理解为抠图过程,可以通过多种方式实现,如对视频帧进行二分类、人像抠图、基于背景先验的抠图或是绿幕抠图等,从而得到前景图像。
S220、获取用于拍摄视频帧的前景拍摄设备的标定结果,根据标定结果将前景图像中的位于图像坐标系下的像素点转换到前景拍摄设备所在的前景拍摄坐标系下,得到标定图像。
其中,标定结果可以是对前景拍摄设备进行标定后得到的结果,实际应用中可以通过前景位姿和前景内参进行表示。示例性的,为了缩短标定时间并且降低标定难度,可以采用如下方式进行标定:分别获取各前景拍摄设备拍摄的视频帧序列,并确定这些视频帧序列之间的特征匹配关系;根据特征匹配关系分别得到各前景拍摄设备的标定结果。由于上述标定过程是自标定过程,通过拍摄的各视频帧序列即可完成,无需涉及到标定板,由此达到了缩短标定时间并且降低标定难度的效果。上述示例仅仅是标定结果的得到过程的一种方法,还可以基于其余方式获取到标定结果,在此未做具体限定。
前景拍摄坐标系可以是前景拍摄设备所在的坐标系,根据标定结果将前景图像中的各像素点分别转换到前景拍摄坐标系下,得到标定图像。示例性的,假设标定图像中的像素点通过P进行表示,则P=[R|t]-1K-1pt,其中,pt表示前景图像中的像素点,R表示前景拍摄设备的旋转矩阵,t表示前景拍摄设备的平移矩阵,这里通过R和t表示前景位姿,K表示前景内参。
S230、将标定图像中的像素点转换到增强现实坐标系下,得到转换图像。
其中,如果各前景拍摄设备在拍摄目标视频前,已经过对齐处理,这意味着各前景拍摄坐标系是同一空间坐标系,那么可以直接将标定图像中的像素点转换到AR坐标系下,得到转换图像;否则,可以先对各前景拍摄坐标系进行定轴处理,然后再对标定图像中的像素点进行转换;等等。
S240、获取背景拍摄设备于目标时刻下的背景位姿,并从与目标时刻对应的各转换图像中确定与背景位姿对应的视角图像。
S250、根据背景位姿,将视角图像中的像素点转换到背景拍摄设备所在的背景拍摄坐标系下,得到目标图像。
S260、将背景拍摄设备于目标时刻下拍摄的背景图像与目标图像进行合并,并将合并后得到的增强现实图像进行展示。
本公开实施例,针对每个视频帧,通过从视频帧中提取出前景图像,然后根据用于拍摄该视频帧的前景拍摄设备的标定结果将该前景图像中的各像素点分别转换到前景拍摄坐标系下,进而将由此得到的标定图像转换到AR坐标系下,由此实现了准确得到转换图像。
一种实施例中,在上述实施例的基础上,将标定图像中的像素点转换到增强现实坐标系下,得到转换图像,包括:获取定轴坐标系,其中,定轴坐标系包括根据各前景拍摄设备的前景位姿或是拍摄的视频帧确定的坐标系;将标定图像中的像素点转换到定轴坐标系下,得到定轴图像;将定轴图像中的像素点转换到增强现实坐标系下,得到转换图像。
其中,人工在搭建多台前景拍摄设备时,通常期望将它们搭建到同一平面上,但这一要求通过人工对齐很难实现,耗时耗力并且精度难以保证。但是,未对齐的各前景拍摄设备拍摄出的目标视频在视角变换时存在抖动现象,这会直接影响到用户对于目标视频的观看体验。为了避免这一情形,可以获取用于实现定轴功能的定轴坐标系,然后让标定图像转换到定轴坐标系下,从而得到在视角变换时不会存在抖动现象的定轴图像。实际应用中,例如,上述定轴坐标系可以通过多种方式得到,如根据各前景拍摄设备的前景位姿得到,例如可以是基于各前景位姿计算出相应单应矩阵,从而得到定轴坐标系;再如根据各前景拍摄设备拍摄的视频帧确定,如对这些视频帧进行特征匹配,从而得到定轴坐标系;等等,在此未做具体限定。进一步,将定轴图像转换到AR坐标系下,得到转换图像,以避免出现视角变换下的转换图像发生抖动的情况。
在此基础上,一种实施例中,将标定图像中的像素点转换到定轴坐标系下, 得到定轴图像,可以包括:获取前景拍摄坐标系到定轴坐标系的第一单应矩阵,并基于第一单应矩阵将标定图像中的像素点转换到定轴坐标系下,得到定轴图像。示例性的,假设定轴图像中的像素点通过Pfix-axis进行表示,则Pfix-axis=HFP,其中,P表示标定图像中的像素点,HF表示第一单应矩阵。
另一实施例中,将定轴图像中的像素点转换到增强现实坐标系下,得到转换图像,可以包括:获取定轴坐标系到增强现实坐标系的第二单应矩阵,并基于第二单应矩阵将定轴图像中的像素点转换到增强现实坐标系下,得到转换图像。示例性的,假设转换图像中的像素点通过PAR进行表示,则PAR=HAPfix-axis,其中,Pfix-axis表示定轴图像中的像素点,HA表示第二单应矩阵。
图3是本公开实施例中提供的另一种图像展示方法的流程图。本实施例以上述实施例为基础进行调整。在本实施例中,将背景拍摄设备于目标时刻下拍摄的背景图像与目标图像进行合并,并将合并后得到的增强现实图像进行展示,可以包括:获取背景拍摄设备于目标时刻下拍摄的背景图像;基于目标图像中各像素点的透明信息,将目标图像和背景图像进行融合,得到增强现实图像,并展示增强现实图像。其中,与上述各实施例相同或是相应的术语的解释在此不再赘述。
相应的,如图3所示,本实施例的方法可以包括如下步骤:
S310、获取目标视频中的各视频帧分别对应的转换图像,其中,转换图像是将前景图像中的位于图像坐标系下的像素点转换到增强现实坐标系下后得到的图像,前景图像是从视频帧中提取出的包含前景对象的图像,目标视频包括自由视角视频或是光场视频。
S320、获取背景拍摄设备于目标时刻下的背景位姿,并从与目标时刻对应的各转换图像中确定与背景位姿对应的视角图像。
S330、根据背景位姿,将视角图像中的像素点转换到背景拍摄设备所在的背景拍摄坐标系下,得到目标图像。
S340、获取背景拍摄设备于目标时刻下拍摄的背景图像。
S350、基于目标图像中的各像素点的透明信息,将目标图像和背景图像进行融合,得到增强现实图像,并展示增强现实图像。
其中,针对目标图像中的每个像素点,其的透明信息可以表示出该像素点在透明通道(即alpha通道)下的信息,基于各像素点的透明信息可以实现目标图像和背景图像的融合,从而得到AR图像。示例性的,针对目标图像中的任一像素点foreground,基于alpha表示其的透明信息,那么该像素点与背景图像中相应的像素点background进行融合后的像素点可以表示为:Pixel_final=alpha*foreground+(1-alpha)*background,其中,Pixel_final表示融合后的像素点。需要说明的是,正如上文所述,本公开实施例是将目标视频放到AR领域中进行播放来实现目标视频的展示过程,并非是通过光照实时绘制三维模型来实现,换言之,目标视频无法再次被绘制,其本身就是视频数据,因此这里是通过融合方式得到AR图像。
本公开实施例,通过目标图像中各像素点的透明信息实现目标图像和背景图像的融合,从而保证了AR图像的有效得到的效果。
一种实施例中,在上述实施例基础上,在基于目标图像中的各像素点的透明信息,将目标图像和背景图像进行融合前之前,上述图像展示方法,还可以包括:获取背景图像的色温;基于色温对目标图像的图像参数进行调整,并根据调整结果更新目标图像,其中,图像参数包括白平衡和/或亮度。其中,为了保证融合后得到的AR图像中的前景对象和背景对象相匹配,在进行融合前,可以先获取背景图像的色温,从而基于色温对目标图像的白平衡和/或亮度等图像参数进行调整,以使调整后的目标图像在色调上与背景图像匹配,从而保证了后续融合后得到的AR图像的整体一致性,用户体验较好。
为了从整体上更好地理解上述各实施例中,下面结合示例,对其进行示例性说明。示例性的,参见图4,针对每个视频帧,对用于拍摄该视频帧的相机进行标定,并根据标定结果实现该视频帧中的各像素点的空间转换,从而得到标定图像;获取定轴坐标系,并将标定图像中的各像素点转换到定轴坐标系下,得到定轴图像;获取AR坐标系,并将定轴图像中的各像素点转换到AR坐标系下,得到目标图像;为了扩展目标视频的观看视角,可以基于物理视角下的目标图像生成虚拟视角下的虚拟图像,并将该虚拟图像也作为目标图像;将目标图像和手机内的摄像头拍摄到的背景图像进行融合,从而得到AR图像;依次展示各AR图像,从而实现了目标视频的AR展示效果。
图5为本公开实施例中提供的图像展示装置的结构框图,该装置用于执行上述任意实施例所提供的图像展示方法。该装置与上述各实施例的图像展示方法属于同一个构思,在图像展示装置的实施例中未详尽描述的细节内容,可以参考上述图像展示方法的实施例。参见图5,该装置可以包括:转换图像获取模块410、视角图像确定模块420、目标图像得到模块430和增强现实图像展示模块440。
其中,转换图像获取模块410,设置为获取与目标视频中的各视频帧分别对应的转换图像,其中,转换图像是将前景图像中的位于图像坐标系下的像素点转换到增强现实坐标系下后得到的图像,前景图像是从视频帧中提取出的包含前景对象的图像,目标视频包括自由视角视频或是光场视频;
视角图像确定模块420,设置为获取背景拍摄设备于目标时刻下的背景位姿,并从与目标时刻对应的各转换图像中确定与背景位姿对应的视角图像;
目标图像得到模块430,设置为根据背景位姿,将视角图像中的像素点转换到背景拍摄设备所在的背景拍摄坐标系下,得到目标图像;
增强现实图像展示模块440,设置为将背景拍摄设备于目标时刻下拍摄的背景图像、与目标图像进行合并,并将合并后得到的增强现实图像进行展示。
在一实施例中,在上述装置的基础上,该装置还可以包括:
前景图像提取模块,设置为针对每个视频帧,从视频帧中提取出前景图像;
标定结果获取模块,设置为获取用于拍摄视频帧的前景拍摄设备的标定结果;
标定图像得到模块,设置为根据标定结果将前景图像中的位于图像坐标系下的像素点转换到前景拍摄设备所在的前景拍摄坐标系下,得到标定图像;
转换图像得到模块,设置为将标定图像中的像素点转换到增强现实坐标系下,得到转换图像。
在此基础上,转换图像得到模块,可以包括:
定轴坐标系获取单元,设置为获取定轴坐标系,其中,定轴坐标系包括根据各前景拍摄设备的前景位姿或是拍摄的视频帧确定的坐标系;
定轴图像得到单元,设置为将标定图像中的各像素点转换到定轴坐标系下,得到定轴图像;
转换图像得到单元,设置为将定轴图像中的像素点转换到增强现实坐标系下,得到转换图像。
在此基础上,在一实施例中,定轴图像得到单元,设置为:
获取前景拍摄坐标系到定轴坐标系的第一单应矩阵,并基于第一单应矩阵将标定图像中的像素点转换到定轴坐标系下,得到定轴图像。
在一实施例中,转换图像得到单元,设置为:
获取定轴坐标系到增强现实坐标系的第二单应矩阵,并基于第二单应矩阵将定轴图像中的像素点转换到增强现实坐标系下,得到转换图像。
在一实施例中,增强现实图像展示模块440,可以包括:
背景图像获取单元,设置为获取背景拍摄设备于目标时刻下拍摄的背景图像;
增强现实图像展示单元,设置为基于目标图像中的各像素点的透明信息,将目标图像和背景图像进行融合,得到增强现实图像,并展示增强现实图像。
在一实施例中,在上述装置的基础上,该装置还可以包括:
色温获取模块,设置为在基于目标图像中各像素点的透明信息,将目标图像和背景图像进行融合之前,获取背景图像的色温;
目标图像更新模块,设置为基于色温对目标图像的图像参数进行调整,根据调整结果更新目标图像,其中,图像参数包括白平衡和/或亮度。
在一实施例中,视角图像确定模块420,可以包括:
下一帧确定单元,设置为将于目标时刻的上一时刻展示的增强现实图像对应的视频帧作为上一帧,并从各视频帧中确定上一帧的下一帧;
拍摄视角获取单元,设置为将与各下一帧分别对应的各转换图像作为与目标时刻对应的各转换图像,分别获取目标时刻对应的各转换图像的拍摄视角;
视角图像得到单元,设置为从各拍摄视角中确定与背景位姿对应的背景视角,并将目标时刻对应的各转换图像中具有背景视角的转换图像作为视角图像。
在一实施例中,增强现实图像展示模块440,可以包括:
平面位置得到单元,设置为获取背景拍摄设备于目标时刻下拍摄的背景图像,识别背景图像中的背景平面,得到背景平面在背景图像中的平面位置;
图像合并单元,设置为基于平面位置对背景图像与目标图像进行合并,以使合并后得到的增强现实图像中的前景对象位于背景平面上;
增强现实图像展示单元,设置为展示增强现实图像。
本公开实施例所提供的图像展示装置,通过转换图像获取模块获取与目标视频中的各视频帧分别对应的转换图像,该转换图像可以是将从视频帧中提取出的前景图像中的位于图像坐标系下的像素点转换到AR坐标系下后的图像;通过视角图像确定模块获取背景拍摄设备于目标时刻下的背景位姿,然后从与目 标时刻对应的各转换图像中确定与背景位姿对应的视角图像;通过目标图像得到模块根据背景位姿将视角图像中的像素点转换到背景拍摄设备所在的背景拍摄坐标系,得到目标图像;从而,通过增强现实图像展示模块将背景拍摄设备于目标时刻下拍摄的背景图像、与目标图像进行合并,将合并后得到的AR图像进行展示。上述装置,可以基于AR方式展示目标视频中的视频帧,即可以基于AR方式播放目标视频,其通过AR方式实现了目标视频的交互式观看过程,从而保证了用户观看目标视频时的自由度,用户体验较佳。
本公开实施例所提供的图像展示装置可执行本公开任意实施例所提供的图像展示方法,具备执行方法相应的功能模块和有益效果。
值得注意的是,上述图像展示装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本公开的保护范围。
下面参考图6,其示出了适于用来实现本公开实施例的电子设备(例如图6中的终端设备或服务器)500的结构示意图。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,PDA)、平板电脑(PAD)、便携式多媒体播放器(Portable Media Player,PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字电视(Television,TV)、台式计算机等等的固定终端。图6示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图6所示,电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501,其可以根据存储在只读存储器(Read-Only Memory,ROM)502中的程序或者从存储装置508加载到随机访问存储器(Random Access Memory,RAM)503中的程序而执行各种适当的动作和处理。在RAM 503中,还存储有电子设备500操作所需的各种程序和数据。处理装置501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(Input/Output,I/O)接口505也连接至总线504。
通常,以下装置可以连接至I/O接口505:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506;包括例如液晶显示器(Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置507;包括例如磁带、硬盘等的存储装置508;以及通信装置509。通信装置509可以允许 电子设备500与其他设备进行无线或有线通信以交换数据。虽然图6中示出了具有各种装置的电子设备500,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置509从网络上被下载和安装,或者从存储装置508被安装,或者从ROM 502被安装。在该计算机程序被处理装置501执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)或闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(Local Area Network,LAN),广域网(Wide Area  Network,WAN),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:
获取与目标视频中的各视频帧分别对应的转换图像,其中,转换图像是将前景图像中位于图像坐标系下的像素点转换到增强现实坐标系下后得到的图像,前景图像是从视频帧中提取出的包含前景对象的图像,目标视频包括自由视角视频或是光场视频;
获取背景拍摄设备于目标时刻下的背景位姿,从与目标时刻对应的各转换图像中确定与背景位姿对应的视角图像;
根据背景位姿,将视角图像中的像素点转换到背景拍摄设备所在的背景拍摄坐标系下,得到目标图像;
将背景拍摄设备于目标时刻下拍摄的背景图像、与目标图像进行合并,并将合并后得到的增强现实图像进行展示。
存储介质可以是非暂态(non-transitory)存储介质。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言-诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言-诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)-连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执 行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,转换图像获取模块还可以被描述为“获取目标视频中的各视频帧分别对应的转换图像,其中,转换图像是将前景图像中的位于图像坐标系下的像素点转换到增强现实坐标系下后得到的图像,前景图像是从视频帧中提取出的包含前景对象的图像,目标视频包括自由视角视频或是光场视频的模块”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Product,ASSP)、片上系统(System on Chip,SOC)、复杂可编程逻辑设备(Complex Programmable Logic Device,CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)或快闪存储器、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,【示例一】提供了一种图像展示方法,该方法可以包括:
获取与目标视频中的各视频帧分别对应的转换图像,其中,转换图像是将前景图像中位于图像坐标系下的像素点转换到增强现实坐标系下后得到的图像,前景图像是从视频帧中提取出的包含前景对象的图像,目标视频包括自由视角视频或是光场视频;
获取背景拍摄设备于目标时刻下的背景位姿,从与目标时刻对应的各转换 图像中确定与背景位姿对应的视角图像;
根据背景位姿,将视角图像中的像素点转换到背景拍摄设备所在的背景拍摄坐标系下,得到目标图像;
将背景拍摄设备于目标时刻下拍摄的背景图像、与目标图像进行合并,并将合并后得到的增强现实图像进行展示。
根据本公开的一个或是多个实施例,【示例二】提供了示例一的方法,上述图像展示方法,还可以包括:
针对每个视频帧,从视频帧中提取出前景图像;
获取用于拍摄视频帧的前景拍摄设备的标定结果;
根据标定结果将前景图像中的位于图像坐标系下的像素点转换到前景拍摄设备所在的前景拍摄坐标系下,得到标定图像;
将标定图像中的像素点转换到增强现实坐标系下,得到转换图像。
根据本公开的一个或多个实施例,【示例三】提供了示例二的方法,将标定图像中的像素点转换到增强现实坐标系下,得到转换图像,可以包括:
获取定轴坐标系,其中,定轴坐标系包括根据各前景拍摄设备的前景位姿或是拍摄的视频帧确定的坐标系;
将标定图像中的像素点转换到定轴坐标系下,得到定轴图像;
将定轴图像中的像素点转换到增强现实坐标系下,得到转换图像。
根据本公开的一个或多个实施例,【示例四】提供了示例三的方法,将标定图像中的像素点转换到定轴坐标系下,得到定轴图像,可以包括:
获取前景拍摄坐标系到定轴坐标系的第一单应矩阵,并基于第一单应矩阵将标定图像中的像素点转换到定轴坐标系下,得到定轴图像。
根据本公开的一个或多个实施例,【示例五】提供了示例三的方法,将定轴图像中的像素点转换到增强现实坐标系下,得到转换图像,可以包括:
获取定轴坐标系到增强现实坐标系的第二单应矩阵,并基于第二单应矩阵将定轴图像中的像素点转换到增强现实坐标系下,得到转换图像。
根据本公开的一个或多个实施例,【示例六】提供了示例一的方法,将背景拍摄设备于目标时刻下拍摄的背景图像、与目标图像进行合并,并将合并后得到的增强现实图像进行展示,可以包括:
获取背景拍摄设备于目标时刻下拍摄的背景图像;
基于目标图像中的各像素点的透明信息,将目标图像和背景图像进行融合,得到增强现实图像,并展示增强现实图像。
根据本公开的一个或多个实施例,【示例七】提供了示例六的方法,在基于目标图像中的各像素点的透明信息,将目标图像和背景图像进行融合前,上述图像展示方法,还可以包括:
获取背景图像的色温;
基于色温对目标图像的图像参数进行调整,并根据调整结果更新目标图像,其中,图像参数包括白平衡和/或亮度。
根据本公开的一个或是多个实施例,【示例八】提供了示例一的方法,从与目标时刻对应的各转换图像中确定与背景位姿对应的视角图像,可以包括:
将于目标时刻的上一时刻展示的增强现实图像对应的视频帧作为上一帧,并从各视频帧中确定上一帧的下一帧;
将与各下一帧分别对应的各转换图像作为与目标时刻对应的各转换图像,分别获取目标时刻对应的各转换图像的拍摄视角;
从各拍摄视角中确定与背景位姿对应的背景视角,并将目标时刻对应的各转换图像中具有背景视角的转换图像作为视角图像。
根据本公开的一个或多个实施例,【示例九】提供了示例一的方法,将背景拍摄设备于目标时刻下拍摄的背景图像、与目标图像进行合并,并将合并后得到的增强现实图像进行展示,可以包括:
获取背景拍摄设备于目标时刻下拍摄的背景图像,识别背景图像中的背景平面,得到背景平面在背景图像中的平面位置;
基于平面位置对背景图像与目标图像进行合并,以使合并后得到的增强现实图像中的前景对象位于背景平面上;
展示增强现实图像。
根据本公开的一个或多个实施例,【示例十】提供了一种图像展示装置,该装置可以包括:
转换图像获取模块,设置为获取目标视频中的各视频帧分别对应的转换图像,其中,转换图像是将前景图像中的位于图像坐标系下的像素点转换到增强现实坐标系下后得到的图像,前景图像是从视频帧中提取出的包含前景对象的图像,目标视频包括自由视角视频或是光场视频;
视角图像确定模块,设置为获取背景拍摄设备于目标时刻下的背景位姿,从与目标时刻对应的各转换图像中确定与背景位姿对应的视角图像;
目标图像得到模块,设置为根据背景位姿,将视角图像中的像素点转换到背景拍摄设备所在的背景拍摄坐标系下,得到目标图像;
增强现实图像展示模块,设置为将背景拍摄设备于目标时刻下拍摄到的背景图像、与目标图像进行合并,并将合并后得到的增强现实图像进行展示。
本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的实施例,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它实施例。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的实施例。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (12)

  1. 一种图像展示方法,包括:
    获取与目标视频中的每个视频帧分别对应的转换图像,其中,所述转换图像是将前景图像中的位于图像坐标系下的像素点转换到增强现实坐标系下后得到的图像,所述前景图像是从所述视频帧中提取出的包含前景对象的图像,所述目标视频包括自由视角视频或是光场视频;
    获取背景拍摄设备于目标时刻下的背景位姿,并从与所述目标时刻对应的至少一个转换图像中确定与所述背景位姿对应的视角图像;
    根据所述背景位姿,将所述视角图像中的像素点转换到所述背景拍摄设备所在的背景拍摄坐标系下,得到目标图像;
    将所述背景拍摄设备于所述目标时刻下拍摄的背景图像、与所述目标图像进行合并,并将合并后得到的增强现实图像进行展示。
  2. 根据权利要求1所述的方法,还包括:
    针对每个所述视频帧,从所述视频帧中提取出所述前景图像;
    获取用于拍摄所述视频帧的前景拍摄设备的标定结果;
    根据所述标定结果将所述前景图像中的位于图像坐标系下的像素点转换到所述前景拍摄设备所在的前景拍摄坐标系下,得到标定图像;
    将所述标定图像中的像素点转换到所述增强现实坐标系下,得到所述转换图像。
  3. 根据权利要求2所述的方法,其中,将所述标定图像中的像素点转换到所述增强现实坐标系下,得到所述转换图像,包括:
    获取定轴坐标系,其中,所述定轴坐标系是根据至少一个前景拍摄设备的前景位姿或是根据拍摄的所述视频帧确定的坐标系;
    将所述标定图像中的像素点转换到所述定轴坐标系下,得到定轴图像;
    将所述定轴图像中的像素点转换到所述增强现实坐标系下,得到所述转换图像。
  4. 根据权利要求3所述的方法,其中,将所述标定图像中的像素点转换到所述定轴坐标系下,得到定轴图像,包括:
    获取所述前景拍摄坐标系到所述定轴坐标系的第一单应矩阵,并基于所述第一单应矩阵将所述标定图像中的像素点转换到所述定轴坐标系下,得到定轴图像。
  5. 根据权利要求3所述的方法,其中,所述将所述定轴图像中的像素点转换到所述增强现实坐标系下,得到所述转换图像,包括:
    获取所述定轴坐标系到所述增强现实坐标系的第二单应矩阵,并基于所述第二单应矩阵将所述定轴图像中的像素点转换到所述增强现实坐标系下,得到所述转换图像。
  6. 根据权利要求1所述的方法,其中,所述将所述背景拍摄设备于所述目标时刻下拍摄的背景图像、与所述目标图像进行合并,并将合并后得到的增强现实图像进行展示,包括:
    获取所述背景拍摄设备于所述目标时刻下拍摄的背景图像;
    基于所述目标图像中的像素点的透明信息,将所述目标图像和所述背景图像进行融合,得到增强现实图像,并展示所述增强现实图像。
  7. 根据权利要求6所述的方法,在所述基于所述目标图像中的像素点的透明信息,将所述目标图像和所述背景图像进行融合前,还包括:
    获取所述背景图像的色温;
    基于所述色温对所述目标图像的图像参数进行调整,并根据调整结果更新所述目标图像,其中,所述图像参数包括白平衡或亮度中的至少一个。
  8. 根据权利要求1所述的方法,其中,所述从与所述目标时刻对应的至少一个转换图像中确定与所述背景位姿对应的视角图像,包括:
    将于所述目标时刻的上一时刻展示的所述增强现实图像对应的所述视频帧作为上一帧,并从至少一个视频帧中确定所述上一帧的下一帧;
    将与每个下一帧分别对应的所述转换图像作为与所述目标时刻对应的所述转换图像,分别获取所述目标时刻对应的所述转换图像的拍摄视角;
    从所述拍摄视角中确定与所述背景位姿对应的背景视角,并将所述目标时刻对应的至少一个转换图像中具有所述背景视角的所述转换图像作为视角图像。
  9. 根据权利要求1所述的方法,其中,所述将所述背景拍摄设备于所述目标时刻下拍摄的背景图像、与所述目标图像进行合并,并将合并后得到的增强现实图像进行展示,包括:
    获取所述背景拍摄设备于所述目标时刻下拍摄的背景图像,识别所述背景图像中的背景平面,得到所述背景平面在所述背景图像中的平面位置;
    基于所述平面位置对所述背景图像与所述目标图像进行合并,以使合并后得到的增强现实图像中的所述前景对象位于所述背景平面上;
    展示所述增强现实图像。
  10. 一种图像展示装置,包括:
    转换图像获取模块,设置为获取目标视频中的每个视频帧分别对应的转换图像,其中,所述转换图像是将前景图像中的位于图像坐标系下的像素点转换到增强现实坐标系下后得到的图像,所述前景图像是从所述视频帧中提取出的包含前景对象的图像,所述目标视频包括自由视角视频或是光场视频;
    视角图像确定模块,设置为获取背景拍摄设备于目标时刻下的背景位姿,从与所述目标时刻对应的至少一个所述转换图像中确定与所述背景位姿对应的视角图像;
    目标图像得到模块,设置为根据所述背景位姿,将所述视角图像中的像素点转换到所述背景拍摄设备所在的背景拍摄坐标系下,得到目标图像;
    增强现实图像展示模块,设置为将所述背景拍摄设备于所述目标时刻下拍摄的背景图像、与所述目标图像进行合并,并将合并后得到的增强现实图像进行展示。
  11. 一种电子设备,包括:
    一个或多个处理器;
    存储器,设置为存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-9中任一所述的图像展示方法。
  12. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-9中任一所述的图像展示方法。
PCT/CN2023/089010 2022-05-24 2023-04-18 图像展示方法、装置、电子设备及存储介质 WO2023226628A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210575768.6A CN115002442B (zh) 2022-05-24 一种图像展示方法、装置、电子设备及存储介质
CN202210575768.6 2022-05-24

Publications (1)

Publication Number Publication Date
WO2023226628A1 true WO2023226628A1 (zh) 2023-11-30

Family

ID=83028855

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/089010 WO2023226628A1 (zh) 2022-05-24 2023-04-18 图像展示方法、装置、电子设备及存储介质

Country Status (1)

Country Link
WO (1) WO2023226628A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112653848A (zh) * 2020-12-23 2021-04-13 北京市商汤科技开发有限公司 增强现实场景下的展示方法、装置、电子设备及存储介质
WO2021073278A1 (zh) * 2019-10-15 2021-04-22 北京市商汤科技开发有限公司 一种增强现实数据呈现方法、装置、设备及存储介质
CN113220251A (zh) * 2021-05-18 2021-08-06 北京达佳互联信息技术有限公司 物体显示方法、装置、电子设备及存储介质
CN115002442A (zh) * 2022-05-24 2022-09-02 北京字节跳动网络技术有限公司 一种图像展示方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021073278A1 (zh) * 2019-10-15 2021-04-22 北京市商汤科技开发有限公司 一种增强现实数据呈现方法、装置、设备及存储介质
CN112653848A (zh) * 2020-12-23 2021-04-13 北京市商汤科技开发有限公司 增强现实场景下的展示方法、装置、电子设备及存储介质
CN113220251A (zh) * 2021-05-18 2021-08-06 北京达佳互联信息技术有限公司 物体显示方法、装置、电子设备及存储介质
CN115002442A (zh) * 2022-05-24 2022-09-02 北京字节跳动网络技术有限公司 一种图像展示方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN115002442A (zh) 2022-09-02

Similar Documents

Publication Publication Date Title
WO2022083383A1 (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
WO2016000309A1 (zh) 基于穿戴设备的增强现实方法及系统
US20240036792A1 (en) Picture displaying method and apparatus, and electronic device
WO2022161107A1 (zh) 三维视频的处理方法、设备及存储介质
CN111970524B (zh) 交互类直播连麦的控制方法、装置、系统、设备及介质
WO2023071603A1 (zh) 视频融合方法、装置、电子设备及存储介质
WO2023071707A1 (zh) 视频图像处理方法、装置、电子设备及存储介质
WO2022037484A1 (zh) 图像处理方法、装置、设备及存储介质
US20220159197A1 (en) Image special effect processing method and apparatus, and electronic device and computer readable storage medium
WO2023207379A1 (zh) 图像处理方法、装置、设备及存储介质
US20240062479A1 (en) Video playing method and apparatus, electronic device, and storage medium
CN114694136A (zh) 一种物品展示方法、装置、设备及介质
WO2023216822A1 (zh) 图像校正方法、装置、电子设备及存储介质
US20220272283A1 (en) Image special effect processing method, apparatus, and electronic device, and computer-readable storage medium
US11871137B2 (en) Method and apparatus for converting picture into video, and device and storage medium
WO2022237435A1 (zh) 更换画面中的背景的方法、设备、存储介质及程序产品
WO2023226628A1 (zh) 图像展示方法、装置、电子设备及存储介质
WO2022227996A1 (zh) 图像处理方法、装置、电子设备以及可读存储介质
CN115002442B (zh) 一种图像展示方法、装置、电子设备及存储介质
GB2600341A (en) Image special effect processing method and apparatus, electronic device and computer-readable storage medium
CN110602480A (zh) 一种采用增强现实分享场景的方法、装置及系统
WO2021121291A1 (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
RU2802724C1 (ru) Способ и устройство обработки изображений, электронное устройство и машиночитаемый носитель информации
US20230134623A1 (en) Method and system for controlling interactive live streaming co-hosting, device, and medium
WO2022213798A1 (zh) 图像处理方法、装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23810702

Country of ref document: EP

Kind code of ref document: A1