WO2019223463A1 - 图像处理方法、装置、存储介质和计算机设备 - Google Patents

图像处理方法、装置、存储介质和计算机设备 Download PDF

Info

Publication number
WO2019223463A1
WO2019223463A1 PCT/CN2019/083295 CN2019083295W WO2019223463A1 WO 2019223463 A1 WO2019223463 A1 WO 2019223463A1 CN 2019083295 W CN2019083295 W CN 2019083295W WO 2019223463 A1 WO2019223463 A1 WO 2019223463A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
virtual
model
terminal
sphere
Prior art date
Application number
PCT/CN2019/083295
Other languages
English (en)
French (fr)
Inventor
刘天成
潘红
余媛
何金文
刘立强
赵彬如
钟庆华
王旅波
肖欢
方楚楠
刘伟
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP19808017.8A priority Critical patent/EP3798801A4/en
Priority to JP2020551294A priority patent/JP7096902B2/ja
Publication of WO2019223463A1 publication Critical patent/WO2019223463A1/zh
Priority to US16/996,566 priority patent/US11238644B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/0304Detection arrangements using opto-electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/033Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
    • G06F3/0346Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor with detection of the device orientation or free movement in a 3D space, e.g. 3D mice, 6-DOF [six degrees of freedom] pointers using gyroscopes, accelerometers or tilt-sensors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04815Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/003Navigation within 3D models or images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Definitions

  • the present application relates to the field of computer technology, and in particular, to an image processing method, device, storage medium, and computer device.
  • image processing technology has also been continuously improved. Users can process images through professional image processing software to make processed images perform better. Users can also use image processing software to attach materials provided by image processing software to images, so that processed images can convey more information.
  • the current image processing method requires the user to expand the material library of the image processing software, browse the material library, select the appropriate material from the material library, adjust the position of the material in the image to confirm the modification, and complete the image processing. Therefore, the current image processing method requires a lot of manual operations, takes a long time, and causes the image processing process to be inefficient.
  • embodiments of the present application provide an image processing method, device, storage medium, and computer equipment to solve the problem of low efficiency with respect to the traditional image processing process.
  • An image processing method includes:
  • An image processing device includes:
  • a playback module configured to play the acquired image frames frame by frame according to the acquired timing
  • a determining module configured to determine a position corresponding to the target in a real scene when the trajectory formed by the movement of the target in the acquired multi-frame image frames meets a trigger condition
  • the rendering module is configured to render a virtual entry in a currently played image frame according to the position; and display virtual content in the virtual entry.
  • a computer-readable storage medium stores a computer program on the computer-readable storage medium.
  • the processor causes the processor to perform the following steps:
  • a computer device includes a memory and a processor.
  • the memory stores a computer program.
  • the processor causes the processor to perform the following steps:
  • the above image processing method, device, storage medium and computer equipment play the image frames that reflect the real scene on the one hand, so that the played image frames can reflect the real scene; on the other hand, the target moves in the image frames collected from the real scene.
  • the position corresponding to the target in the real scene is automatically determined to render the virtual entry in the currently played image frame according to the determined position, and display the virtual content in the virtual mode entry.
  • the virtual content of the virtual world can be automatically displayed in the virtual entrance, and the real content of the real world can be displayed outside the virtual entrance, which avoids the tedious steps of manual operation and greatly improves the efficiency of image processing.
  • FIG. 1 is an application environment diagram of an image processing method in an embodiment
  • FIG. 2 is a schematic flowchart of an image processing method according to an embodiment
  • FIG. 3 is a schematic diagram of an interface for playing an image frame in an embodiment
  • FIG. 4 is a schematic diagram of an interface for rendering a virtual portal in a currently played image frame according to an embodiment
  • FIG. 5 is a schematic diagram of segmenting a hand region from an acquired image frame in an embodiment
  • FIG. 6 is a schematic diagram of a trajectory change in an image frame in an embodiment
  • FIG. 7 is a schematic diagram of a relationship between coordinate spaces in an embodiment
  • FIG. 8 is a rendering schematic diagram of a current terminal position after passing through a spatial area in an embodiment
  • FIG. 9 is a schematic diagram of an interface displayed on a terminal interface when the position of the current terminal does not pass through the space area again and covers the virtual entrance after the current field of view area is moved in one embodiment
  • FIG. 10 is a rendering schematic diagram when the position of a current terminal moves around a space area in an embodiment
  • FIG. 11 is a schematic cross-sectional view of a model in an embodiment
  • FIG. 12 is a schematic diagram of a rendering principle in an embodiment
  • FIG. 13 is a flowchart block diagram of an image processing method according to an embodiment
  • FIG. 14 is a block diagram of an image processing apparatus according to an embodiment
  • FIG. 15 is a block configuration diagram of an image processing apparatus in another embodiment
  • FIG. 16 is an internal structural diagram of a computer device in one embodiment.
  • FIG. 1 is an application environment diagram of an image processing method in an embodiment.
  • the image processing method is applied to an image processing system.
  • the image processing system includes a terminal 110 and a server 120.
  • the terminal 110 and the server 120 are connected through a network.
  • the terminal 110 is configured to execute an image processing method.
  • the terminal 110 may be a desktop terminal or a mobile terminal, and the mobile terminal may be at least one of a mobile phone, a tablet computer, a notebook computer, and the like.
  • the server 120 may be an independent server or a server cluster composed of multiple independent servers.
  • the terminal 110 may acquire image frames collected from a real scene, and play the acquired image frames frame by frame according to the collected timing.
  • the image frame may be acquired by the terminal 110 from the real world through a built-in image acquisition device or an externally connected image acquisition device.
  • the built-in image acquisition device may be a front camera or a rear camera of the terminal 110; the image frame may also be It is sent to the terminal 110 by other devices after being collected from the real scene.
  • the terminal 110 may determine the position corresponding to the target in the real scene when the trajectory formed by the movement of the target in the acquired multi-frame image frames locally determines, and render a virtual entry in the currently played image frame according to the position.
  • the terminal 110 may also send the acquired image frame to the server 120, and the server 120 notifies the terminal 110 of the trigger when it determines that the track formed by the movement of the target object in the acquired multi-frame image frame meets the trigger condition
  • the terminal 110 determines the position corresponding to the target in the real scene, renders the virtual entry in the currently played image frame according to the position, and displays the virtual content in the virtual entry.
  • FIG. 2 is a schematic flowchart of an image processing method according to an embodiment.
  • the image processing method is mainly applied to a computer device for illustration.
  • the computer device may be the terminal 110 in FIG. 1.
  • the image processing method includes the following steps:
  • the real scene is a scene existing in the natural world.
  • An image frame is a unit in an image frame sequence capable of forming a dynamic picture, and is used to record a picture in a real scene at a certain moment.
  • the terminal may acquire image frames from a real scene at a fixed or dynamic frame rate, and acquire the acquired image frames.
  • the fixed or dynamic frame rate enables the image frames to form a continuous dynamic picture when they are played at the fixed or dynamic frame rate.
  • the terminal may use a built-in or externally connected image acquisition device such as a camera to collect image frames of a real scene under the current field of view of the camera, and obtain the acquired image frames.
  • the field of view of the camera may change due to changes in the attitude and position of the terminal.
  • the terminal may use an AR (Augmented Reality, Augmented Reality, AR) shooting mode provided by an application running on the local machine, and after selecting the AR shooting mode, collect image frames from a real scene, Obtain the acquired image frames.
  • the application may be a social application, and the social application is an application capable of performing social interaction on the network based on a social network.
  • Social applications include instant messaging applications, SNS (Social Network Service) applications, live broadcast applications or photo applications, such as QQ or WeChat.
  • the terminal may receive an image frame collected from a real scene and send the received image frame sent by another terminal.
  • a terminal establishes a video session through a social application running on the terminal, it receives image frames collected from a real scene and sent by the terminals corresponding to other conversation parties.
  • the frame rate of the acquired image frame may be the same as the frame rate of the acquired image frame, or may be smaller than the frame rate of the acquired image frame.
  • timing of acquisition refers to the chronological order when image frames are acquired, and can be represented by the relationship between the timestamps of image frames recorded at the time of acquisition.
  • Frame-by-frame playback refers to frame-by-image playback.
  • the terminal may play the acquired image frames one by one according to the frame rate of the acquired image frames and in ascending order of time stamps.
  • the terminal can directly play the acquired image frames, or store the acquired image frames into the buffer area at the timing of acquisition, and take out the image frames from the buffer area at the timing of acquisition to play.
  • the terminal may play the received image frames sent by another terminal from a real scene according to the frame rate of the image frames collected by the other terminal, and play the received image frames one by one according to the ascending order of the time stamps.
  • the terminal can directly play the received image frame, or store the received image frame into the buffer area according to the acquisition timing, and take out the image frame from the buffer area to play according to the acquisition timing.
  • FIG. 3 is a schematic diagram of an interface for playing an image frame in an embodiment.
  • FIG. 3 (a) is a simplified schematic diagram of a terminal interface when an image frame is played
  • FIG. 3 (b) is a screen capture screenshot of the terminal interface when an image frame is played. It can be seen that what is displayed on the terminal display screen is a picture in a real scene.
  • the target is the entity that is the target in the real scene.
  • the target may be a hand, a face, or a long object.
  • the trajectory formed by the movement of the target may be a trajectory formed by the reference point movement of the target in the image frame obtained when the target is in motion.
  • the trajectory formed by the movement of the imaging point of the index finger of the hand in the acquired image frame and for example, when the user is holding a long object (such as a pen or magic wand) and moves, the image is The trajectory formed by the movement of the imaging point on the top of the long object in the frame.
  • a trigger is a constraint that triggers a specific event.
  • the specific event is an event of rendering a virtual entry in a played image frame.
  • the trigger condition may be that the trajectory formed by the movement of the target in the acquired multi-frame image frames is a regular closed shape, such as a triangle, a quadrangle, or a circle.
  • the user can select the target and control the target to move in the real scene, so that the trajectory formed by the motion of the target in the image frame from which the target is collected meets specific constraints (such as trigger conditions) to trigger specific Event (rendering a virtual entry).
  • the position corresponding to the target in the real scene is the mapping of the position of the virtual entrance in the image frame played by the user intending the terminal to be mapped in the real scene.
  • the visual perception is to trigger a virtual entrance to appear in the real world, just like the real entrance of the real world.
  • the terminal determines the coordinate position of the target in the image frame, and according to a projection matrix adapted to the image acquisition device of the terminal, Calculate the position of the target in the real scene.
  • the coordinate position of the target in the image frame may specifically be the coordinate position of a reference point of the target in the image frame.
  • the coordinate position of the imaging point of the index finger of the hand may also be the center coordinate position of the trajectory formed by the target's movement. For example, when the index finger of the hand draws a circle, the center coordinate position of the circular trajectory.
  • a user holds a terminal and opens a built-in camera on the terminal through an application program for invoking a camera on the terminal.
  • the terminal can collect image frames in a real scene under the current field of view of the camera.
  • the user controls the movement of the target object towards the camera of the terminal, and then acquires the real-time image frame including the target object.
  • the captured image frame includes the target and the background under the current field of view of the front camera; if the user opens the rear camera, the captured image includes the target and The background in the current field of view of the rear camera.
  • the virtual entrance is a concept relative to the real entrance.
  • a real portal is an entity used to divide a real space in a real scene. Real entrances, such as room doors, can divide real space into areas inside and outside the room; or real entrances, such as entrances to scenic areas, can divide real space into scenic and non-scenic areas.
  • the virtual portal is a virtual model used to divide the area in the virtual scene. Virtual entrances such as virtual model doors.
  • the position in the real scene is the absolute position in the real space. This position does not change due to changes in the current field of view of the image acquisition device built-in or externally connected to the terminal. It can be known that when the current field of view of the image acquisition device changes, the rendering position and size of the virtual portal in the image frame are different. Then, when the current field of view of the image acquisition device changes, the rendering position and size of the virtual portal in the image frame are based on the principle of object imaging in the real world, showing near-to-far effects.
  • the terminal may calculate a projection matrix of the virtual entrance in an image frame collected in the current field of view based on a projection matrix adapted to the current scene and the terminal ’s image acquisition device in the current field of view Rendering position, where the virtual portal is rendered.
  • FIG. 4 is a schematic diagram of an interface for rendering a virtual entry in a currently played image frame in an embodiment.
  • FIG. 4 (a) is a simplified schematic diagram of a terminal interface when a virtual entry is rendered in a currently played image frame
  • FIG. 4 (b) is a screen capture screenshot of the terminal interface when a virtual entry is rendered in a currently played image frame
  • FIG. 4 (a) includes a rendering virtual entry 410a
  • FIG. 4 (b) includes a rendering virtual entry 410b.
  • S210 Display virtual content in the virtual portal.
  • the virtual content here is content that does not exist in the real scene collected from the acquired image frames.
  • the virtual content is content that does not exist in the real scene A.
  • the virtual content here is virtual content relative to the current real scene, not absolute virtual content.
  • the virtual content here can be completely virtual content simulated by computer technology, or it can be content in non-current real-life scenarios.
  • the current real scene is the real scene from which the image frames acquired by the terminal are collected.
  • the virtual content may be dynamic content or static content.
  • the virtual content may be uniform content, content corresponding to a trajectory formed by the movement of an object in an image frame, or content selected by the user.
  • the terminal may set a correspondence relationship between the track and the virtual content, so that after the terminal recognizes the track, the terminal may query the virtual content corresponding to the track for display.
  • the terminal may also display a selection dialog box, display the available virtual content in the selection dialog box, and then display the virtual content selected by the user selection instruction in the virtual portal.
  • the virtual content may be a virtual video or a video generated by collecting image frames from a real scene. For example, if a user holds a hand-held terminal in an office, and captures the real scene in the office to obtain image frames for playback, then the content displayed outside the virtual entrance is the real scene in the office.
  • the virtual entrance can display game videos or other image collections. Installed in a non-current office, such as a realistic scene on Wangfujing Street.
  • virtual content is displayed in the virtual portal 410a (b), and outside the virtual portal 410a (b) is a picture in a real scene.
  • the visual perception is that the virtual world is the virtual world, and the virtual world is the real world. Users can move through the virtual entrance to view the virtual world inside the virtual entrance, or move out of the virtual entrance to view the real world outside the virtual entrance and experience the crossing effect in the virtual world and the real world.
  • the above image processing method plays on the one hand an image frame that reflects a real scene, so that the played image frame can reflect the real scene; on the other hand, when the trajectory formed by the movement of the target in the image frame collected from the real scene meets the trigger condition , Automatically determine the corresponding position of the target in the real scene, in order to render the virtual portal in the currently playing image frame according to the determined position, and display the virtual content in the virtual mold portal, so that it can be automatically displayed in the virtual portal
  • the virtual content of the virtual world displays the real content of the real world outside the virtual entrance, which avoids the tedious steps of manual operation and greatly improves the efficiency of image processing.
  • the target is a hand.
  • the image processing method further includes: segmenting a hand image from the acquired image frame; identifying a gesture type corresponding to the hand image; when the gesture type is a trigger type, determining a motion reference point in the image frame; and determining according to the motion reference point The trajectory created by hand movements.
  • the hand is a limb part of a human or an animal.
  • the hand image is an image that includes a hand and has a high ratio of the hand region to the image region.
  • Gestures are forms of motion made by the user through the hand.
  • the gesture type is the type to which the gesture belongs in the obtained image frame.
  • the trigger type is the type of the gesture that triggers a specific event.
  • the motion reference point is used as a reference for discriminating the motion process of the target. It can be understood that if the position of the motion reference point changes in different image frames, it means that the target is running. For example, the imaging point of the index fingertip of the finger is used as the motion reference point. When the position of the imaging point of the index fingertip in multiple image frames changes, it is determined that the hand has moved.
  • segmenting the hand image from the acquired image frame includes: encoding the acquired image frame into a semantic segmentation feature matrix through a hand recognition model; decoding the semantic segmentation feature matrix to obtain a semantic segmentation image; and a semantic segmentation image
  • the pixel points in have a pixel value representing the classification category and correspond to the pixel points in the image frame encoded from; the hand image is segmented from the image according to the pixel points belonging to the hand category.
  • the hand recognition model is a machine learning model with hand recognition ability after training.
  • Machine learning English is called Machine Learning, or ML for short.
  • Machine learning models can have specific capabilities through sample learning.
  • Machine learning models can use neural network models, support vector machines, or logistic regression models.
  • Neural network models such as convolutional neural networks.
  • the hand recognition model may be a Fully Convolutional Networks model.
  • the semantic segmentation feature matrix is a low-dimensional representation of the semantic features of the image content in the image frame and covers the semantic feature information of the entire image frame.
  • a semantic segmentation image is an image that is segmented into several non-overlapping, regions with a certain semantics.
  • the pixel values of the pixels in the semantic segmentation image are used to reflect the classification category to which the corresponding pixels belong.
  • the classification of pixels can be two-class or multi-class. Pixel two classification, such as pixels corresponding to roads and other pixels in the map image. There are multiple types of pixels, such as pixels corresponding to the sky, pixels corresponding to the earth, and pixels corresponding to people in a landscape map.
  • the image size of the semantic segmentation image is consistent with the image size of the original image frame. In this way, it can be understood that the model input image is classified pixel by pixel, and the pixel values of the pixels in the image are segmented according to semantics to obtain the category membership of each pixel in the model input image.
  • the terminal may obtain a hand recognition model by training image samples belonging to each gesture type.
  • the terminal uses the hand image as an input for the hand recognition model, and encodes the hand image into a semantic segmentation feature matrix through the obtained coding structure of the hand recognition model. Then continue to decode the structure of the hand recognition model and decode the semantic segmentation feature matrix to obtain a semantic segmentation image.
  • the hand recognition model when the gesture type of the trigger type set by the terminal is unique, the hand recognition model is a two-class model.
  • the image samples used to train the binary classification model include positive samples that belong to the target gesture type and negative samples that do not belong to the target gesture type.
  • the hand recognition model is a multi-class model.
  • the image samples used to train the multi-classification model include samples belonging to each target gesture type.
  • FIG. 5 shows a schematic diagram of segmenting a hand region from an acquired image frame in an embodiment.
  • FIG. 5 (a) is an obtained original image frame including a hand region
  • FIG. 5 (b) is a semantic segmentation image obtained after performing semantic segmentation
  • FIG. 5 (c) is a regular segmentation of a hand region. Hand image.
  • the image frame is automatically input to a trained machine learning model, the image frame is encoded into a semantic segmentation feature matrix, and then the semantic segmentation feature matrix is decoded to obtain a semantic segmentation image.
  • the pixels in the semantic segmentation image have pixel values representing the classification category to which they belong, and correspond to the pixels in the original image frame. In this way, the hand region can be automatically determined according to the pixels belonging to the hand category to segment the hand image, thereby improving the accuracy of image segmentation.
  • the terminal may compare the segmented hand image with a trigger type hand image template, and determine the gesture type corresponding to the hand image when the segmented hand image matches the trigger type hand image template. Is the trigger type, and then the motion reference point is determined in the image frame; the trajectory formed by the hand movement is determined according to the motion reference point.
  • the terminal can also input the segmented hand image into the trained gesture recognition model to obtain the gesture recognition result output by the gesture recognition model.
  • the gesture recognition result indicates that the gesture type corresponding to the hand image is a trigger type, then Determine the motion reference point in the frame; determine the trajectory formed by the hand movement according to the motion reference point.
  • the user makes a gesture towards the camera of the terminal, and when it is determined that the operation type corresponding to the gesture is a drawing type, the motion reference point of the gesture in each of the collected consecutive frames of images is determined. Because the frequency of collecting images is high, continuous motion reference points can be formed by connecting very short line segments.
  • the terminal may select an image frame from the acquired image frames according to the acquisition timing at a frame rate less than the acquired frame rate for hand image segmentation and gesture recognition.
  • selecting an image frame from the acquired image frames can be performed by manually selecting image frames through multiple threads to perform hand image segmentation and gesture recognition independently, which can improve recognition efficiency.
  • a specific gesture is made by the hand, and a specific trajectory is drawn through space, and the display effect of the virtual entrance can be automatically triggered.
  • the entire drawing process does not require the user to operate through the input device, and the user can draw through the gestures shown in a large space, which improves the convenience of triggering the virtual entrance.
  • the image processing method further includes: when the triggering condition is not satisfied, replacing the pixel value of the pixel passed by the track with the reference pixel value in the video frame being played; when the triggering condition is satisfied, The reference animation is played in the currently played image frame by position.
  • the trajectory when the trigger condition is not satisfied, in order to visually highlight the trajectory formed by the movement of the target, so that the user can intuitively perceive whether the trigger condition is satisfied, the trajectory may be displayed with pixel values different from the pixels in the image.
  • the trajectory is essentially a trajectory formed by pixel coordinates corresponding to each motion reference point in successive multi-frame image frames. Therefore, the terminal can determine the pixel points that the trajectory passes in the image frame according to each pixel coordinate, and convert the pixels of these pixel points to pixels.
  • the value is updated to the reference pixel value.
  • the reference pixel value is, for example, the pixel value corresponding to the more vivid green or red.
  • the terminal may also update the pixel values of the pixels within a certain range with reference to these pixel points as the reference pixel value.
  • the terminal can also render particle animation at the pixels passing by the track, covering or replacing the pixel values of the pixels passing by the track, to achieve the effect of magic gesture motion.
  • the terminal when the terminal updates the pixel value or renders the particle animation, it may be performed in real time. Once the pixel point that the track passes in the current image frame is determined, the pixel value of the pixel point is updated, or Pixel particles render particle animation, so that you can display the motion trajectory in real time.
  • FIG. 6 shows a schematic diagram of a trajectory change in an image frame in one embodiment.
  • FIGS. 6 (a), (b), and (c) are simplified diagrams of a terminal interface when a trajectory changes in an image frame
  • FIGS. 6 (d), (e), and (f) are trajectory changes in an image frame.
  • Screen capture of the terminal interface It can be clearly seen from FIG. 6 (a) or (d) that the pixel value of the pixel point passed by the trajectory is updated to be different from the original pixel value. It can be clearly seen from FIG. 6 (b) or (e) that the trajectory formed by the movement of the target object in the acquired multi-frame image frame is circular, and the trigger condition is satisfied. It can be clearly seen from FIG. 6 (c) or (f) that the pixels whose values are different from the original pixel values gradually approach the center of the track to achieve the animation effect.
  • the terminal may play the reference animation in the currently played image frame according to the position corresponding to the target.
  • the reference animation may be to gradually restore the pixels updated to the reference pixel value to the original pixel value, or to gradually cancel rendering of the particle animation, or to cancel the rendering after the particle animation gradually approaches the center of the track.
  • the motion trajectory of the target can be directly displayed in the current image frame, forming the effect of real-time drawing, and improving user perception. And after the trigger condition is met, the reference animation is played, which improves the fun.
  • determining the position corresponding to the target in the real scene includes determining a world coordinate position of the target in the world coordinate space.
  • Rendering the virtual entry in the currently played image frame according to the position corresponding to the target includes: rendering the virtual entry in the currently played image frame according to the camera coordinate position corresponding to the world coordinate position in the camera coordinate space.
  • the world coordinate position is the coordinate position of the target in the world coordinate space.
  • the world coordinate space is the coordinate space of the real scene, and it is a fixed absolute coordinate space.
  • the camera coordinate space is a coordinate space composed of an intersection point of an optical axis and an image plane as an origin, and is a relative coordinate space that changes with the position of the image acquisition device (camera).
  • the world coordinate position in the world coordinate space can be mapped to the camera coordinate position in the camera coordinate system through a rigid body change.
  • the intention is to render the virtual entry at the position where the world coordinate position of the target's motion is mapped in the image frame, so that the user perceives that the virtual entry appears at the world coordinate position where the target's motion is moving.
  • the world coordinate position and then determine the camera coordinate position in the current camera coordinate space in real time. For example, a user holds a terminal to draw a circle in the field of vision of the rear camera with the index finger of his hand, which means that a virtual door, that is, a virtual entrance, appears at a position where the finger is drawn in a circle in a real scene.
  • the terminal may obtain the image coordinate position of the target object in the current image frame, according to the projection change of the image coordinate space and the camera coordinate space.
  • the terminal may obtain the camera coordinate position of the target object, and then according to the rigid body change of the camera coordinate space and the world coordinate space, to obtain the world coordinate position of the target object in the world coordinate space.
  • the image coordinate space is a coordinate space formed by the center of the image as the origin and the coordinate axis parallel to the sides of the image.
  • the projection change relationship between the camera coordinate space and the image coordinate space is determined based on the factory settings, based on the projection change relationship and the image coordinate position in the image coordinate space, the horizontal direction of the target in the camera coordinate space can be determined.
  • the ordinate, and the vertical coordinate of the target in the camera coordinate space can be obtained according to the image depth of the target.
  • each coordinate space will be differentiated and described by taking FIG. 7 as an example.
  • Oo-XoYoZo is the coordinate system of the model coordinate space
  • Ow-XwYwZw is the coordinate system of the world coordinate space
  • Oc-XcYcZc is the coordinate system of the camera coordinate space
  • O 1 -xy is the coordinate system of the image coordinate space
  • O 2 -uv is the coordinate system of the pixel coordinate space.
  • Point P (Xw, Yw, Zw) is a point in the world coordinate space (ie, a real point in the real world), and point p is an image point in an image frame that matches point P (Xw, Yw, Zw).
  • the position coordinate of the point p in the image coordinate system space is (x, y), and the position coordinate in the pixel coordinate system space is (u, v).
  • the camera coordinate position of the point p in the camera coordinate space can be determined. It can be understood that the origin of the pixel coordinate space is the screen vertex.
  • rendering the virtual entry in the currently played image frame according to the camera coordinate position corresponding to the world coordinate position in the camera coordinate space includes: obtaining the position and posture of the current terminal; according to the position and posture of the current terminal, Determine the transformation matrix between the current camera coordinate space and the world coordinate space; transform the world coordinate position into the camera coordinate position in the camera coordinate space according to the transformation matrix; according to the camera coordinate position, render a virtual entry in the currently played image frame.
  • the position of the current terminal is the position of the image acquisition device of the current terminal in a real scene.
  • the attitude of the current terminal is the spatial state of the image acquisition device of the current terminal of rolling, pitching, and yaw in a real scene.
  • the terminal collects key frames, locates and records the position in the real scene when the key frames are collected, so that when the terminal performs image processing in real time, it can match the currently acquired image frames with the key frames to match the corresponding key The position of the frame record.
  • the terminal can build a corresponding map for the real scene based on SLAM (Simultaneous Localization and Mapping Positioning and Map Construction), VO (Visual Odometry), or VIO (Visual Inertial Odometry).
  • acquiring the position and attitude of the current terminal includes: selecting a map node from the map that matches the acquired image frame; querying a location corresponding to the map node stored in the real scene; and acquiring the data collected by the inertial sensor. Sensor data; determine the current attitude of the terminal based on the sensor data.
  • the terminal may match the acquired image frame with the node image in the map.
  • the matching is successful, locate the map node of the matched node image, and query the location in the real scene corresponding to the map node , Which is the current location of the terminal.
  • the terminal can also obtain sensor data collected by Inertial Measurement Unit (IMU), and determine the current terminal's attitude based on the sensor data. In this way, the rigid body transformation matrix of the current camera coordinate space and the world coordinate space can be calculated according to the position and attitude of the current terminal.
  • IMU Inertial Measurement Unit
  • the terminal can calculate the rigid body transformation matrix of the camera coordinate space and time coordinate space at the reference map node when constructing the map.
  • the rigid body transformation matrix of the camera coordinate space and time coordinate space of other map node positions can be calculated according to The position and attitude of the current map node and the reference map node are changed to obtain a rigid body transformation matrix of the current camera coordinate space and the world coordinate space.
  • the terminal may also determine the current rigid body transformation matrix between the world coordinate space and the current camera coordinate space in real time according to the conversion relationship between the world coordinate position of the object point and the camera coordinate position of the image point in the current camera coordinate space.
  • the current terminal is combined with the image characteristics of the currently collected image frame and the sensor data collected by the inertial sensor to locate the current terminal, thereby improving the accuracy of the positioning.
  • the terminal can transform the world coordinate position into the camera coordinate position in the camera coordinate space according to the current rigid body transformation matrix between the world coordinate space and the current camera coordinate space.
  • rendering the virtual entry in the currently played image frame according to the camera coordinate position includes: projecting the model vertices of the virtual entry into corresponding pixel points in the image coordinate space; according to the connection relationship between the model vertices, The corresponding pixels of the model vertices are combined into a primitive; the rasterized primitive is rendered at the image coordinate position corresponding to the camera coordinate position in the image coordinate space according to the pixel value of each pixel in the primitive, and a virtual entry is rendered.
  • the model of the virtual entrance is a model that has been set up.
  • the model parameters are also set.
  • Model parameters include the connection relationship between model vertex parameters and model vertices.
  • the model vertex parameters include the model coordinate position of the model vertex in the model coordinate space, the color of the model vertex, and the model texture coordinate.
  • Primitives are basic graphics such as points, lines, or areas. Rasterization is the process of transforming primitives into a set of two-dimensional images that represent pixels that can be drawn on the screen. In layman's terms, the assembly of the primitives results in a graph composed of vertices, and rasterization is based on the shape of the graph, interpolating the pixels of that graphics area.
  • the terminal may project the model vertices of the virtual entrance into corresponding pixel points in the image coordinate space through a transformation relationship between model coordinate space, world coordinate space, camera coordinate space, and image coordinate space. Then, according to the connection relationship between the vertices of each model, the pixels corresponding to the vertices of the model are combined into primitives to realize the assembly of primitives.
  • the primitives are rasterized and colored, and the virtual entry is rendered at the image coordinate position corresponding to the camera coordinate position in the image coordinate space.
  • the terminal can draw the virtual model according to OpenGL (Open Graphics Library).
  • the terminal can render the virtual entry in the captured image frame at the image coordinate position corresponding to the camera coordinate position in the image coordinate space, and place the obtained image frame after rendering the virtual entry into the frame buffer area and wait for display; Display directly on the terminal screen.
  • the image processing method further includes: a step of constructing a map.
  • the step of constructing the map includes: selecting an image frame from the image frames collected in time sequence; when the image characteristics of the selected image frame match the image of the node image During feature extraction, the selected image frame is a node image; it is determined that the acquired node image corresponds to a map node in the map; the image feature corresponding to the determined map node stores the acquired node image, and the acquired node image is collected in a real scene Location.
  • the selected image frame may be a key frame in the acquired image frame.
  • the terminal may receive a user selection instruction, and select an image frame from the collected image frames according to the user selection instruction.
  • the terminal may also select an image frame from the collected image frames according to the number of interval frames. For example, image frames are selected after every 20 image frames.
  • the image feature of a node image is an image feature for selecting a node image.
  • the image features matching the node image may be the number of feature points included in the image that matches the feature points included in the existing node image exceeds the reference number, or it may be the included feature points and the features included in the existing node image.
  • the ratio of matching feature points in the points to the feature points included in the existing node image is lower than the reference proportion.
  • the number of feature points included in the recently added node image is 100, and the number of feature points included in the currently selected image frame is 120.
  • the reference number is 50 and the reference ratio is 90%.
  • the number of feature points included in the currently selected image frame matches the feature points included in the recently added node image, the number is 70.
  • the number of feature points included in the current image frame matches the feature points included in the existing node image exceeds the reference number, and it can be determined that the features of the currently selected image frame match the features of the node image.
  • the terminal may acquire image frames at a fixed or dynamic frame rate, and select an image frame in which the number of feature points included in the acquired image frame is greater than the number threshold as the initial node image. Determine the corresponding map node of the node image in the map and the corresponding position of the feature points included in the node image in the map to construct a local map. The terminal then selects an image frame from the image frames collected in time sequence, and selects an image frame that matches the characteristics of the node image as a subsequent node image until a global map is obtained.
  • the terminal may track the feature points in the reference node image by using the initial node image as a reference node image.
  • the matching number of the feature points included in the selected image frame and the feature points included in the reference node image is lower than the first number and higher than the second number
  • the selected image frame is used as the node image.
  • the recently acquired node image is used as the reference node image, and image tracking is continued to select the node image.
  • the terminal may determine a map node in which the acquired node image is collected in natural space and projected on the map space.
  • the terminal can extract the features of the node images that are earlier in the sequence of the acquired node image, calculate the features of the node images that are earlier in the sequence, and the change matrix of the acquired node image, and obtain the position of the node images that are earlier in the sequence according to the change matrix
  • the change amount of the position when the acquired node image is collected, and then the corresponding map node of the acquired node image in the map is determined according to the change amount.
  • the change matrix is a similar change relationship between the features of the two-dimensional image and the features of the two-dimensional image.
  • the terminal may extract the image features of the acquired node images and match the image features of the node images corresponding to the existing nodes in the map, and obtain the successfully matched image features in the acquired node image and the existing node image, respectively.
  • Location The acquired node images are image frames acquired later, and the existing node images are image frames acquired earlier.
  • the terminal can determine the change matrix between the two image frames that have been acquired successively according to the positions of the obtained matched image features on the two image frames that have been successively acquired, so as to obtain the position change and attitude of the terminal when collecting the two image frames. Change, and then according to the position and posture of the previously acquired image, the position and posture of the image acquired later can be obtained.
  • a node image corresponding to an existing node in the map may be one frame or multiple frames.
  • the terminal may also compare the characteristics of the acquired node image with the characteristics of the node images corresponding to multiple existing nodes to obtain a change matrix of the image frames acquired later and the plurality of previously acquired image frames, and then according to the multiple changes,
  • the matrix synthesis obtains the position and attitude of the image acquired later. For example, weighted averaging of multiple position changes and attitude changes calculated.
  • the transformation relationship between the features of the node image is used to obtain the conversion relationship between the currently acquired node image and the previously existing node image, thereby realizing the current position of the previous image frame on the map to infer the current
  • the position of the image frame on the map enables real-time positioning.
  • the terminal may extract the image features of the node image, store the image features of the node image corresponding to the corresponding map node of the node image, and when the image feature comparison is needed, directly find the image feature of the corresponding node image according to the map node To save storage space and improve search efficiency.
  • the terminal may also collect the location of the acquired node image in a real scene, so as to directly find the corresponding node image storage location according to the map node when the terminal is located, so as to improve the search efficiency.
  • map construction can be automatically performed, which avoids the need for a large number of staff with professional drawing capabilities to manually survey and map the environment, and requires high staff capacity And the problem of large amount of labor improves the efficiency of map construction.
  • the virtual content is a panoramic video.
  • the image processing method further includes: determining that the virtual entrance corresponds to a spatial area in a real scene; and after the current terminal position passes through the spatial area, directly displaying the video picture in the current field of view in the panoramic video.
  • panoramic video is a 360-degree video shot with a 3D camera.
  • the virtual entrance corresponds to a spatial region in a real scene, and is a projection of the space occupied by the virtual entrance in the camera coordinate space in the world coordinate space.
  • the space area may be a planar space area, that is, there is no thickness; or a three-dimensional space area, that is, there is a thickness.
  • a virtual entrance virtual room door
  • the user moves the handheld terminal to the location, he perceives that he is walking towards the virtual entrance (virtual room door).
  • the terminal can establish a three-dimensional sphere model for the center of the sphere at the current position of the terminal, and render the panoramic video to the inside of the sphere in a textured manner. In this way, the terminal can directly display the video picture in the current field of view of the terminal in the panoramic video.
  • the current field of view is related to the current pose of the terminal.
  • FIG. 8 shows a rendering principle diagram after the position of the current terminal passes through a spatial area in an embodiment.
  • FIG. 8 (a) shows a schematic diagram of the current terminal position passing through a spatial area. It can be clearly seen that the terminal position passes through the area where the virtual entrance 801 is located and moves from one side of the virtual entrance to the virtual entrance. The other side.
  • Figure 8 (b) shows a schematic diagram of determining the video picture in the current field of view in the three-dimensional sphere model of the rendered panoramic video. It can be clearly seen that the center of the sphere of the three-dimensional sphere model is the terminal position, that is, the terminal camera is used for observation.
  • FIG. 8 (c) shows a simplified schematic diagram of the video screen of the intersection area 830 displayed on the terminal interface
  • FIG. 8 (d) shows a screenshot of the video screen of the intersection area 830 when displayed on the terminal interface.
  • the posture of the terminal may be changed according to a user instruction.
  • the current field of view of the terminal changes immediately.
  • the terminal can display the video picture in the current field of view in the panoramic video in real time.
  • the image processing method further includes: after directly displaying the video picture in the current field of view in the panoramic video, then the position of the current terminal does not pass through the space region again, and the current field of view is covered after the virtual entrance is moved When it is determined, the visual field in the virtual field of view in the current visual field is determined; and the picture in the visual field determined in the acquired image frame is displayed in the virtual gate.
  • the position of the current terminal does not pass through the space area again, and the current view area covers the virtual entrance after moving, that is, the current terminal is not displaced through the virtual entrance, but the posture of the current terminal is adjusted so that the virtual entrance re-enters Within the current field of view of the terminal.
  • the terminal directly displays the video picture in the current field of view in the panoramic video
  • the current field of view covers the virtual entrance after moving
  • the current field of view is determined
  • the field of view in the area is located in the virtual entrance, and the screen in the field of view determined in the acquired image frame is displayed in the virtual entrance. In this way, it is possible to display the real world in the virtual portal and display the virtual content outside the virtual portal.
  • the room door is behind the user and no longer appears in the user's field of vision.
  • the user adjusts the field of view in the room for viewing, and sees the scene pictures around the room, that is, after the position of the current terminal passes the spatial area in this embodiment, the video picture in the current field of view in the panoramic video directly displayed, the user is Seeing a panoramic video inside the virtual entrance.
  • the room door reappears in the user's field of vision, and the user sees the real picture outside the room through the room door, that is, in this embodiment, the position of the current terminal does not pass through the space area again and the current field of view
  • the area covers the virtual entrance after the movement the visual area within the virtual entrance in the current visual area is determined; and the picture in the visual area determined in the acquired image frame is displayed in the virtual entrance. In this way, the user sees a picture of a real scene in the virtual portal.
  • FIG. 9 is a schematic diagram of an interface displayed on a terminal interface when the position of the current terminal does not pass through the space area again and covers the virtual entrance after the current field of view area is moved in one embodiment.
  • inside the virtual entrance is a picture of a real scene, and outside the virtual entrance is virtual content.
  • the image processing method further includes: when the position of the current terminal moves around the spatial area, determining a field of view located in the virtual entry in the current field of view; displaying the field of view determined in the panoramic video in the virtual entry in the virtual entry. Video footage.
  • FIG. 10 shows a rendering principle diagram when the position of the current terminal moves around a spatial area in one embodiment.
  • the left diagram of FIG. 10 shows a schematic diagram when the position of the current terminal moves around a spatial area. It can be clearly seen that the terminal position bypasses the area where the virtual entrance 1001 is located and moves from one side of the virtual entrance to the virtual entrance. On the other side.
  • the terminal moves in the moving manner on the left in FIG. 10, in the screen displayed by the terminal, the virtual entrance is always virtual content, and the virtual entrance is always the real scene screen, as shown in the right diagram of FIG.
  • a user holds a terminal and opens a built-in camera on the terminal through an application program for invoking a camera on the terminal.
  • the terminal can collect image frames in a real scene under the current field of view of the camera.
  • the user holds the terminal to draw a circle in the field of vision of the rear camera with the index finger of his hand, and intends to display a virtual door, that is, a virtual entrance, at the position where the finger circles in the real scene.
  • the terminal renders the virtual entry at the position corresponding to the image position of the terminal screen.
  • the virtual scene is displayed outside the virtual entrance, and the partial video picture of the panoramic video is displayed inside the virtual entrance.
  • the location in the real scene is close, and the virtual entrance on the terminal screen gradually becomes larger, until the terminal no longer appears on the terminal screen after the terminal passes through the location, at this time the user sees the presentation Partial video picture of panoramic video on the terminal screen.
  • the user can adjust the field of view of the rear camera to view the panoramic video of different areas.
  • the user can hold the terminal backward to pass through the position again.
  • the virtual portal appears on the screen of the terminal and gradually becomes smaller.
  • the virtual scene is displayed outside the virtual entrance, and the partial video of the panoramic video is displayed inside the virtual entrance.
  • the user can turn around by holding the terminal, but does not pass through the location again.
  • the virtual entrance appears on the terminal screen, a partial video picture of the panoramic video is displayed outside the virtual entrance, and the real scene picture is displayed inside the virtual entrance.
  • the user's hand-held terminal maps the position in the real scene around the virtual portal.
  • the virtual portal always appears on the terminal screen.
  • the virtual scene is displayed outside the virtual entrance, and the partial video picture of the panoramic video is displayed inside the virtual entrance.
  • the change of the rendered content inside and outside the virtual portal is provided when the position of the current terminal moves through the virtual portal or moves around the virtual portal. Allows users to move from the entrance to the entrance to view the virtual world inside the entrance, or to move from the entrance to the outside to view the real world outside, and experience the virtual and realistic crossing effects.
  • the virtual content is a panoramic video.
  • the image processing method further includes: drawing the collected video frame on the inside of the sphere of the first sphere model, drawing the panoramic video picture of the panoramic video on the inside of the sphere of the second sphere model; determining that the virtual entrance corresponds to a spatial region in a real scene ;
  • the position of the current terminal has not passed through the space area, or the number of times the position of the current terminal has passed through the space area is an even number, according to the reverse order of the rendering order and model depth, according to the first sphere model, the second in the current field of view
  • the sphere model and the fully transparent third model are rendered for display.
  • the sphere radius of the first sphere model is greater than the sphere radius of the second sphere model.
  • the model depth of the first sphere model is greater than the model depth of the second sphere model.
  • the model depth of the second sphere model is greater than the model depth of the third model; the third model is used to trigger cancellation of rendering of the second sphere model in the field of view outside the virtual entrance in the current field of view when the current field of view covers the virtual entry ; Or, to trigger the unrendering when the viewport does not cover the virtual entrance.
  • the rendering order is the order of rendering models.
  • Model depth is the distance from the model boundary to the observation point. The deeper the model, the farther the model boundary is from the observation point.
  • the observation point is the position where the model is observed inside the model, and the picture rendered in the visual field of the observation point is the picture displayed on the terminal screen.
  • the rendering order is an order that is artificially set and needs to be based on rendering.
  • rendering is performed according to the rendering order and the reverse order of the model depth at the same time.
  • the depth of the rendered model is less than that of other models of the model, the model is no longer rendered.
  • the terminal may record depth information (writesToDepthBuffer) of each model in a depth buffer (depthBuffer), and add a depth information test mark (readsToDepthBuffer). Adding a depth information test mark indicates that when the model is drawn, the terminal will read the model depth of the model and render according to the model depth.
  • the terminal may determine the model coordinate position of the virtual entrance in the model coordinate space according to the change relationship between the model coordinate space and the world coordinate space, and the world coordinate position of the virtual entrance in the world coordinate system. Then, the first sphere model and the second sphere model are established using the model coordinate position as the center of the sphere. The terminal may then draw the collected video frames into the inside of the sphere of the first sphere model in a texture manner, and draw the panorama video picture of the panoramic video into the inside of the sphere of the second sphere model in a texture manner.
  • the terminal can also create a model plane that can be projected on the terminal screen in front of the observation point, and this model plane always stays in front of the observation point as the observation point moves and turns.
  • the terminal then draws image frames collected from the real scene on this model plane, so that the acquired image frames are played frame by frame according to the collected timing.
  • the virtual entry when the virtual entry is rendered for the first time after the trigger conditions are met, the virtual entry is the virtual content and the virtual entry is the real scene. Then, when the position of the current terminal has not crossed the space area, or the number of times the position of the current terminal has crossed the space area is an even number, that is, the virtual content is still inside the virtual entrance, and the virtual entrance is still a real scene.
  • the terminal can create a third model surrounding the observation point and set the rendering order to: first sphere model ⁇ third model ⁇ The second sphere model.
  • the virtual portal is located on the interface of the third model, and the area where the virtual portal is located is empty.
  • Figure 11 shows a schematic cross-sectional view of a model in one embodiment.
  • 11 (a) a first sphere model 1101, a second sphere model 1102, a third model 1103, and a virtual entrance 1104 are included.
  • the observation point 1105 is far from the virtual entrance 1104.
  • the terminal when the third model exists on the line of sight of the field of view of the observation point, because the rendering order of the third model takes precedence over the second sphere model, and the model depth of the third model is smaller than the second sphere model, the terminal only renders at this time.
  • the first sphere model and the third model obtain a picture for display.
  • the terminal can also set the transparency of the third model to be fully transparent, so the picture used for display at this time is actually drawing the video frames collected from the real scene on the inside of the sphere of the first sphere model, that is to say, the virtual entrance Show realistic scenes outside.
  • the line of sight OA starting from the observation point 1105 without passing through the virtual entrance 1104 passes through the third model, the second sphere model, and the first sphere model in sequence, as shown in FIG. 12 (a).
  • the rendering order is: first sphere model ⁇ third model ⁇ second sphere model, assuming that the color of the third model is C3 and transparency is T3; the color of the second sphere model is C2 and transparency is T2 ; The color of the first sphere model is C1, and the transparency is T1; then the color on the screen obtained by rendering is: C3 * T3 + (1-C3) * T1 * C1.
  • C3 can be set to 0, that is, fully transparent, and C1 to 1, that is, opaque.
  • the color on the screen is the color of the first sphere model, that is, the reality drawn on the inside of the sphere of the first sphere model. Image frames captured in the scene.
  • the terminal When the third model does not exist on the line of sight of the field of view of the observation point, the terminal only renders the first sphere model and the second sphere model to obtain a picture for display.
  • the terminal may set a second sphere model, which is opaque, so the picture used for display at this time is actually a video frame drawing a panoramic video on the inside of the sphere of the second sphere model, that is, ensuring that virtual content is displayed in the virtual entrance.
  • the line of sight OB starting from the observation point 1105 passing through the virtual entrance 1104, passing through the second sphere model and the first sphere model in sequence, as shown in FIG. 12 (b).
  • the rendering order is: first sphere model ⁇ second sphere model
  • the color of the second sphere model is C2 and transparency is T2
  • the color of the first sphere model is C1 and transparency is T1
  • rendering The resulting color on the screen is: C2 * T2 + (1-C2) * T1 * C1.
  • C2 can be set to 1, which is opaque
  • C1 is set to 1, which is opaque.
  • the color on the screen is the color of the second sphere model, that is, the video frame of the panoramic video drawn inside the sphere of the second sphere model.
  • the third model when the user enters the virtual entrance, the third model is used to cover the virtual entrance in the current field of view, triggering to cancel rendering of the panoramic video on the second sphere model in the field of view outside the virtual field of view in the current field of view. ; Trigger the rendering of the panoramic video on the second sphere model when the field of view does not cover the virtual entrance.
  • the image processing method further includes: when the number of times that the current terminal position passes through the spatial area is an odd number, according to the reverse order of the rendering order and the model depth, according to the first sphere model, the first The second sphere model and the completely transparent fourth model are rendered to obtain a picture for display; wherein the model depth of the second sphere model is greater than the model depth of the fourth model; the fourth model is used to trigger when the current view area covers the virtual entrance. Cancel rendering the second sphere model in the field of view outside the virtual entrance in the current field of view; or, for triggering the cancellation of rendering of the second sphere model when the field of view does not cover the virtual entrance.
  • the virtual entry when the virtual entry is rendered for the first time after the trigger conditions are met, the virtual entry is the virtual content and the virtual entry is the real scene. Then, when the number of times the current terminal ’s position crosses the spatial area is an odd number, that is, the inside of the virtual entrance changes to a real scene, and the outside of the virtual entrance changes to a virtual content scene.
  • the terminal can create a fourth model surrounding the observation point and set the rendering order as: first sphere model ⁇ fourth model ⁇ first Two sphere model.
  • the interface of the fourth model is a virtual entrance.
  • a first sphere model 1101, a second sphere model 1102, a fourth model 1106, and a virtual entrance 1104 are included. At this time, the observation point 1107 is far from the virtual entrance 1104.
  • the terminal when a fourth model exists on the line of sight of the field of view of the observation point, because the rendering order of the fourth model takes precedence over the second sphere model, and the model depth of the fourth model is smaller than the second sphere model, the terminal only renders at this time.
  • the first sphere model and the fourth model obtain a picture for display.
  • the terminal can also set the transparency of the fourth model to be fully transparent, so the picture used for display at this time is actually drawing the video frames collected from the real scene on the inside of the sphere of the first sphere model, that is to say, the virtual entrance is guaranteed Show realistic scenes inside.
  • the line of sight OC starting from the observation point 1107 passing through the virtual entrance 1104, passing through the fourth model, the second sphere model, and the first sphere model in sequence, as shown in FIG. 12 (c).
  • the rendering order is: first sphere model ⁇ fourth model ⁇ second sphere model, assuming that the color of the fourth model is C4 and transparency is T4; the color of the second sphere model is C2 and transparency is T2 ; The color of the first sphere model is C1, and the transparency is T1; then the color on the screen obtained by rendering is: C4 * T4 + (1-C4) * T1 * C1.
  • C4 to 0, which is fully transparent
  • C1 to 1, which is opaque
  • the color on the screen is the color of the first sphere model, which is drawn from the real scene drawn on the inside of the sphere of the first sphere model. Image frames.
  • the terminal When the third model does not exist on the line of sight of the field of view of the observation point, the terminal only renders the first sphere model and the second sphere model to obtain a picture for display.
  • the terminal can set a second sphere model, which is opaque, so the picture used for display at this time is actually a video frame for drawing a panoramic video on the inside of the sphere of the second sphere model, that is to say, virtual content is displayed outside the virtual entrance.
  • the line of sight OB starting from the observation point 1107 passing through the virtual entrance 1104, passing through the second sphere model and the first sphere model in sequence, as shown in FIG. 12 (d).
  • the rendering order is: first sphere model ⁇ second sphere model
  • the color of the second sphere model is C2 and transparency is T2
  • the color of the first sphere model is C1 and transparency is T1
  • rendering The resulting color on the screen is: C2 * T2 + (1-C2) * T1 * C1.
  • C2 can be set to 1, which is opaque
  • C1 is set to 1, which is opaque.
  • the color on the screen is the color of the second sphere model, that is, the video frame of the panoramic video drawn inside the sphere of the second sphere model.
  • the fourth model when the user has not entered the virtual entrance, the fourth model is used to trigger the cancellation of rendering of the panorama on the second sphere model in the view area outside the virtual entrance in the current view area when the current view area covers the virtual entrance.
  • Video content triggers to cancel rendering of panoramic video content on the second sphere model when the field of view does not cover the virtual entrance. This ensures that when the user does not enter the virtual portal and does not see the virtual portal, what he sees is the content of the actual scene; when he sees the virtual portal, the content of the panoramic video is inside the virtual portal, and the content of the scene is displayed outside the virtual portal.
  • FIG. 12 (c) is a cross-sectional view of the model when the number of times the position of the current terminal passes through the spatial area is odd, and the current terminal is oriented and close to the virtual entrance.
  • 12 (d) is a cross-sectional view of the model when the position of the current terminal has not crossed the space area, or the number of times the position of the current terminal has passed through the space area is an even number, and the current terminal is oriented close to the virtual entrance.
  • the third model is flipped according to the plane where the virtual entrance is located. This ensures that virtual content is still displayed in the virtual portal, and image frames collected from the real world are still displayed outside the virtual portal. In other words, the content displayed inside and outside the virtual portal will be exchanged only after the terminal passes the virtual portal.
  • the terminal renders a three-dimensional spherical space in the camera.
  • the spherical space fits and plays the panoramic video texture image, and the image frame collected from the real scene is played on the spherical surface larger than the ball, or in front of the observation point. Play the image frame collected from the real scene on the model plane.
  • the terminal then simulates an arbitrary door when the triggering conditions are met, displaying the virtual world inside the door and displaying the real world image outside the door. The user can move through the door from the door to the door hole to view the virtual world inside the door, or move from the door to the door to view the real world outside and experience the virtual and realistic crossing effects.
  • the door When a person enters any door and is far away from the door, the door is used as the fourth model, and the fourth model is completely transparent.
  • the inside is the real world and the outside is the virtual world, that is, the person is in the virtual world.
  • rendering jitter occurs.
  • the door faces the door, a small room that just fits the door is placed. The door itself is empty, and the small room is completely transparent. People look through the door. At the moment, the real world is inside the door and the virtual world is outside the door, that is, people are in the virtual world.
  • a third model is created surrounding the observation point with the door as a local boundary.
  • the area where the door is located on the third model is empty, and the other areas are completely transparent.
  • the door is used to extend the boundary. In this way, when people look through the door, the inside of the door is a virtual world and the outside of the door For the real world, that is, people are in the real world.
  • the boundary of the side where the observation point is located in the third model is extended.
  • the third model presses the door The plane is flipped. In this way, when people look through the door, the inside of the door is still a virtual world, and the outside is still the real world, that is, they are still in the real world.
  • the third model and the boundary of the third model in the foregoing embodiment may be a plane or a curved surface.
  • FIG. 13 is a flowchart block diagram of an image processing method in an exemplary embodiment.
  • a user holds a terminal and collects image frames from a real scene through a camera of the terminal.
  • the terminal creates a map based on the collected image frames, and on the other hand detects whether the image frames include a hand region. If the terminal does not detect that the image frame includes a hand region, it continues to detect the subsequently acquired image frames. If the terminal detects that the image frame includes a hand region, it continues to determine whether the gesture type corresponding to the hand region is a trigger type.
  • the terminal may then continue to determine whether the trajectory meets the trigger condition, and if not, continue to detect subsequent captured image frames; if it is, then trigger rendering of the virtual entry.
  • the terminal determines the position of the hand in the real scene, renders the virtual portal in the currently played image frame according to the position, displays a panoramic video in the virtual portal, and displays the real scene outside the virtual portal.
  • the user can hold the terminal to move, and the terminal then judges whether it passes through the virtual entrance. If passing through the virtual entrance, the video picture in the current field of view in the panoramic video is displayed directly. If the virtual entrance is not passed, the panoramic video is still displayed in the virtual entrance, and the real scene is displayed outside the virtual entrance.
  • the terminal can also perform screen recording according to user instructions, and share the video obtained by screen recording.
  • the terminal may also display the guidance information.
  • the guidance information can be text, voice, or pictures.
  • the guidance information includes information that guides the user to manipulate the movement of the target to meet the trigger conditions. For example, "stretch your finger, draw a circle in front of the rear camera, and trigger to open any door.” Refer to Figure 3 (b).
  • the guidance information may also include information to guide the user to move to the virtual portal. For example, for an arrow pointing to a virtual entrance, refer to FIG. 4 (b).
  • an image processing apparatus 1400 is provided.
  • the image processing apparatus 1400 includes an acquisition module 1401, a playback module 1402, a determination module 1403, and a rendering module 1404.
  • the obtaining module 1401 is configured to obtain an image frame collected from a real scene.
  • the playing module 1402 is configured to play the acquired image frames frame by frame according to the acquired timing.
  • a determining module 1403 is configured to determine a position corresponding to the target in a real scene when a trajectory formed by the movement of the target in the obtained multiple image frames meets a trigger condition.
  • the rendering module 1404 is configured to render a virtual entry in a currently played image frame according to a position; and display virtual content in the virtual entry.
  • the target is a hand.
  • the determining module 1403 is further configured to segment a hand image from the acquired image frame; identify a gesture type corresponding to the hand image; determine the motion reference point in the image frame when the gesture type is a trigger type; determine the hand according to the motion reference point The trajectory formed by the movement of the body.
  • the determination module 1403 is further configured to encode the obtained image frame into a semantic segmentation feature matrix through a hand recognition model; decode the semantic segmentation feature matrix to obtain a semantic segmentation image; the pixels in the semantic segmentation image have a representation Pixel values belonging to the classification category, and the pixels in the image frame encoded from correspond to; the hand image is segmented from the image according to the pixels belonging to the hand category.
  • the rendering module 1404 is further configured to replace a pixel value of a pixel passing by a track with a reference pixel value in a played video frame when a trigger condition is not satisfied; when the trigger condition is satisfied, Position in the currently playing image frame and play the reference animation.
  • the determining module 1403 is further configured to determine a world coordinate position of the target in a world coordinate space.
  • the rendering module 1404 is further configured to render the virtual entry in the currently played image frame according to the camera coordinate position corresponding to the world coordinate position in the camera coordinate space.
  • the rendering module 1404 is further configured to obtain the position and attitude of the current terminal; determine a transformation matrix of the current camera coordinate space and the world coordinate space according to the position and attitude of the current terminal; and transform the world coordinate position according to the transformation matrix Is the camera coordinate position in the camera coordinate space; according to the camera coordinate position, a virtual entry is rendered in the currently played image frame.
  • the rendering module 1404 is further configured to select a map node from the map that matches the acquired image frame; query a location corresponding to a real scene stored in the map node; acquire sensor data collected by an inertial sensor; and according to the sensor Data to determine the current posture of the terminal in a real scene.
  • the image processing device 1400 further includes a map construction module 1405 for selecting an image frame from the image frames collected in time sequence; when the image features of the selected image frame match the image of the node image During feature extraction, the selected image frame is a node image; it is determined that the acquired node image corresponds to a map node in the map; the image feature corresponding to the determined map node stores the acquired node image, and the acquired node image is collected in a real scene Location.
  • a map construction module 1405 for selecting an image frame from the image frames collected in time sequence; when the image features of the selected image frame match the image of the node image During feature extraction, the selected image frame is a node image; it is determined that the acquired node image corresponds to a map node in the map; the image feature corresponding to the determined map node stores the acquired node image, and the acquired node image is collected in a real scene Location.
  • the rendering module 1404 is further configured to project the model vertices of the virtual entrance into corresponding pixel points in the image coordinate space; according to the connection relationship between the model vertices, combine the corresponding pixel points of the model vertices into primitives;
  • the rasterized primitive is rendered according to the pixel value of each pixel in the primitive at the image coordinate position corresponding to the camera coordinate position in the image coordinate space, and a virtual entry is obtained.
  • the virtual content is a panoramic video.
  • the rendering module 1404 is also used to determine that the virtual entrance corresponds to a spatial area in a real scene; after the current terminal position passes through the spatial area, it directly displays the video picture in the current field of view in the panoramic video.
  • the rendering module 1404 is further configured to directly display the video picture in the current field of view in the panoramic video, then cover the virtual entrance after the current terminal position does not pass through the space region again, and the current field of view is moved.
  • the visual field in the virtual field of view in the current visual field is determined; and the picture in the visual field determined in the acquired image frame is displayed in the virtual gate.
  • the rendering module 1404 is further configured to determine a field of view located in the virtual entrance in the current field of view when the current terminal moves around the spatial area; and display the field of view determined in the panoramic video in the virtual entrance in the virtual entrance. Video footage.
  • the virtual content is a panoramic video.
  • the rendering module 1404 is also used to draw the captured video frames on the inside of the sphere of the first sphere model, and draw the panoramic video picture of the panoramic video on the inside of the sphere of the second sphere model; determine that the virtual entrance corresponds to the spatial area in the real scene ;
  • the position of the current terminal has not passed through the space area, or the number of times the position of the current terminal has passed through the space area is an even number, according to the reverse order of the rendering order and model depth, according to the first sphere model, the second in the current field of view
  • the sphere model and the fully transparent third model are rendered for display.
  • the sphere radius of the first sphere model is greater than the sphere radius of the second sphere model.
  • the model depth of the first sphere model is greater than the model depth of the second sphere model. ;
  • the model depth of the second sphere model is greater than the model depth of the third model;
  • the third model is used to trigger cancellation of rendering of the second sphere model in the field of view outside the virtual entrance in the current field of view when the current field of view covers the virtual entry ; Or, to trigger the cancellation of the rendering Two sphere model.
  • the rendering module 1404 is further configured to, when the number of times that the current terminal position passes through the spatial area is an odd number, according to the reverse order of the rendering order and the model depth, according to the first sphere model, the second The sphere model and the completely transparent fourth model are rendered for display.
  • the model depth of the second sphere model is greater than the model depth of the fourth model.
  • the fourth model is used to trigger the cancellation when the current view area covers the virtual entrance. Rendering a second sphere model in a field of view outside the virtual entrance in the current field of view; or for triggering cancellation of rendering of the second sphere model when the field of view does not cover the virtual entrance.
  • FIG. 16 shows an internal structure diagram of a computer device in one embodiment.
  • the computer device may be the terminal 110 in FIG. 1.
  • the computer device includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system, and may also store a computer program.
  • the processor can implement an image processing method.
  • a computer program may also be stored in the internal memory, and when the computer program is executed by the processor, the processor may execute the image processing method.
  • the display screen of a computer device may be a liquid crystal display or an electronic ink display, etc.
  • the input device may be a touch layer covered on the display screen, or a button, a trackball or a touchpad provided on the computer equipment shell, or it may be External keyboard, trackpad, or mouse.
  • a touch layer covered on the display screen, or a button, a trackball or a touchpad provided on the computer equipment shell, or it may be External keyboard, trackpad, or mouse.
  • FIG. 16 is only a block diagram of a part of the structure related to the scheme of the present application, and does not constitute a limitation on the computer equipment to which the scheme of the present application is applied.
  • the specific computer equipment may be Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.
  • the image processing apparatus provided in this application may be implemented in the form of a computer program.
  • the computer program may be run on a computer device as shown in FIG. 16, and a non-volatile storage medium of the computer device may store a composition.
  • Each program module of the image processing apparatus includes, for example, an acquisition module 1401, a playback module 1402, a determination module 1403, a rendering module 1404, and the like shown in FIG.
  • the computer program composed of each program module causes the processor to execute the steps in the image processing method of each embodiment of the present application described in this specification.
  • the computer device shown in FIG. 16 may obtain an image frame collected from a real scene through the obtaining module 1401 in the image processing apparatus 1400 shown in FIG. 14.
  • the acquired image frame is played frame by frame by the playback module 1402 according to the captured timing.
  • the determination module 1403 determines the position corresponding to the target in the real scene when the trajectory formed by the movement of the target in the acquired multi-frame image frames meets the trigger condition.
  • the rendering module 1404 renders the virtual entry in the currently played image frame according to the position; and displays the virtual content in the virtual entry.
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor causes the processor to execute any one of the foregoing image processing methods.
  • a computer device including a memory and a processor.
  • the memory stores a computer program, and when the computer program is executed by the processor, the processor causes the processor to execute any one of the foregoing image processing methods.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM dual data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Geometry (AREA)
  • Multimedia (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Processing Or Creating Images (AREA)
  • User Interface Of Digital Computer (AREA)
  • Position Input By Displaying (AREA)

Abstract

本申请涉及一种图像处理方法、装置、存储介质和计算机设备,该方法包括:获取从现实场景中采集的图像帧;将获取的图像帧按照采集的时序逐帧播放;当获取的多帧图像帧中目标物运动所形成的轨迹满足触发条件时,确定现实场景中所述目标物所对应的位置;按照所述位置在当前播放的图像帧中渲染虚拟入口;在所述虚拟入口中显示虚拟内容。本申请提供的方案提高了图像处理效率。

Description

图像处理方法、装置、存储介质和计算机设备
本申请要求于2018年05月22日提交的申请号为201810494117.8、发明名称为“图像处理方法、装置、存储介质和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别是涉及一种图像处理方法、装置、存储介质和计算机设备。
背景技术
随着计算机技术的发展,图像处理技术也不断进步。用户可以通过专业的图像处理软件对图像进行处理,使得经过处理的图像表现更好。用户还可以通过图像处理软件,在图像中附加由图像处理软件提供的素材,让经过处理的图像能够传递更多的信息。
然而,目前的图像处理方式,需要用户展开图像处理软件的素材库,浏览素材库,从素材库中选择合适的素材,调整素材在图像中的位置,从而确认修改,完成图像处理。于是目前的图像处理方式需要大量的人工操作,耗时长,导致图像处理过程效率低。
发明内容
基于此,本申请实施例提供一种图像处理方法、装置、存储介质和计算机设备,以解决针对传统的图像处理过程效率低的问题。
一种图像处理方法,包括:
获取从现实场景中采集的图像帧;
将获取的图像帧按照采集的时序逐帧播放;
当获取的多帧图像帧中目标物运动所形成的轨迹满足触发条件时,确定现实场景中所述目标物所对应的位置;
按照所述位置在当前播放的图像帧中渲染虚拟入口;
在所述虚拟入口中显示虚拟内容。
一种图像处理装置,包括:
获取模块,用于获取从现实场景中采集的图像帧;
播放模块,用于将获取的图像帧按照采集的时序逐帧播放;
确定模块,用于当获取的多帧图像帧中目标物运动所形成的轨迹满足触发条件时,确定现实场景中所述目标物所对应的位置;
渲染模块,用于按照所述位置在当前播放的图像帧中渲染虚拟入口;在所述虚拟入口中显示虚拟内容。
一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行以下步骤:
获取从现实场景中采集的图像帧;
将获取的图像帧按照采集的时序逐帧播放;
当获取的多帧图像帧中目标物运动所形成的轨迹满足触发条件时,确定现实场景中所述目标物所对应的位置;
按照所述位置在当前播放的图像帧中渲染虚拟入口;
在所述虚拟入口中显示虚拟内容。
一种计算机设备,包括存储器和处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
获取从现实场景中采集的图像帧;
将获取的图像帧按照采集的时序逐帧播放;
当获取的多帧图像帧中目标物运动所形成的轨迹满足触发条件时,确定现实场景中所述目标物所对应的位置;
按照所述位置在当前播放的图像帧中渲染虚拟入口;
在所述虚拟入口中显示虚拟内容。
上述图像处理方法、装置、存储介质和计算机设备,一方面将反映现实场景的图像帧播放,使得播放的图像帧能够反映现实场景;另一方面在从现实场景中采集的图像帧中目标物运动所形成的轨迹满足触发条件时,自动确定现实场景中该目标物所对应的位置,以按照确定的位置在当前播放的图像帧中渲染虚拟入口,并在虚拟模入口中显示虚拟内容,这 样就可以自动实现在虚拟入口内显示虚拟世界的虚拟内容,在虚拟入口外显示现实世界的现实内容,避免了人工操作的繁琐步骤,极大地提高了图像处理效率。
附图说明
图1为一个实施例中图像处理方法的应用环境图;
图2为一个实施例中图像处理方法的流程示意图;
图3为一个实施例中播放图像帧的界面示意图;
图4为一个实施例中在当前播放的图像帧中渲染虚拟入口的界面示意图;
图5为一个实施例中从获取的图像帧中分割出手部区域的示意图;
图6为一个实施例中在图像帧中轨迹变化的示意图;
图7为一个实施例中坐标空间的关系示意图;
图8为一个实施例中当前终端的位置经过空间区域后的渲染原理图;
图9为一个实施例中当前终端的位置未再次穿过空间区域、且当前视野区域经过移动后覆盖虚拟入口时,终端界面显示的界面示意图;
图10为一个实施例中当前终端的位置围绕空间区域移动时的渲染原理图;
图11为一个实施例中模型的剖面示意图;
图12为一个实施例中渲染原理示意图;
图13为一个实施例中图像处理方法的流程框图;
图14为一个实施例中图像处理装置的模块结构图;
图15为另一个实施例中图像处理装置的模块结构图;
图16为一个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
图1为一个实施例中图像处理方法的应用环境图。参照图1,该图像处理方法应用于图像处理系统。该图像处理系统包括终端110和服务器120。其中,终端110和服务器120通过网络连接。终端110用于执行图像处理方法。示例性地,终端110可以是台式终端或移动终端,移动终端可以是手机、平板电脑、笔记本电脑等中的至少一种。服务器120可以是独立的服务器,也可以是多个独立的服务器组成的服务器集群。
终端110可以获取从现实场景中采集的图像帧,将获取的图像帧按照采集的时序逐帧播放。该图像帧可以是终端110通过内置的图像采集装置或者外部连接的图像采集装置从现实世界中采集的,内置的图像采集装置可以是终端110的前置摄像头或者后置摄像头;该图像帧也可以是其它设备从现实场景中采集后发送至终端110的。终端110可在本地判定获取的多帧图像帧中目标物运动所形成的轨迹满足触发条件时,确定现实场景中目标物所对应的位置,按照该位置在当前播放的图像帧中渲染虚拟入口,在虚拟入口中显示虚拟内容;终端110也可将获取的图像帧发送至服务器120,由服务器120在判定获取的多帧图像帧中目标物运动所形成的轨迹满足触发条件时,通知终端110触发条件被满足,终端110继而确定现实场景中目标物所对应的位置,按照该位置在当前播放的图像帧中渲染虚拟入口,在虚拟入口中显示虚拟内容。
图2为一个实施例中图像处理方法的流程示意图。本实施例主要以该图像处理方法应用于计算机设备来举例说明,该计算机设备可以是图1中的终端110。参照图2,该图像处理方法包括如下步骤:
S202,获取从现实场景中采集的图像帧。
其中,现实场景是自然世界中存在的场景。图像帧是能够形成动态画面的图像帧序列中的单元,用来记录某时刻现实场景中的画面。
在一个实施例中,终端可按照固定或动态的帧率,从现实场景中采集图像帧,获取采集得到的图像帧。其中,固定或动态的帧率能够使图像帧按照该固定或动态的帧率播放时形成连续的动态画面。
在一个实施例中,终端可通过内置或者外部连接的图像采集装置如摄像头,在摄像头当前的视野下,采集现实场景的图像帧,获取采集得到的图像帧。其中,摄像头的视野可因终端的姿态和位置的变化而变化。
在一个示例性实施例中,终端可通过在本机上运行的应用所提供的AR(Augmented Reality,增强现实)拍摄模式,并在选定该AR拍摄模式后,从现实场景中采集图像帧,获取采集得到的图像帧。其中,应用可以是社交应用,社交应用是能够基于社交网络进行网络社交互动的应用。社交应用包括即时通信应用、SNS(Social Network Service,社交网站)应用、直播应用或者拍照应用等,比如QQ或者微信等。
在一个实施例中,终端可接收另一终端发送的从现实场景中采集的图像帧,获取接收的图像帧。比如,终端通过运行在终端上的社交应用建立视频会话时,接收其他会话方所对应的终端发送的从现实场景中采集的图像帧。
在一个实施例中,获取图像帧的帧率可以与采集图像帧的帧率一致,也可小于采集图像 帧的帧率。
S204,将获取的图像帧按照采集的时序逐帧播放。
其中,采集的时序是指采集图像帧时的时间顺序,可通过图像帧在采集时记录的时间戳的大小关系来表示。逐帧播放是指逐图像帧播放。
示例性地,终端可按照获取图像帧的帧率,按照时间戳升序,逐个播放采集的图像帧。终端可以将获取的图像帧直接播放,也可以将获取的图像帧按照采集的时序存入缓存区,并按采集的时序从缓存区取出图像帧播放。
在一个实施例中,终端可将接收到的另一终端发送的从现实场景中采集的图像帧,按照另一终端采集图像帧的帧率,根据时间戳升序,逐个播放接收到的图像帧。终端可以将接收到的图像帧直接播放,也可以将接收到的图像帧按照采集的时序存入缓存区,并按采集的时序从缓存区取出图像帧播放。
图3示出了一个实施例中播放图像帧的界面示意图。参考图3,图3(a)为播放图像帧时终端界面的简化示意图,图3(b)为播放图像帧时终端界面的录屏截图。可以看出,终端显示屏上显示的即为现实场景中的画面。
S206,当获取的多帧图像帧中目标物运动所形成的轨迹满足触发条件时,确定现实场景中目标物所对应的位置。
其中,目标物是现实场景中作为目标的实体。目标物可以是手部、面部或者长条形物体等。目标物运动所形成的轨迹,可以是目标物在运动时获取的图像帧中目标物的参考点移动所形成的轨迹。比如用户控制手部运动时,获取的图像帧中手部食指指尖的成像点移动所形成的轨迹,再比如,用户手持长条形物体(比如笔或者魔法棒等)运动时,获取的图像帧中长条形物体顶部的成像点移动所形成的轨迹等。
触发条件是触发特定事件的约束条件。在本实施例中,特定事件是在播放的图像帧中渲染虚拟入口的事件。触发条件可以是获取的多帧图像帧中目标物运动所形成的轨迹为规则的闭合形状,比如三角形、四边形或者圆形等。
可以理解,用户可选择目标物,通过控制目标物在现实场景中移动,从而使得采集该目标物的图像帧中目标物运动所形成的轨迹满足特定的约束条件(如触发条件),以触发特定的事件(渲染虚拟入口)。而在现实场景中目标物所对应的位置,就是用户意图终端在播放的图像帧中渲染虚拟入口的位置在现实场景中的映射。这样,对于用户而言,从视觉上的感知即为触发一个虚拟入口在现实世界中出现,犹如现实世界的真实入口一样。
示例性地,终端在获取的多帧图像帧中目标物运动所形成的轨迹满足触发条件时,确定目标物在该图像帧中的坐标位置,根据与终端的图像采集装置适配的投影矩阵,计算该目 标物在现实场景中的位置。其中,目标物在该图像帧中的坐标位置具体可以是目标物的参考点在该图像帧中的坐标位置。比如,手部食指指尖的成像点的坐标位置。目标物在该图像帧中的坐标位置也可以是目标物运动所形成的轨迹的中心坐标位置。比如,手部食指指尖画圈时,该圆形轨迹的中心坐标位置。
在一个示例性的应用场景中,用户手持终端,通过终端上用于调用摄像头的应用程序打开终端上内置的摄像头,这样,终端可在摄像头当前的视野下,采集现实场景中的图像帧。用户对着终端的摄像头控制目标物运动就会被采集到,从而获取到实时采集的包括目标物的图像帧。可以理解,若用户打开的是前置摄像头,那么采集的图像帧中包括目标物以及前置摄像头当前视野下的背景;若用户打开的是后置摄像头,那么采集的图像中就包括目标物以及后置摄像头当前视野下的背景。
S208,按照位置在当前播放的图像帧中渲染虚拟入口。
其中,虚拟入口是相对于真实入口的概念。真实入口是用作在现实场景中划分现实空间的实体。真实入口比如房间门,可将现实空间划分为房间内区域和房间外区域;或者,真实入口比如景区入口,可将现实空间划分为景区和非景区等。虚拟入口则是用作在虚拟场景中划分区域的虚拟模型。虚拟入口比如虚拟模型门等。
可以理解,现实场景中的位置是真实空间中的绝对位置。该位置不因终端内置或者外部连接的图像采集装置当前视野的变化而变化。由此可知,在图像采集装置当前视野发生变化时,虚拟入口在图像帧中的渲染位置和尺寸均不同。那么,在图像采集装置当前视野发生变化时,虚拟入口在图像帧中的渲染位置和尺寸依据现实世界中物体成像原理,呈现出近大远小的效果。
示例性地,终端在确定目标物在现实场景中的位置后,可根据现实场景与终端的图像采集装置当前视野下适配的投影矩阵,计算该虚拟入口在当前视野下采集的图像帧中的渲染位置,在该位置渲染虚拟入口。
图4示出了一个实施例中在当前播放的图像帧中渲染虚拟入口的界面示意图。参考图4,图4(a)为在当前播放的图像帧中渲染虚拟入口时终端界面的简化示意图,图4(b)为在当前播放的图像帧中渲染虚拟入口时终端界面的录屏截图。图4(a)包括渲染虚拟入口410a,图4(b)包括渲染虚拟入口410b。
S210,在虚拟入口中显示虚拟内容。
其中,这里的虚拟内容是不存在于获取的图像帧所采集自的现实场景中的内容。例如,获取的图像帧采集自现实场景A,则虚拟内容是不存在于该现实场景A中的内容。可以理解,这里的虚拟内容是相对于当前现实场景而言虚拟的内容,而非绝对虚拟的内容。也就 是说,这里的虚拟内容可以是通过计算机技术模拟出的完全虚拟的内容,也可以是在非当前现实场景中的内容。当前现实场景即终端获取的图像帧采集自的现实场景。
在一个实施例中,虚拟内容可以是动态内容,也可以是静态内容。虚拟内容可以是统一的内容,也可以是与图像帧中目标物运动所形成的轨迹对应的内容,还可以是通过用户自主选择的内容。示例性地,终端可设置轨迹与虚拟内容的对应关系,这样终端即可在识别出轨迹后,即查询与该轨迹对应的虚拟内容进行显示。比如,当轨迹为三角形时,显示的虚拟内容为足球比赛视频;当轨迹为四边形时,显示的虚拟内容为拍摄商场得到的视频;当轨迹为圆形时,显示的虚拟内容为拍摄旅游景区得到的视频。终端还可显示选择对话框,在选择对话框中展示可供选择的虚拟内容,再在虚拟入口中显示用户选择指令选中的虚拟内容。
在一个实施例中,虚拟内容可以是虚拟视频,也可以是从现实场景中采集图像帧生成的视频。比如,用户手持终端处于办公室中,采集办公室内的现实场景得到图像帧播放,那么虚拟入口外显示的内容是办公室这一现实场景,虚拟入口内显示的可以是游戏视频,也可以是其他图像采集装置在非当前办公室,比如王府井大街的现实场景。
继续参考图4,可以看出虚拟入口410a(b)内显示的是虚拟内容,虚拟入口410a(b)外显示的是现实场景中的画面。这样,对于用户而言,从视觉上的感知即为虚拟入口内是虚拟世界,虚拟入口外则是现实世界。用户便可通过移动穿过虚拟入口查看虚拟入口内的虚拟世界,或者通过移动传出虚拟入口查看虚拟入口外的现实世界,体验在虚拟世界和现实世界中的穿越效果。
上述图像处理方法,一方面将反映现实场景的图像帧播放,使得播放的图像帧能够反映现实场景;另一方面在从现实场景中采集的图像帧中目标物运动所形成的轨迹满足触发条件时,自动确定现实场景中该目标物所对应的位置,以按照确定的位置在当前播放的图像帧中渲染虚拟入口,并在虚拟模入口中显示虚拟内容,这样就可以自动实现在虚拟入口内显示虚拟世界的虚拟内容,在虚拟入口外显示现实世界的现实内容,避免了人工操作的繁琐步骤,极大地提高了图像处理效率。
在一个实施例中,目标物为手部。该图像处理方法还包括:从获取的图像帧中分割出手部图像;识别出手部图像所对应的手势类型;当手势类型为触发类型时,在图像帧中确定运动参考点;按照运动参考点确定手部运动所形成的轨迹。
其中,手部是人或动物的肢体部分。手部图像是包括手部、且手部区域占图像区域占比高的图像。手势是由用户通过手部做出的动作形态。手势类型是获取的图像帧中手势所属的类型。触发类型为触发特定事件的手势所属的类型。
运动参考点是用作目标物的运动过程判别的参考基准。可以理解,在不同的图像帧中运动参考点的位置发生了变化,则表示目标物进行了运行。比如,以手指的食指指尖的成像点为运动参考点,当食指指尖在多帧图像帧中的成像点位置发生了变化,则判定手部进行了运动。
可以理解,相比于直接对获取的原始图像帧中手部区域进行手势类型识别,从获取的图像帧中分割出手部图像之后再对分割出的手部图像进行手势类型识别,可避免手部图像占整个图像帧的比例较小时导致的识别不准确的问题,能够减少原始图像帧中相对于手部区域的背景区域对手部区域的手势类型识别的干扰,可以提高识别的准确度。
在一个实施例中,从获取的图像帧中分割出手部图像,包括:通过手部识别模型,将获取的图像帧编码为语义分割特征矩阵;解码语义分割特征矩阵得到语义分割图像;语义分割图像中的像素点,具有表示所属分类类别的像素值,且与编码自的图像帧中的像素点对应;根据属于手部类别的像素点从图像中分割出手部图像。
其中,手部识别模型是经过训练后具有手部识别能力的机器学习模型。机器学习英文全称为Machine Learning,简称ML。机器学习模型可通过样本学习具备特定的能力。机器学习模型可采用神经网络模型、支持向量机或者逻辑回归模型等。神经网络模型比如卷积神经网络等。在本实施例中,手部识别模型可以是全卷积网络模型(Fully Convolutional Networks)。
语义分割特征矩阵是对图像帧中图像内容的语义特征的低维表达,涵盖了该整个图像帧的语义特征信息。语义分割图像是分割为若干个互不重叠的、具有一定语义的区域的图像。语义分割图像中像素点的像素值用于反映相应像素点所属的分类类别。像素点的分类可以是二分类,也可以是多分类。像素点二分类,比如地图图像中对应道路的像素点和其他像素点。像素点多分类,比如风景地图中对应天空的像素点、对应大地的像素点以及对应人物的像素点等。语义分割图像的图像尺寸与原始图像帧的图像尺寸一致。这样,可以理解为对模型输入图像进行了逐像素点分类,根据语义分割图像中的像素点的像素值,即可得到模型输入图像中的每个像素点的类别隶属。
示例性地,终端可通过属于各手势类型的图像样本,训练得到手部识别模型。这样,终端在从获取的图像帧中分割出手部图像后,将手部图像作为手部识别模型的输入,通过获取的手部识别模型的编码结构,将手部图像编码为语义分割特征矩阵。再继续通过手部识别模型的解码结构,解码语义分割特征矩阵得到语义分割图像。
其中,当终端设置的触发类型的手势类型唯一时,手部识别模型即为二分类模型。用于训练二分类模型的图像样本包括属于目标手势类型的正样本,及不属于目标手势类型的负 样本。当终端设置的触发类型的手势类型多样时,手部识别模型即为多分类模型。用于训练多分类模型的图像样本包括属于各目标手势类型的样本。
图5示出了一个实施例中从获取的图像帧中分割出手部区域的示意图。参考图5,图5(a)为获取的包括手部区域的原始图像帧;图5(b)为进行语义分割后得到的语义分割图像;图5(c)为分割出手部区域的规则的手部图像。
在本实施例中,在获取到图像帧后,即自动将该图像帧输入训练好的机器学习模型,将图像帧编码为语义分割特征矩阵,再解码该语义分割特征矩阵即可得到语义分割图像。其中,语义分割图像中的像素点,具有表示所属分类类别的像素值,且与原始图像帧中的像素点对应。这样即可自动根据属于手部类别的像素点来确定手部区域以分割出手部图像,提高了图像分割准确率。
进一步地,终端可将分割出的手部图像与触发类型的手部图像模板进行比较,在分割出的手部图像与触发类型的手部图像模板匹配时,判定手部图像所对应的手势类型为触发类型,然后在图像帧中确定运动参考点;按照运动参考点确定手部运动所形成的轨迹。
终端也可将分割出的手部图像输入训练好的手势识别模型,得到该手势识别模型输出的手势识别结果,当该手势识别结果表示手部图像所对应的手势类型为触发类型,然后在图像帧中确定运动参考点;按照运动参考点确定手部运动所形成的轨迹。
在一个示例性的场景中,用户对着终端的摄像头做出一个手势,在确定了该手势对应的操作类型为绘图类型时,就确定采集的连续的每一帧图像中手势的运动参考点,由于采集图像的频率较高,因此连续的运动参考点用很短线段连接起来后可以形成轨迹。
在一个实施例中,终端可按采集时序从获取的图像帧中,按小于获取帧率的帧率选取图像帧进行手部图像分割与手势识别。其中,从获取的图像帧中选取图像帧,可以是通过多线程异步选取图像帧分别独立进行手部图像分割与手势识别,这样可以提高识别效率。
上述实施例中,通过手部做出特定的手势,隔空绘制出特定的轨迹,即可自动触发虚拟入口的展示效果。整个绘图过程无需用户通过输入装置进行操作,用户可在较大的空间范围内通过示出的手势进行绘图,提高了触发虚拟入口的便利性。
在一个实施例中,该图像处理方法还包括:当触发条件未满足时,在播放的视频帧中,将轨迹所经过的像素点的像素值替换为参考像素值;当触发条件被满足时,按照位置在当前播放的图像帧中,播放参考动画。
示例性地,当触发条件未满足时,为了以可视化方式突显目标物运动所形成的轨迹,便于用户直观地感知是否满足触发条件,可以以异于图像中像素点的像素值展示轨迹。轨迹实质上是连续的多帧图像帧中各运动参考点所对应的像素坐标形成的轨迹,因而,终端可 以根据各个像素坐标在图像帧中确定轨迹所经过的像素点,将这些像素点的像素值更新为参考像素值。参考像素值比如较为鲜艳的绿色或红色对应的像素值。
在一个实施例中,终端也可将以这些像素点为中心,一定范围内的像素点的像素值更新为参考像素值。终端也可在轨迹所经过的像素点处渲染粒子动画,覆盖或者替换轨迹所经过的像素点的像素值,实现魔法手势运动的效果。
在一个实施例中,终端在更新像素值或者渲染粒子动画时,可以是实时进行的,一旦确定了当前图像帧中轨迹经过的像素点,就将该像素点的像素值进行更新,或者在该像素点位置渲染粒子动画,这样即可实现实时展示运动轨迹。
图6示出了一个实施例中在图像帧中轨迹变化的示意图。参考图6,图6(a)、(b)和(c)为图像帧中轨迹变化时终端界面的简化示意图,图6(d)、(e)和(f)为图像帧中轨迹变化时终端界面的录屏截图。从图6(a)或(d)可以明显看出轨迹所经过的像素点的像素值更新为异于原始像素值。从图6(b)或(e)可以明显看出获取的多帧图像帧中目标物运动所形成的轨迹为圆形,满足触发条件。从图6(c)或(f)可以明显看出取值为异于原始像素值的像素点逐渐向轨迹中心位置处靠近,实现动画效果。
进一步地,当触发条件被满足时,终端可按照目标物所对应的位置,在当前播放的图像帧中播放参考动画。其中,参考动画可以是逐渐将更新为参考像素值的像素点恢复初始像素值,或者逐渐取消渲染粒子动画,或者粒子动画逐渐向轨迹中心位置处靠近后取消渲染等。
上述实施例中,通过对图像帧中轨迹所经过的像素点的像素值进行更新,可以将目标物的运动轨迹直接展示在当前的图像帧中,形成实时绘图的效果,提高用户感知度。而且在触发条件满足后,播放参考动画,提高了趣味性。
在一个实施例中,确定现实场景中目标物所对应的位置,包括:确定目标物在世界坐标空间中的世界坐标位置。按照目标物所对应的位置,在当前播放的图像帧中渲染虚拟入口,包括:按照相机坐标空间中对应于该世界坐标位置的相机坐标位置,在当前播放的图像帧中渲染虚拟入口。其中,该世界坐标位置即目标物在世界坐标空间中的坐标位置。
其中,世界坐标空间是现实场景的坐标空间,是固定不变的绝对坐标空间。相机坐标空间是以光轴与图像平面的交点为原点所构成的坐标空间,是随图像采集装置(相机)的位置变化而变化的相对坐标空间。世界坐标空间中的世界坐标位置可通过刚体变化映射为相机坐标系中的相机坐标位置。
可以理解,对于现实场景中的目标物,通常情况下该目标物在未发生运动时,其在世界坐标空间中的世界坐标位置是固定不变的,但随着图像采集装置的位置和姿态的变化,其 在相机坐标空间中的相机坐标位置是相对不变的。
在本申请中,是意图在目标物运动的世界坐标位置映射在图像帧中的位置处渲染虚拟入口,让用户感知为目标物运动的世界坐标位置处出现了虚拟入口,故需要获取目标物运动的世界坐标位置,再实时确定在当前的相机坐标空间中的相机坐标位置。比如,用户手持终端在后置摄像头的视野区域内用手部食指指尖画出了圆圈,那就是意图在现实场景中手指画圈的那个位置出现虚拟门,即虚拟入口。
示例性地,终端在判定获取的多帧图像帧中目标物运动所形成的轨迹满足触发条件时,可获取当前图像帧中目标物的图像坐标位置,按照图像坐标空间与相机坐标空间的投影变化,得到目标物的相机坐标位置,再按照相机坐标空间与世界坐标空间的刚体变化,得到目标物在世界坐标空间中的世界坐标位置。其中,图像坐标空间是以图像的中心处为原点,坐标轴平行于图像的边所构成的坐标空间。
可以理解,由于相机坐标空间和图像坐标空间之间的投影变化关系是基于出厂设置确定的,所以根据投影变化关系,以及图像坐标空间中的图像坐标位置,可以确定目标物在相机坐标空间中横纵坐标,再根据目标物的图像深度可以得到目标物在相机坐标空间中垂直方向的坐标。
由于本申请实施例中涉及到多个坐标空间,为了使本申请方案更加清晰,现以图7为例对各坐标空间进行区分说明。参照图7,Oo-XoYoZo为模型坐标空间的坐标系、Ow-XwYwZw为世界坐标空间的坐标系、Oc-XcYcZc为相机坐标空间的坐标系、O 1-xy为图像坐标空间的坐标系以及O 2-uv为像素坐标空间的坐标系。点P(Xw,Yw,Zw)为世界坐标空间中的点(即现实世界中真实的点),点p即为与点P(Xw,Yw,Zw)匹配的图像帧中的像点。点p在图像坐标系空间中的位置坐标为(x,y),在像素坐标系空间中的位置坐标为(u,v)。通过点p在图像坐标空间中的位置坐标(x,y)和点p的深度,可以确定点p在相机坐标空间中的相机坐标位置。可以理解,像素坐标空间的原点即为屏幕顶点。
在一个实施例中,按照相机坐标空间中对应于世界坐标位置的相机坐标位置,在当前播放的图像帧中渲染虚拟入口,包括:获取当前终端的位置和姿态;根据当前终端的位置和姿态,确定当前的相机坐标空间与世界坐标空间的变换矩阵;按变换矩阵将世界坐标位置变换为相机坐标空间中的相机坐标位置;按照相机坐标位置,在当前播放的图像帧中渲染虚拟入口。
其中,当前终端的位置即为当前终端的图像采集装置在现实场景中的位置。当前终端的姿态是当前终端的图像采集装置在现实场景中横滚(roll)、俯仰(pitch)、偏转(yaw)的空间状态。
可以理解,终端在进行图像处理之前,还执行了构建地图的步骤。终端通过采集关键帧,定位并记录采集关键帧时在现实场景中的位置,这样终端在实时进行图像处理时,即可将当前获取的图像帧与关键帧进行匹配,以将对应于匹配的关键帧记录的位置。终端可基于SLAM(Simultaneous Localization And Mapping定位与地图构建)、VO(Visual Odometry视觉里程计)或者VIO(Visual Inertial Odometry视觉惯性里程计)对现实场景构建相应的地图。
在一个实施例中,获取当前终端的位置和姿态,包括:从地图中挑选与获取的图像帧匹配的地图节点;查询对应于地图节点所存储的在现实场景中的位置;获取惯性传感器采集的传感器数据;根据传感器数据,确定当前终端的姿态。
示例性地,终端可将获取的图像帧与地图中的节点图像进行匹配,在匹配成功时,定位该匹配的节点图像的地图节点,查询对应于该地图节点所存储的在现实场景中的位置,也就是终端当前的位置。终端还可获取惯性传感器(IMU,Inertial Measurement Unit)采集的传感器数据,根据传感器数据,确定当前终端的姿态。这样,即可根据当前终端的位置和姿态,计算当前相机坐标空间与世界坐标空间的刚体变换矩阵。
可以理解,终端在构建地图时,可以计算参考地图节点处,相机坐标空间与时间坐标空间的刚体变换矩阵,这样,其他地图节点位置的相机坐标空间与时间坐标空间的刚体变换矩阵,即可根据当前地图节点与参考地图节点的位置与姿态变化,得到当前相机坐标空间与世界坐标空间的刚体变换矩阵。终端也可实时根据物点的世界坐标位置和像点在当前相机坐标空间中的相机坐标位置之间的转换关系,确定世界坐标空间和当前相机坐标空间之间的当前刚体变换矩阵。
在本实施例中,结合当前采集的图像帧的图像特征,以及惯性传感器采集的传感器数据对当前终端进行定位,提高了定位的准确性。
进一步地,终端即可根据世界坐标空间和当前相机坐标空间之间的当前刚体变换矩阵,按该刚体变换矩阵将世界坐标位置变换为相机坐标空间中的相机坐标位置。
在一个实施例中,按照相机坐标位置,在当前播放的图像帧中渲染虚拟入口,包括:将虚拟入口的模型顶点投影为图像坐标空间中相应的像素点;根据各模型顶点间的连接关系,将模型顶点相应的像素点组合为图元;将光栅化后的图元按照图元中各像素点的像素值,在图像坐标空间中对应于相机坐标位置的图像坐标位置处,渲染得到虚拟入口。
其中,虚拟入口的模型是设置完成的模型。模型参数也是设置好的。模型参数包括模型顶点参数和模型顶点之间的连接关系。模型顶点参数包括模型顶点在模型坐标空间中的模型坐标位置、模型顶点的颜色以及模型纹理坐标。图元是点、线或面等基本图形。光栅化 是将图元转化为一组二维图像的过程,这些二维图像代表着可在屏幕上绘制的像素。通俗地说,图元装配得到的是顶点组成的图形,光栅化是根据图形的形状,插值出那个图形区域的像素。
示例性地,终端可通过自模型坐标空间、世界坐标空间、相机坐标空间至图像坐标空间之间的变换关系,将将虚拟入口的模型顶点投影为图像坐标空间中相应的像素点。然后根据各模型顶点间的连接关系,将模型顶点相应的像素点组合为图元,实现图元装配。再对图元进行光栅化以及着色,在图像坐标空间中对应于相机坐标位置的图像坐标位置处,渲染得到虚拟入口。终端可依据OpenGL(Open Graphics Library开放图形库)进行虚拟模型的绘制。
终端可以在采集的图像帧中,图像坐标空间中对应于相机坐标位置的图像坐标位置处渲染得到虚拟入口后,将得到的渲染虚拟入口后的图像帧放入帧缓存区,等待显示;也可直接在终端屏幕上进行显示。
在一个实施例中,该图像处理方法还包括:构建地图的步骤,该构建地图的步骤包括:从按时序采集的图像帧中选取图像帧;当选取的图像帧的图像特征符合节点图像的图像特征时,获取选取的图像帧为节点图像;确定获取的节点图像在地图中相应的地图节点;对应于确定的地图节点存储获取的节点图像的图像特征,及采集获取的节点图像时在现实场景中的位置。
其中,选取的图像帧,可以是采集的图像帧中的关键帧。
在一个实施例中,终端可接收用户选择指令,根据该用户选择指令,从采集的图像帧中选取图像帧。终端也可按照间隔帧数从采集的图像帧中选取图像帧。比如,每隔20帧图像帧后选取图像帧。
节点图像的图像特征是用于选择节点图像的图像特征。符合节点图像的图像特征可以是图像中包括的特征点与已有节点图像包括的特征点中相匹配的特征点的数量超过参考数量,也可以是包括的特征点与已有节点图像包括的特征点中相匹配特征点占已有节点图像包括的特征点的比例低于参考比例。
举例说明,假设最近添加的节点图像包括的特征点数量为100,当前选取的图像帧包括的特征点数量为120。参考数量为50,参考比例为90%。其中,若当前选取的图像帧包括的特征点与最近添加的节点图像包括的特征点中相匹配的特征点的数量为70。那么,当前图像帧中包括的特征点与已有节点图像包括的特征点匹配的数量超过参考数量,可判定当前选取的图像帧的特征符合节点图像的特征。
在一个实施例中,终端在获取构建地图的指令后,可按照固定或动态的帧率采集图像帧, 选取采集的图像帧包括的特征点的数量大于数量阈值的图像帧为初始的节点图像,确定该节点图像在地图中相应的地图节点,以及该节点图像包括的特征点在地图中相应的位置,构建局部地图。终端再从按时序采集的图像帧中选取图像帧,将选取符合节点图像的特征的图像帧作为后续的节点图像,直至得到全局地图。
示例性地,终端可以初始的节点图像为参考节点图像,追踪参考节点图像中的特征点。当选取的图像帧包括的特征点与参考节点图像包括的特征点的匹配数量低于第一数量且高于第二数量时,将选取的图像帧作为节点图像。当选取的图像帧包括的特征点与参考节点图像包括的特征点的匹配数量低于第二数量时,将最近获取的节点图像作为参考节点图像,继续进行图像追踪,以选取节点图像。
进一步地,终端可确定在自然空间中采集该获取的节点图像投影于地图空间中的地图节点。终端可提取在获取的节点图像时序靠前的节点图像的特征,计算时序靠前的节点图像的特征与获取的节点图像的变化矩阵,根据该变化矩阵得到采集时序靠前的节点图像时的位置到采集获取的节点图像时的位置的变化量,再根据该变化量确定获取的节点图像在地图中相应的地图节点。
其中,变化矩阵是二维图像的特征到二维图像的特征之间的相似变化关系。示例性地,终端可提取获取的节点图像的图像特征,与地图中已有的节点对应的节点图像的图像特征进行匹配,获取匹配成功的图像特征分别在获取的节点图像和已有的节点图像中的位置。获取的节点图像为在后采集的图像帧,已有的节点图像为在先采集的图像帧。终端即可根据得到的匹配的图像特征在先后采集的两帧图像帧上的位置确定先后采集的两帧图像帧之间的变化矩阵,从而得到终端采集这两帧图像帧时的位置变化和姿态变化,再根据在前采集的图像的位置和姿态,即可得到在后采集的图像的位置和姿态。
在一个实施例中,地图中已有的节点对应的节点图像可以是一帧或者多帧。终端也可将获取的节点图像的特征与多个已有的节点对应的节点图像的特征比较,得到在后采集的图像帧与多个在先采集的图像帧的变化矩阵,再根据多个变化矩阵综合得到在后采集的图像的位置和姿态。比如,对计算得到的多个位置变化和姿态变化加权求平均等。
在本实施例中,通过节点图像的特征之间的变化矩阵,得到当前获取的节点图像与在前已有的节点图像的转化关系,从而实现由在前的图像帧在地图中的位置推测当前图像帧的在地图中的位置,实现实时定位。
示例性地,终端可提取节点图像的图像特征,将节点图像的图像特征对应于节点图像相应的地图节点存储,可在需要进行图像特征比较时,直接根据地图节点查找对应的节点图像的图像特征,以节省存储空间提高查找效率。
终端还可将采集获取的节点图像时在现实场景中的位置,以在终端定位时,直接根据地图节点查找对应的节点图像存储的位置,以提高查找效率。
在本实施例中,通过自身采集图像帧,再对采集的图像帧进行处理即可自动进行地图构建,避免了需要大量具备专业绘图能力的工作人员人工对环境进行测绘,对工作人员能力要求高且劳动量大的问题,提高地图构建的效率。
在一个实施例中,虚拟内容为全景视频。该图像处理方法还包括:确定虚拟入口对应于现实场景中的空间区域;在当前终端的位置经过空间区域后,则直接显示全景视频中当前视野区域内的视频画面。
其中,全景视频是利用3D摄像机进行全方位360度进行拍摄的视频。用户在观看全景视频的时候,可以随意调节视角进行360度全景观看。虚拟入口对应于现实场景中的空间区域,是虚拟入口在相机坐标空间中所占空间在世界坐标空间中的投影。该空间区域可以是平面空间区域,即没有厚度;也可以是立体空间区域,即有厚度。对用户而言,即让用户感知为现实场景中某个固定的位置出现了虚拟入口(虚拟的房间门),当用户手持终端向该位置移动时,感知为走向该虚拟入口(虚拟的房间门)。
示例性地,当前终端的位置经过空间区域后,可理解为用户穿过房间门进入了另一个房间或空间。终端可以终端当前位置为球心建立三维球体模型,将全景视频以纹理方式渲染到球面内侧。这样,终端即可直接显示全景视频中终端当前视野区域内的视频画面。当前视野区域与终端当前的姿态有关。
图8示出了一个实施例中当前终端的位置经过空间区域后的渲染原理图。参考图8,图8(a)示出了当前终端的位置经过空间区域的示意图,可以明显看到终端的位置是穿过虚拟入口801所在的区域,从虚拟入口的一侧运动至虚拟入口的另一侧。图8(b)示出了在渲染全景视频的三维球体模型中确定当前视野区域内的视频画面的示意图,可以明显看出,三维球体模型的球心即为终端位置,也就是终端摄像头作为观测点的位置,当前视野区域810与球面820交集区域830的视频画面即为用于在终端屏幕上显示的画面。图8(c)示出了交集区域830的视频画面在终端界面显示的简化示意图,图8(d)示出了交集区域830的视频画面在终端界面显示时的截图。
在一个实施例中,当前终端的位置经过空间区域后,终端的姿态可根据用户指令进行变化。当终端的姿态发生变化时,终端当前视野区域随即发生变化。终端即可实时显示全景视频中当前视野区域内的视频画面。
在一个实施例中,图像处理方法还包括:在直接显示全景视频中当前视野区域内的视频画面后,则在当前终端的位置未再次穿过空间区域、且当前视野区域经过移动后覆盖虚 拟入口时,则确定当前视野区域中位于虚拟入口内的视野区域;在虚拟入口中显示获取的图像帧中确定的视野区域内的画面。
可以理解,当前终端的位置未再次穿过空间区域、且当前视野区域经过移动后覆盖虚拟入口,即为当前终端并未位移穿过虚拟入口,而是调整当前终端的姿态,使得虚拟入口重新进入终端当前视野区域内。
示例性地,终端在直接显示全景视频中当前视野区域内的视频画面后,当检测到当前终端的位置未再次穿过空间区域、且当前视野区域经过移动后覆盖虚拟入口时,则确定当前视野区域中位于虚拟入口内的视野区域,在虚拟入口中显示获取的图像帧中确定的视野区域内的画面。这样,即实现在虚拟入口内显示现实世界,在虚拟入口外显示虚拟内容。
可理解为,用户通过房间门进入房间后,房间门即在用户身后,不再出现在用户视野中。用户在房间内调整视野进行观看,看到房间各处的场景画面,也就是本实施例中当前终端的位置经过空间区域后,则直接显示的全景视频中当前视野区域内的视频画面,用户即看到的是虚拟入口内的全景视频。当用户转身后,房间门又重新出现在用户视野中,用户通过房间门看到的是房间外的现实画面,也就是本实施例中在当前终端的位置未再次穿过空间区域、且当前视野区域经过移动后覆盖虚拟入口时,则确定当前视野区域中位于虚拟入口内的视野区域;在虚拟入口中显示获取的图像帧中确定的视野区域内的画面。这样用户看到的是虚拟入口内的现实场景的画面。
图9示出了一个实施例中当前终端的位置未再次穿过空间区域、且当前视野区域经过移动后覆盖虚拟入口时,终端界面显示的界面示意图。参考图9,虚拟入口内的是现实场景的画面,虚拟入口外则为虚拟内容。
在一个实施例中,图像处理方法还包括:在当前终端的位置围绕空间区域移动时,则确定当前视野区域中位于虚拟入口内的视野区域;在虚拟入口中显示全景视频中确定的视野区域内的视频画面。
图10示出了一个实施例中当前终端的位置围绕空间区域移动时的渲染原理图。参考图10,图10左图示出了当前终端的位置围绕空间区域移动时的示意图,可以明显看到终端的位置是绕过虚拟入口1001所在的区域,从虚拟入口的一侧运动至虚拟入口的另一侧。终端在以图10左图的移动方式移动时,终端显示的画面中,虚拟入口内的始终是虚拟内容,虚拟入口外则始终为现实场景画面,如图10右图所示。
在一个示例性的应用场景中,用户手持终端,通过终端上用于调用摄像头的应用程序打开终端上内置的摄像头,这样,终端可在摄像头当前的视野下,采集现实场景中的图像帧。用户手持终端在后置摄像头的视野区域内用手部食指指尖画出了圆圈,意图在现实场 景中手指画圈的那个位置出现虚拟门,即虚拟入口。此时,终端即在该位置相应于终端屏幕的图像位置渲染虚拟入口。虚拟入口外显示现实场景画面,虚拟入口内显示全景视频的局部视频画面。
用户手持终端向虚拟入口映射在现实场景中的位置靠近,则终端屏幕上的虚拟入口逐渐变大,直至在终端穿过该位置后不再出现在终端屏幕上,此时用户看到的是呈现在终端屏幕上的全景视频的局部视频画面。用户可调整后置摄像头的视野区域以观看不同区域的全景视频画面。
用户可手持终端向后退以再次穿过该位置,此时,虚拟入口出现在终端屏幕上并逐渐变小,虚拟入口外显示现实场景画面,虚拟入口内显示全景视频的局部视频画面。用户可手持终端向转身,但未再次穿过该位置,此时,虚拟入口出现在终端屏幕上,虚拟入口外显示全景视频的局部视频画面,虚拟入口内显示现实场景画面。
用户手持终端围绕虚拟入口映射在现实场景中的位置,此时,虚拟入口始终出现在终端屏幕上,虚拟入口外显示现实场景画面,虚拟入口内显示全景视频的局部视频画面。
上述实施例中,提供了当前终端的位置在经过虚拟入口或围绕虚拟入口移动时,虚拟入口内外的渲染内容的变化。使得用户可以移动位置从入口外穿越到入口中,查看入口内的虚拟世界,也可以从入口内移动到入口外查看外面的现实世界,体验虚拟和现实的穿越效果。
在一个实施例中,虚拟内容为全景视频。该图像处理方法还包括:将采集的视频帧绘制于第一球体模型的球面内侧,将全景视频的全景视频画面绘制于第二球体模型的球面内侧;确定虚拟入口对应于现实场景中的空间区域;在当前终端的位置未曾穿过空间区域、或者当前终端的位置穿过空间区域的次数为偶数时,则按照渲染顺序和模型深度的逆序,根据当前视野区域内的第一球体模型、第二球体模型以及全透明的第三模型渲染得到用于显示的画面;其中,第一球体模型的球半径大于第二球体模型的球半径;第一球体模型的模型深度大于第二球体模型的模型深度;第二球体模型的模型深度大于第三模型的模型深度;第三模型用于在当前视野区域覆盖虚拟入口时,触发取消渲染当前视野区域中位于虚拟入口外的视野区域中的第二球体模型;或者,用于在视野区域未覆盖虚拟入口时触发取消渲染第二球体模型。
其中,渲染顺序是渲染模型的先后顺序。模型深度是模型边界离观测点的距离。模型深度越深,模型边界离观测点的距离越远。观测点是在模型内部观测模型的位置,观测点视野区域内渲染得到的画面,即为终端屏幕显示的画面。通常情况下,在渲染视野区域内的模型得到用于显示的画面时,通常是按照模型深度的逆序渲染,也就是先渲染离观测点 的距离近的模型。渲染顺序则是人为设定的在渲染时需要依据的顺序。
本实施例中,同时依据渲染顺序和模型深度的逆序进行渲染。这样,在渲染某一模型时,已渲染模型深度小于该模型的其他模型时,该模型则不再渲染。示例性地,终端可在建模时,即在深度缓冲(depthBuffer)中记录各模型的深度信息(writesToDepthBuffer),并添加深度信息测试标记(readsToDepthBuffer)。添加深度信息测试标记,表示模型在绘制时,终端会读取该模型的模型深度,根据该模型深度进行渲染。
示例性地,终端可依据模型坐标空间与世界坐标空间的变化关系、虚拟入口在世界坐标系中的世界坐标位置,确定虚拟入口在模型坐标空间中的模型坐标位置。然后以该模型坐标位置为球心建立第一球体模型和第二球体模型。终端可再将采集的视频帧以纹理方式绘制于第一球体模型的球面内侧,并将全景视频的全景视频画面以纹理方式绘制于第二球体模型的球面内侧。
在一个实施例中,终端也可在观测点的前方创建可以投影到终端屏幕上的模型平面,这个模型平面随着观测点的移动和转向永远保持在观测点的前方。终端再在这个模型平面上绘制从现实场景中采集的图像帧,这样即实现将获取的图像帧按照采集的时序逐帧播放。
可以理解,在触发条件满足后,首次渲染虚拟入口时,虚拟入口内为虚拟内容,虚拟入口外为现实场景。那么当前终端的位置未曾穿过空间区域、或者当前终端的位置穿过空间区域的次数为偶数时,也就是虚拟入口内仍为虚拟内容,虚拟入口外仍为现实场景。
这样,为了保持在虚拟入口内仍为虚拟内容,虚拟入口外仍为现实场景,终端可创建包围观测点的第三模型,将渲染顺序(renderingOrder)设置为:第一球体模型→第三模型→第二球体模型。其中,虚拟入口位于第三模型的界面上,且虚拟入口所在的区域为空。
图11示出了一个实施例中模型的剖面示意图。参考图11(a),包括第一球体模型1101、第二球体模型1102、第三模型1103和虚拟入口1104。此时,观测点1105离虚拟入口1104较远。
这样,在观测点的视野区域的视线上存在第三模型时,由于第三模型的渲染顺序优先于第二球体模型、且第三模型的模型深度小于第二球体模型,此时终端则仅渲染第一球体模型和第三模型得到用于显示的画面。终端还可将第三模型的透明度设置为全透明,那么此时用于显示的画面实际上即为在第一球体模型的球面内侧绘制从现实场景采集的视频帧,那就是说保证了虚拟入口外显示现实场景。
继续参考图11(a)起始于观测点1105未经过虚拟入口1104的视线OA,依次经过第三模型、第二球体模型和第一球体模型,如图12(a)所示。这样,在渲染时,由于渲染顺序为:第一球体模型→第三模型→第二球体模型,假设第三模型的颜色为C3,透明度为T3; 第二球体模型的颜色为C2,透明度为T2;第一球体模型的颜色为C1,透明度为T1;那么渲染得到的屏幕上的颜色即为:C3*T3+(1-C3)*T1*C1。示例性地,可设置C3为0,即为全透明,C1为1,即为不透明,那么屏幕上的颜色即为第一球体模型的颜色,也就是绘制在第一球体模型球面内侧的从现实场景中采集的图像帧。
在观测点的视野区域的视线上不存在第三模型时,此时终端则仅渲染第一球体模型和第二球体模型得到用于显示的画面。终端可设置第二球体模型,为不透明,那么此时用于显示的画面实际上即为在第二球体模型的球面内侧绘制全景视频的视频帧,那就是说保证了虚拟入口内显示虚拟内容。
继续参考图11(a)起始于观测点1105经过虚拟入口1104的视线OB,依次经过第二球体模型和第一球体模型,如图12(b)所示。这样,在渲染时,由于渲染顺序为:第一球体模型→第二球体模型,假设第二球体模型的颜色为C2,透明度为T2;第一球体模型的颜色为C1,透明度为T1;那么渲染得到的屏幕上的颜色即为:C2*T2+(1-C2)*T1*C1。具体可设置C2为1,即为不透明,C1为1,即为不透明,那么屏幕上的颜色即为第二球体模型的颜色,也就是绘制在第二球体模型球面内侧的全景视频的视频帧。
上述实施例中,在用户为进入虚拟入口时,通过第三模型在当前视野区域覆盖虚拟入口时,触发取消渲染当前视野区域中位于虚拟入口外的视野区域中的第二球体模型上的全景视频;在视野区域未覆盖虚拟入口时触发取消渲染第二球体模型上的全景视频。这样保证了用户进入虚拟入口后未看到虚拟入口时,看到的都是全景视频的内容;看到虚拟入口时,虚拟入口内是现实场景的内容,虚拟入口外则是全景视频的内容。
在一个实施例中,该图像处理方法还包括:在当前终端的位置穿过空间区域的次数为奇数时,则按照渲染顺序和模型深度的逆序,根据当前视野区域内的第一球体模型、第二球体模型以及全透明的第四模型渲染得到用于显示的画面;其中,第二球体模型的模型深度大于第四模型的模型深度;第四模型用于在当前视野区域覆盖虚拟入口时,触发取消渲染当前视野区域中位于虚拟入口外的视野区域中的第二球体模型;或者,用于在视野区域未覆盖虚拟入口时触发取消渲染第二球体模型。
可以理解,在触发条件满足后,首次渲染虚拟入口时,虚拟入口内为虚拟内容,虚拟入口外为现实场景。那么当前终端的位置穿过空间区域的次数为奇数时,也就是虚拟入口内变化为现实场景,虚拟入口外则变化为虚拟内容场景。
这样,为了保持在虚拟入口内变化为现实场景,虚拟入口外则变化为虚拟内容场景,终端可创建包围观测点的第四模型,将渲染顺序设置为:第一球体模型→第四模型→第二球体模型。其中,第四模型的界面即为虚拟入口。
参考图11(b),包括第一球体模型1101、第二球体模型1102、第四模型1106和虚拟入口1104。此时,观测点1107离虚拟入口1104较远。
这样,在观测点的视野区域的视线上存在第四模型时,由于第四模型的渲染顺序优先于第二球体模型、且第四模型的模型深度小于第二球体模型,此时终端则仅渲染第一球体模型和第四模型得到用于显示的画面。终端还可将第四模型的透明度设置为全透明,那么此时用于显示的画面实际上即为在第一球体模型的球面内侧绘制从现实场景采集的视频帧,那就是说保证了虚拟入口内显示现实场景。
继续参考图11(b)起始于观测点1107经过虚拟入口1104的视线OC,依次经过第四模型、第二球体模型和第一球体模型,如图12(c)所示。这样,在渲染时,由于渲染顺序为:第一球体模型→第四模型→第二球体模型,假设第四模型的颜色为C4,透明度为T4;第二球体模型的颜色为C2,透明度为T2;第一球体模型的颜色为C1,透明度为T1;那么渲染得到的屏幕上的颜色即为:C4*T4+(1-C4)*T1*C1。具体可设置C4为0,即为全透明,C1为1,即为不透明,那么屏幕上的颜色即为第一球体模型的颜色,也就是绘制在第一球体模型球面内侧的从现实场景中采集的图像帧。
在观测点的视野区域的视线上不存在第三模型时,此时终端则仅渲染第一球体模型和第二球体模型得到用于显示的画面。终端可设置第二球体模型,为不透明,那么此时用于显示的画面实际上即为在第二球体模型的球面内侧绘制全景视频的视频帧,那就是说保证了虚拟入口外显示虚拟内容。
继续参考图11(a)起始于观测点1107经过虚拟入口1104的视线OB,依次经过第二球体模型和第一球体模型,如图12(d)所示。这样,在渲染时,由于渲染顺序为:第一球体模型→第二球体模型,假设第二球体模型的颜色为C2,透明度为T2;第一球体模型的颜色为C1,透明度为T1;那么渲染得到的屏幕上的颜色即为:C2*T2+(1-C2)*T1*C1。具体可设置C2为1,即为不透明,C1为1,即为不透明,那么屏幕上的颜色即为第二球体模型的颜色,也就是绘制在第二球体模型球面内侧的全景视频的视频帧。
上述实施例中,在用户为未进入虚拟入口时,通过第四模型在当前视野区域覆盖虚拟入口时,触发取消渲染当前视野区域中位于虚拟入口外的视野区域中的第二球体模型上的全景视频内容;在视野区域未覆盖虚拟入口时触发取消渲染第二球体模型上的全景视频内容。这样保证了用户未进入虚拟入口后未看到虚拟入口时,看到的都是现实场景的内容;看到虚拟入口时,虚拟入口内是全景视频的内容,虚拟入口外则是显示场景的内容。
在另外的实施例中,当观测点离虚拟入口较近时,可在观测点所在位置增加第三模型或第四模型的边界区域,以避免观测点在经过或者围绕虚拟入口移动时,渲染时出现的扰动。 图12(c)为当前终端的位置穿过空间区域的次数为奇数、且当前终端的朝向且靠近虚拟入口时的模型剖面图。图12(d)为当前终端的位置未曾穿过空间区域、或者当前终端的位置穿过空间区域的次数为偶数、且当前终端的朝向且靠近虚拟入口时的模型剖面图。图12(e)为当前终端的位置未曾穿过空间区域、或者当前终端的位置穿过空间区域的次数为偶数、且当前终端在虚拟入口一侧绕行时的模型剖面图。
在一个实施例中,在当前终端的位置未曾穿过空间区域、或者当前终端的位置穿过空间区域的次数为偶数、且当前终端从虚拟入口一侧绕行至虚拟入口的另一侧时,第三模型按虚拟入口所在平面进行翻转。这样保证了虚拟入口中仍显示虚拟内容,虚拟入口外仍显示从现实世界采集的图像帧。也就是说,只有在终端经过虚拟入口后,才会交换虚拟入口内外所显示的内容。
可以理解的是,终端在摄像头中渲染一个三维球形空间,球形空间中贴合播放全景视频纹理图像,在球径大于该球的球面上播放从现实场景中采集的图像帧,或者在观测点前方的模型平面上播放从现实场景中采集的图像帧。终端继而在触发条件被满足时,模拟一个任意门,实现在门内展示虚拟世界,门外显示现实世界图像。用户可以移动位置从门外穿越到门洞,查看门内的虚拟世界,也可以从门内移动到门外查看外面的现实世界,体验虚拟和现实的穿越效果。
人进入任意门且离门较远时,以门为第四模型,该第四模型全透明。这样,人透过门看时,门内为现实世界,门外为虚拟世界,即人处于虚拟世界中。人进入任意门且离门很近时,为了防人穿越门是发生渲染抖动,在门对面临时放一个刚好套住门的小房间,门本身为空,小房间为全透明,人透过门看时,门内为现实世界,门外为虚拟世界,即人处于虚拟世界中。
人未进入任意门且离门较远时,以门为局部边界创建包围观测点的第三模型,第三模型上门所在的区域为空,其他区域为全透明,这样,人透过门看时,门内为虚拟世界,门外为现实世界,即人处于现实世界中。人未进入任意门且离门较近时,为了防人穿越门是发生渲染抖动,在第三模型上以门为边界扩展边界,这样,人透过门看时,门内为虚拟世界,门外为现实世界,即人处于现实世界中。人未进入任意门且靠近门的一侧时,为了防人穿越门是发生渲染抖动,扩展第三模型中观测点所在一侧的边界,当人穿过门所在的平面时,第三模型按门所在平面进行翻转。这样,人透过门看时,门内仍为虚拟世界,门外仍为现实世界,即仍处于现实世界中。
可以理解,上述实施例中第三模型以及第三模型的边界可以是平面也可以是曲面。
图13为一个示例性的实施例中图像处理方法的流程框图。参考图13,用户手持终端, 通过终端的摄像头从现实场景中采集图像帧,终端一方面根据采集的图像帧创建地图,另一方面检测图像帧中是否包括手部区域。若终端未检测到图像帧中包括手部区域,则继续检测后续采集的图像帧。若终端检测到图像帧中包括手部区域时,则继续判断该手部区域所对应的手势类型是否为触发类型。若否,则继续检测后续采集的图像帧;若是,则识别该图像帧中手部区域的运动参考点,在运动参考点处渲染粒子动画,以突出显示手部运动形成的轨迹。终端可再继续判断该轨迹是否满足触发条件,若否,则继续检测后续采集的图像帧;若是,则触发渲染虚拟入口。
终端继而确定现实场景中手部所对应的位置,按照该位置在当前播放的图像帧中渲染虚拟入口,在虚拟入口中显示全景视频,在虚拟入口外显示现实场景。用户可手持终端移动,终端继而判断是否经过虚拟入口。若经过虚拟入口,则直接显示全景视频中当前视野区域内的视频画面。若未经过虚拟入口,则依然在虚拟入口中显示全景视频,在虚拟入口外显示现实场景。终端还可根据用户指令进行录屏,将录屏得到的视频进行分享。
在一个实施例中,终端还可展示引导信息。引导信息可以是文本、语音或者图片等。引导信息包括引导用户操控目标物运动形成满足触发条件的信息。比如“伸出手指,在后置摄像头前画圈,触发开启任意门”,可参考图3(b)。引导信息还可以包括引导用户向虚拟入口移动的信息。比如,指向虚拟入口的箭头等,可参考图4(b)。
应该理解的是,虽然上述各实施例的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,上述各实施例中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
如图14所示,在一个实施例中,提供了一种图像处理装置1400。参照图14,该图像处理装置1400包括:获取模块1401、播放模块1402、确定模块1403和渲染模块1404。
获取模块1401,用于获取从现实场景中采集的图像帧。
播放模块1402,用于将获取的图像帧按照采集的时序逐帧播放。
确定模块1403,用于当获取的多帧图像帧中目标物运动所形成的轨迹满足触发条件时,确定现实场景中目标物所对应的位置。
渲染模块1404,用于按照位置在当前播放的图像帧中渲染虚拟入口;在虚拟入口中显 示虚拟内容。
在一个实施例中,目标物为手部。确定模块1403还用于从获取的图像帧中分割出手部图像;识别出手部图像所对应的手势类型;当手势类型为触发类型时,在图像帧中确定运动参考点;按照运动参考点确定手部运动所形成的轨迹。
在一个实施例中,确定模块1403还用于通过手部识别模型,将获取的图像帧编码为语义分割特征矩阵;解码语义分割特征矩阵得到语义分割图像;语义分割图像中的像素点,具有表示所属分类类别的像素值,且编码自的图像帧中的像素点对应;根据属于手部类别的像素点从图像中分割出手部图像。
在一个实施例中,渲染模块1404还用于当触发条件未满足时,在播放的视频帧中,将轨迹所经过的像素点的像素值替换为参考像素值;当触发条件被满足时,按照位置在当前播放的图像帧中,播放参考动画。
在一个实施例中,确定模块1403还用于确定目标物在世界坐标空间中的世界坐标位置。渲染模块1404还用于按照相机坐标空间中对应于世界坐标位置的相机坐标位置,在当前播放的图像帧中渲染虚拟入口。
在一个实施例中,渲染模块1404还用于获取当前终端的位置和姿态;根据当前终端的位置和姿态,确定当前的相机坐标空间与世界坐标空间的变换矩阵;按变换矩阵将世界坐标位置变换为相机坐标空间中的相机坐标位置;按照相机坐标位置,在当前播放的图像帧中渲染虚拟入口。
在一个实施例中,渲染模块1404还用于从地图中挑选与获取的图像帧匹配的地图节点;查询对应于地图节点所存储的现实场景中的位置;获取惯性传感器采集的传感器数据;根据传感器数据,确定当前终端在现实场景中的姿态。
如图15所示,在一个实施例中,图像处理装置1400还包括地图构建模块1405,用于从按时序采集的图像帧中选取图像帧;当选取的图像帧的图像特征符合节点图像的图像特征时,获取选取的图像帧为节点图像;确定获取的节点图像在地图中相应的地图节点;对应于确定的地图节点存储获取的节点图像的图像特征,及采集获取的节点图像时在现实场景中的位置。
在一个实施例中,渲染模块1404还用于将虚拟入口的模型顶点投影为图像坐标空间中相应的像素点;根据各模型顶点间的连接关系,将模型顶点相应的像素点组合为图元;将光栅化后的图元按照图元中各像素点的像素值,在图像坐标空间中对应于相机坐标位置的图像坐标位置处,渲染得到虚拟入口。
在一个实施例中,虚拟内容为全景视频。渲染模块1404还用于确定虚拟入口对应于现 实场景中的空间区域;在当前终端的位置经过空间区域后,则直接显示全景视频中当前视野区域内的视频画面。
在一个实施例中,渲染模块1404还用于在直接显示全景视频中当前视野区域内的视频画面后,则在当前终端的位置未再次穿过空间区域、且当前视野区域经过移动后覆盖虚拟入口时,则确定当前视野区域中位于虚拟入口内的视野区域;在虚拟入口中显示获取的图像帧中确定的视野区域内的画面。
在一个实施例中,渲染模块1404还用于在当前终端的位置围绕空间区域移动时,则确定当前视野区域中位于虚拟入口内的视野区域;在虚拟入口中显示全景视频中确定的视野区域内的视频画面。
在一个实施例中,虚拟内容为全景视频。渲染模块1404还用于将采集的视频帧绘制于第一球体模型的球面内侧,并将全景视频的全景视频画面绘制于第二球体模型的球面内侧;确定虚拟入口对应于现实场景中的空间区域;在当前终端的位置未曾穿过空间区域、或者当前终端的位置穿过空间区域的次数为偶数时,则按照渲染顺序和模型深度的逆序,根据当前视野区域内的第一球体模型、第二球体模型以及全透明的第三模型渲染得到用于显示的画面;其中,第一球体模型的球半径大于第二球体模型的球半径;第一球体模型的模型深度大于第二球体模型的模型深度;第二球体模型的模型深度大于第三模型的模型深度;第三模型用于在当前视野区域覆盖虚拟入口时,触发取消渲染当前视野区域中位于虚拟入口外的视野区域中的第二球体模型;或者,用于在视野区域未覆盖虚拟入口时触发取消渲染第二球体模型。
在一个实施例中,渲染模块1404还用于在当前终端的位置穿过空间区域的次数为奇数时,则按照渲染顺序和模型深度的逆序,根据当前视野区域内的第一球体模型、第二球体模型以及全透明的第四模型渲染得到用于显示的画面;其中,第二球体模型的模型深度大于第四模型的模型深度;第四模型用于在当前视野区域覆盖虚拟入口时,触发取消渲染当前视野区域中位于虚拟入口外的视野区域中的第二球体模型;或者,用于在视野区域未覆盖虚拟入口时触发取消渲染第二球体模型。
图16示出了一个实施例中计算机设备的内部结构图。该计算机设备可以是图1中的终端110。如图16所示,该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、输入设备和显示屏。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现图像处理方法。该内存储器中也可储存有计算机程序,该计算机程序被 处理器执行时,可使得处理器执行图像处理方法。计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏等,输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,也可以是外接的键盘、触控板或鼠标等。本领域技术人员可以理解,图16中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请提供的图像处理装置可以实现为一种计算机程序的形式,计算机程序可在如图16所示的计算机设备上运行,计算机设备的非易失性存储介质可存储组成该图像处理装置的各个程序模块,比如,图14所示的获取模块1401、播放模块1402、确定模块1403和渲染模块1404等。各个程序模块组成的计算机程序使得处理器执行本说明书中描述的本申请各个实施例的图像处理方法中的步骤。
例如,图16所示的计算机设备可以通过如图14所示的图像处理装置1400中的获取模块1401获取从现实场景中采集的图像帧。通过播放模块1402将获取的图像帧按照采集的时序逐帧播放。通过确定模块1403当获取的多帧图像帧中目标物运动所形成的轨迹满足触发条件时,确定现实场景中目标物所对应的位置。通过渲染模块1404按照位置在当前播放的图像帧中渲染虚拟入口;在虚拟入口中显示虚拟内容。
在一个实施例中,提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时,使得处理器执行上述任一种图像处理方法。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中储存有计算机程序,计算机程序被处理器执行时,使得处理器执行上述任一种图像处理方法。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率 SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (17)

  1. 一种图像处理方法,其中,所述方法应用于计算机设备,包括:
    获取从现实场景中采集的图像帧;
    将获取的图像帧按照采集的时序逐帧播放;
    当获取的多帧图像帧中目标物运动所形成的轨迹满足触发条件时,确定现实场景中所述目标物所对应的位置;
    按照所述位置在当前播放的图像帧中渲染虚拟入口;
    在所述虚拟入口中显示虚拟内容。
  2. 根据权利要求1所述的方法,其中,所述目标物为手部;所述方法还包括:
    从获取的图像帧中分割出手部图像;
    识别出所述手部图像所对应的手势类型;
    当所述手势类型为触发类型时,在所述图像帧中确定运动参考点;
    按照所述运动参考点确定所述手部运动所形成的轨迹。
  3. 根据权利要求2所述的方法,其中,所述从获取的图像帧中分割出手部图像,包括:
    通过手部识别模型,将获取的图像帧编码为语义分割特征矩阵;
    解码所述语义分割特征矩阵得到语义分割图像;所述语义分割图像中的像素点,具有表示所属分类类别的像素值,且编码自的图像帧中的像素点对应;
    根据属于手部类别的像素点从所述图像中分割出手部图像。
  4. 根据权利要求2所述的方法,其中,所述方法还包括:
    当所述触发条件未满足时,在播放的视频帧中,将所述轨迹所经过的像素点的像素值替换为参考像素值;
    当所述触发条件被满足时,按照所述位置在当前播放的图像帧中,播放参考动画。
  5. 根据权利要求1所述的方法,其中,所述确定现实场景中所述目标物所对应的位置,包括:
    确定所述目标物在世界坐标空间中的世界坐标位置;
    所述按照所述位置在当前播放的图像帧中渲染虚拟入口,包括:
    按照相机坐标空间中对应于所述世界坐标位置的相机坐标位置,在当前播放的图像帧中渲染虚拟入口。
  6. 根据权利要求2所述的方法,其中,所述按照相机坐标空间中对应于所述世界坐标位置的相机坐标位置,在当前播放的图像帧中渲染虚拟入口,包括:
    获取当前终端的位置和姿态;
    根据所述当前终端的位置和姿态,确定当前的相机坐标空间与所述世界坐标空间的变换矩阵;
    按所述变换矩阵将世界坐标位置变换为相机坐标空间中的相机坐标位置;
    按照所述相机坐标位置,在当前播放的图像帧中渲染虚拟入口。
  7. 根据权利要求6所述的方法,其中,所述获取当前终端的位置和姿态,包括:
    从地图中挑选与获取的图像帧匹配的地图节点;
    查询对应于所述地图节点所存储的现实场景中的位置;
    获取惯性传感器采集的传感器数据;
    根据所述传感器数据,确定当前终端在现实场景中的姿态。
  8. 根据权利要求7所述的方法,其中,所述方法还包括:
    从按时序采集的图像帧中选取图像帧;
    当选取的图像帧的图像特征符合节点图像的图像特征时,获取选取的图像帧为节点图像;
    确定获取的所述节点图像在地图中相应的地图节点;
    对应于确定的所述地图节点存储获取的所述节点图像的图像特征,及采集获取的所述节点图像时在现实场景中的位置。
  9. 根据权利要求6所述的方法,其中,所述按照所述相机坐标位置,在当前播放的图像帧中渲染虚拟入口,包括:
    将虚拟入口的模型顶点投影为图像坐标空间中相应的像素点;
    根据各所述模型顶点间的连接关系,将模型顶点相应的像素点组合为图元;
    将光栅化后的图元按照图元中各像素点的像素值,在图像坐标空间中对应于所述相机坐标位置的图像坐标位置处,渲染得到虚拟入口。
  10. 根据权利要求1所述的方法,其中,所述虚拟内容为全景视频;所述方法还包括:
    确定所述虚拟入口对应于现实场景中的空间区域;
    在当前终端的位置经过所述空间区域后,则直接显示全景视频中当前视野区域内的视频画面。
  11. 根据权利要求10所述的方法,其中,所述方法还包括:
    在直接显示全景视频中当前视野区域内的视频画面后,则在当前终端的位置未再次穿过所述空间区域、且当前视野区域经过移动后覆盖所述虚拟入口时,则确定当前视野区域中位于所述虚拟入口内的视野区域;
    在所述虚拟入口中显示获取的图像帧中确定的所述视野区域内的画面。
  12. 根据权利要求10所述的方法,其中,所述方法还包括:
    在当前终端的位置围绕所述空间区域移动时,则确定当前视野区域中位于所述虚拟入口内的视野区域;
    在所述虚拟入口中显示全景视频中确定的所述视野区域内的视频画面。
  13. 根据权利要求1所述的方法,其中,所述虚拟内容为全景视频;所述方法还包括:
    将采集的视频帧绘制于第一球体模型的球面内侧,将全景视频的全景视频画面绘制于第二球体模型的球面内侧;
    确定所述虚拟入口对应于现实场景中的空间区域;
    在当前终端的位置未曾穿过所述空间区域、或者当前终端的位置穿过所述空间区域的次数为偶数时,则按照渲染顺序和模型深度的逆序,根据当前视野区域内的所述第一球体模型、所述第二球体模型以及全透明的第三模型渲染得到用于显示的画面;
    其中,所述第一球体模型的球半径大于所述第二球体模型的球半径;所述第一球体模型的模型深度大于所述第二球体模型的模型深度;所述第二球体模型的模型深度大于所述第三模型的模型深度;所述第三模型用于在当前视野区域覆盖所述虚拟入口时,触发取消渲染当前视野区域中位于所述虚拟入口外的视野区域中的第二球体模型;或者,用于在视野区域未覆盖所述虚拟入口时触发取消渲染所述第二球体模型。
  14. 根据权利要求13所述的方法,其中,所述方法还包括:
    在当前终端的位置穿过所述空间区域的次数为奇数时,则按照渲染顺序和模型深度的逆序,根据当前视野区域内的所述第一球体模型、所述第二球体模型以及全透明的第四模型渲染得到用于显示的画面;
    其中,所述第二球体模型的模型深度大于所述第四模型的模型深度;所述第四模型用于在当前视野区域覆盖所述虚拟入口时,触发取消渲染当前视野区域中位于所述虚拟入口外的视野区域中的第二球体模型;或者,用于在视野区域未覆盖所述虚拟入口时触发取消渲染所述第二球体模型。
  15. 一种图像处理装置,所述装置应用于计算机设备,包括:
    获取模块,用于获取从现实场景中采集的图像帧;
    播放模块,用于将获取的图像帧按照采集的时序逐帧播放;
    确定模块,用于当获取的多帧图像帧中目标物运动所形成的轨迹满足触发条件时,确定现实场景中所述目标物所对应的位置;
    渲染模块,用于按照所述位置在当前播放的图像帧中渲染虚拟入口;在所述虚拟入口中显示虚拟内容。
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如权利要求1至14中任一项所述的图像处理方法的步骤。
  17. 一种计算机设备,包括存储器和处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如权利要求1至14中任一项所述的图像处理方法的步骤。
PCT/CN2019/083295 2018-05-22 2019-04-18 图像处理方法、装置、存储介质和计算机设备 WO2019223463A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP19808017.8A EP3798801A4 (en) 2018-05-22 2019-04-18 IMAGE PROCESSING METHOD AND APPARATUS, RECORDING MEDIA AND COMPUTER DEVICE
JP2020551294A JP7096902B2 (ja) 2018-05-22 2019-04-18 画像処理方法、装置、コンピュータプログラム及びコンピュータデバイス
US16/996,566 US11238644B2 (en) 2018-05-22 2020-08-18 Image processing method and apparatus, storage medium, and computer device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810494117.8A CN110515452B (zh) 2018-05-22 2018-05-22 图像处理方法、装置、存储介质和计算机设备
CN201810494117.8 2018-05-22

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/996,566 Continuation US11238644B2 (en) 2018-05-22 2020-08-18 Image processing method and apparatus, storage medium, and computer device

Publications (1)

Publication Number Publication Date
WO2019223463A1 true WO2019223463A1 (zh) 2019-11-28

Family

ID=68616193

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/083295 WO2019223463A1 (zh) 2018-05-22 2019-04-18 图像处理方法、装置、存储介质和计算机设备

Country Status (5)

Country Link
US (1) US11238644B2 (zh)
EP (1) EP3798801A4 (zh)
JP (1) JP7096902B2 (zh)
CN (1) CN110515452B (zh)
WO (1) WO2019223463A1 (zh)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111724439A (zh) * 2019-11-29 2020-09-29 中国科学院上海微系统与信息技术研究所 一种动态场景下的视觉定位方法及装置
CN111862288A (zh) * 2020-07-29 2020-10-30 北京小米移动软件有限公司 一种位姿渲染方法、装置及介质
CN112562068A (zh) * 2020-12-24 2021-03-26 北京百度网讯科技有限公司 人体姿态生成方法、装置、电子设备及存储介质
CN113256715A (zh) * 2020-02-12 2021-08-13 北京京东乾石科技有限公司 机器人的定位方法和装置
CN113421343A (zh) * 2021-05-27 2021-09-21 深圳市晨北科技有限公司 基于增强现实观测设备内部结构的方法
CN113766117A (zh) * 2020-11-09 2021-12-07 北京沃东天骏信息技术有限公司 一种视频去抖动方法和装置
CN114153307A (zh) * 2020-09-04 2022-03-08 中移(成都)信息通信科技有限公司 场景区块化处理方法、装置、电子设备及计算机存储介质
CN116475905A (zh) * 2023-05-05 2023-07-25 浙江闽立电动工具有限公司 角磨机的控制系统及其方法
CN116778127A (zh) * 2023-07-05 2023-09-19 广州视景医疗软件有限公司 一种基于全景图的三维数字场景构建方法及系统
CN113421343B (zh) * 2021-05-27 2024-06-04 深圳市晨北科技有限公司 基于增强现实观测设备内部结构的方法

Families Citing this family (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9332285B1 (en) * 2014-05-28 2016-05-03 Lucasfilm Entertainment Company Ltd. Switching modes of a media content item
US11227435B2 (en) 2018-08-13 2022-01-18 Magic Leap, Inc. Cross reality system
WO2020036898A1 (en) 2018-08-13 2020-02-20 Magic Leap, Inc. A cross reality system
JP2022512600A (ja) 2018-10-05 2022-02-07 マジック リープ, インコーポレイテッド 任意の場所における場所特有の仮想コンテンツのレンダリング
WO2021002687A1 (ko) * 2019-07-04 2021-01-07 (주) 애니펜 사용자 간의 경험 공유를 지원하기 위한 방법, 시스템 및 비일시성의 컴퓨터 판독 가능 기록 매체
US11416998B2 (en) * 2019-07-30 2022-08-16 Microsoft Technology Licensing, Llc Pixel classification to reduce depth-estimation error
EP4046139A4 (en) * 2019-10-15 2023-11-22 Magic Leap, Inc. EXTENDED REALITY SYSTEM WITH LOCATION SERVICE
EP4046401A4 (en) 2019-10-15 2023-11-01 Magic Leap, Inc. CROSS-REALLY SYSTEM WITH WIRELESS FINGERPRINTS
JP2023504775A (ja) 2019-11-12 2023-02-07 マジック リープ, インコーポレイテッド 位置特定サービスおよび共有場所ベースのコンテンツを伴うクロスリアリティシステム
WO2021118962A1 (en) 2019-12-09 2021-06-17 Magic Leap, Inc. Cross reality system with simplified programming of virtual content
CN111008260A (zh) * 2019-12-20 2020-04-14 山东省国土测绘院 轨迹的可视化方法、装置、设备和存储介质
EP4104001A4 (en) 2020-02-13 2024-03-13 Magic Leap Inc CROSS-REALLY SYSTEM WITH MAP PROCESSING USING MULTIPLE RESOLUTION FRAME DESCRIPTORS
US11830149B2 (en) 2020-02-13 2023-11-28 Magic Leap, Inc. Cross reality system with prioritization of geolocation information for localization
US20210256766A1 (en) * 2020-02-13 2021-08-19 Magic Leap, Inc. Cross reality system for large scale environments
EP4103910A4 (en) 2020-02-13 2024-03-06 Magic Leap Inc CROSS-REALLY SYSTEM WITH ACCURATE COMMON MAPS
JP2023515524A (ja) 2020-02-26 2023-04-13 マジック リープ, インコーポレイテッド 高速位置特定を伴うクロスリアリティシステム
WO2021222371A1 (en) 2020-04-29 2021-11-04 Magic Leap, Inc. Cross reality system for large scale environments
CN111310744B (zh) * 2020-05-11 2020-08-11 腾讯科技(深圳)有限公司 图像识别方法、视频播放方法、相关设备及介质
CN113709389A (zh) * 2020-05-21 2021-11-26 北京达佳互联信息技术有限公司 一种视频渲染方法、装置、电子设备及存储介质
CN114939278A (zh) * 2021-02-17 2022-08-26 武汉金运激光股份有限公司 一种基于nfc技术的轨道玩具车互动方法及系统
CN112933606B (zh) * 2021-03-16 2023-05-09 天津亚克互动科技有限公司 游戏场景转换方法及装置、存储介质、计算机设备
US11361519B1 (en) 2021-03-29 2022-06-14 Niantic, Inc. Interactable augmented and virtual reality experience
JP7467810B2 (ja) 2021-05-07 2024-04-16 Kyoto’S 3D Studio株式会社 複合現実感提供システムおよび複合現実感提供方法
CN113014824B (zh) * 2021-05-11 2021-09-24 北京远度互联科技有限公司 视频画面处理方法、装置及电子设备
CN113674435A (zh) * 2021-07-27 2021-11-19 阿里巴巴新加坡控股有限公司 图像处理方法、电子地图展示方法、装置及电子设备
WO2023049087A1 (en) * 2021-09-24 2023-03-30 Chinook Labs Llc Portal view for content items
CN113658296B (zh) * 2021-10-20 2022-01-25 腾讯科技(深圳)有限公司 一种图像渲染方法及相关装置
TWI817479B (zh) * 2022-04-29 2023-10-01 狂點軟體開發股份有限公司 結合現實世界與複數虛擬世界而為虛實互動之適地性「Metaverse」社群系統
CN114422698B (zh) * 2022-01-19 2023-09-26 北京字跳网络技术有限公司 视频生成方法、装置、设备及存储介质
CN114494328B (zh) * 2022-02-11 2024-01-30 北京字跳网络技术有限公司 图像显示方法、装置、电子设备及存储介质
CN117398680A (zh) * 2022-07-08 2024-01-16 腾讯科技(深圳)有限公司 虚拟对象的显示方法、装置、终端设备及存储介质
WO2024024357A1 (ja) * 2022-07-29 2024-02-01 ガラクーダ株式会社 画像表示装置および画像表示方法
CN116095355A (zh) * 2023-01-18 2023-05-09 百果园技术(新加坡)有限公司 视频显示控制方法及其装置、设备、介质、产品
CN117032618B (zh) * 2023-10-07 2024-02-02 启迪数字科技(深圳)有限公司 基于多屏幕的动画旋转方法、设备及介质
CN117078975B (zh) * 2023-10-10 2024-01-02 四川易利数字城市科技有限公司 一种基于进化算法的ar时空场景模式匹配方法

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168619A (zh) * 2017-03-29 2017-09-15 腾讯科技(深圳)有限公司 用户生成内容处理方法和装置

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5424405B2 (ja) * 2010-01-14 2014-02-26 学校法人立命館 複合現実感技術による画像生成方法及び画像生成システム
JP5601045B2 (ja) * 2010-06-24 2014-10-08 ソニー株式会社 ジェスチャ認識装置、ジェスチャ認識方法およびプログラム
KR101818024B1 (ko) * 2011-03-29 2018-01-12 퀄컴 인코포레이티드 각각의 사용자의 시점에 대해 공유된 디지털 인터페이스들의 렌더링을 위한 시스템
WO2013063767A1 (en) * 2011-11-01 2013-05-10 Intel Corporation Dynamic gesture based short-range human-machine interaction
US9530232B2 (en) * 2012-09-04 2016-12-27 Qualcomm Incorporated Augmented reality surface segmentation
US9552673B2 (en) * 2012-10-17 2017-01-24 Microsoft Technology Licensing, Llc Grasping virtual objects in augmented reality
JP2016062486A (ja) * 2014-09-19 2016-04-25 株式会社ソニー・コンピュータエンタテインメント 画像生成装置および画像生成方法
CN104331929B (zh) * 2014-10-29 2018-02-02 深圳先进技术研究院 基于视频地图与增强现实的犯罪现场还原方法
CN104536579B (zh) * 2015-01-20 2018-07-27 深圳威阿科技有限公司 交互式三维实景与数字图像高速融合处理系统及处理方法
CN105988562A (zh) * 2015-02-06 2016-10-05 刘小洋 智能穿戴设备及基于智能穿戴设备实现手势输入的方法
US20170186219A1 (en) * 2015-12-28 2017-06-29 Le Holdings (Beijing) Co., Ltd. Method for 360-degree panoramic display, display module and mobile terminal
US9898256B2 (en) * 2015-12-31 2018-02-20 Microsoft Technology Licensing, Llc Translation of gesture to gesture code description using depth camera
WO2017167381A1 (en) * 2016-03-31 2017-10-05 Softkinetic Software Method for foreground and background determination in an image
DE102016212236A1 (de) * 2016-07-05 2018-01-11 Siemens Aktiengesellschaft Interaktionssystem und -verfahren
CN107707839A (zh) * 2017-09-11 2018-02-16 广东欧珀移动通信有限公司 图像处理方法及装置
CN110827376A (zh) * 2018-08-09 2020-02-21 北京微播视界科技有限公司 增强现实多平面模型动画交互方法、装置、设备及存储介质
CN110020909A (zh) * 2019-01-14 2019-07-16 启云科技股份有限公司 采用虚拟实境技术的购物系统

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168619A (zh) * 2017-03-29 2017-09-15 腾讯科技(深圳)有限公司 用户生成内容处理方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3798801A4

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111724439A (zh) * 2019-11-29 2020-09-29 中国科学院上海微系统与信息技术研究所 一种动态场景下的视觉定位方法及装置
CN111724439B (zh) * 2019-11-29 2024-05-17 中国科学院上海微系统与信息技术研究所 一种动态场景下的视觉定位方法及装置
CN113256715B (zh) * 2020-02-12 2024-04-05 北京京东乾石科技有限公司 机器人的定位方法和装置
CN113256715A (zh) * 2020-02-12 2021-08-13 北京京东乾石科技有限公司 机器人的定位方法和装置
CN111862288A (zh) * 2020-07-29 2020-10-30 北京小米移动软件有限公司 一种位姿渲染方法、装置及介质
CN114153307A (zh) * 2020-09-04 2022-03-08 中移(成都)信息通信科技有限公司 场景区块化处理方法、装置、电子设备及计算机存储介质
CN113766117A (zh) * 2020-11-09 2021-12-07 北京沃东天骏信息技术有限公司 一种视频去抖动方法和装置
CN113766117B (zh) * 2020-11-09 2023-08-08 北京沃东天骏信息技术有限公司 一种视频去抖动方法和装置
CN112562068A (zh) * 2020-12-24 2021-03-26 北京百度网讯科技有限公司 人体姿态生成方法、装置、电子设备及存储介质
CN112562068B (zh) * 2020-12-24 2023-07-14 北京百度网讯科技有限公司 人体姿态生成方法、装置、电子设备及存储介质
CN113421343A (zh) * 2021-05-27 2021-09-21 深圳市晨北科技有限公司 基于增强现实观测设备内部结构的方法
CN113421343B (zh) * 2021-05-27 2024-06-04 深圳市晨北科技有限公司 基于增强现实观测设备内部结构的方法
CN116475905A (zh) * 2023-05-05 2023-07-25 浙江闽立电动工具有限公司 角磨机的控制系统及其方法
CN116475905B (zh) * 2023-05-05 2024-01-09 浙江闽立电动工具有限公司 角磨机的控制系统及其方法
CN116778127B (zh) * 2023-07-05 2024-01-05 广州视景医疗软件有限公司 一种基于全景图的三维数字场景构建方法及系统
CN116778127A (zh) * 2023-07-05 2023-09-19 广州视景医疗软件有限公司 一种基于全景图的三维数字场景构建方法及系统

Also Published As

Publication number Publication date
JP2021517309A (ja) 2021-07-15
US11238644B2 (en) 2022-02-01
CN110515452B (zh) 2022-02-22
CN110515452A (zh) 2019-11-29
EP3798801A1 (en) 2021-03-31
JP7096902B2 (ja) 2022-07-06
EP3798801A4 (en) 2021-07-14
US20200380769A1 (en) 2020-12-03

Similar Documents

Publication Publication Date Title
WO2019223463A1 (zh) 图像处理方法、装置、存储介质和计算机设备
US11217006B2 (en) Methods and systems for performing 3D simulation based on a 2D video image
US11335379B2 (en) Video processing method, device and electronic equipment
Liu et al. Semantic-aware implicit neural audio-driven video portrait generation
CN111656407B (zh) 对动态三维模型的视图进行融合、纹理化和绘制
US11024088B2 (en) Augmented and virtual reality
KR102624635B1 (ko) 메시징 시스템에서의 3d 데이터 생성
US11941748B2 (en) Lightweight view dependent rendering system for mobile devices
US20130321396A1 (en) Multi-input free viewpoint video processing pipeline
US20140035934A1 (en) Avatar Facial Expression Techniques
US10484599B2 (en) Simulating depth of field
US10621777B2 (en) Synthesis of composite images having virtual backgrounds
US11151791B2 (en) R-snap for production of augmented realities
KR20220167323A (ko) 메시징 시스템 내의 3d 데이터를 포함하는 증강 현실 콘텐츠 생성기들
US11889222B2 (en) Multilayer three-dimensional presentation
KR20230162107A (ko) 증강 현실 콘텐츠에서의 머리 회전들에 대한 얼굴 합성
US11481960B2 (en) Systems and methods for generating stabilized images of a real environment in artificial reality
WO2023098635A1 (zh) 图像处理
WO2022022260A1 (zh) 图像风格迁移方法及其装置
US20230326095A1 (en) Overlaying displayed digital content with regional transparency and regional lossless compression transmitted over a communication network via processing circuitry
WO2024077791A1 (zh) 视频生成方法、装置、设备与计算机可读存储介质
US20240062425A1 (en) Automatic Colorization of Grayscale Stereo Images
US20230334790A1 (en) Interactive reality computing experience using optical lenticular multi-perspective simulation
US20240020901A1 (en) Method and application for animating computer generated images
Zhou et al. Implementation of the Interaction Effect Among Virtual Large Curved Screens on Multiple Buildings Based on Mixed Reality

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19808017

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020551294

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019808017

Country of ref document: EP

Effective date: 20201222