CN112017300A

CN112017300A - Processing method, device and equipment for mixed reality image

Info

Publication number: CN112017300A
Application number: CN202010712798.8A
Authority: CN
Inventors: 吴涛
Original assignee: Qingdao Xiaoniao Kankan Technology Co Ltd
Current assignee: Qingdao Xiaoniao Kankan Technology Co Ltd
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2020-12-01

Abstract

The invention discloses a method, a device and equipment for processing a mixed reality image, wherein the method comprises the following steps: acquiring an environment scene image; identifying a target object contained in an environment scene image, and obtaining spatial position information of the target object; and overlaying the description information corresponding to the target object onto the environment scene image to obtain a mixed reality image.

Description

Processing method, device and equipment for mixed reality image

Technical Field

The embodiment of the disclosure relates to the technical field of image processing, and more particularly, to a method and a device for processing a mixed reality image and a head-mounted display device.

Background

Mixed Reality (MR) is a further development of virtual Reality technology, which builds an interactive feedback information loop among the virtual world, the real world and the user by introducing real scene information into the virtual environment, so as to enhance the Reality of the user experience.

Currently, in the field of infant teaching, augmented reality technology can be utilized to process teaching videos. When utilizing head-mounted display device to carry out infant's teaching, can show more lifelike scene for the user. However, the user cannot interact with the objects in the scene, and the user experience is poor.

Therefore, there is a need to provide a new approach to mixed reality image processing.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide a new technical solution for processing a mixed reality image.

According to a first aspect of the embodiments of the present disclosure, there is provided a method for processing a mixed reality image, the method including:

acquiring an environment scene image;

identifying a target object contained in the environment scene image, and obtaining spatial position information of the target object;

and based on the spatial position information, overlaying the description information corresponding to the target object onto the environment scene image to obtain a mixed reality image.

Optionally, after the step of identifying a target object included in the image of the environmental scene and obtaining spatial position information of the target object, the method further includes:

acquiring an initial data set of a virtual object to be displayed;

rendering the virtual object to be displayed in the environment scene image according to the plane parameter of the target plane in the environment scene image and the initial data set of the virtual object to be displayed;

and based on the spatial position information and the plane parameters, overlaying the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed on the environment scene image to obtain a mixed reality image.

Optionally, the environmental scene image comprises a first image and a second image captured by a binocular camera of the head mounted display device;

the step of identifying a target object contained in the image of the environmental scene and obtaining spatial position information of the target object includes:

identifying a target object contained in the first image based on a preset identification model, and obtaining first position information of a region where the target object is located;

identifying a target object contained in the second image based on a preset identification model, and obtaining second position information of the area where the target object is located;

and determining the spatial position information of the target object according to the first position information and the second position information.

Optionally, the method further comprises: training a predetermined recognition model:

acquiring a historical environment scene image;

determining the area of a target object in the historical environment scene image, and labeling the position information of the area of the target object and the category information of the target object;

generating a data set according to the marked historical environment scene images;

the recognition model is trained from the data set.

Optionally, the method further comprises:

identifying a target object contained in the environment scene image, and obtaining the category information of the target object;

and selecting the description information corresponding to the target object from a pre-established database according to the category information of the target object.

Optionally, the head mounted display device comprises a first camera and a second camera;

the step of superimposing the description information corresponding to the target object onto the environment scene image based on the spatial position information to obtain a mixed reality image includes:

determining three-dimensional coordinate information of a display position of the description information corresponding to the target object according to the spatial position information and a preset first offset;

converting the three-dimensional coordinate information to obtain a first pixel coordinate of the display position under an image coordinate system of the first camera and a second pixel coordinate of the display position under an image coordinate system of the second camera;

and according to the first pixel coordinate and the second pixel coordinate, overlaying the description information corresponding to the target object onto the environment scene image to obtain a mixed reality image.

Optionally, the method further comprises: the method comprises the following steps of determining plane parameters of a target plane in the environment scene image:

extracting feature points of the environment scene image;

constructing a feature point cloud according to the extracted feature points;

carrying out plane detection on the environment scene image based on the feature point cloud, and determining a target plane;

and acquiring plane parameters of the target plane, wherein the plane parameters comprise a central point coordinate and a normal vector.

Optionally, the step of rendering the virtual object to be displayed in the environment scene image according to the plane parameter of the target plane in the environment scene image and the initial data set of the virtual object to be displayed includes:

acquiring an initial data set of a virtual object to be displayed, wherein the initial data set comprises three-dimensional coordinate information of a plurality of characteristic points for constructing the virtual object;

determining the placement position of the virtual object to be displayed according to the coordinates of the central point of the target plane, and determining the placement direction of the virtual object to be displayed according to the normal vector of the target plane;

determining a target data set of the virtual object to be displayed according to the initial data set of the virtual object to be displayed, and the placement position and the placement direction of the virtual object to be displayed;

and rendering the virtual object to be displayed in the environment scene image according to the target data set of the virtual object to be displayed.

Optionally, the step of superimposing, based on the spatial position information and the plane parameter, the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed onto the environment scene image to obtain a mixed reality image includes:

determining first position information of a display position of the description information corresponding to the target object according to the spatial position information and a preset first offset;

determining second position information of a display position of the description information corresponding to the virtual object to be displayed according to the center point coordinate of the target plane and a preset second offset;

and according to the first position information and the second position information, overlaying the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed on the environment scene image to obtain a mixed reality image.

According to a second aspect of the embodiments of the present disclosure, there is provided a processing apparatus for mixed reality images, the apparatus including:

the acquisition module is used for acquiring an environment scene image;

the identification model is used for identifying a target object contained in the environment scene image and obtaining the spatial position information of the target object;

the image generation module is used for overlaying the description information corresponding to the target object to the environment scene image based on the spatial position information to obtain a mixed reality image;

or,

the apparatus comprises a processor and a memory, the memory storing computer instructions that, when executed by the processor, perform the method of any one of the first aspect of the embodiments of the present disclosure.

According to a third aspect of the embodiments of the present disclosure, there is provided a head mounted display device comprising a binocular camera and the processing apparatus of the mixed reality image according to the second aspect of the embodiments of the present disclosure.

According to the embodiment of the disclosure, through the processing of the mixed reality image, the description information related to the target object can be displayed to the user while the target object is displayed. According to the embodiment of the disclosure, by processing each frame of image in the video, the description information of the target object can be merged into the live-action video, so that a user can obtain the description information corresponding to the target object while watching the live-action video through the head-mounted display device, and the user experience is better.

The embodiment of the disclosure can be applied to teaching scenes, can show the target object to the user and show the related description information of the target object to the user at the same time, is convenient for the user to know the target object, can also increase the interestingness of teaching, and has better user experience.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the embodiments will be briefly described below. It is appreciated that the following drawings depict only certain embodiments of the invention and are therefore not to be considered limiting of its scope. For a person skilled in the art, it is possible to derive other relevant figures from these figures without inventive effort.

FIG. 1 is a schematic diagram of a hardware configuration of a head mounted display device that can be used to implement embodiments of the present disclosure;

fig. 2 is a schematic flow chart of a processing method of a mixed reality image according to an embodiment of the present disclosure;

fig. 3 is a flowchart illustrating a method for processing a mixed reality image according to an embodiment of the disclosure;

fig. 4 is a scene schematic diagram of a processing method of a mixed reality image according to an embodiment of the disclosure;

fig. 5 is a block diagram illustrating a structure of a mixed reality image processing apparatus according to an embodiment of the present disclosure;

fig. 6 is a block diagram illustrating a structure of a mixed reality image processing apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a head-mounted display device according to an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

< hardware configuration >

Fig. 1 is a hardware configuration diagram of a head-mounted display device 100 that can be used to implement the processing method of a mixed reality image according to an embodiment of the present disclosure.

In one embodiment, the head-mounted display device 100 may be a smart device such as a Virtual Reality (VR) device, an Augmented Reality (AR) device, or a Mixed Reality (Mixed Reality) device.

In one embodiment, the head mounted display device 100 includes a first camera and a second camera for simulating human eyes.

In one embodiment, as shown in FIG. 1, the head mounted display device 100 may include a processor 110, a memory 120, an interface device 130, a communication device 140, a display device 150, an input device 160, a speaker 170, a microphone 180, a camera 190, and the like. The processor 110 may include, but is not limited to, a central processing unit CPU, a microprocessor MCU, and the like. The processor 110 may further include, for example, an image processor gpu (graphics Processing unit), or the like. The memory 120 may include, for example, but is not limited to, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 130 may include, for example, but is not limited to, a USB interface, a serial interface, a parallel interface, an infrared interface, and the like. The communication device 140 can perform wired or wireless communication, and specifically can include WiFi communication, bluetooth communication, 2G/3G/4G/5G communication, and the like. The display device 150 may be, for example, a liquid crystal display, an LED display, a touch display, or the like. Input device 160 may include, for example, but is not limited to, a touch screen, a keyboard, somatosensory inputs, and the like. The speaker 170 and the microphone 180 may be used to output/input voice information. Camera 180 may be used, for example, to acquire image information, and camera 190 may be, for example, a binocular camera. Although a plurality of devices are shown for the head mounted display apparatus 100 in fig. 1, the present invention may only refer to some of the devices.

For application in an embodiment of the present disclosure, the memory 120 of the head-mounted display device 100 is configured to store instructions for controlling the processor 110 to operate so as to support a processing method for implementing a mixed reality image according to any embodiment provided by the first aspect of the present disclosure. The skilled person can design the instructions according to the disclosed embodiments of the present disclosure. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.

< method embodiment I >

Referring to fig. 2, a method for processing a mixed reality image according to an embodiment of the present disclosure is described. The method involves a head mounted display device, which may be the head mounted display device 100 as shown in fig. 1. The processing method of the mixed reality image comprises the following steps:

and step 210, acquiring an environment scene image.

In this embodiment, an image of an environmental scene is acquired through a head-mounted display device. The head mounted display device includes a binocular camera. The environmental scene image includes a first image captured by a first camera of the binocular cameras and a second image captured by a second camera of the binocular cameras. Wherein the first image and the second image are acquired at the same time. Optionally, the first camera and the second camera may be triggered using the same clock trigger source to ensure hardware synchronization of the first camera and the second camera. In this embodiment, the image sizes of the first image and the second image are the same, wherein the image sizes can be set in various ways.

In one embodiment, in a teaching scene, different objects can be displayed for a user through the environment scene image, and corresponding teaching contents are displayed for the displayed objects. The target object is included in the environmental scene image. The target object may be an object in the image of the environmental scene for presentation to the user. The environmental scene image may be, for example, an indoor scene image or an outdoor scene image. The target object included in the indoor scene image may be household goods, food, and the like. The household article may be, for example, a sofa, a table, a seat, etc. The food may be vegetables, fruits, snacks, etc. Such as apples, bananas, dragon fruits, tomatoes, green vegetables, etc. The target object contained in the outdoor scene image may be, for example, a shop, a bus, a traffic light, and the like.

After the environmental scene image is acquired, step 220 is entered.

Step 220, identifying a target object contained in the environment scene image, and obtaining spatial position information of the target object.

In one embodiment, the step of identifying a target object included in the image of the environmental scene and obtaining spatial position information of the target object may further include: and identifying a target object contained in the environment scene image based on a preset identification model, and obtaining the spatial position information of the target object. According to the embodiment of the disclosure, the target object in the environmental scene image is positioned based on the preset identification model, and the identification accuracy can be improved.

In this embodiment, the method for processing a mixed reality image further includes a step of training a predetermined recognition model. The step of training the predetermined recognition model may further comprise: step 410-.

And step 410, acquiring historical environment scene images.

In this example, the historical environmental scene image may be an image containing a target object. The environmental scene may be an indoor environment or an outdoor environment. The target object may be an object for presentation to a user.

Step 420, determining the area of the target object in the historical environment scene image, and labeling the position information of the area of the target object and the category information of the target object.

In this example, the target object in the historical scene image may be selected through a selection window. That is, the region in which the target object is located may be a region selected by the selection window. The position information of the region where the target object is located may include coordinates of corner points of the selection window and size information of the selection window. The size information of the selection window may include length information and width information of the selection window. Alternatively, the size information of the selection window may be a predetermined fixed value. Optionally, corresponding selection windows are set for different types of target objects, and the size information of the selection windows corresponding to the different types of target objects may be the same or different. The shape of the selected window can also be set according to actual needs. For example, the selection window is rectangular. The disclosed embodiments are not so limited.

And 430, generating a data set according to the marked historical environment scene images.

Step 440, training the recognition model according to the data set.

In this embodiment, the labeled historical environment scene images are used as training samples, and a data set is generated according to the plurality of labeled historical environment scene images. The greater the number of training samples in the dataset, the higher the accuracy of the training results. When the number of training samples in the data set reaches a certain value, the range of improvement in the accuracy of the training result gradually decreases until the accuracy becomes stable as the number of training samples increases. In this case, the number of training samples required for determining the accuracy of the training result and the data processing cost, that is, the number of historical environment scene images, can be considered.

In a more specific example, the step of generating the data set according to the annotated historical environmental scene image may further include: step 431-434.

Step 431, obtaining a first predetermined number of labeled historical environment scene images as a first data set.

In this step, a small number of historical environment scene images may be selected, and the selected historical environment scene images are manually labeled to label the position information of the region where the target object is located in the historical environment scene images and the category information of the target object.

Step 432, obtaining a second predetermined number of unlabeled historical environment scene images as a second data set.

In this step, the second predetermined number is much larger than the first predetermined number.

And 433, clustering the historical environment scene images in the second data set according to the first data set to obtain the position information of the region where the target object is located in each historical environment scene image in the second data set and the category information of the target object.

And step 434, taking the first data set and the clustered second data set as data sets.

According to the embodiment of the disclosure, a large number of unlabelled second data sets can be clustered according to a small number of labeled first data sets to determine the position information of the region where the target object is located in each historical environment scene image in the second data sets and the category information of the target object, and the first data sets and the clustered second data sets are used as data sets to train the recognition model. Therefore, the efficiency of acquiring the data set can be improved, and the labor cost is reduced. Further, the accuracy of recognition of the recognition model can be improved.

In a more specific example, the step of identifying a target object included in the image of the environmental scene and obtaining spatial position information of the target object may further include: step 510-530.

Step 510, identifying a target object contained in the first image based on a predetermined identification model, and obtaining first position information of a region where the target object is located.

Step 520, identifying a target object contained in the second image based on a predetermined identification model, and obtaining second position information of the area where the target object is located.

Step 530, determining spatial position information of the target object according to the first position information and the second position information.

In this example, the head mounted display device includes binocular cameras, i.e., a first camera and a second camera. The first image and the second image are acquired by a first camera and a second camera, respectively, and the first image and the second image are acquired at the same time. That is, the first image and the second image contain the same target object. The first position information may be position information of an area where the target object is located in an image coordinate system of the first camera. The second position information may be position information of an area where the target object is located in an image coordinate system of the second camera. For example, the position information of the region where the target object is located may be coordinates of one corner point of the selection window and size information of the selection window. The coordinates of the four corner points of the selection window can be calculated according to the coordinates of one corner point of the selection window and the size information of the selection window. Further, by using the principle of stereo triangulation, the spatial position information of the area where the target object is located in the camera coordinate system of the first camera or the camera coordinate system of the second camera can be calculated according to the first position information and the second position information. The spatial position information of the region where the target object is located may be three-dimensional coordinates of four corner points of the selected window in the camera coordinate system. According to the embodiment of the disclosure, the target object is positioned by positioning the area where the target object is located, and a plurality of feature points forming the target object do not need to be positioned, so that the positioning efficiency can be improved.

After identifying the target object included in the image of the environmental scene and obtaining the spatial position information of the target object, step 230 is entered.

And step 230, based on the spatial position information, overlaying the description information corresponding to the target object onto the environment scene image to obtain a mixed reality image.

In this embodiment, the description information at least includes teaching information of the target object. The description information may be subtitles or illustrations. And overlaying the description information of the target object on the environment scene image, and simultaneously showing the target object to the user and showing the description information of the target object to the user. In this embodiment, according to the description information of the target object, the relevant content of the target object can be explained to the user, so that the display and teaching of the target object are realized. The description information may be different for different classes of target objects. For example, the description information of the home product may include a name and a usage scene of the home product, and the like. The description information corresponding to the fruit may include the name of the fruit, the producing place of the fruit, the growing environment of the fruit, and the like. The traffic light description information may include the type, usage scenario, etc. of the traffic light. For example, as shown in fig. 4, descriptive information may be added to the fruit in the scene.

In one embodiment, the method for processing a mixed reality image may further include: and identifying a target object contained in the environment scene image, and obtaining the category information of the target object. And selecting the description information corresponding to the target object from a pre-established database according to the category information of the target object.

In an embodiment, the step of superimposing description information corresponding to the target object onto the environment scene image based on the spatial position information to obtain a mixed reality image may further include: step 610, 630.

And step 610, determining three-dimensional coordinate information of the display position of the description information corresponding to the target object according to the spatial position information and a preset first offset.

In one example, three-dimensional coordinates of a corner point at the upper left corner of a selection window of a target object are obtained; and according to the three-dimensional coordinates of the corner point and the preset first offset, the three-dimensional coordinate information of the display position of the description information corresponding to the target object can be determined. The three-dimensional coordinate information of the display position of the description information is determined according to the preset first offset, so that the description information can be ensured to be displayed near the target object, and meanwhile, the target object can be prevented from being shielded by the description information.

In one example, the determination of the display position of the description information of the target object may be a determination of a position of an information display window for displaying the description information. Specifically, three-dimensional coordinates of a corner point at the upper left corner of a selected window of a target object are obtained; according to the three-dimensional coordinates of the corner point and a preset first offset, the three-dimensional coordinate information of the corner point at the upper left corner of the information display window can be determined; and adding description information for the target object according to the three-dimensional coordinate information of the corner point at the upper left corner of the information display window and the size of the information display window. The description information may be subtitles or illustrations.

In one example, based on a SLAM (Simultaneous Localization And Mapping) algorithm, pose information of the head mounted display device with respect to the external environment may be obtained. The attitude information comprises a rotation matrix R of the head-mounted display device under a world coordinate system_HMDAnd translation vector T_HMD. And performing coordinate system conversion on the three-dimensional coordinate information of the display position of the description information according to the posture information of the head-mounted display equipment, and determining the three-dimensional coordinate information of the display position of the description information in a world coordinate system. Specifically, the three-dimensional coordinate information describing the display position of the information is subjected to coordinate system conversion based on the following formula (1).

P_w＝R_HMD*P_c+T_HMD (1)

Wherein, P_wThree-dimensional coordinate information in the world coordinate system for describing the display position of the information, P_cThree-dimensional coordinate information in the camera coordinate system for describing the display position of the information, R_HMDFor a rotation matrix of the head-mounted display device in the world coordinate system, T_HMDA translation vector of the head-mounted display device in a world coordinate system.

And step 620, converting the three-dimensional coordinate information to obtain a first pixel coordinate of the display position in the image coordinate system of the first camera and a second pixel coordinate of the display position in the image coordinate system of the second camera.

In one example, the first pixel coordinates and the second pixel coordinates are calculated based on the following equations (2) - (4).

P_uv1＝k₁*E*P_w (2)

P_uv2＝k₂*E*P_w (3)

Wherein, P_uv1For displaying a first pixel coordinate, P, of a position in an image coordinate system of the first camera_uv2For displaying a second pixel coordinate, k, of a position in the image coordinate system of the second camera₁Is the internal reference matrix of the first camera, k₂Is an internal reference matrix of the second camera, P_wTo describe the three-dimensional coordinate information of the display position of the information in the world coordinate system, E is the transformation matrix of the head-mounted display device, R_HMDFor a rotation matrix of the head-mounted display device in the world coordinate system, T_HMDA translation vector of the head-mounted display device in a world coordinate system.

Step 630, according to the first pixel coordinate and the second pixel coordinate, superimposing the description information corresponding to the target object onto the environment scene image to obtain a mixed reality image.

In this embodiment, the binocular camera of the head-mounted display device may display the description information superimposed on the environmental scene image according to the obtained first pixel coordinates and second pixel coordinates. Further, according to each environment scene image of the live-action video, the description information of the target object can be displayed in the live-action video in an overlapping manner.

< method example two >

Referring to fig. 3, a method for processing a mixed reality image according to an embodiment of the present disclosure is described. The method involves a head mounted display device, which may be the head mounted display device 100 as shown in fig. 1. The processing method of the mixed reality image comprises the following steps.

And step 310, acquiring an environment scene image.

And step 320, identifying a target object contained in the environment scene image, and obtaining spatial position information of the target object.

In this embodiment, the specific processes of obtaining the environment scene image in step 310 and identifying the spatial position information of the target object in step 320 are as described in the foregoing embodiments, and are not described herein again.

Step 330, obtaining an initial data set of the virtual object to be displayed.

In one embodiment, an initial dataset of a virtual object to be rendered includes three-dimensional coordinate information for a plurality of feature points that construct the virtual object. From the initial data set, a virtual object can be constructed. According to the actual use scene of the user, a plurality of initial data sets of the virtual objects to be displayed can be generated in advance. Optionally, the initial data of the virtual object to be presented may be stored in the head mounted display device, or may be stored in the server. After the initial data set of the virtual object to be rendered, step 340 is entered.

Step 340, rendering the virtual object to be displayed in the environment scene image according to the plane parameter of the target plane in the environment scene image and the initial data set of the virtual object to be displayed.

In this embodiment, the target plane in the image of the environmental scene may be the ground, a desktop, a platform, etc. A virtual object may be generated based on the target plane. According to the embodiment of the disclosure, in the process of teaching through the head-mounted display device, the virtual object can be converted into the live-action video, so that a user can interact with the virtual object while watching the live-action video through the head-mounted display device, the user can conveniently observe the virtual object from different angles, the effect of teaching based on the head-mounted display device is improved, and the user experience is better.

In one embodiment, the step of rendering the virtual object to be displayed in the environmental scene image according to the plane parameter of the target plane in the environmental scene image and the initial data set of the virtual object to be displayed may further include: step 710-.

Step 710, obtaining an initial data set of a virtual object to be displayed, wherein the initial data set comprises three-dimensional coordinate information of a plurality of characteristic points for constructing the virtual object.

Step 720, determining the placement position of the virtual object to be displayed according to the coordinates of the center point of the target plane, and determining the placement direction of the virtual object to be displayed according to the normal vector of the target plane.

Alternatively, when generating the virtual object, the center of gravity of the virtual object may be made to coincide with the center point of the target plane, and the virtual object may be stacked in the normal vector direction of the target plane. The placing position and the placing direction of the virtual object to be displayed are determined according to the plane parameters of the target plane, so that the virtual object to be displayed can be displayed in the middle of the mixed reality image, and the displayed virtual object is prevented from inclining.

Step 730, determining a target data set of the virtual object to be displayed according to the initial data set of the virtual object to be displayed, and the placement position and the placement direction of the virtual object to be displayed.

Step 740, rendering the virtual object to be displayed in the environment scene image according to the target data set of the virtual object to be displayed.

In one example, the target data set of the virtual object includes three-dimensional coordinate information of a plurality of feature points constructing the virtual object, and the three-dimensional coordinate information of the feature is coordinate information converted from plane parameters of the target plane. Rendering the virtual object to be displayed in the environment scene image specifically comprises converting three-dimensional coordinate information of each feature point in a target data set; obtaining a third pixel coordinate of the feature point in an image coordinate system of the first camera and a fourth pixel coordinate of the feature point in an image coordinate system of the second camera; and rendering the virtual object to be displayed in the environment scene image by a binocular camera of the head-mounted display device according to the third pixel coordinate and the fourth pixel coordinate of each feature point in the target data set.

In one embodiment, the method for processing the mixed reality image may further include the step of determining a plane parameter of an object plane in the image of the environmental scene. Specifically, step 710 and step 740 may be further included.

And 710, extracting feature points of the environment scene image.

In this step, the environmental scene image includes a first image captured by a first camera of the binocular cameras and a second image captured by a second camera of the binocular cameras. And extracting the characteristic points of the first image to obtain a plurality of first characteristic points. The Feature point extraction may be performed on the image by using a FAST from estimated Segment Test (FAST from Accelerated Segment Test) corner detection algorithm, a Scale Invariant Feature Transform (SIFT) algorithm, or an object FAST and Rotated BRIEF (ORB) algorithm, which is not limited herein.

And 720, constructing a feature point cloud according to the extracted feature points.

In one example, the feature point cloud may include three-dimensional coordinate information of the plurality of first feature points in a world coordinate system. Specifically, feature point matching is performed in the second image by adopting an epipolar matching method, and a plurality of second feature points matched with the plurality of first feature points in the second image are obtained. And calculating three-dimensional coordinate information of a plurality of first characteristic points in the first camera coordinate system according to the first characteristic points and the second characteristic points through triangulation. And converting a coordinate system according to the pose of the head-mounted display equipment during the acquisition of the environmental scene image of the current frame, and obtaining three-dimensional coordinate information of the first feature points in a world coordinate system, namely obtaining the feature point cloud.

And 730, carrying out plane detection on the environment scene image based on the characteristic point cloud, and determining a target plane.

In one example, a current plane is determined according to three feature points randomly extracted from a feature point cloud; and acquiring the normal vector and the number of the first interior points of the current plane, and determining the current effective plane of the current plane when the normal vector and the number of the first interior points of the current plane meet the preset condition. The first interior point is a feature point having a distance from the current plane less than a first predetermined distance. And after determining a plurality of current effective planes, selecting the plane with the maximum first interior point number as a target plane.

And 740, acquiring plane parameters of the target plane, wherein the plane parameters comprise a central point coordinate and a normal vector.

The following describes a specific example of determining to render a virtual object to be displayed in an environmental scene image. The virtual object to be displayed is a dinosaur model.

And step 901, acquiring an environment scene image.

And step 902, determining plane parameters of a target plane in the environment scene image. The plane parameters comprise three-dimensional coordinates P of the center point of the target plane_cplaneSum normal vector V_cplane。

Step 903, based on the following formulas (5) and (6), performing three-dimensional coordinate P on the central point of the target plane according to the posture information of the head-mounted display device_cplaneSum normal vector V_cplaneThe coordinate system is converted.

P_wplane＝R_HMD*P_cplane+T_HMD (5)

V_wplane＝R_HMD*V_cplane+T_HMD (6)

Wherein, P_wplaneThree-dimensional coordinate information of the center point of the target plane in the world coordinate system, P_cplaneThree-dimensional coordinate information of the center point of the target plane in the camera coordinate system, W_wplaneIs a normal vector, V, of the target plane in a world coordinate system_cplaneIs a normal vector of the target plane in the camera coordinate system, R_HMDFor a rotation matrix of the head-mounted display device in the world coordinate system, T_HMDA translation vector of the head-mounted display device in a world coordinate system.

Step 904, obtaining an initial data set of the dinosaur model to be displayed;

step 905, according to the initial data set of the dinosaur model to be displayed, the three-dimensional coordinate P of the central point of the target plane in the world coordinate system_wplaneSum normal vector V_wplaneAnd determining a target data set of the virtual object to be displayed.

Step 906, converting the three-dimensional coordinate information of each feature point in the target data set of the dinosaur model based on the following formulas (7) to (9); and obtaining a third pixel coordinate of the feature point in the image coordinate system of the first camera and a fourth pixel coordinate of the feature point in the image coordinate system of the second camera.

P_uv3＝k₁*E*P_wdinosant (7)

P_uv4＝k₂*E*P_wdinosant (8)

Wherein, P_wdinosantThree-dimensional coordinate information of any characteristic point in the target data set in a world coordinate system, P_uv3Is the third pixel coordinate, P, of any feature point in the target data set in the image coordinate system of the first camera_uv4Is the third pixel coordinate, k, of any feature point in the target data set in the image coordinate system of the first camera₁Is the internal reference matrix of the first camera, k₂Is a secondAn internal reference matrix of the camera, E is a transformation matrix of the head-mounted display device, R_HMDFor a rotation matrix of the head-mounted display device in the world coordinate system, T_HMDA translation vector of the head-mounted display device in a world coordinate system.

Step 907, rendering the dinosaur model to be displayed in the environment scene image according to the third pixel coordinate and the fourth pixel coordinate of each feature point in the target data set of the dinosaur model.

Rendering the virtual object to be shown in the environment scene image, and then entering step 350.

And step 350, based on the spatial position information and the plane parameters, overlaying the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed on the environment scene image to obtain a mixed reality image.

In an embodiment, the step of superimposing, on the basis of the spatial position information and the plane parameter, the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed onto the environment scene image to obtain a mixed reality image may further include: step 810-.

Step 810, determining first position information of a display position of the description information corresponding to the target object according to the spatial position information and a predetermined first offset.

The description information corresponding to the target object at least comprises teaching information of the target object. And overlaying the description information of the target object on the environment scene image, and displaying the target object to the user while displaying the related teaching information of the target object to the user. In one embodiment, a target object included in the environmental scene image is identified, and category information of the target object is obtained. And selecting the description information corresponding to the target object from a pre-established database according to the category information of the target object. The process of determining the first location information is as described in the foregoing embodiments, and is not described herein again.

And step 820, determining second position information of the display position of the description information corresponding to the virtual object to be displayed according to the center point coordinate of the target plane and a preset second offset.

The description information corresponding to the virtual object to be displayed at least comprises teaching information of the virtual object. The description information may be subtitles or illustrations. After the virtual object to be displayed is rendered in the environment scene image, description information of the virtual object may be superimposed on the environment scene image, so that the user may be displayed with the virtual object and the related teaching information of the virtual object. In this embodiment, according to the description information of the virtual object, the relevant content of the target object can be explained to the user, so that the display and teaching of the virtual object are realized.

Step 830, according to the first position information and the second position information, superimposing the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed on the environment scene image to obtain a mixed reality image.

The first position information of the display position of the description information of the target object is determined according to the preset first offset, and the second position information of the display position of the description information of the virtual object is determined according to the preset second offset, so that the target object or the virtual object can be prevented from being shielded by the description information while the corresponding description information is displayed near the target object and the virtual object.

According to the embodiment of the disclosure, in the process of teaching through the head-mounted display device, the virtual object can be converted into the live-action video, so that a user can interact with the virtual object while watching the live-action video through the head-mounted display device, the user can conveniently observe the virtual object from different angles, the effect of teaching based on the head-mounted display device is improved, and the user experience is better. In addition, in the process of teaching through the head-mounted display device, the description information corresponding to the target object and the virtual object can be integrated into the live-action video, so that a user can watch the live-action video through the head-mounted display device and simultaneously obtain the teaching information corresponding to the target object and the virtual object, and the user experience is better.

< first embodiment of the apparatus >

Referring to fig. 5, the embodiment of the present disclosure provides a processing apparatus 50 for a mixed reality image, where the processing apparatus 50 for a mixed reality image includes an acquisition module 51, a recognition module 52, and an image generation module 53.

The acquisition module 51 may be used to acquire an image of an environmental scene.

The identification module 52 may be configured to identify a target object included in the environmental scene image, and obtain spatial position information of the target object.

In a specific example, the identifying module 52 is specifically configured to identify a target object included in the first image based on a predetermined identification model, and obtain first position information of a region where the target object is located; identifying a target object contained in the second image based on a preset identification model, and obtaining second position information of the area where the target object is located; and determining spatial position information of the target object according to the first position information and the second position information.

In a specific example, the identifying module 52 is specifically configured to identify a target object included in the environmental scene image, and obtain category information of the target object.

The image generating module 53 may be configured to superimpose the description information corresponding to the target object onto the environment scene image based on the spatial position information, so as to obtain a mixed reality image.

In one embodiment, the description information includes at least teaching information of the target object.

In a specific example, the image generating module 53 is specifically configured to determine, according to the spatial position information and a predetermined first offset, three-dimensional coordinate information of a display position of the description information corresponding to the target object;

In an embodiment, the processing apparatus 50 of the mixed reality image may further include a description information obtaining module, and the description information obtaining module may be configured to select, according to the category information of the target object, description information corresponding to the target object from a pre-established database.

In one embodiment, the processing device 50 for mixed reality images may further include an initial dataset acquisition module, a virtual object generation module.

In this embodiment, the initial data set obtaining module may be configured to obtain an initial data set of a virtual object to be presented.

The virtual object generation module may be configured to render the virtual object to be displayed in the environment scene image according to a plane parameter of a target plane in the environment scene image and the initial data set of the virtual object to be displayed.

In a specific example, the virtual object generation module is specifically configured to obtain an initial data set of a virtual object to be displayed, where the initial data set includes three-dimensional coordinate information of a plurality of feature points for constructing the virtual object; determining the placement position of the virtual object to be displayed according to the coordinates of the central point of the target plane, and determining the placement direction of the virtual object to be displayed according to the normal vector of the target plane; determining a target data set of the virtual object to be displayed according to the initial data set of the virtual object to be displayed, and the placement position and the placement direction of the virtual object to be displayed; and rendering the virtual object to be displayed in the environment scene image according to the target data set of the virtual object to be displayed.

The image generating module 53 may be further configured to superimpose, based on the spatial position information and the plane parameter, the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed onto the environment scene image, so as to obtain a mixed reality image.

In a specific example, the image generating module 53 is specifically configured to determine, according to the spatial position information and a predetermined first offset, first position information of a display position of description information corresponding to the target object; determining second position information of a display position of the description information corresponding to the virtual object to be displayed according to the center point coordinate of the target plane and a preset second offset; and according to the first position information and the second position information, overlaying the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed on the environment scene image to obtain a mixed reality image.

Referring to fig. 6, the embodiment of the present disclosure provides a processing apparatus 60 of a mixed reality image, and the processing apparatus 60 of the mixed reality image includes a processor 61 and a memory 62. The memory 62 is used for storing a computer program, and the computer program is executed by the processor 61 to implement the processing method of the mixed reality image disclosed in any one of the foregoing embodiments.

< example II of the apparatus >

Referring to fig. 7, an embodiment of the present disclosure provides a head mounted display device 70, which may be the head mounted display device 100 shown in fig. 1. The head mounted display device 70 includes a binocular camera 71 and an image processing apparatus 72.

In one embodiment, the image processing device 72 may be, for example, the processing device 50 for a mixed reality image as shown in fig. 5, or may be the processing device 60 for a mixed reality image as shown in fig. 6.

The embodiments in the present disclosure are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments, but it should be clear to those skilled in the art that the embodiments described above can be used alone or in combination with each other as needed. In addition, for the device embodiment, since it corresponds to the method embodiment, the description is relatively simple, and for relevant points, refer to the description of the corresponding parts of the method embodiment. The system embodiments described above are merely illustrative, in that modules illustrated as separate components may or may not be physically separate.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "like" programming languages, or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, by software, and by a combination of software and hardware are equivalent.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. A processing method of a mixed reality image is applied to a head-mounted display device, and is characterized by comprising the following steps:

acquiring an environment scene image;

2. The method of claim 1, wherein after the step of identifying a target object included in the image of the environmental scene and obtaining spatial location information of the target object, the method further comprises:

acquiring an initial data set of a virtual object to be displayed;

3. The method of claim 1, wherein the environmental scene image comprises a first image and a second image captured by a binocular camera of the head mounted display device;

4. The method of claim 3, wherein the method further comprises: training a predetermined recognition model:

acquiring a historical environment scene image;

the recognition model is trained from the data set.

5. The method of claim 1, wherein the method further comprises:

6. The method of claim 1, wherein the head mounted display device comprises a first camera and a second camera;

7. The method of claim 2, wherein the method further comprises: the method comprises the following steps of determining plane parameters of a target plane in the environment scene image:

extracting feature points of the environment scene image;

constructing a feature point cloud according to the extracted feature points;

8. The method of claim 7, wherein the step of rendering the virtual object to be shown in the environmental scene image according to the plane parameter of the target plane in the environmental scene image and the initial data set of the virtual object to be shown comprises:

9. The method according to claim 8, wherein the step of superimposing, on the basis of the spatial position information and the plane parameter, the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed onto the image of the environmental scene to obtain the mixed reality image includes:

10. An apparatus for processing mixed reality images, the apparatus comprising:

the acquisition module is used for acquiring an environment scene image;

or,

the apparatus comprises a processor and a memory, the memory storing computer instructions that, when executed by the processor, perform the method of any of claims 1-9.

11. A head-mounted display device comprising a binocular camera and the processing means of the mixed reality image of claim 10.