CN111966213A

CN111966213A - Image processing method, device, equipment and storage medium

Info

Publication number: CN111966213A
Application number: CN202010607638.7A
Authority: CN
Inventors: 吴涛; 贾维
Original assignee: Qingdao Xiaoniao Kankan Technology Co Ltd
Current assignee: Qingdao Xiaoniao Kankan Technology Co Ltd
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2020-11-20

Abstract

The invention discloses an image processing method, an image processing device, image processing equipment and a storage medium. The method comprises the following steps: acquiring positions of the head-mounted equipment, a handle matched with the head-mounted equipment and two feet of a wearer wearing the head-mounted equipment under a local world coordinate system of the head-mounted equipment respectively; determining the position of the virtual character matched with the wearer in a local world coordinate system according to the acquired position; converting the position of the virtual character in the local world coordinate system to the position in the reference world coordinate system, and sending the position to other head-mounted equipment in the interactive system; receiving the positions of the corresponding virtual characters under the reference world coordinate system from other head-mounted equipment in the interactive system; and rendering the virtual scene according to the position of the corresponding virtual character under the reference coordinate system.

Description

Image processing method, device, equipment and storage medium

Technical Field

The present invention relates to the field of image processing, and more particularly, to an image processing method, an image processing apparatus, a head-mounted device, and a computer-readable storage medium.

Background

At present, the realization of multi-person interaction in a Virtual scene (multi-person interaction for short) through a VR (Virtual Reality) device has become one of important technologies in a Virtual Reality technology.

When multiple persons interact, it is very critical to realize 1:1 restoration of the body and hand actions of a user wearing the VR device in a virtual scene. Currently, when performing 1:1 reduction, as shown in fig. 1, it is usually required that each user wears a VR head-mounted display device and each of two hands has a handle controller (the handle controller may be an interactive tool in a virtual scene such as a gun, a sword, or a bow and arrow in the virtual scene). And optical tracking Mark points are respectively placed at joints of each moving part of legs, arms and the like of the user and on the handle controller, or tracking sensors (IMU inertial navigation unit, electromagnetic sensor and the like) are worn. At the same time, users are required to carry a small PC computer and erect a plurality of tracking cameras and a camera server in an interactive field. When the user freely moves in the field space, the tracking camera can track each part of the body of the user, optical Mark points on the handle controller and the VR head-mounted display equipment of the user in real time, and transmits tracking data obtained by tracking into the camera server in real time. The camera server counts and calculates the tracking data of all users and transmits the tracking data to the PC computer carried by each user in real time through a data line. The PC computer performs scene rendering by combining the transmitted user tracking data, and then displays the scene rendering on VR head-mounted display equipment of the corresponding user in real time. In this way, the user can see various actions of the virtual counterparts of each other in the virtual scene through the VR head-mounted display device, including the user's walking, various movements of the handle controller, whether the user is swinging their arms, and the like.

However, the existing multi-person interaction scheme has the following problems: a plurality of optical tracking Mark points or tracking sensors are placed on the body of the user, so that the wearing complexity of the user is increased; each user needs to bear a PC computer, and the increased carrying weight of the computer and the heat dissipation of the computer during operation reduce the use experience of the user; each user needs to be in cable connection with the tracking camera server, so that the range of motion of the user is limited; besides the VR head-mounted display device, a plurality of Mark points and a plurality of other devices, such as a tracking camera, a tracking camera server, a PC computer, etc., need to be set, which increases the interaction cost.

Disclosure of Invention

It is an object of the invention to provide a new image processing scheme.

According to a first aspect of the present invention, there is provided an image processing method applied to any head-mounted device in an interactive system, including:

acquiring positions of the head-mounted equipment, a handle matched with the head-mounted equipment and feet of a wearer wearing the head-mounted equipment under a local world coordinate system of the head-mounted equipment respectively;

determining the position of a virtual character matched with the wearer in the local world coordinate system according to the acquired position;

converting the position of the virtual character in the local world coordinate system to a position in a reference world coordinate system, and sending the position to other head-mounted equipment in the interactive system;

receiving the positions of the corresponding virtual characters under the reference world coordinate system from other head-mounted devices in the interactive system;

and rendering a virtual scene according to the position of the corresponding virtual character under the reference coordinate system.

Optionally, the acquiring the positions of the feet of the wearer in a world coordinate system local to the head-mounted device includes:

and estimating the positions of the feet of the wearer wearing the head-mounted equipment in the local world coordinate system according to the positions of the head-mounted equipment in the local world coordinate system.

Optionally, the acquiring a position of a handle matched with the head-mounted device in a world coordinate system local to the head-mounted device includes:

determining a relative position between a handle matched with the head-mounted equipment and the head-mounted equipment according to an electromagnetic signal emitted by the handle and inertial data of the handle;

and determining the position of the handle in the local world coordinate system according to the relative position and the position of the head-mounted device in the local world coordinate system.

Optionally, the converting the position of the virtual character in the local world coordinate system to the position in the reference world coordinate system includes:

converting the position of the virtual character in the local world coordinate system to the position in the reference world coordinate system by using a first conversion relation, a second conversion relation and a third conversion relation;

the first conversion relation is a conversion relation between a world coordinate system local to the head-mounted device and a camera coordinate system local to the head-mounted device, the second conversion relation is a conversion relation between the camera coordinate system local to the head-mounted device and the reference camera coordinate system, and the third conversion relation is a conversion relation between the reference camera coordinate system and the reference world coordinate system.

Optionally, the method further includes a step of obtaining the second conversion relationship, including:

acquiring a feature vector of each first feature point in a historical frame image acquired by the head-mounted device and a feature vector of each second feature point in a set space image acquired by the reference device, wherein the historical frame image belongs to the set space image;

determining a first preset number of feature point pairs according to the feature vector of the first feature point and the feature vector of the second feature point, wherein one pair of feature point pairs consists of one first feature point and the matched second feature point;

and determining the second conversion relation according to the position of the first feature point in each feature point pair in the camera coordinate system local to the head-mounted equipment and the position of the corresponding second feature point in the reference camera coordinate system.

Optionally, the method further includes:

acquiring an image of the wearer's hand;

determining the positions of the skeleton points of the two hands of the wearer under the local world coordinate system according to the images of the hands;

and updating the virtual character according to the position of the skeleton point in the local world coordinate system.

Optionally, the method further includes:

acquiring images of the eyes and the mouth of the wearer;

determining the positions of the eyes and the mouth under the local world coordinate system according to the images of the eyes and the mouth of the wearer;

and updating the virtual character according to the positions of the eyes and the mouth in the local world coordinate system.

According to a second aspect of the present invention, there is provided an image processing apparatus comprising:

the acquisition module is used for acquiring the positions of the head-mounted equipment, a handle matched with the head-mounted equipment and the positions of the feet of a wearer wearing the head-mounted equipment under a local world coordinate system of the head-mounted equipment respectively;

the determining module is used for determining the position of the virtual character matched with the wearer in the local world coordinate system according to the acquired position;

the conversion module is used for converting the position of the virtual character in the local world coordinate system to the position in the reference world coordinate system and sending the position to other head-mounted equipment in the interactive system;

the receiving module is used for receiving the positions of the corresponding virtual characters under the reference world coordinate system from other head-mounted equipment in the interactive system;

and the rendering module is used for rendering the virtual scene according to the position of the corresponding virtual character under the reference coordinate system.

According to a third aspect of the present invention there is provided a head-mounted device comprising an apparatus as described in the second aspect;

or, a memory and a processor, wherein:

the memory is for storing executable instructions for controlling the processor to perform the method according to any one of the first aspects.

According to a fourth aspect of the present invention, there is provided a computer readable storage medium storing computer instructions which, when executed by a processor, implement the method of any one of the first aspects.

In the embodiment, the positions of the head-mounted device, the handle matched with the head-mounted device and the feet of the wearer wearing the head-mounted device are obtained respectively in the local world coordinate system of the head-mounted device. And determining the position of the virtual character matched with the wearer in the local world coordinate system according to the acquired position. Based on this, the head-mounted device can obtain a virtual character 1:1 with the wearer. The virtual character is represented by the positions of a plurality of characteristic points capable of outlining the virtual character in a local world coordinate system. The head-mounted equipment converts the position of the virtual character in the local world coordinate system to the position in the reference world coordinate system and sends the position to other head-mounted equipment in the interactive system. Meanwhile, the head-mounted equipment receives the positions of the corresponding virtual characters in the reference world coordinate system from other head-mounted equipment in the interactive system, namely the positions of the virtual characters matched with the wearers of the other head-mounted equipment in the interactive system in the same coordinate system can be obtained; and rendering the virtual scene according to the position of the corresponding virtual character under the reference coordinate system. I.e. the behaviour of the other wearer can be observed in the virtual scene in real time. Thus, multi-person interaction can be realized. In this embodiment, the wearer need only wear the headset and hold the mating handle. Thus, the situation that a plurality of optical tracking Mark points or tracking sensors are placed on the body of a wearer, a PC computer is carried on the back, and cable connection is carried out with a tracking camera server is avoided. This improves the use experience for the wearer and increases the range of motion for the wearer. Meanwhile, the places where multiple persons interact do not need to be provided with equipment such as a tracking camera, a tracking camera server, a PC (personal computer) and the like, so that the interaction cost is reduced.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a diagram of a conventional scenario for implementing multi-person interaction;

fig. 2 is a block diagram of a hardware configuration of a head-mounted device that implements an image processing method according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating an image processing method according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a head-mounted device provided by an embodiment of the invention;

FIG. 5 is a diagram illustrating a multi-user interactive scene implemented by the image processing method according to the embodiment of the present invention;

FIG. 6 is a schematic diagram of a mark point corresponding to a human eye according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a corresponding mark point of a mouth according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of another head-mounted device according to an embodiment of the present invention.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

< hardware configuration embodiment >

Fig. 2 is a block diagram of a hardware configuration of a head-mounted device that implements an image processing method according to an embodiment of the present invention.

The head-mounted device 2000 may be a VR head-mounted display device, and may also be an MR head-mounted display device.

The headset 2000 may include a processor 2100, a memory 2200, an interface device 2300, a communication device 2400, a display device 2500, an input device 2600, a speaker 2700, a microphone 2800, and so on. The processor 2100 may be a central processing unit CPU, a microprocessor MCU, or the like. The memory 2200 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 2300 includes, for example, a USB interface, a headphone interface, and the like. Communication device 2400 is capable of wired or wireless communication, for example. The display device 2500 is, for example, a liquid crystal display panel, a touch panel, or the like. The input device 2600 may include, for example, a touch screen, a keyboard, and the like. A user can input/output voice information through the speaker 2700 and the microphone 2800.

Although a plurality of apparatuses are shown for each of the head-mounted device 2000 in fig. 2, the present invention may relate to only some of the apparatuses, for example, the head-mounted device 2000 only relates to the memory 2200 and the processor 2100.

In an embodiment of the present invention, the memory 2200 of the head-mounted device 2000 is configured to store instructions for controlling the processor 2100 to execute the image processing method provided by the embodiment of the present invention.

In the above description, the skilled person will be able to design instructions in accordance with the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.

< method examples >

The embodiment of the invention provides an image processing method which is applied to any one head-mounted device in an interactive system. Any one of the headsets may be a headset 2000 as shown in fig. 2. And the interactive system at least comprises two head-mounted devices, and the head-mounted devices in the interactive system are in communication connection.

As shown in fig. 3, the image processing method provided by the embodiment of the present invention includes the following steps S3100 to S3500:

s3100, acquiring positions of the head-mounted device, a handle matched with the head-mounted device and feet of a wearer wearing the head-mounted device in a local world coordinate system of the head-mounted device respectively.

In this embodiment, the manner of obtaining the position of the head-mounted device in the local world coordinate system of the head-mounted device is as follows: as shown in fig. 4, 4 imaging devices C1, C2, C3, and C4 (imaging devices C1, C2, C3, and C4 constitute a 6DOF module) that track the external environment are provided on the head set, and the field angle FOV of each imaging device is at least 150 °, and the frame rate is 30Hz or higher. And an inertial measurement unit IMU is arranged on the head-mounted equipment, and the frequency of IMU information acquired by the IMU is 1000 Hz. The head-mounted device obtains 6DOF information of the head-mounted device in a local world coordinate system based on the external environment obtained by scanning the same time by 4 imaging devices tracking the external environment. And the head-mounted equipment fuses the IMU information and the 6DOF information to obtain the 6DOF information according with the refresh rate of the head-mounted equipment. Further, the head-mounted device obtains 6DOF information based on the fusion, and obtains the position of the head-mounted device in the local world coordinate system.

It will be appreciated that the head-mounted device is relatively stationary with respect to the head of the wearer wearing the head-mounted device, and therefore the position of the head-mounted device in the local world coordinates may be considered to be the position of the head of the wearer wearing the head-mounted device in the local world coordinates.

It should be noted that, the IMU information may also compensate for the 6DOF information of the head-mounted device in the local world coordinate system, so that the 6DOF information of the head-mounted device in the local world coordinate system is more accurate. Other ways of obtaining the position of the head-mounted device in the local world coordinate system may also be adopted in the embodiments of the present invention.

In one embodiment, the position of the handle matched with the head-mounted device in the local world coordinate system can be obtained by: such as the manner in which the position of the head-mounted device in the local world coordinate system is obtained.

In another embodiment, the position of the handle matched with the head-mounted device in the local world coordinate system can be obtained in other ways. Based on this, the manner of acquiring the position of the handle matched with the head-mounted device in the world coordinate system local to the head-mounted device includes the following S3110 and S3111:

s3110, determining the relative position between the handle and the head-mounted device according to the electromagnetic signal emitted by the handle matched with the head-mounted device and the inertia data of the handle.

In this embodiment, for each of the handles that mate with the head-mounted device, an electromagnetic transmitter is provided in the handle, along with the IMU. Wherein, the electromagnetic transmitter comprises three mutually perpendicular electromagnetic coils in the three-dimensional direction. An electromagnetic receiver is arranged in the head-mounted equipment. An electromagnetic transmitter disposed in the handle transmits an electromagnetic signal of a certain frequency. An electromagnetic receiver in the head-mounted device receives an electromagnetic signal transmitted by an electromagnetic transmitter provided in the handle. The headset processes the received electromagnetic signals to obtain 6DOF information of the handle relative to the electromagnetic receiver in the headset. In addition, the head-mounted device compensates the 6DOF information of the handle relative to the electromagnetic receiver in the head-mounted device through IMU information, namely inertial data, in the handle obtained by a wireless transmission module built in the head-mounted device, so that the more accurate 6DOF information of the handle relative to the electromagnetic receiver in the head-mounted device can be obtained.

Further, based on the transformation relationship of the electromagnetic receiver in the head-mounted device and the local world coordinate system of the head-mounted device, 6DOF information of the handle relative to the head-mounted device can be obtained. Still further, the headset may derive the relative position between the handle and the headset based on the 6DOF information of the handle relative to the headset.

S3111, determining the position of the handle in the local world coordinate system according to the relative position and the position of the head-mounted device in the local world coordinate system.

In an embodiment, the position of the handle in the local world coordinate system can be obtained by superimposing the relative position obtained in S3110 on the position of the head-mounted device in the local world coordinate system.

It will be appreciated that the handle is relatively stationary with respect to the hand of the wearer wearing the head-mounted device, and therefore the position of the handle in the local world coordinates may be the position of the corresponding hand of the wearer considered to be wearing the head-mounted device in the local world coordinates.

In this embodiment, the manner of obtaining the positions of the feet of the wearer wearing the head-mounted device in the local world coordinate system of the head-mounted device may be as follows S3220:

s3220, estimating positions of the feet of the wearer wearing the head-mounted equipment in the local world coordinate system according to the positions of the head-mounted equipment in the local world coordinate system.

In this embodiment, the head can be driven to move by the movement of the feet, so that there is a relationship between the movement of the feet and the movement of the head. Based on this, the motion of both feet can be estimated based on the motion of the head. The estimation may use a motion estimation engine, such as the estimation method used in Unity3D or unregeal.

It should be noted that other ways may also be adopted in the embodiments of the present invention to obtain the positions of both feet of the wearer wearing the head-mounted device in the local world coordinate system.

And S3200, determining the position of the virtual character matched with the wearer in the local world coordinate system according to the acquired position.

In this embodiment, the head mount device fuses the positions obtained in S3100 to obtain a virtual character 1:1 with the wearer. It will be appreciated that the virtual character is represented by the position of a plurality of feature points capable of outlining the virtual character in the local world coordinate system.

And S3300, converting the position of the virtual character in the local world coordinate system to a position in a reference world coordinate system, and sending the position to other head-mounted equipment in the interactive system.

In the embodiment, the reference world coordinate system is the world coordinate system of the reference device in the interactive system. The reference device in the interactive system may be specified by the user and may be a default one of the head-mounted devices.

In this embodiment, when the head-mounted device executing the image processing method provided by the embodiment of the present invention is a reference device, the position of the virtual character in the local world coordinate system is the position of the virtual character in the reference world coordinate system. The position of the virtual character in the local world coordinate system refers to a coordinate position of a feature point constituting the virtual character in the local world coordinate system.

In one embodiment, the user inputs an instruction to the head-mounted device, and the head-mounted device determines whether itself is the reference device according to the instruction. In this embodiment, specific contents of the input instruction are not limited.

In another embodiment, the identification of whether the headset is the reference device is stored in the headset, and the headset can determine whether the headset is the reference device according to the identification.

In the case where the head-mounted device performing the image processing method provided by the embodiment of the present invention is not the reference device, the conversion in S3300 described above may be implemented by the following 3310:

s3310, convert the position of the virtual character in the local world coordinate system to the position in the reference world coordinate system using the first conversion relationship, the second conversion relationship, and the third conversion relationship.

The first conversion relation is the conversion relation between the local world coordinate system of the head-mounted equipment and the local camera coordinate system of the head-mounted equipment, the second conversion relation is the conversion relation between the local camera coordinate system of the head-mounted equipment and the reference camera coordinate system, and the third conversion relation is the conversion relation between the reference camera coordinate system and the reference world coordinate system.

In the present embodiment, the reference camera coordinate system is the camera coordinate system of the reference device. Through the first conversion relation, the virtual character can be converted to the position of the head-mounted device under the local world coordinate system and the local camera coordinate system. Further, the position of the virtual character in the camera coordinate system local to the head-mounted device can be converted to the position in the reference camera coordinate system by the second conversion relationship. Still further, the position of the virtual character in the reference camera coordinate system can be converted to the position in the reference world coordinate system through the third conversion relationship. In this way, the local world coordinate system of the head-mounted device can be converted into the reference world coordinate system, and the virtual character can be displayed in the world coordinate system of the unified reference device.

And S3400, receiving the position of the corresponding virtual character in the reference world coordinate system from other head-mounted equipment in the interactive system.

In this embodiment, the other head-mounted devices in the interactive system simultaneously execute the steps of S3100 to S3300, so that the head-mounted device executing the image processing method according to the embodiment of the present invention can obtain the positions of the virtual characters matched with the wearers of the other head-mounted devices in the interactive system in the reference coordinate system. The positions of the virtual characters matched with the wearers of other head-mounted devices in the interactive system under the same coordinate system can be obtained.

And S3500, rendering the virtual scene according to the position of the corresponding virtual character in the reference coordinate system.

In the embodiment of the invention, the head-mounted device renders the virtual scene of the received virtual character at the position under the reference coordinate system, and the virtual characters of a plurality of wearers 1:1 can be displayed. I.e. the behaviour of the other wearer can be observed in the virtual scene in real time. Thus, multi-person interaction can be realized.

In one example, corresponding to the game scene shown in fig. 1, with the image processing method provided by the present embodiment, as shown in fig. 5, a user can only wear a head-mounted device and a handheld simulation gun, and multi-user interaction can be achieved.

On the basis of the above S3310, the image processing method provided by the embodiment of the present invention further includes a step of acquiring the first conversion relationship. This step includes the following S3311 and S3312:

s3311, for the historical frame image collected by the head-mounted device, acquiring positions of a second preset number of first feature points in an image coordinate system of the historical frame image and positions in a local world coordinate system of the head-mounted device.

S3312, determining a first conversion relation according to the positions of the first feature points of the second preset number in the image coordinate system of the historical frame image and the positions in the world coordinate system of the local head-mounted device.

In the embodiment of the invention, the head-mounted device collects images in advance and records the images as historical frame images. And then the head-mounted equipment extracts the first feature points in the historical frame images, and randomly selects a second preset number of first feature points from the extracted first feature points. Meanwhile, the head-mounted device obtains the position of each first feature point in a second preset number of first feature points in a local world coordinate system of the head-mounted device based on the 6DOF module arranged in the head-mounted device.

Meanwhile, for each extracted first feature point and the corresponding historical frame image, the position of each first feature point in the image coordinate system of the corresponding historical frame image can be obtained.

Based on the obtained positions of the second preset number of first feature points in the image coordinate system of the historical frame image and the positions of the first feature points in the local world coordinate system of the head-mounted device, a conversion relation between the local camera coordinate system of the head-mounted device and the local world coordinate system of the head-mounted device, namely the first conversion relation, can be obtained by using a PNP algorithm.

It should be noted that the second predetermined number can be set empirically.

On the basis of the above S3310, the image processing method provided by the embodiment of the present invention further includes a step of acquiring a second conversion relationship. The steps include the following S3313-S3315:

s3313, a feature vector of each first feature point in the history frame image acquired by the head-mounted device and a feature vector of each second feature point in the setting space image acquired by the reference device are acquired, and the history frame image belongs to the setting space image.

Wherein, the historical frame image belongs to the set space image.

In the present embodiment, the setting space image is a multi-frame image obtained by scanning the entire setting space by the reference device. Wherein, the setting space refers to a space in which the interactive system operates. For example, a user plays a game in one room through an interactive system. The set space is the room. The historical frame image collected by the head-mounted device is an image collected by the head-mounted device at a certain position of the set space.

Based on the above S3313, the image processing method provided by the embodiment of the present invention further includes a step of determining a feature vector of each first feature point in the historical frame image acquired by the head-mounted device. This step can be implemented by S3313-1 as follows:

s3313-1, in the historical frame image, determining a feature vector corresponding to the first feature point according to the gray value of the pixel of the first feature point and the gray value of the pixel in the neighborhood of the first feature point.

In one embodiment, the neighborhood of the first feature point may be a 5 x 5 window region centered on the first feature point. Of course, other embodiments are also possible, and this embodiment is not limited thereto.

Calculating the difference between the gray value of each pixel in the neighborhood and the gray value of the first characteristic point for each first characteristic point in the historical image frame; extracting the maximum difference value of all the obtained difference values; and normalizing each obtained difference value by using the maximum difference value to obtain a feature vector corresponding to the first feature point.

Meanwhile, based on the above S3313, the image processing method according to the embodiment of the present invention further includes a step of obtaining a feature vector of each second feature point in the set space image collected by the reference device. In this embodiment, the feature vector of each second feature point in the set spatial image acquired by the reference device is calculated by the reference device and sent to the head-mounted device. The sending comprises the step that the reference equipment directly sends the feature vectors to the head-mounted equipment, or the reference equipment uploads the feature vectors of each second feature point in the set space image acquired by the reference equipment to the cloud; based on this, the head-mounted device acquires from the cloud again.

S3314, determining a first preset number of pairs of feature points according to the feature vector of the first feature point and the feature vector of the second feature point, where a pair of pairs of feature points consists of one first feature point and a matched second feature point.

Wherein, a pair of characteristic point pairs is composed of a first characteristic point and a matched second characteristic point.

In this embodiment, feature point matching is performed on the feature vector of each first feature point in the history frame image acquired by the head-mounted device and the feature vector of each second feature point in the setting space image acquired by the reference device to obtain a first preset number of best-matching feature point pairs. It is understood that the first feature point and the second feature point in a pair of feature points are the same point in space.

Wherein the first predetermined number may be set empirically. In one embodiment, the first preset number may be set to 50. For the feature point matching, any of matching methods such as hamming distance matching, KNN matching, RANSAC matching, and the like can be used.

S3315, determine a second transformation relationship according to the position of the first feature point in each feature point pair in the local camera coordinate system of the head-mounted device and the position of the corresponding second feature point in the reference camera coordinate system.

In this embodiment, the position of the first feature point in each feature point pair in the camera coordinate system local to the head-mounted device is determined in the following manner: and determining the position of the corresponding first characteristic point in the local camera coordinate system of the head-mounted equipment by utilizing the perspective projection relation according to the position of the corresponding first characteristic point in the image coordinate system of the historical frame image.

In this embodiment, for the position of the first feature point in each feature point pair in the local camera coordinate system of the head-mounted device and the position of the corresponding second feature point in the reference camera coordinate system, the second transformation relationship can be obtained according to the relationship between the stereoscopic geometric imaging and the perspective projection of the camera.

According to the relationship between the camera solid geometry imaging and the perspective projection, the following relationship exists between the position of the first feature point in each feature point pair in the camera coordinate system local to the head-mounted device and the position of the corresponding second feature point in the reference camera coordinate system:

wherein, the delta T is the conversion relation between the reference camera coordinate system and the local camera coordinate system of the head-mounted equipment;

for each feature point pair, the position of the first feature point in the camera coordinate system local to the head-mounted device;

is the position of the corresponding second feature point in the reference camera coordinate system.

Based on the above formula, Δ T can be obtained. Further, based on Δ T, Δ T can be obtained^-1I.e. the second conversion relation.

On the basis of the above S3310, the image processing method provided by the embodiment of the present invention further includes a step of acquiring a third conversion relationship. This step includes the following S3316 and S3317:

s3316, for the set spatial image acquired by the reference device, obtain the positions of the second feature points of the third preset number in the image coordinate system and the positions in the reference world coordinate system.

S3317, determining a third conversion relation according to the positions of the second feature points of the third preset number in the image coordinate system and the positions in the reference world coordinate system.

In this embodiment, the third preset number may be set empirically. And the position in the image coordinate system of the second feature point in S3316 and the position in the reference world coordinate system are calculated by the reference device and transmitted to the head-mounted device. Wherein the transmitting comprises transmitting directly by the reference device to the head-mounted device; or the reference device uploads the position of the second feature point in the image coordinate system of the S3316 and the position of the second feature point in the reference world coordinate system to the cloud; based on this, the head-mounted device acquires from the cloud again.

In the present embodiment, the specific implementation of S3317 is similar to the specific implementation of S3312. Therefore, the description of the above S3317 is omitted in this embodiment.

On the basis of any of the above embodiments, the image processing method provided by the embodiment of the present invention further includes the following steps S3510-S3512:

s3510, images of the eyes and the mouth of the wearer are acquired.

In the embodiment, as shown in fig. 4, the bottom of the left-eye display screen and the right-eye display screen of the head-mounted device is provided with the cameras C5 and C6. The camera C5 is used to capture images of the left eye of the wearer and the camera C6 is used to capture images of the right eye of the wearer.

And a camera C7 for capturing the mouth image of the wearer is arranged near the contact nose tip position of the head-mounted display device. The camera C7 is used to acquire images of the wearer's mouth.

S3511, determining the positions of the eyes and the mouth under a local world coordinate system according to the images of the eyes and the mouth of the wearer.

In this embodiment, the implementation manner of S3511 may be:

s3511-1, respectively obtaining internal references of the corresponding imaging devices from the above-mentioned C5, C6, and C7 by a conventional zhangying friend calibration method, and recording as: k1[ 3X 3], K2[ 3X 3], K3[ 3X 3 ].

S3511-2, human eye samples are collected. The human eye samples contain images of human eyes of different ages, different sexes, with and without glasses. One human eye image comprises a left eye image and a right eye image. Each of the left-eye image and the right-eye image in each human eye sample is marked with 16 marking points. The 16 marked points can outline the human eye shape. And training to obtain an eye tracking model by using the human eye sample. As shown in fig. 6, the 16 annotation points for the left eye include 10 annotation points capable of outlining the eyeball of the left eye, and 6 annotation points capable of outlining the shape of the left eye are formed. Correspondingly, the 16 marking points of the right eye comprise 10 marking points capable of outlining the eyeball of the right eye, and 6 marking points capable of outlining the shape of the right eye are formed.

S3511-3, the images of the two eyes obtained based on the S3510 are input into the eye tracking model obtained based on the S3511-2 training, and 32 marking points of the left eye and the right eye corresponding to the images of the two eyes obtained based on the S3510 are obtained. Further, the image coordinates of the 32 labeling points in the image of the both eyes obtained in S3510 are obtained.

S3511-4, normalization processing of the image coordinates of the obtained 32 marking points in the image of the eyes obtained in the S3510.

Specifically, for the 16 annotation points for the left eye, the coordinate average of the image coordinates of the 10 annotation points corresponding to the eye eyes of the left eye in the image of the eyes obtained in S3510 above is determined. And subtracting each marking point of the 16 marking points of the left eye from the corresponding average value to obtain the positions of the 16 marking points of the left eye after normalization.

Similarly, for the 16 annotation points for the right eye, the normalized mode of the 16 annotation points for the left eye is also adopted to obtain the positions of the 16 annotation points for the right eye after normalization.

S3511-5, collecting a mouth sample. Mouth samples include mouth images of different morphologies (e.g., mouth open, mouth closed, mouth beeping, etc.), different genders, presence or absence of a mustache, different mustache shapes. 10 marked points of the mouth shape are marked in each mouth sample. And training to obtain a mouth tracking model by using the mouth sample. As shown in fig. 7, the 10 marked points in each mouth sample can outline the shape of the mouth.

S3511-6, the mouth image obtained based on the S3510 is input into the mouth tracking model obtained based on the S3511-5, and 10 marking points corresponding to the mouth shape in the mouth image are obtained. Further, the image coordinates of the 10 marking points in the mouth image obtained in S3510 are obtained.

S3511-7, normalization processing is performed on the image coordinates of the obtained 10 marking points in the mouth image obtained in the S3510.

Specifically, the average value of the image coordinates of the obtained 10 labeling points in the mouth image obtained in S3510 is determined. Subtracting the average value from the image coordinates of each of the obtained 10 labeling points in the mouth image obtained in the step S3510 to obtain the positions of the normalized 10 labeling points of the mouth.

S3511-8, obtaining three-dimensional positions corresponding to the world coordinate system of the head-mounted device based on image coordinates, camera imaging perspective principle and perspective relation of the normalized 42 annotation points of the left eye, the right eye and the mouth on the corresponding images.

Specifically, the above-mentioned S3511-8 is implemented by the following formula:

k is an internal parameter of the camera device corresponding to the marking point; (x, y) is the normalized image coordinate of the corresponding annotation point; (X, Y, Z) is the three-dimensional position of the corresponding annotation point in a world coordinate system local to the head-mounted device. z is the average position of the camera from the face and can be set according to practical experience. Exemplary z is 15 cm.

It should be noted that, the normalized image coordinates of the marking points 42 of the left eye, the right eye and the mouth on the corresponding image may also be subjected to filtering processing through the image coordinate system of the corresponding marking points in the historical frame image. In this way, jitter of the mark point can be prevented.

S3512, updating the virtual character according to the positions of the eyes and the mouth in the local world coordinate system.

In this embodiment, the virtual character is updated based on the positions of the eyes and mouth in the local world coordinate system so that the eyes and mouth of the virtual character are consistent with the eyes and mouth of the wearer. Since the expression of the human face is mainly expressed by the mouth and both eyes, the virtual character has a facial expression in accordance with the wearer on the basis of the above S3512.

In the embodiment, the virtual character has the facial expression consistent with the wearer, so that in multi-person interaction, each wearer can know the facial expressions of other wearers, and the immersion of the wearer in the virtual scene is improved.

On the basis of any of the above embodiments, the image processing method provided by the embodiment of the present invention further includes the following steps S3610 to S3612:

s3610, the image of the hand of the wearer is acquired.

In the present embodiment, since the wearer performs a series of interactions by holding the handle in the hand in the multi-person interaction, both hands of the wearer may appear in the visual field range of any of the 4 imaging devices provided on the head-mounted device that track the external environment. Based on this, the image of the external environment acquired by any of the 4 imaging devices tracking the external environment provided on the head mounted apparatus is taken as the image of the hand of the wearer.

S3611, determining the positions of the bone points of the two hands of the wearer in the local world coordinate system according to the hand images.

In the present embodiment, first, training of a hand detection model is performed. Based on this, the image of the hand is input to the hand detection model obtained by training, and then the area where the hand is located is output.

Then, training of a detection model of the two-dimensional bone points is carried out. Based on this, the image of the region where the hand is located is input to the trained detection model of the two-dimensional skeleton points, and then the position of the skeleton point of the hand in the image of the hand is obtained.

And finally, obtaining the position of the skeleton point of the hand under the local world coordinate system based on the position of the skeleton point of the hand in the image of the hand.

S3612, updating the virtual character according to the position of the skeleton point in the local world coordinate system.

In the embodiment, the virtual character is updated according to the positions of the skeletal points in the local world coordinate system, so that the gestures of the two hands of the virtual character are consistent with the gestures of the wearer. Therefore, on the basis of S3612 described above, the virtual character has a gesture in accordance with the wearer.

In the embodiment, the virtual character has the gesture consistent with the gesture of the wearer, so that in multi-person interaction, each wearer can know the gestures of other wearers, accordingly, a scene of bare-hand interaction can be realized, and meanwhile, the immersion of the wearer in the virtual scene is improved.

< apparatus embodiment >

As shown in fig. 8, an embodiment of the present invention further provides an image processing apparatus 80. The apparatus 80 comprises an acquisition module 81, a determination module 82, a conversion module 83, a receiving module 84 and a rendering module 85. Wherein:

an obtaining module 81, configured to obtain positions of the head-mounted device, a handle matched with the head-mounted device, and two feet of a wearer wearing the head-mounted device in a local world coordinate system of the head-mounted device, respectively;

a determining module 82, configured to determine, according to the obtained location, a location of a virtual character matched with the wearer in the local world coordinate system;

the conversion module 83 is configured to convert the position of the virtual character in the local world coordinate system to a position in a reference world coordinate system, and send the position to other head-mounted devices in the interactive system;

a receiving module 84, configured to receive, from other head-mounted devices in the interactive system, positions of corresponding virtual characters in the reference world coordinate system;

and the rendering module 85 is configured to render a virtual scene according to the position of the corresponding virtual character in the reference coordinate system.

In one embodiment, the obtaining module 81 is specifically configured to: and estimating the positions of the feet of the wearer wearing the head-mounted equipment in the local world coordinate system according to the positions of the head-mounted equipment in the local world coordinate system.

In one embodiment, the obtaining module 81 is specifically configured to: determining a relative position between a handle matched with the head-mounted equipment and the head-mounted equipment according to an electromagnetic signal emitted by the handle and inertial data of the handle;

In one embodiment, the conversion module 83 is specifically configured to: converting the position of the virtual character in the local world coordinate system to the position in the reference world coordinate system by using a first conversion relation, a second conversion relation and a third conversion relation;

In one embodiment, the conversion module 83 further includes an obtaining unit, and the obtaining unit is configured to:

In one embodiment, the image processing apparatus 80 further comprises a first updating module, the first updating module is further configured to:

acquiring an image of the wearer's hand;

In one embodiment, the image processing apparatus 80 further comprises a second updating module, the second updating module further configured to:

acquiring images of the eyes and the mouth of the wearer;

< apparatus embodiment >

As shown in fig. 9, an embodiment of the present invention further provides a head-mounted device 90, which includes the image processing apparatus 80 shown in fig. 8.

Alternatively, memory 91 and processor 92 are included; wherein:

the memory 91 is adapted to store executable instructions for controlling the processor 92 to perform the image processing method according to any of the above method embodiments.

In this embodiment, the head-mounted device 92 may be a VR head-mounted display device, and may also be an MR head-mounted display device.

< storage Medium embodiment >

An embodiment of the present invention further provides a computer-readable storage medium, where computer instructions are stored, and when the computer instructions in the storage medium are executed by a processor, the image processing method according to any one of the above method embodiments is implemented.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, by software, and by a combination of software and hardware are equivalent.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. An image processing method applied to any head-mounted device in an interactive system comprises the following steps:

2. The method of claim 1, wherein the obtaining the positions of the wearer's feet in a world coordinate system local to the head-mounted device comprises:

3. The method of claim 1, wherein the obtaining a position of a handle matched to the head-mounted device in a world coordinate system local to the head-mounted device comprises:

4. The method of claim 1, wherein converting the position of the virtual character in the local world coordinate system to a position in a reference world coordinate system comprises:

5. The method of claim 4, further comprising the step of obtaining the second translation relationship, comprising:

6. The method of claim 1, further comprising:

acquiring an image of the wearer's hand;

7. The method of claim 1, further comprising:

acquiring images of the eyes and the mouth of the wearer;

8. An image processing apparatus characterized by comprising:

9. A headset comprising the apparatus of claim 8;

or, a memory and a processor, wherein:

the memory is to store executable instructions to control the processor to perform the method of any one of claims 1-7.

10. A computer-readable storage medium, wherein the storage medium stores computer instructions, which when executed by a processor, implement the method of any one of claims 1-7.