CN110119194A

CN110119194A - Virtual scene processing method, device, interactive system, head-wearing display device, visual interactive device and computer-readable medium

Info

Publication number: CN110119194A
Application number: CN201810119323.0A
Authority: CN
Inventors: 胡永涛; 戴景文; 贺杰
Original assignee: Guangdong Virtual Reality Technology Co Ltd
Current assignee: Guangdong Virtual Reality Technology Co Ltd
Priority date: 2018-02-06
Filing date: 2018-02-06
Publication date: 2019-08-13

Abstract

The embodiment of the invention provides a kind of virtual scene processing method, device, interactive system, head-wearing display device, visual interactive device and computer-readable mediums, belong to technical field of image processing, which comprises the processor obtains the target image with the visual interactive device of described image acquisition device acquisition；Position and the rotation information between the visual interactive device and described image acquisition device are determined according to the target image；Virtual scene corresponding with the visual interactive device is determined according to the position and rotation information；The display content is sent to the display device, indicates that the display device shows the display content.The display content is superimposed with extraneous real scene, to be seen by human eye, realizes the effect of augmented reality.

Description

Virtual scene processing method, device, interactive system, head-mounted display device, visual interactive device and computer readable medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a virtual scene processing method, a virtual scene processing apparatus, a virtual scene processing interactive system, a head-mounted display apparatus, a visual interactive apparatus, and a computer-readable medium.

Background

In recent years, with the progress of science and technology, technologies such as Augmented Reality (AR) and Virtual Reality (VR) have become hot spots of research at home and abroad. Taking augmented reality as an example, augmented reality is a technique for increasing the user's perception of the real world through information provided by a computer system, which overlays computer-generated virtual objects, scenes, or system cues into a real scene to enhance or modify the perception of the real world environment or data representing the real world environment.

In interactive systems such as virtual reality systems and augmented reality systems, a target object needs to be identified and tracked. The existing identification tracking method is usually realized by adopting a magnetic sensor, an optical sensor, ultrasonic waves, an inertial sensor, target object image processing and the like, but the identification tracking method is usually not ideal in identification tracking effect, for example, the magnetic sensor, the optical sensor, the ultrasonic waves and the like are generally greatly influenced by the environment, the inertial sensor has extremely high precision requirement, a brand new identification method is urgently needed in the market to realize low-cost and high-precision interaction, and a perfect and effective solution is needed as an important technology for identification tracking in processing of the target object image.

Disclosure of Invention

The invention provides a virtual scene processing method, a virtual scene processing device, an interactive system, a head-mounted display device, a visual interactive device and a computer readable medium, which aim to overcome the defects.

In a first aspect, an embodiment of the present invention provides a virtual scene processing method, which is applied to an interactive system, where the system includes a visual interaction device and an image acquisition device. The method comprises the following steps: acquiring a target image with a marker acquired by the image acquisition device, wherein the visual interaction device is positioned in a real scene; determining attitude information of the visual interaction device in the real scene according to the target image; and determining a virtual scene corresponding to the visual interaction device according to the attitude information.

In a second aspect, an embodiment of the present invention further provides a virtual scene processing apparatus, including: the device comprises an acquisition unit, a first calculation unit and a second calculation unit. The acquisition unit is used for acquiring a target image with a marker acquired by the image acquisition device, and the visual interaction device is positioned in a real scene; a first computing unit, configured to determine, according to the target image, pose information of the visual interaction apparatus within the real scene; and the second computing unit is used for determining a virtual scene corresponding to the visual interaction device according to the posture information.

In a third aspect, an embodiment of the present invention further provides an interactive system, including: the system comprises a head-mounted display device and a visual interaction device with a marker, wherein the head-mounted display device is used for acquiring a target image with the marker, and the visual interaction device is positioned in a real scene; determining attitude information of the visual interaction device in the real scene according to the target image; and determining a virtual scene corresponding to the visual interaction device according to the attitude information.

In a fourth aspect, an embodiment of the present invention further provides a head-mounted display apparatus, which is applied to an interactive system, where the interactive system includes a visual interaction apparatus, and the head-mounted display apparatus includes: display device, image acquisition device and optical display device. The image acquisition device is used for acquiring a scene image with a real scene and sending the scene image to a processor, wherein the scene image comprises a target image with a visual interaction device positioned in the real scene. The processor is used for determining the posture information of the visual interaction device in the real scene according to the target image, and determining a virtual scene corresponding to the visual interaction device according to the posture information, and the display device is used for displaying the virtual scene.

In a fifth aspect, an embodiment of the present invention further provides a visual interaction apparatus, which is applied to an interaction system, where the interaction system further includes a head-mounted display apparatus; the visual interaction device is provided with markers: the marker is used for determining the posture information of the visual interaction device in the real scene for the head-mounted display device, so that the head-mounted display device determines the virtual scene according to the posture information.

In a sixth aspect, the present invention also provides a computer readable medium having a program code executable by a processor, where the program code causes the processor to execute the above method.

According to the virtual scene processing method, the virtual scene processing device, the interactive system, the head-mounted display device, the visual interaction device and the computer readable medium, after the marker target image of the visual interaction device acquired by the image acquisition device is acquired, the posture information of the visual interaction device in the real scene is determined according to the target image. And then acquiring a virtual scene corresponding to the attitude information, so that the virtual scene is seen by human eyes to realize the visual effect of augmented reality or virtual reality.

Additional features and advantages of embodiments of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of embodiments of the invention. The objectives and other advantages of the embodiments of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating an identification tracking system according to an embodiment of the present invention;

FIG. 2 shows a schematic view of a marker provided by an embodiment of the present invention;

FIG. 3 shows a schematic view of a marker provided by another embodiment of the present invention;

fig. 4 is a flowchart illustrating a method of processing a virtual scene according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a visual interaction device provided by a first embodiment of the invention;

FIG. 6 is a schematic diagram of another perspective of the visual interaction device shown in FIG. 5;

fig. 7 is a schematic structural diagram of a visual interaction device provided by a second embodiment of the invention;

FIG. 8 shows a schematic view of a marker provided in accordance with another embodiment of the present invention;

FIG. 9 is a schematic diagram illustrating a user's field of view provided by an embodiment of the present invention;

FIG. 10 is a diagram illustrating a virtual scene provided by an embodiment of the invention;

FIG. 11 is a schematic diagram illustrating a change in a virtual scene provided by an embodiment of the invention;

FIG. 12 is a diagram illustrating a change in a virtual scene according to another embodiment of the present invention

Fig. 13 shows a block diagram of a virtual scene processing apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

Referring to fig. 1, a recognition tracking system according to an embodiment of the present invention is shown. The recognition tracking system 10 includes a head mounted display device 100 and a visual interaction device.

The visual interaction device comprises a first background and at least one marker distributed over the first background according to a certain rule. The marker comprises a second background and a plurality of sub-markers distributed on the second background according to a specific rule, wherein each sub-marker has one or more characteristic points. The first background and the second background have a certain degree of distinction, for example, the first background may be black, and the second background may be white. In the present embodiment, since the distribution rule of the sub-markers in each marker is different, the images corresponding to each marker are different from each other.

The sub-marker is a pattern with a certain shape, and the color of the sub-marker has a certain degree of distinction from the second background in the marker, for example, the second background is white, and the color of the sub-marker is black. The sub-markers may be formed by one or more feature points, and the shape of the feature points is not limited, and may be dots, circles, triangles or other shapes.

In one embodiment, as shown in fig. 2, the marker 210 includes a plurality of sub-markers 220 therein, and each sub-marker 220 is composed of one or more feature points 221, and each white circular pattern in fig. 2 is a feature point 221. The outline of the marker 210 is rectangular, but the shape of the marker may be other shapes, and is not limited herein, and in fig. 2, a white area of a rectangle and a plurality of sub markers in the white area constitute one marker.

As another embodiment, as shown in FIG. 3, a plurality of sub-markers 340 are included in the marker 310, and each sub-marker 340 is composed of one or more feature points 341; wherein a plurality of black dots form a sub-marker 340. Specifically, in fig. 3, each white circle pattern and each black circle pattern are feature points 341.

In particular, a visual interaction device includes a planar marker object and a multi-faceted marker structure. The planar marking object includes a first marking board 200 and a second marking board 500, and the multi-surface marking structure includes six-surface marking structures 400 and twenty-six-surface marking structures 300, but other numbers of marking structures are also possible, which are not listed here.

The first marking board 200 is provided with a plurality of markers, the contents of the plurality of markers are different from each other, and the plurality of markers on the first marking board 200 are arranged on the same plane, that is, the first marking board 200 is provided with a marking surface, all the markers are arranged on the marking surface of the first marking board 200, and the feature points on the first marking board 200 are all on the marking surface; the second marker plate 500 is provided with one marker, all the feature points on the second marker plate 500 are on the marker surface, the number of the second marker plates 500 may be multiple, and the content of the marker of each second marker plate 500 is different from each other, and multiple second marker plates 500 may be used in combination, for example, in the application fields such as augmented reality, virtual reality, and the like corresponding to the recognition and tracking system 10.

The multi-faced marker structure includes a plurality of marker faces, and at least two of the non-coplanar marker faces have markers disposed thereon, as shown in fig. 1, the multi-faced marker structure includes six-sided marker structure 400 and twenty-six-sided marker structure 300, wherein the six-sided marker structure 400 includes 6 marker faces, and each marker face has a marker disposed thereon, and the patterns of the markers on each face are different from each other.

The twenty-six-face mark structure 300 includes twenty-six faces including 17 mark faces, and each mark face is provided with a mark, and the patterns of the marks on each face are different from each other. Of course, the total number of faces of the multi-face mark structure, the description of the mark faces, and the arrangement of the markers may be set according to actual use, and are not limited herein.

It should be noted that the visual interaction device is not limited to the above-mentioned planar marker object and the multi-surface marker structure, and the visual interaction device may be any carrier having a marker, and the carrier may be set according to an actual scene, such as a model gun like a toy gun or a game gun, and the corresponding marker is set on the visual interaction device like the model gun, and the position and rotation information of the model gun can be obtained by identifying and tracking the marker on the model gun, and a user can perform a game operation in a virtual scene by holding the model gun, thereby achieving an augmented reality effect.

Head mounted display device 100 includes a housing (not identified), an image capture device 110, a processor 140, a display device 120, an optical assembly 130, and an illumination device 150.

Wherein, the visual mileage camera 160, the display device 120 and the image acquisition device 110 are all electrically connected with the processor; in some embodiments, the illumination device 150 and the image capturing device 110 are disposed and covered in a housing by a filter (not labeled) that can filter ambient light and other interfering light, for example, if the illumination device 150 emits infrared light, the filter can be an element that filters light other than infrared light.

The image capturing device 110 is used for capturing an image of an object to be photographed and sending the image to the processor. Specifically, an image including at least one of the marker plate or the multi-surface marker structure is acquired and sent to a processor. In one embodiment, the image capturing device 110 is a monocular near-infrared imaging camera. In the present embodiment, the image capturing device 110 is a monocular camera that employs an infrared receiving method, and has low cost, no need of external reference between binocular cameras, low power consumption, and higher frame rate under the same bandwidth.

The processor 140 is configured to output corresponding display content to the display device 120 according to the image, and is further configured to perform operations of identifying and tracking the visual interaction device.

Processor 140 may include any suitable type of general or special purpose microprocessor, digital signal processor, or microcontroller. The processor 140 may be configured to receive data and/or signals from various components of the system via, for example, a network. The processor 140 may also process the data and/or signals to determine one or more operating conditions in the system. For example, when the processor 140 is applied to a head-mounted display device, the processor generates image data of a virtual world from pre-stored image data, transmits it to the display device, and displays it through the optical components; the image data sent by the intelligent terminal or the computer can be received through a wired or wireless network, the image of the virtual world is generated according to the received image data, and the image is displayed through the optical assembly; and the corresponding display content in the virtual world can be determined by carrying out identification tracking operation according to the image acquired by the image acquisition device, and the display content is sent to the display device and displayed through the optical assembly. It is understood that the processor 140 is not limited to being disposed within the head-mounted display device.

In some embodiments, the head-mounted display device 100 further includes a visual odometry camera 160 disposed on the housing, wherein the visual odometry camera 160 is electrically connected to the processor, and the visual odometry camera 160 is configured to capture a scene image of an external real scene and transmit the scene image to the processor. When the user wears the head-mounted display device 100, the processor acquires the position and rotation relationship between the head of the user and the real scene according to the scene image acquired by the visual mileage camera 160 and the visual mileage technology, specifically, the system obtains the specific position and direction changes through the image sequence acquired by the camera and the processing of feature extraction, feature matching and tracking and motion estimation, completes navigation and positioning, and further obtains the relative position and rotation relationship between the head-mounted display device and the real scene; and then, according to the position and rotation information of the visual interaction device relative to the head-mounted display device, the relative position and rotation relation between the visual interaction device and the real scene can be calculated, so that more complex interaction forms and experiences can be realized.

The display device 120 is used for displaying the display content. In some embodiments, the display device may be a part of the smart terminal, i.e., a display screen of the smart terminal, such as a display screen of a mobile phone and a tablet computer. In other embodiments, the display device may also be a stand-alone display (e.g., LED, OLED, or LCD), etc., where the display device is fixedly mounted on the housing.

When the display device 120 is a display screen of an intelligent terminal, a mounting structure for mounting the intelligent terminal is provided on the housing. When the intelligent terminal is used, the intelligent terminal is installed on the shell through the installation structure. The processor 140 may be a processor in the intelligent terminal, or may be a processor separately disposed in the housing and electrically connected to the intelligent terminal through a data line or a communication interface. In addition, when the display device 120 is a display device separated from a terminal device such as a smart terminal, it is fixedly mounted on the housing.

The optical assembly 130 is used for emitting the incident light emitted from the light emitting surface of the display device 120 to a predetermined position. Wherein the preset positions are observation positions of two eyes of the user.

The lighting device 150 is used for providing light for the image acquisition device 110 to acquire an image of an object to be photographed. Specifically, the illumination angle of the illumination device 150 and the number of the illumination devices 150 may be set according to actual use so that the emitted illumination light can cover the object to be photographed. The illuminating device 150 is an infrared illuminating device capable of emitting infrared light, and the image capturing device is a near-infrared camera capable of receiving infrared light. By means of active illumination, the image quality of the target image captured by the image capturing device 110 is improved, and specifically, the number of the illumination devices 150 is not limited, and may be one or multiple. In some embodiments, the illumination device 150 is disposed adjacent to the image capture device 110, wherein a plurality of illumination devices 150 may be circumferentially disposed adjacent to a camera of the image capture device 110.

When a user wears the head-mounted display device 100 and enters a preset virtual scene, and when the visual interaction device is in the visual field range of the image acquisition device 110, the image acquisition device 110 acquires a target image containing the visual interaction device; the processor 140 obtains the target image and the related information, calculates and identifies the visual interaction device, obtains the position and rotation relationship between the marker in the target image and the image acquisition device, and further obtains the position and rotation relationship of the visual interaction device relative to the head-mounted display device, so that the virtual scene viewed by the user is at the corresponding position and rotation angle; the user can further generate a new virtual image in the virtual scene through the combination of a plurality of visual interaction devices, so that a better experience effect is brought to the user; the user can also realize the interaction with the virtual scene through the visual interaction device; in addition, the recognition tracking system can acquire the position and rotation relation between the head-mounted display device and the real scene through the visual mileage camera, further acquire the position and rotation relation between the visual interaction device and the real scene, and construct a virtual scene similar to the real scene when the virtual scene corresponding to the visual interaction device has a certain corresponding relation with the real scene, so that the more real augmented reality experience can be improved.

In view of the above recognition and tracking system applicable to the virtual reality system and the augmented reality system, an embodiment of the present invention provides an image processing method for tracking and positioning a visual interaction device and implementing an augmented reality visual effect, and specifically, referring to fig. 4, a virtual scene processing method is shown. The method is applied to an interactive system, where the interactive system may be a virtual reality system or an augmented reality system, and in this embodiment, the interactive system may be the recognition and tracking system 10 shown in fig. 1, and the processor is used as an execution subject, and the method includes: s401 to S403.

S401: and acquiring a target image with a marker acquired by the image acquisition device, wherein the visual interaction device is positioned in a real scene.

Wherein the visual interaction device is located in a real scene. The target image is an image which is acquired by the image acquisition device and is provided with the visual interaction device, and the target image comprises the marker of the visual interaction device.

As an embodiment, the visual interaction means may be the multi-faceted marker structure described above. Referring to fig. 5 and 6, the visual interaction device has a marker 101 to allow an external image capturing device, such as the image capturing device 110 described above, to identify and track.

In the present embodiment, the visual interaction device includes a device body 10 and a handle 30 connected to the device body 10. In some embodiments, the handle 30 is provided with a connection portion (not shown) to which the device body 10 is connected.

The device main body 10 is provided with a marker 101, and an external image acquisition device acquires information carried by the visual interaction device by acquiring an image with the marker 101, so as to acquire identity information and position and posture information of the visual interaction device, thereby realizing identification or/and tracking of the visual interaction device. The marker is used for determining the posture information of the visual interaction device in the real scene for the head-mounted display device, so that the head-mounted display device determines the virtual scene according to the posture information.

The specific configuration of the apparatus body 10 is not limited. In the embodiment shown in fig. 4 to 5 in particular, the device body 10 is a twenty-hexahedron including eighteen square faces and eight triangular faces.

Further, the device body 10 includes a first surface 12 and a second surface 14, the second surface 14 being non-coplanar with the first surface 12. Specifically, the normal direction of the first surface 12 is different from the normal direction of the second surface 12. The first surface 12 is provided with a first marker 121, and the second surface 14 is provided with a second marker 141 different from the first marker 121. By setting the first marker 121 and the second marker 141, the image capturing device recognizes either one or both of the first marker 121 and the second marker 141, and further confirms that the target object corresponding to the first marker 121 and the second marker 141 is the visual interaction device, and acquires position and posture information of the visual interaction device, so as to perform recognition and tracking on the visual interaction device.

It is understood that the positional relationship between the first surface 12 and the second surface 14 is not limited, for example, the first surface 12 and the second surface 14 may be disposed adjacently, or the first surface 12 and the second surface 14 may be disposed at an interval, or the first surface 12 and the second surface 14 may be any two of eighteen square surfaces and eight triangular surfaces, and is not limited to the description in this specification.

It will be appreciated that in other embodiments, the device body 10 may further include any one or more of a third surface, a fourth surface, a fifth surface … …, and a twenty-sixth surface (none shown), and accordingly, corresponding markers 101 may be disposed on these surfaces, and the markers 101 on the various surfaces may be different.

As another embodiment, the visual interaction device may also be the first indicia panel described above. Referring to fig. 7, fig. 7 is a schematic structural diagram of a first marking plate 200 according to an embodiment of the present invention. The first marking sheet 200 includes a base layer 240 and one or more markers 210 disposed on the base layer 240, wherein when the plurality of markers 210 is provided, the plurality of markers 210 are disposed on the base layer 240 in a scattered manner.

The base layer 240 may be made of a soft material, and the base layer 240 may also be made of a hard material. When the base layer 240 is made of a soft material, the base layer 240 may be made of cloth, plastic, or the like, when the base layer 240 is made of a hard material, the base layer 240 may be made of a metal material, an alloy material, or the like, and further, the base layer 240 may be provided with a folding portion so that the base layer 240 has a folding function to facilitate folding and housing of the first mark plate 200, specifically, as an embodiment, the first mark plate 200 is provided with two folding portions perpendicular to each other, the two folding portions equally divide the first mark plate 200 into four regions, and the four regions of the first mark plate 200 are folded by the two folding portions, so that the first mark plate 200 may be stacked into one region size.

The shape of the base layer 240 is not limited, and may be, for example, a circle, a triangle, a square, a rectangle, an irregular polygon, or the like. In one embodiment, the base layer 240 is square, and the size of the base layer 240 may be set differently according to actual needs, which is not limited herein.

S402: and determining the posture information of the visual interaction device in the real scene according to the target image.

The posture information of the visual interaction device in the real scene comprises information such as the position and the rotation angle of the visual interaction device in the real scene. Specifically, the posture information is position and rotation information between the visual interaction device and the image acquisition device. The number of the visual interaction devices in the acquired target image can be one or more. And when a plurality of visual interaction devices are arranged in the acquired target image, acquiring the attitude information between each visual interaction device and the image acquisition device in the target image.

Specifically, the visual interaction device is provided with one or more markers, each marker comprises a plurality of mutually separated sub-markers, each sub-marker comprises one or more feature points, and the manner of determining the posture information between the visual interaction device and the image acquisition device according to the target image is as follows: identifying the marker in the target image according to the enclosing relation among the plurality of connected domains in the binarized target image and the enclosing relation among the connected domains in the pre-stored plurality of preset marker models so as to obtain the pre-stored information of the marker; judging whether the marker in the target image is a planar marker or a three-dimensional marker according to the pre-stored information of the marker, wherein the planar marker is a marker arranged on a planar marker object, and the three-dimensional marker is a marker arranged on a multi-surface marker structure; if the marker is a planar marker or the marker is a three-dimensional marker and the three-dimensional marker belongs to the same plane, acquiring attitude information between the visual interaction device and the image acquisition device according to the characteristic points in the target image by adopting a planar positioning and tracking method; and if the marker is a three-dimensional marker and the three-dimensional marker does not belong to the same plane, acquiring attitude information between the visual interaction device and the image acquisition device according to the characteristic points in the target image by adopting a three-dimensional positioning and tracking method.

The image binarization mode is as follows: acquiring a first threshold image corresponding to a current frame target image except a first frame target image in continuous multi-frame target images, wherein the first threshold image is a gray image which is obtained by processing a previous frame target image and has the same resolution as the current frame target image; and for each pixel point of the current frame target image, taking the pixel point at the corresponding position in the first threshold value image as a binarization threshold value, and binarizing the current frame image.

The pre-stored information of the marker includes various information for identifying and tracking the visual interactive device. Such as physical coordinates of the marker; the visual interaction device provided with the marker is a plane marker or a multi-surface marker structure, the marker is arranged on the plane marker and is a plane marker, and the marker is arranged on the multi-surface marker structure and is a three-dimensional marker; whether a certain three-dimensional marker is arranged on one plane of the multi-face marker structure body or on different planes, usually, one three-dimensional marker arranged on different planes is arranged on two adjacent planes of the multi-face marker structure body which are not on the same plane; whether different three-dimensional markers are arranged on the same plane or not, and the like.

Therefore, whether each marker in the visual interaction device is a planar marker or a three-dimensional marker can be judged according to the pre-stored information.

Specifically, connected domains in the pre-stored preset marker models respectively include a first connected domain, a second connected domain, a third connected domain and a fourth connected domain, where an enclosing relationship between the connected domains in the pre-stored preset marker models includes: the first connected domain surrounds one or more second connected domains, the number of third connected domains surrounded by each second connected domain, and the number of fourth connected domains surrounded by each third connected domain.

The manner of identifying the marker in the target image is as follows:

determining a first communication domain, wherein the color of the first communication domain is a first color, the first communication domain surrounds a communication domain of a second color, and the first communication domain is not surrounded by the communication domain of the second color; determining a connected domain which is surrounded by the first connected domain and has a second color as a second connected domain; determining a connected domain which is surrounded by the second connected domain and has the color of the first color as a third connected domain; and determining the connected domain which is surrounded by the third connected domain and has the second color as a fourth connected domain.

And acquiring the number of the third connected domains surrounded in each second connected domain and the number of the fourth connected domains surrounded in each third connected domain.

And for each second connected domain in the target image, determining a corresponding second connected domain in the enclosing relation of the preset marker model, and taking the pre-stored information of the corresponding second connected domain as the pre-stored information of the second connected domain in the target image to obtain the pre-stored information of the marker corresponding to the second connected domain, wherein the mutually corresponding second connected domains are enclosed by the same number of third connected domains, and the number of fourth connected domains enclosed by each enclosed third connected domain corresponds to one another.

As shown in fig. 8, in the target image, a surrounding relationship is formed among the first background, the second background, and the mark pattern, that is, the above-described feature point. The first background surrounds the second background, the second background surrounds the mark figure, and if the mark figure is a hollow figure, the mark figure also surrounds a hollow part in the mark figure. That is, the connected domains corresponding to the first background, the second background and the mark pattern have a surrounding relationship, and the connected domain corresponding to the mark pattern and the connected domain corresponding to the hollow portion also have a surrounding relationship. Therefore, the enclosing relationship between connected domains in the target image can be acquired. The connected domain refers to an image region which is formed by pixel points with the same pixel value and adjacent positions in an image.

Specifically, the target image may be processed first, so as to be able to distinguish the connected domains corresponding to the first background, the second background, and the mark pattern from the target image. As a specific embodiment, the target image may be processed into a binarized image, wherein a binarization threshold value may be flexibly set according to the brightness and darkness characteristics of the markers, or an adaptive threshold value binarization method or the like may be used to process the parts between the markers in the target image and the marker pattern into a first color, and the parts of the markers except the sub-markers 220 are a second color. That is, the part corresponding to the first background in the target image is processed into the first color, the part corresponding to the mark figure is processed into the first color, the second background is processed into the second color, and if the mark figure is a hollow figure such as a circular ring, the hollow part is processed into the second color. The first color and the second color may be different colors with larger pixel values, for example, the first color is black, and the second color is white.

Accordingly, connected domains corresponding to the respective portions can be determined from the binarized image. Wherein the first background portion may be defined as a first connected domain, that is to say, a first connected domain is determined, wherein the determined first connected domain satisfies the following condition: the color is a first color, is surrounded by connected domains of a second color, and is not surrounded by connected domains of the second color. And determining the connected domain which is surrounded by the first connected domain and has the second color as the second connected domain, namely defining the second background part in the marker as the second connected domain. And determining the connected domain which is surrounded by the second connected domain and has the color of the first color as a third connected domain, namely defining the mutually connected mark patterns as the third connected domain. And determining the connected domain which is surrounded by the third connected domain and has the second color as the fourth connected domain, namely if the mark graph is an open graph as shown by a circular ring in the figure 2, the open part is defined as the fourth connected domain. The identity information of the marker can be identified according to the surrounding relation of the communication areas. According to the identity information, prestored information corresponding to the marker can be obtained.

If the marker is a planar marker or the marker is a three-dimensional marker and the three-dimensional marker belongs to the same plane, all the feature points in the target image are coplanar. I.e. all feature points lie in the same plane. Specifically, the target image may be an image of a marking surface of the planar marking object, which is acquired by the image acquisition device and contains the above-mentioned planar marking object; when the visual interaction device in the acquired image includes a multi-surface marker structure, the target image may also be an image including only one marker surface acquired from the multi-surface marker structure. And acquiring the attitude information between the visual interaction device and the image acquisition device according to the feature points in the target image by adopting a plane positioning and tracking method, specifically, acquiring the pixel coordinates of the feature points in the target image in an image coordinate system corresponding to the target image, and acquiring the attitude information between the image acquisition device and the visual interaction device according to the pixel coordinates of the feature points in the target image and the pre-acquired physical coordinates corresponding to the feature points. The physical coordinates are coordinates of the feature points acquired in advance in a physical coordinate system corresponding to the visual interaction device, and the physical coordinates of the feature points are real positions of the feature points on the corresponding visual interaction device. The physical coordinates of each feature point can be acquired in advance, specifically, a plurality of feature points and a plurality of markers are arranged on a marking surface of the visual interaction device, and a certain point on the marking surface is selected as an origin to establish a physical coordinate system. The marking surface is taken as an XOY plane of a physical coordinate system, and the origin of the XOY coordinate system is positioned in the marking surface.

The way of acquiring the physical coordinates is:

in an embodiment of the present invention, in order to correspond feature points in a target image to feature points in a physical coordinate system, a preset marker model is required, and specifically, before acquiring pose information between the image acquisition device and the visual interaction device according to pixel coordinates and physical coordinates of the feature points, the method further includes: determining model characteristic points corresponding to each characteristic point in a preset marker model; searching physical coordinates of each model characteristic point in the preset marker model in a physical coordinate system corresponding to the visual interaction device; and taking the physical coordinates of the model feature points corresponding to each feature point as the physical coordinates of the feature point in the physical coordinate system corresponding to the visual interaction device.

The preset marker model may be a virtual visual interaction device established according to distribution of each feature point on the visual interaction device, and the preset marker model includes a plurality of model feature points, and each model feature point corresponds to a physical coordinate in a physical coordinate system corresponding to the visual interaction device. In addition, the position of each model feature point corresponds to the position of one feature point on the visual interaction device.

And after the preset marker model is obtained, determining the model characteristic point corresponding to each characteristic point in the preset marker model. Specifically, each feature point is mapped into a coordinate system corresponding to the preset marker model, so as to obtain the coordinate of each feature point in the coordinate system corresponding to the preset marker model.

The pixel coordinates of the feature points in the target image and the coordinates of the coordinate system corresponding to the preset marker model have a mapping relation, and the coordinate values of the feature points in the coordinate system corresponding to the preset marker model can be acquired according to the mapping relation.

After the pixel coordinates and the physical coordinates of all the feature points in the target image are acquired, the position information between the image acquisition device and the marker is acquired according to the pixel coordinates and the physical coordinates of all the feature points in each marker, and specifically, the mapping parameters between the image coordinate system and the physical coordinate system are acquired according to the pixel coordinates and the physical coordinates of each feature point and the internal parameters of the image acquisition device acquired in advance.

Specifically, the relationship between the image coordinate system and the physical coordinate system is:

wherein, (u, v) is the pixel coordinate of the characteristic point in the image coordinate system of the target image, and (X, Y, Z) is the physical coordinate of the characteristic point in the physical coordinate system.

Is a camera matrix, or a matrix of intrinsic parameters, (cx, cy) being the center point of the image, (fx, fy) being the focal length in pixel units, which matrix is obtainable by a calibration operation of the image acquisition device, being a known quantity.

Wherein,the first three columns are rotation parameters and the fourth column is translation parameters. Definition ofFor the homography matrix H, the above equation (1) becomes:

therefore, by substituting the obtained pixel coordinates and physical coordinates of the plurality of feature points, and the internal parameters of the image capturing device into the above equation (2), H, which is a mapping parameter between the image coordinate system and the physical coordinate system, can be obtained.

And then, obtaining a rotation parameter and a translation parameter between a camera coordinate system of the image acquisition device and the physical coordinate system according to the mapping parameter, specifically, according to an SVD algorithm:

and performing singular value decomposition on the homography matrix H to obtain the following formula:

H＝UΛV^T(3)

two orthogonal matrices U and V and one diagonal matrix Λ can be obtained. Wherein the diagonal matrix Λ comprises singular values of the homography matrix H. Therefore, this diagonal matrix can also be regarded as a homography matrix H, and equation (3) above can be written as:

when the matrix H is decomposed into diagonal matrices, the rotation matrix R and the translation matrix T can be calculated. In particular, t_ΛCan be eliminated in the three vector equations separated by equation (4) above, since R_ΛBeing an orthogonal matrix, the parameters in the normal vector n can be solved linearly by a new set of equations that relate the parameters in the normal vector n to the singular values of the homography matrix H.

Through the decomposition algorithm, 8 different solutions of the three unknowns can be obtained, wherein the three unknowns are: { R_Λ，t_Λ，n_Λ}. Then, assuming the decomposition of matrix Λ is complete, we need only use the following expression in order to obtain the final decomposed element:

R＝UR_ΛV^T

t＝Ut_Λ(6)

n＝Vn_Λ

thus, R and T can be solved, where R is a rotation parameter between the camera coordinate system of the image capturing device and the physical coordinate system, and T is a translation parameter between the camera coordinate system of the image capturing device and the physical coordinate system.

Then, the rotation parameter and the translation parameter are used as position information between the image acquisition device and the marking plate. The rotation parameter represents a rotation state between the camera coordinate system and the physical coordinate system, that is, a rotation degree of freedom of the image acquisition device in the physical coordinate system and each coordinate axis of the physical coordinate system. The translation parameter represents a moving state between the camera coordinate system and the physical coordinate system, that is, a moving degree of freedom of the image acquisition device in the physical coordinate system and each coordinate axis of the physical coordinate system. The rotation parameter and the translation parameter are six free information of the image acquisition device in the physical coordinate system, and can represent the rotation and movement states of the image acquisition device in the physical coordinate system, that is, the angle, the distance, and the like between the visual field of the image acquisition device and each coordinate axis in the physical coordinate system can be obtained.

S403: specifically, the processor determines display content corresponding to the visual interaction device according to the posture information, and superimposes the display content on the real scene so that the user observes the virtual scene. Specifically, a corresponding relationship of display content corresponding to each gesture information is stored in advance, and after the gesture information between the visual interaction device and the image acquisition device is acquired, the display content corresponding to the gesture information between the current visual interaction device and the image acquisition device is searched according to the corresponding relationship. The virtual scene may be an augmented reality scene or a virtual scene. In this embodiment, a virtual scene is taken as an augmented reality scene as an example to illustrate the method.

The processor sends the display content to the display device, the display device is indicated to display the display content, the display device displays the display content, and light emitted from the light emitting surface of the display device is processed by the optical element and then emitted to a preset position. The preset positions are positions of the two eyes of the user, the two eyes of the user can observe the display content and the position of the preset position of the ambient light guide, so that superposition of the display content and the external real environment is achieved, and the user can observe the visual effect of virtual reality. In addition, the user can further generate a new virtual image in the virtual scene through the combination of a plurality of visual interaction devices, so that a better experience effect is brought to the user; the user can also realize the interaction with the virtual scene through the visual interaction device, and specifically, the visual interaction device includes a planar marker object and a multi-surface marker structure as an example.

After the user wears the head-mounted display device, the user includes a planar marking object and a multi-surface marking structure body in the visual field of the user, wherein the planar marking object is exemplarily a first marking plate, and the multi-surface marking structure body is a twenty-six-surface marking structure body. As shown in fig. 9, for the schematic position diagram between the first marking plate and the twenty-six-sided marking structure observed by the user, the image acquisition device acquires the target image of the visual interaction device in the field of view of the user, and the acquired image may not be the same as the picture shown in fig. 9, but the difference is not too large. The processor analyzes the image, can determine the posture information between the head-mounted display device and the first marking plate, finds the display content corresponding to the posture information, and displays the display content through the display device. The optical display device, for example, a mirror reflects the display content to the eyes of the user, and the real scene including the planar marker object and the multi-surface marker structure in the field of view of the user is seen by the user through the mirror, so that the user can observe the visual effect image obtained by superimposing the display content and the external real scene.

As shown in fig. 10, where the object w1 represents a water cup, the object w2 represents a soup ladle with food in it, the object w3 represents a table, correspondingly, the position of the object w1 corresponds to the marker 210A in fig. 9, the object w2 corresponds to the first marker plate in fig. 9, and the object w3 corresponds to the twenty-six marker structure in fig. 9. That is, when the user holds the twenty-six marker structure and moves to the position of the marker 210A to reach the position shown by the twenty-six marker structure in fig. 9, the screen seen is the screen shown in fig. 10. The display content displayed by the display device is reasonably set, so that when a user observes the graph 10, the virtual image can just shield the first mark plate and the twenty-six mark structure body, and the visual effect is better.

Further, with the change of the posture information between the visual interaction device and the image acquisition device, the augmented reality scene also changes correspondingly, specifically, the variation of the posture information between the visual interaction device and the image acquisition device is obtained, and the displayed display content is adjusted according to the variation, so that the augmented reality scene changes correspondingly along with the variation of the posture information.

As shown in fig. 11(a), a virtual reality scene observed by a user is shown, wherein a rectangular frame in the figure is only used to indicate the size of an image, and the rectangular frame is not seen by the user when observing the image. Assuming that the user moves or rotates in the virtual reality scene shown in fig. 11(a) such that the posture information between the visual interaction device and the image capturing device is S1, and then the user changes the posture information between the visual interaction device and the image capturing device, that is, the posture information between the visual interaction device and the image capturing device is S2 in the virtual reality scene shown in fig. 11(b), the virtual scene becomes as shown in fig. 11 (b). Comparing fig. 11(a) and (b) with the front of the object w1 in fig. 11(a) and the back of the visual interaction device in fig. (b), it can be seen that the change of the virtual scene is simulated, and the user walks from the front of the object 1 to the back of the object w1 under the visual angle of fig. 11 (a).

In addition, when the posture information among the plurality of visual interaction devices changes, the virtual scene also changes correspondingly, specifically, the posture information among the visual interaction devices is determined according to the posture information of each visual interaction device, and the virtual scene corresponding to the visual interaction device is determined according to the posture information of the visual interaction device and the posture information among the visual interaction devices. Specifically, a virtual image corresponding to each visual interaction device is determined, and a plurality of virtual images are used for forming a virtual scene. As an embodiment, judging whether the posture information between at least two visual interaction devices meets a preset standard; and if so, modifying the virtual image corresponding to the visual interaction device so as to change the virtual scene. The preset criterion is a criterion set according to needs, and may be, for example, a preset angle or a preset distance value.

As shown in fig. 12, as shown in fig. 12(a), the virtual scene is a virtual scene that a candle is set on a table, and may be a virtual scene that is presented in the eyes of a user when a first mark plate is captured by an image capturing device, where the position of the candle may be the position of a certain marker of the first mark plate, for example, the marker 210 c. When the user holds the twenty-six marker structure 300 to approach the marker 210c, as shown in fig. 12(b), the virtual image corresponding to the twenty-six marker structure 300 is a burning match stick, as the twenty-six marker structure 300 gradually approaches the marker 210c, when a certain predetermined position is reached, the virtual scene observed by the user is that the burning match stick ignites the candle, and the user takes the twenty-six marker structure 300 away from the visual field, so that the current virtual scene is changed into a state that an ignited candle is arranged on the table as shown in fig. 12 (c).

Referring to fig. 13, a virtual scene processing apparatus 1000 according to an embodiment of the present invention is shown, which is applied to the processor of the recognition and tracking system 10 shown in fig. 1, specifically, the virtual scene processing apparatus 1300 includes: an acquisition unit 1301, a first calculation unit 1302, a second calculation unit 1303, and a display unit 1304.

An obtaining unit 1301, configured to obtain a target image with the marker, which is collected by the image collecting apparatus.

A first calculation unit 1302 for determining pose information between the visual interaction device and the image acquisition device.

And the second computing unit 1303 is configured to determine a virtual scene corresponding to the visual interaction device according to the posture information.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In summary, according to the virtual scene processing method, the virtual scene processing apparatus, the virtual scene processing interaction system, the head mounted display apparatus, the visual interaction apparatus, and the computer readable medium provided in the embodiments of the present invention, after the marker target image of the visual interaction apparatus acquired by the image acquisition apparatus is acquired, the pose information of the visual interaction apparatus in the real scene is determined according to the target image. And then acquiring a virtual scene corresponding to the attitude information, so that the virtual scene is seen by human eyes to realize the visual effect of augmented reality or virtual reality.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (mobile terminal) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments. In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

Claims

1. A virtual scene processing method is applied to an interactive system, the system comprises a visual interactive device with a marker and an image acquisition device, and the method comprises the following steps:

acquiring a target image with a marker acquired by the image acquisition device, wherein the visual interaction device is positioned in a real scene;

determining attitude information of the visual interaction device in the real scene according to the target image;

and determining a virtual scene corresponding to the visual interaction device according to the attitude information.

2. The method of claim 1, further comprising:

acquiring the change of the attitude information;

and updating the virtual scene according to the change of the attitude information so that the virtual scene changes correspondingly along with the change of the attitude information.

3. The method according to any one of claims 1 or 2, wherein the pose information is pose information between the visual interaction device and the image acquisition device; the determining the pose information of the visual interaction device in the real scene according to the target image comprises:

identifying and confirming identity information of a marker in the target image;

determining a tracking method adopted by a visual interaction device corresponding to a marker according to the marker in the target image and the identity information of the corresponding marker;

and acquiring attitude information between the visual interaction device and the image acquisition device according to a corresponding tracking method.

4. The method according to claim 3, wherein the determining, according to the identity information of the markers and the corresponding markers in the target image, the tracking method adopted by the visual interaction device corresponding to the markers comprises:

judging whether the marker in the target image is a planar marker or a three-dimensional marker according to the identity information of the marker, wherein the planar marker is a marker arranged on a planar marker object, and the three-dimensional marker is a marker arranged on a multi-surface marker structure;

if the marker is a planar marker, adopting a corresponding planar positioning and tracking method;

if the marker is a three-dimensional marker, judging whether the marker belongs to the same plane, and if the three-dimensional marker belongs to the same plane, adopting a corresponding plane positioning and tracking method;

and if the three-dimensional markers do not belong to the same plane, adopting a corresponding three-dimensional positioning and tracking method.

5. The method of claim 3, wherein the visual interaction device is plural; the determining a virtual scene corresponding to the visual interaction device according to the posture information includes:

determining attitude information among the visual interaction devices according to the attitude information of each visual interaction device;

and determining a virtual scene corresponding to the visual interaction device according to the attitude information of the visual interaction device and the attitude information among the visual interaction devices.

6. The method of claim 3, wherein feature points are disposed within the markers, and wherein obtaining pose information between the visual interaction device and the image capture device according to a corresponding tracking method comprises:

acquiring pixel coordinates of feature points in a target image in an image coordinate system corresponding to the target image;

and acquiring attitude information between the image acquisition device and the visual interaction device according to the pixel coordinates of the feature points in the target image and the pre-acquired physical coordinates corresponding to the feature points.

7. A virtual scene processing device applied to an interactive system, the system comprises a visual interactive device with a marker and an image acquisition device, and the virtual scene processing device is characterized by comprising:

the acquisition unit is used for acquiring a target image with a marker acquired by the image acquisition device, and the visual interaction device is positioned in a real scene;

a first computing unit, configured to determine, according to the target image, pose information of the visual interaction apparatus within the real scene;

and the second computing unit is used for determining a virtual scene corresponding to the visual interaction device according to the posture information.

8. An interactive system, comprising: a head mounted display device and a visual interaction device having a marker;

the head-mounted display device is used for acquiring a target image with a marker, and the visual interaction device is positioned in a real scene;

9. A head-mounted display device applied to an interactive system, the interactive system comprising a visual interactive device, the head-mounted display device comprising: the device comprises a display device, an image acquisition device and an optical display device;

the image acquisition device is used for acquiring a scene image with a real scene and sending the scene image to a processor, wherein the scene image comprises a target image with a visual interaction device positioned in the real scene;

the processor is used for determining the posture information of the visual interaction device in the real scene according to the target image and determining a virtual scene corresponding to the visual interaction device according to the posture information;

the display device is used for displaying the virtual scene.

10. The head-mounted display device according to claim 9, further comprising: a visual range camera electrically connected to the processor;

the visual mileage camera is used for acquiring a scene image of a real scene and sending the scene image to the processor;

the processor is further configured to construct a virtual scene corresponding to the real scene according to the scene image.

11. A visual interaction device is applied to an interaction system, and the interaction system further comprises a head-mounted display device; the visual interaction device is provided with markers:

the marker is used for determining the posture information of the visual interaction device in the real scene for the head-mounted display device, so that the head-mounted display device determines the virtual scene according to the posture information.

12. A computer-readable medium having program code executable by a processor, wherein the program code causes the processor to perform the method of any of claims 1-6.