CN111510701A

CN111510701A - Virtual content display method and device, electronic equipment and computer readable medium

Info

Publication number: CN111510701A
Application number: CN202010322732.8A
Authority: CN
Inventors: 彭冬炜
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-04-22
Filing date: 2020-04-22
Publication date: 2020-08-07

Abstract

The application discloses a display method, a display device, electronic equipment and a computer readable medium of virtual content, which relate to the technical field of display, and the method comprises the following steps: acquiring an image of a real environment acquired by an image acquisition device; if the target object exists in the image, acquiring product description information corresponding to the target object; acquiring visual presentation information of a target object in a real environment; generating virtual content according to the product description information and the visual presentation information, wherein the virtual content is used for representing the product description information; virtual content is added at the location of the target object in the real environment. Therefore, in a real environment, the position of the target object passes through the added virtual content, so that the product description information corresponding to the target object can be displayed through the virtual content, and compared with a mode of reading a product use instruction, the mode of the virtual content is more vivid and interesting, and can attract users and bring convenience to the users to know the product description information.

Description

Virtual content display method and device, electronic equipment and computer readable medium

Technical Field

The present application relates to the field of display technologies, and in particular, to a method and an apparatus for displaying virtual content, an electronic device, and a computer-readable medium.

Background

At present, when a user needs to know a product, the user often reads the related introduction of the product. For example, when a product is installed or used, the user needs to read the instruction of the product to install or use the product successfully. For another example, at the time of online shopping, it is necessary to view a video introduction of a product or read a detailed introduction of the product, etc. However, product-related presentations are more obscure to read, more content and less understandable, and users tend to finish reading the content less patiently.

Disclosure of Invention

The application provides a display method and device of virtual content, an electronic device and a computer readable medium, so as to improve the defects.

In a first aspect, an embodiment of the present application provides a method for displaying virtual content, including: acquiring an image of a real environment acquired by an image acquisition device; if a target object exists in the image, acquiring product description information corresponding to the target object; acquiring visual presentation information of the target object in the real environment, wherein the visual presentation information comprises pose information of the target object relative to the image acquisition device; generating virtual content according to the product description information and the visual presentation information, wherein the virtual content is used for representing the product description information; adding the virtual content at the location of the target object in the real environment.

In a second aspect, an embodiment of the present application further provides an apparatus for displaying virtual content, where the apparatus includes: the device comprises a first acquisition unit, a second acquisition unit, a determination unit and a processing unit. The first acquisition unit is used for acquiring the image of the real environment acquired by the image acquisition device; a second obtaining unit, configured to obtain, if a target object exists in the image, product description information corresponding to the target object and visual presentation information of the target object in the real environment, where the visual presentation information includes pose information of the target object with respect to the image acquisition device; the determining unit is used for generating virtual content according to the product description information; a processing unit for adding the virtual content at the position of the target object in the real environment.

In a third aspect, an embodiment of the present application further provides an electronic device, including: one or more processors; a memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the above-described method.

In a fourth aspect, the present application also provides a computer-readable storage medium, where a program code executable by a processor is stored, and when executed by the processor, the program code causes the processor to execute the above method.

The method, the device, the electronic device and the computer readable medium for displaying the virtual content, provided by the application, are used for acquiring an image of a real environment acquired by an image acquisition device, determining whether a target object exists in the image, acquiring product description information corresponding to the target object if the target object exists in the image, wherein the product description information is information for describing the target object, then acquiring visual presentation information of the target object in the real environment, wherein the visual presentation information comprises pose information of the target object relative to the image acquisition device, generating the virtual content according to the product description information, and adding the virtual content to the position of the target object in the real environment. Therefore, in a real environment, the position of the target object passes through the added virtual content, so that the product description information corresponding to the target object can be displayed through the virtual content, and compared with a mode of reading the text content related to the product, the mode of the virtual content is more vivid and interesting, and can attract users and bring convenience to the users to know the product description information.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an AR device according to an embodiment of the present application;

fig. 2 is a schematic diagram of an AR device according to another embodiment of the present application;

FIG. 3 is a flowchart illustrating a method for displaying virtual content according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating virtual content provided by an embodiment of the present application;

FIG. 5 is a flow chart of a method for displaying virtual content according to another embodiment of the present application;

FIG. 6 is a schematic diagram of virtual content provided by another embodiment of the present application;

FIG. 7 is a schematic diagram of virtual content provided by yet another embodiment of the present application;

FIG. 8 illustrates a method flow diagram of a method of displaying virtual content provided by yet another embodiment of the present application;

FIG. 9 shows a flowchart of the method of S840 of FIG. 8;

FIG. 10 is a schematic diagram of an analytical model provided by an embodiment of the present application;

FIG. 11 is a schematic diagram illustrating a process of training and applying an analytical model provided by an embodiment of the present application;

FIG. 12 is a schematic diagram illustrating virtual content provided by yet another embodiment of the present application;

FIG. 13 is a block diagram illustrating a display device for virtual content provided by an embodiment of the present application;

FIG. 14 is a block diagram illustrating a display device of virtual content provided in another embodiment of the present application;

FIG. 15 is a block diagram of an electronic device according to an embodiment of the present application;

fig. 16 illustrates a storage unit for storing or carrying program code for implementing a method according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. At present, when a user needs to know a product, the user often reads the related introduction of the product. For example, when a product is installed or used, a user needs to read a use instruction of the product to successfully install or use the product, and for example, when a product is purchased on the internet, a video introduction of the product needs to be viewed or a detailed introduction of the product needs to be read. However, the use of product-related introductions is obscure to read, much more and less understandable, and users tend to finish reading the content less patiently. For example, in real life, many commercial products, such as household appliances, require reading instructions, which is often a tedious process.

In order to improve the convenience of reading the product specification, reduce the tediousness of reading the specification, attract the user to read the product specification and improve the convenience of learning the content of the product specification, the content of the product specification can be displayed through an Augmented Reality (AR) technology.

Augmented reality is a technology for increasing the perception of a user to the real world through information provided by a computer system, and is used for overlaying content objects such as virtual content, scenes or system prompt information generated by a computer into a real scene to enhance or modify the perception of the real world environment or data representing the real world environment, so that the user can observe the effect of augmented reality or mixed reality after the content such as the virtual content and the real world are overlaid in the real world environment by wearing head-mounted display equipment such as AR glasses.

For example, some AR glasses may collect information in a real environment through a camera and a sensor of the AR glasses, and after running through a processor and a specific algorithm, render a corresponding image to be displayed on the glasses, so that a user feels that a virtual image coincides with a real world.

Some AR glasses are split-type structures, i.e. the processor module of the glasses is not on the glasses but on a separate computing unit, which is connected to the AR glasses by wires. Data collected by a camera and a sensor of the AR glasses are preprocessed and then sent to a computing unit, the computing unit renders a corresponding virtual image according to the collected information, and the virtual image is transmitted to the AR glasses for display.

As shown in fig. 1, fig. 1 shows an AR device, which may then be a head mounted display device, in particular AR glasses. As shown in fig. 1, the head-mounted display apparatus includes a display screen 110, a frame 120, and an imaging device 130.

The frame 120 includes a front surface 121 on which the display screen 110 is mounted, a side surface 122, and a rear surface 123, and the imaging device 130 is capable of displaying an image of virtual content on the display screen 110. For example, the imaging device 130 may be a diffractive light guide capable of projecting an image onto a display screen.

As an embodiment, the display screen 110 may be a lens of the AR glasses, and the display screen 110 may also transmit light, that is, the display screen 110 may be a transflective lens, when the user wears the head-mounted display device, when an image is displayed on the display screen 110, the user can see the image displayed on the display screen 110 and can also see objects in the real world in the surrounding environment through the display screen 110. Then, through the semi-transparent and semi-reflective lens, the user can superimpose the image displayed on the lens with the surrounding environment, thereby realizing the visual effect of augmented reality.

When the user wears the head-mounted display device, the display screen 110 is located in front of the eyes of the user, that is, the front surface 121 is located in front of the eyes of the user, the rear surface 123 is located behind the eyes of the user, and the side surface 122 is located at the side of the eyes of the user.

In addition, a front camera is disposed on the front surface 121, and environmental information in front is sensed through the front camera, so as to realize instant positioning and Mapping (S L AM), and further realize a visual effect of augmented reality or mixed reality.

In other AR technologies, a front-facing camera may be used to implement the integration of real scenes with virtual content. Specifically, the visual field direction of the front camera on the front surface of the head-mounted display device may be consistent with the visual field direction of the user when the user wears the head-mounted display device, and the front camera is configured to capture an image of a real scene, and display the captured image of the real scene on the display screen in front of the eyes of the user after being processed, specifically, an image of virtual content may be superimposed on the image of the real scene and viewed by the user, so that the user observes the visual effect of augmented reality.

As another embodiment, the AR effect may also be implemented by a mobile terminal or a terminal with a screen, such as a tablet computer or a computer device. Specifically, as shown in fig. 2, fig. 2 shows another AR device, which may be a user terminal, where the terminal includes a camera and a screen, and as shown in fig. 2, an image of a real scene (an indoor scene, such as a table lamp, a sofa, etc. shown in fig. 2) displayed on the screen of the terminal may be an image captured by the camera of the terminal, and display content corresponding to virtual content a is displayed on the screen, where the display content may be a picture, and a picture corresponding to virtual content a can be added to the image corresponding to the real scene displayed on the screen, and the user can see, through the content displayed on the screen, that virtual content a (such as a sphere in fig. 2) is disposed in the real scene, so as to obtain a display effect of augmented reality in which the virtual content a is disposed in the real scene.

Specifically, the steps of implementing the effect of augmented reality by the terminal shown in fig. 2 include tracking, scene understanding, and rendering. Wherein the tracking may be implemented to provide a relative position of the terminal in the real environment. In particular, an accurate view of where the terminal is located and the orientation of the device can be provided by means of a visual odometer, which uses camera images and motion data of the device.

Context understanding refers to determining properties or characteristics of the environment surrounding a device. For example, a surface or plane in the real environment can be determined by a plane detection (planedetection) function or the like. Such as a floor or a table. In order to place the virtual content, the terminal also needs to provide a hit test function. This function may obtain the intersection of the real-world topology, i.e. obtain the specific location of each object in the real environment, e.g. the height of the table from the ground or the distance from the terminal, etc., in order to place the virtual content in the real environment. Finally, a ray estimation can be performed for scene understanding. The ray estimation is used to correctly illuminate the virtual content to match the real world, for example, when the virtual content is placed in the real world, the shadow of the virtual content under the illumination of the real world is displayed, thereby increasing the sense of realism and matching of the virtual content with the real world.

The real environment displayed on the screen may be an image of the current environment acquired by a camera of the terminal, or an image of a real scene sent by another terminal received by the terminal in real time, for example, when video interaction is performed between the terminal and another terminal, an image of a real scene where the terminal is located and acquired by a camera of another terminal in real time is displayed on the screen of fig. 2.

In order to achieve the above effect of the user when reading the product specification, the embodiment of the present application provides a display method of virtual content, which may be applied to the above devices shown in fig. 1 and 2. In the embodiment of the present application, the method may be applied to the electronic device shown in fig. 2, that is, the execution subject of the method may be a processor or a client in the electronic device. In addition, it should be noted that the executing subject of the method may also be the head-mounted display device of fig. 1, specifically, if a processor is disposed in the head-mounted display device, the executing subject of the method may be the processor in the head-mounted display device. As an implementation manner, and in order to better illustrate the effect of the present embodiment, the execution subject of the method in the embodiment of the present application is, for example, a mobile terminal, such as the electronic device shown in fig. 2, but this does not limit the application field of the method of the present application, that is, the device and the application environment to which the present application is applicable are not limited.

Referring to fig. 3, fig. 3 illustrates a virtual content display method provided by an embodiment of the present application, for displaying virtual content at a position of a target object, so as to improve convenience of a user in reading a product specification, reduce perniciousness of the specification, and better attract the user to read the product specification and improve convenience of the user in learning the content of the product specification. Specifically, the method comprises the following steps: s301 to S304.

S301: and acquiring an image of the real environment acquired by the image acquisition device.

The image acquisition device can be a camera, a digital camera and other equipment capable of acquiring images.

As an implementation manner, the image capturing apparatus may be installed on the head-mounted display device or in a mobile phone, and specifically, if the execution subject of the embodiment of the present application is the head-mounted display device, that is, if the embodiment of the present application is applied to the head-mounted display device, the image capturing apparatus may be an imaging apparatus installed on the head-mounted display device, and if the execution subject of the embodiment of the present application is the user terminal, that is, if the embodiment of the present application is applied to the user terminal, the image capturing apparatus may be a camera installed on the user terminal.

As another embodiment, the image capturing device may be provided independently, that is, may not be provided on the execution subject of the method, and in this embodiment, the image capturing device is connected to the execution subject of the method, and is used to send the captured image of the real environment to the execution subject of the method.

S302: and if the target object exists in the image, acquiring product description information corresponding to the target object.

The product description information may be product-related introduction information, and the information may include product-related information such as a use instruction, an installation instruction, a feature introduction, and a function introduction of the product. The user can know or use the product by reading the product description information.

As an embodiment, the presence of a target object within an image may be determined by an executing subject of the method of the present application. Specifically, taking an example that an execution subject of the method is a user terminal, the user terminal obtains an image of a real environment acquired by an image acquisition device, and determines whether a target object exists in the image acquired by the image acquisition device. Specifically, the identity information of all objects in the image is analyzed, whether target identity information exists in the identity information of all the objects is determined, and if yes, the target object exists in the image is determined. And then searching product description information corresponding to preset identity information matched with the identity information of the target object in the data set to serve as the product description information corresponding to the target object. The identity information may be a contour line of an object image in the image, a feature point of the object image in the image, or other information that can distinguish the image of the object from images of other objects, which is not limited herein.

The target identity information may be identity information of a target object, and specifically, the target identity information may be preset. In one embodiment, a data set is preset, and the data set includes a plurality of pieces of preset identity information and product description information corresponding to each preset identity. The preset identity information is a preset product provided with product description information, or the preset identity information is a preset product which needs to display the product description information for a user.

As another embodiment, the data processing server may determine that the target object exists in the image. Specifically, the image acquisition device may send the acquired image to the data processing server, and the data processing server determines whether the target object exists in the image according to the method and sends the result to the execution subject of the method.

In addition, the product description information corresponding to the target object may include information such as installation guidance, a use instruction, and product introduction of the target object, which is not described herein again.

As an embodiment, the obtaining of the product description information of the target object may be by sending an obtaining request to a server corresponding to the target object. The target object may be a controlled device within an internet of things. The internet of things is a network concept which extends and expands the user side of the internet of things to any article to perform information exchange and communication on the basis of the internet concept. With the development of the internet of things technology, some scenes can be configured in the internet of things system. For a configured scene, a plurality of controlled devices can be involved, and the plurality of controlled devices have a certain linkage relationship and can work cooperatively.

The controlled equipment can be a projector, a projection screen, an intelligent lamp, an intelligent socket, a human body sensor, a door and window sensor, a wireless switch, an air conditioner partner, a smoke alarm, an intelligent curtain motor, an air purifier, an intelligent sound box and other user terminals. In one embodiment, in the internet of things system, the electronic device (such as the user terminal) for controlling can realize data interaction with the controlled device by directly establishing a wireless connection with the router. Moreover, after the electronic device is connected with the cloud end, data interaction with the controlled device is achieved through a data link between the cloud end and the router. Alternatively, the controlled device may establish a wireless connection with the router through the gateway. The data interaction may include the user terminal sending a control instruction to the controlled device, and the controlled device returning status information or returning an instruction execution result to the user terminal. Wherein the data interaction between the user terminal and the controlled device can be triggered by a client installed in the user terminal.

As an implementation manner, the user terminal obtains an identifier of the target object, specifically, the data set may include a product identifier corresponding to preset identity information, where the product identifier may be a name and a model of a product. The product identification of the target object can be determined after the identity information of the target object is acquired, an acquisition request is sent to a product server corresponding to the product identification through the Internet of things, the acquisition request comprises the product identification, and after the product server acquires the acquisition request, the product server analyzes the request to obtain the product identification, acquires product description information corresponding to the product identification, and sends the product description information as the product description information of the target object to the user terminal.

S303: obtaining visual presentation information of the target object within the real environment.

Wherein the visual presentation information comprises pose information of the target object with respect to the image acquisition device.

S304: and generating virtual content according to the product description information and the visual presentation information, wherein the virtual content is used for representing the product description information.

As an embodiment, the virtual content can reflect the product description information, that is, the user can know the product description information by observing the virtual content. As an implementation manner, the virtual content may be an animation display, a text display, or a picture display of the product description information, which is not described herein again. For a specific implementation of obtaining the virtual content, reference may be made to the following embodiments.

Therefore, the product description information is used to determine the information characterized by the virtual content, i.e. the information characterized by the virtual content is related to the product description information, so that the product description information can be known through the information characterized by the virtual content. The visual presentation information is used to reflect information that can be observed by a user when the virtual content is placed in the real world, the information may include placement information or lighting information, the placement information may include information such as a position and a placement angle of the virtual content, and in particular, embodiments of determining the virtual content according to the visual presentation information may refer to the following embodiments.

S305: adding the virtual content at the location of the target object in the real environment.

Specifically, position information of the target object within the real environment is acquired in advance.

As an embodiment, the position information of the target object within the real environment may be determined from the image of the target object captured by the image capturing device.

In particular, the real environment may refer to a world coordinate system corresponding to the real world, and the position information of the target object in the real environment may refer to the physical coordinates of the target object in the world coordinate system. The world coordinate system may be a coordinate system established with the image capturing device as a center, that is, when the position of the image capturing device in the real world scene is used as an origin of the world coordinate system, specifically, when the image capturing device is installed in the execution subject of the method of the present application, for example, when the method of the present application is a user terminal, the center of the world coordinate system is the position of the user terminal.

The user terminal can scan the surrounding environment according to a preset positioning algorithm to establish a world coordinate system taking the terminal as a center, and determine the coordinate position of each real object in the real environment in the world coordinate system, wherein the coordinate position is used as the position information of the real object in the real environment. As an implementation manner, an image acquisition device and an inertial measurement unit are arranged in the user terminal, and a world coordinate system corresponding to the real environment, that is, a world coordinate system with the terminal as a center, can be established according to an image of the surrounding environment acquired by the image acquisition device and pose information of the user terminal obtained by the inertial measurement unit, so as to further obtain coordinate positions of each real object in the world coordinate system.

Specifically, the above-mentioned S L AM technique may be used to understand the surrounding real environment and track the real object in the real environment, the S L AM technique may be used to construct a world coordinate system based on the terminal as a starting point based on the image of the surrounding environment collected by the camera and the pose information of the user terminal obtained by the inertial measurement unit, and then use a Time of flight (TOF) depth camera to obtain a dense 3D point cloud, where the dense 3D point cloud may obtain 3D coordinates of each point on the surface of the real object in the world coordinate system, and then the 3D coordinates may be used as the position information of each real object in the real environment.

After determining the position of the target object in the real environment, adding the virtual content to the position of the target object in the real environment, specifically, referring to the above embodiment, an implementation of adding the virtual content in the real environment to enable the user to observe the AR effect may be referred to, and details are not repeated here.

Specifically, the position of the added virtual content may be located at the position of the target object within the world coordinate system according to the position of the target object in the real environment. Specifically, the position information of the virtual content in the real environment has a mapping relationship with the display information of the display content corresponding to the virtual content on the display screen of the user terminal, and the display information includes information such as the display size, the shape, and the position of the display content corresponding to the virtual content.

In one embodiment, in the image of the real environment captured by the camera, the real object is in a camera coordinate system corresponding to the camera, a Z axis of the camera coordinate system matches with an optical axis direction of the camera, specifically, the optical axis direction of the camera is the Z axis direction of the camera coordinate system, and an XOY plane formed by an X axis and a Y axis is perpendicular to the Z axis. The coordinates of the real object within the camera coordinate system can be determined. For example, according to the mapping relationship between the pixel coordinate system of the image acquired by the camera and the camera coordinate system, the coordinates of the pixel points of the image of each real object in the image in the camera coordinate system can be determined, where the coordinates include depth information of the real object, and for example, the projection of the coordinates of the real object on the Z-axis of the camera coordinate system is the depth information of the moving object. The change of the depth of field of the real object can be determined through the change of the coordinates of the real object in the camera coordinate system, and the distance from the real object to the camera can be determined according to the change of the depth of field.

In an embodiment of displaying the virtual content in the real environment, the virtual content is displayed on a plane or at a position of an object in the real environment, a distance between the object and the camera may be determined, and further, depth-of-field information corresponding to the virtual content may be determined, and profile information of the display content corresponding to the depth-of-field information of the target object may be determined according to a preset correspondence relationship between the depth-of-field information and profile information in the display corresponding to the virtual content, where the profile information includes a shape and a size of the display content corresponding to the virtual content. For example, according to a rule that the farther the distance is or the depth information is larger, the smaller the contour is, that is, the larger the distance is, the correspondence between different positions in the real environment and the contour information of the display content corresponding to the virtual content may be set in advance, and the shape and size of the display content corresponding to the virtual content may be determined.

As another embodiment, the coordinate relationship between each pixel point in the pixel coordinates of the screen and each position point in the real environment may also be determined, specifically, by means of internal and external parameters of the camera, for example, by a zhangnyou calibration method. For example, as shown in fig. 2, the virtual content a is displayed on the ground in the vicinity of the desk lamp in the real environment, and the position of the ground in the vicinity of the desk lamp in the image of the real environment within the pixel coordinates of the screen can be determined, so that the display content corresponding to the virtual content can be determined to be displayed at the position of the ground in the vicinity of the desk lamp in the image of the real environment displayed on the screen.

Therefore, the virtual content can be added to a specific position of the real environment by a predetermined mapping relationship between the pixel coordinates of each pixel point of the image displayed on the screen and the world coordinates of each position point in the world coordinate system.

For example, it is necessary to display the virtual content at a position a in the specified coordinate system, that is, when the user uses the user terminal, the user can see that the virtual object is displayed at the position a in the real space, it is determined that the display position on the display screen corresponding to the position a is a position B, and when the virtual object is displayed at the position B on the display screen of the user terminal, the user can see that a virtual object is displayed at the position a through the display screen.

As an embodiment, after determining the position of the target object, and determining the virtual content according to the product description information of the target object, the virtual content may be added to the real world at the position of the target object. As shown in fig. 4, the content displayed on the interface includes a target object 401 and the current real environment and virtual content 402 of the target object 401.

As an embodiment, the content of the interface may be content observed by the user through the head mounted display device. As another embodiment, the content of the interface may be content displayed on a screen of the user terminal, specifically, the target object 401 displayed on the screen and the current real environment of the target object 401 may be images captured by an image capturing device of the user terminal, and the virtual content 402 may be display content displayed on the screen of the user terminal, where the display content corresponds to the virtual content 402.

As shown in FIG. 4, the virtual content 402 may be a textual representation of the product description information of the target object. As shown in fig. 4, the virtual contents added at the position of the target object 401 (i.e., the range hood in fig. 4) are two labeling boxes in which the introduction and the operation steps of the respective parts of the product are described, respectively. Through the virtual content, the user can know the characteristics of each part of the target object and the whole operation steps.

Taking fig. 4 as an example, the main execution body of the method is a user terminal, and when the user shoots the range hood through the user terminal during use, the user terminal can determine the position of the range hood in the real world, and add virtual contents corresponding to the product description information of the range hood, such as product introduction information and operation step information of the range hood, at the position of the range hood, so that the user can quickly know and operate the range hood.

Referring to fig. 5, fig. 5 illustrates a virtual content display method provided by an embodiment of the present application, for displaying virtual content at a position of a target object, so as to improve convenience of a user in reading a product specification, reduce perniciousness of the specification, and better attract the user to read the product specification and improve convenience of the user in learning the content of the product specification. Specifically, the method comprises the following steps: s510 to S560.

S510: and acquiring an image of the real environment acquired by the image acquisition device.

S520: and if the target object exists in the image, acquiring product description information corresponding to the target object.

S530: and obtaining model data corresponding to the target object.

The model data may refer to product design data of the target object, specifically, when a developer designs the target object or designs a display effect of the target object, the developer may set a model of the target object, the data corresponding to the model may be the model data, and the model data includes data of a contour shape, a size ratio, an appearance color, and the like of the target object. The target object may be produced based on the model data at the time of producing the target object.

As an implementation manner, the model data of the target object may be stored in a product server corresponding to the target object, and the user terminal may send a data obtaining request to the product server, where the data obtaining request includes a product identifier corresponding to the target object, and the product server obtains the model data of the target object based on the product identifier corresponding to the target object in response to the data obtaining request.

As another embodiment, the model data of the target object may be stored in the data processing server, and the user terminal may send a data acquisition request to the data processing server, and the data processing server returns the model data of the target object according to the data acquisition request. Wherein the data processing server may synchronize with at least part of the data in the product server to reconcile the model data of the target object stored in the processing server with the model data in the product server.

S540: and generating a virtual model corresponding to the target object based on the model data.

The virtual model may be a virtual object that can be added to a real environment, and specifically, taking the example that the execution subject of the present application is a user terminal, the virtual model may be a target image corresponding to the target object that can be displayed on a screen of the user terminal. In one embodiment, the virtual model may be a 2D plan view or a 3D perspective view.

Taking the target object 401 in fig. 4 as an example, for example, if the target object 401 is a range hood, after model data of the range hood, that is, data such as shape profile data and size of the range hood is acquired, a virtual model of the range hood is generated from the model data, that is, one virtual range hood is generated, and the graph of the virtual range hood is the same as the shape profile of the actual range hood, and the size may be an equal scaling of the size of the actual range hood.

As an embodiment, the virtual model can be added to the real environment in which the user can observe the virtual model, i.e. can observe the virtual target object, through the AR device.

S550: generating the virtual content based on the product description information and the virtual model.

As an embodiment, the virtual model may be modified according to the product description information, for example, based on the virtual model, a specific content is added to the virtual model, the specific content is set according to the product description information, and the virtual model with the specific content added as the virtual content can be added to the real environment.

Specifically, the specified content may be added to the virtual model, the marked content may be a text label, the text label is used to indicate functions and descriptions of each part of the target object, and a description of the whole operation procedure or the whole product, and the text label corresponds to the product description information, and specifically, the text label may be set according to the product description information. Note that the text label does not include only text, but may include information in a format such as a picture or voice.

As shown in fig. 6, the content displayed on the interface includes a target object 401 and a current real environment and virtual content of the target object 401, specifically, the virtual content includes a virtual model 4021 and a text label 4022, as shown in fig. 6, the target object 401 is an object represented by a dotted line, for example, a range hood, and the virtual model 4021 is represented by an implementation and is wrapped outside the target object 401.

Referring to fig. 4, the way of adding the text label 4022 to the virtual model 4021 and the way of adding the text label 4022 to the target object 401 can reduce the amount of calculation, because the way of adding the text label 4022 to the target object 401 requires precise location of each part of the target object 401 in the real environment, which increases the overall amount of calculation, but in the embodiment of the present application, the virtual model 4021 is set according to the model data of the target object, so that the size and ratio of the virtual model 4021 can be similar to those of the original target object, and flexibility can be increased by operating the virtual model 4021, for example, adding text information to the virtual model 4021.

As shown in fig. 7, the content displayed on the interface includes a target object 401 and a current real environment and virtual content of the target object 401, and the virtual content includes a virtual model 4021 and a text label 4022, where the text label 4022 may include a first content added on a surface of the virtual model 4021 and a second content disposed outside the surface of the virtual model 4021, where the first content and the second content may be associated with each other, as shown in fig. 7, the first content is a number and functions as a label, and the second content is an operation content corresponding to the number, as shown in "removal 1" in fig. 7, so that a user can clearly operate the target object by combining the first content and the second content without referring to a usage specification of the target object.

As an embodiment, the product description may be presented to the user by an AR effect through animation, and an embodiment of generating the virtual content based on the product description information and the virtual model may be that a dynamic presentation screen is generated based on the product description information and the virtual model, and the dynamic presentation screen is used as the virtual content.

In particular, the dynamic presentation picture may be a dynamic picture built based on the virtual model. As an embodiment, the product description information may be product location introduction information, and the dynamic display screen may be a screen that rotates a virtual model in a predetermined order, displays a plurality of locations of the virtual model in a visual field of a user by rotating the virtual model, and displays description information of the locations at each location, and at the same time, opens a shell of the virtual model, thereby enabling display of an internal structure of the virtual model.

As another embodiment, the product description information may be product operation information including information that requires operation of the target object in a designated order for installation, removal, cleaning, etc. of the product to accomplish a specific operation and function. The dynamic display screen may be a dynamic display of the operation process corresponding to the operation information. For example, if the operation information is a removal operation of the target object, the dynamic display screen may be a process display animation in which the respective parts of the virtual model of the target object are sequentially removed in a predetermined order.

As an implementation manner, information such as a pose and illumination of the target object in the real environment may also be obtained, and the virtual model is determined based on the information, so that the virtual model can be closer to the placement position of the target object in the real environment, specifically, please refer to the following embodiments.

S560: adding the virtual content at the location of the target object in the real environment.

Referring to fig. 8, fig. 8 illustrates a virtual content display method provided by an embodiment of the present application, for displaying virtual content at a position of a target object, so as to improve convenience of a user in reading a product specification, reduce perniciousness of the specification, and better attract the user to read the product specification and improve convenience of the user in learning the content of the product specification. Specifically, the method comprises the following steps: s810 to S870.

S810: and acquiring an image of the real environment acquired by the image acquisition device.

S820: and if the target object exists in the image, acquiring product description information corresponding to the target object.

S830: and obtaining model data corresponding to the target object.

S840: obtaining visual presentation information of the target object within the real environment.

The visual presentation information is used to represent a visual experience of a user when observing a target object in the real environment, and specifically may be a placement angle, illumination, other phenomena, and the like of the observed target object.

As an embodiment, the visual presentation information includes at least one of illumination information of a target object and pose information of the target object with respect to the image capture device. As an embodiment, the method is applied to an electronic device, and the image capture device is installed in the electronic device, so that the pose information of the target object relative to the image capture device is also equivalent to the pose information of the target object relative to the electronic device.

The pose information of the target object in the real environment may include information such as a position and a rotation angle of the target object in the real environment. Specifically, the pose information is position and rotation information between the target object and the image capture device. The number of the target objects in the acquired target image may be one or more. When a plurality of target objects are in the acquired target image, attitude information between each target object in the target image and the image acquisition device is acquired.

The pose information may be acquired by a marker set in advance on the target object, or may be determined from an image of the target object based on a previously trained analysis model.

As an embodiment, the target object is provided with one or more markers, each marker comprises a plurality of sub-markers separated from each other, each sub-marker comprises one or more feature points, wherein the marker comprises a background area and a plurality of sub-markers distributed in the background area according to a specific rule, and each sub-marker has one or more feature points.

The sub-marker is a pattern with a certain shape, and the color of the sub-marker is differentiated from the color of the background area in the marker, for example, the background area is white, and the color of the sub-marker is black. The sub-markers may be formed by one or more feature points, and the shape of the feature points is not limited, and may be dots, circles, triangles or other shapes.

Specifically, pixel coordinates of a feature point in an image coordinate system corresponding to the image are acquired, and attitude information between the image acquisition device and the target object is acquired according to the pixel coordinates of the feature point in the image and pre-acquired physical coordinates corresponding to the feature point. Wherein the feature point may be the above-mentioned sub-marker.

The physical coordinates are coordinates of the feature points acquired in advance in a physical coordinate system corresponding to the target object, and the physical coordinates of the feature points are real positions of the feature points on the corresponding target object. The physical coordinates of each feature point can be obtained in advance, specifically, a plurality of feature points and a plurality of markers are arranged on a marking surface of the target object, and a certain point on the marking surface is selected as an origin to establish a physical coordinate system. The marking surface is taken as an XOY plane of a physical coordinate system, and the origin of the XOY coordinate system is positioned in the marking surface. The physical coordinates of each feature point may be determined in advance according to the position of each feature point on the marker and the distance from a reference point on the marker, and then, the center of the physical coordinate system may be the reference point.

Under the condition that the poses of the target objects are different, in the acquired image of the target object, the distances between the sub-markers in the markers on the target object and the reference points and the positions of the sub-markers are different, so that the positions of the sub-markers in the real environment can be determined through the mapping relation between the pixel coordinates and the physical coordinates, the positions of the markers can be integrally determined, and the poses of the target object are determined according to the installation positions of the markers on the target object.

In other embodiments, the visual presentation information may include illumination information of the target object and pose information of the target object relative to the image capture device. The electronic device may be an execution subject of the embodiment of the method, and may be a user terminal, for example. The lighting information may be a lighting effect of the target object within the real environment, for example, a lighting effect of a surface of the target object and a shadow of the target object in the real environment.

As another embodiment, the pose information may also be determined according to a pre-trained analysis model, and specifically, referring to fig. 9, the S840 includes S841 to S843.

S841: and acquiring a plurality of sample data, wherein the sample data is image data obtained in advance according to model data corresponding to a plurality of different visual presentation information of the target object.

In the embodiment of the application, an analysis model can be trained in advance, and the analysis model can obtain the visual presentation information of the target object based on the image analysis of the target object acquired by the image acquisition device.

As an embodiment, the analytical model may be a neural network based machine learning model. In particular, the analytical model may be a Convolutional Neural Network (CNN). The analysis model in the embodiment of the present application is set based on CNN, and specifically, as shown in fig. 10, an image acquired by an image acquisition device is input to a convolutional neural network, and the convolutional neural network inputs a feature vector of the image into a first fully-connected layer and a second fully-connected layer, where the first fully-connected layer outputs a detection result, and the detection result is used to determine whether a target object exists in the image of a real environment acquired by the image acquisition device and determine a product identifier of the target object. The second fully-connected layer outputs visual presentation information.

The pose regression method directly through the end-to-end CNN network, such as posenet, is usually one order of magnitude lower in accuracy than the conventional feature point matching method. This is because the way CNN, more like image retrieval, is affected by the training data.

Therefore, in order to improve the accuracy of the neural network, the embodiment of the application may render richer sample data through the model data of the target object. Moreover, the analysis model of the application can also complete the detection task at the same time without an external detection module.

Specifically, the application of the analytical model of the present application includes two parts, namely an offline training part and an online running part. As shown in fig. 11, in the offline training part, the rendering model is used to generate sample data, specifically to input the 3D model, the pose, and the illumination parameters, and output the image projected by the model to the 2D. Wherein the image projected into 2D may be a virtual model corresponding to the target object generated based on the model data.

Specifically, model data corresponding to a target object is acquired, a pose and an illumination parameter corresponding to the model data are set, the model data, the pose and the illumination parameter are input into the rendering model, the rendering model can output a two-dimensional image of the target object according to the model data, the pose and the illumination parameter, wherein the two-dimensional image can be regarded as an image acquired by the image acquisition device in a real environment of the pose and the illumination parameter corresponding to the model data of the target object, and the two-dimensional image can represent the image of the target object under a designated pose and illumination.

Specifically, the rendering model may be graphics design software or an application with graphics processing capability that can convert 3D graphics into 2D graphics, e.g., a three-dimensional graphics rendering tool that can render a three-dimensional object at a pose and illumination based on model data, the pose, and illumination parameters, and convert the three-dimensional object into a two-dimensional image of the object.

Then, based on the model data of the target object and the input multiple different poses and illuminations, multiple different image data under different poses and illuminations, i.e., the two-dimensional image, can be obtained.

S842: training an analytical model based on a plurality of said sample data and each said corresponding visual presentation information.

The plurality of sample data is then input into an analytical model that is trained based on the plurality of sample data and each of the corresponding visual presentation information. And the pose and the illumination parameter corresponding to each sample data are used as the true value, namely the label, of the sample data, so that the model is trained according to the sample and the label of each sample, and the model has the function of analyzing the image of the target object to obtain the visual presentation information of the target object in the real environment.

Specifically, the detection result is evaluated by using the cross entropy error, and the result of the visual presentation information is evaluated by using the minimum average error. Error direction transfer, and optimizing the parameters of the model by using a gradient descent method.

S843: and analyzing the image comprising the target object based on the trained analysis model to obtain the visual presentation information of the target object in the real environment.

Referring to fig. 11, in the online operation portion, an image acquired by the image acquisition device, that is, the camera image in fig. 11, is input to the analysis model, the analysis model outputs a result of the image, referring to the process in fig. 10, a detection result and visual presentation information are obtained, whether a target object exists in the image is determined according to the detection result, and if the target object exists, the visual presentation information is input to the rendering module, so that a virtual model corresponding to the target object is obtained.

S850: and generating a virtual model corresponding to the target object based on the model data and the visual presentation information.

In some embodiments, the visual presentation information includes pose information, and generating a virtual model corresponding to the target object based on the model data and the visual presentation information includes: and generating a virtual model corresponding to the target object based on the model data and the pose information, wherein the virtual model is the same as the placing pose of the target object.

As an embodiment, model data corresponding to different position information is predetermined, wherein the pose information may include position and rotation information between the target object and the image capturing device. Specifically, the rotation information may be used as a pose of the target object in the real environment, and a distance between the target object and the image capturing device in the real environment may be obtained according to a position between the target object and the image capturing device.

In some embodiments, the model data includes size information and pose, which may be the orientation of the virtual model, e.g., front or rear view, etc. The size information is determined according to the position between the target object and the image acquisition device. For example, the closer the distance, the larger the dimension, and the farther the distance, the smaller the dimension. And determining a placing pose according to the rotation information between the target object and the image acquisition device, generating a virtual model according to the model data by taking the determined size information and the placing pose as model data, so that the size of the virtual model is not smaller than the size of the target object in the image of the real environment acquired by the image acquisition device, and the placing position of the virtual model is the same as the placing pose of the target object in the real environment.

Specifically, the model data corresponding to the target object includes a plurality of size information and placing positions, each size information corresponds to a position, and each placing pose corresponds to rotation information, so that after the position and rotation information between the target object and the image acquisition device are acquired, the size information corresponding to the position is searched, and the placing pose corresponding to the rotation information is searched.

S860: generating the virtual content based on the product description information and the virtual model.

S870: adding the virtual content at the location of the target object in the real environment.

After referring to the pose information, in the virtual content observed by the user, the virtual model of the target object moves along with the movement of the observation direction, position and visual angle of the user, that is, the virtual content changes according to the observation angle of the user in the same way as the target object seen by the user.

As shown in fig. 12, comparing fig. 6 and 7, it can be seen that after the observation angle of view of the image capturing apparatus is changed, the virtual model changes following the change of the pose of the target object in the image, so that in the AR effect seen by the user through the screen of the image capturing apparatus and the user terminal, the virtual model changes following the change of the pose of the target object in the image, and has more realistic effect.

As an embodiment, the virtual model is a virtual model of the target object, the virtual model matches the size of the target object within the image, specifically, the virtual model and the target object are located at the same position within the real environment, and the virtual model and the target object are the same size, so that the virtual model and the target object can be completely overlapped, so that the user can observe that the target object is completely replaced by the virtual model.

Therefore, the user can photograph the target object through the user terminal and display the virtual content at the position of the target object, and the virtual content follows the change as the viewing angle and position of the user terminal and the user change.

Referring to fig. 13, a display device 900 for virtual content according to an embodiment of the present application is shown, where the device may include: a first acquisition unit 1301, a second acquisition unit 1302, a determination unit 1303 and a processing unit 1304.

The first obtaining unit 1301 is configured to obtain an image of the real environment collected by the image collecting apparatus.

A second obtaining unit 1302, configured to, if a target object exists in the image, obtain product description information corresponding to the target object and visual presentation information of the target object in the real environment, where the visual presentation information includes pose information of the target object relative to the image capture device.

The determining unit 1303 is configured to generate virtual content according to the product description information and the visual presentation information, where the virtual content is used to represent the product description information.

A processing unit 1304 for adding the virtual content at the position of the target object in the real environment.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Referring to fig. 14, a display device 1400 for virtual content according to an embodiment of the present application is shown, where the device may include: a first acquisition unit 1410, a second acquisition unit 1420, a determination unit 1430, and a processing unit 1440.

A first obtaining unit 1410, configured to obtain an image of the real environment collected by the image collecting apparatus.

A second obtaining unit 1420, configured to obtain product description information corresponding to a target object if the target object exists in the image.

A determining unit 1430, configured to generate virtual content according to the product description information and the visual presentation information, where the virtual content is used to represent the product description information.

The determination unit 1430 includes an acquisition sub-unit 1431, a model sub-unit 1432, and a content sub-unit 1433.

The obtaining subunit 1431 is configured to obtain model data corresponding to the target object.

The model subunit 1432 is configured to generate a virtual model corresponding to the target object based on the model data and the visual presentation information.

In particular, the model subunit 1432 is configured to obtain visual presentation information of the target object within the real environment, the visual presentation information including at least one of illumination information of the target object and pose information of the target object with respect to the image acquisition device; and generating a virtual model corresponding to the target object based on the model data and the visual presentation information.

Further, the visual presentation information comprises the pose information, and the model subunit 1432 is configured to generate a virtual model corresponding to the target object based on the model data and the pose information, the virtual model being the same as the pose of the target object.

Further, the virtual model is a virtual model of the target object, which matches a size of the target object within the image.

Further, the visual presentation information includes the pose information, and the model subunit 1432 is configured to acquire a plurality of sample data, where the sample data is image data obtained in advance according to model data corresponding to a plurality of different visual presentation information of the target object; training an analytical model based on a plurality of said sample data and each said corresponding visual presentation information; and analyzing the image comprising the target object based on the trained analysis model to obtain the visual presentation information of the target object in the real environment.

A content subunit 1433 is configured to generate the virtual content based on the product description information and the virtual model.

Specifically, the content subunit 1433 is configured to generate a dynamic display screen based on the product description information and the virtual model, where the dynamic display screen is the virtual content.

A processing unit 1440 for adding the virtual content at the position of the target object in the real environment.

In the several embodiments provided in the present application, the coupling between the modules may be electrical, mechanical or other type of coupling.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Referring to fig. 15, a block diagram of an electronic device according to an embodiment of the present application is shown. The electronic device 1500 may be a smart phone, a tablet computer, an electronic book, or other electronic devices capable of running an application. In particular, the electronic device may be the head display or the user terminal described above.

The electronic device 1500 in the present application may include one or more of the following components: a processor 1510, a memory 1520, and one or more applications, wherein the one or more applications may be stored in the memory 1520 and configured to be executed by the one or more processors 1510, the one or more programs configured to perform a method as described in the aforementioned method embodiments.

The processor 1510 may be implemented in the form of at least one hardware of a Digital Signal Processing (DSP), a Field-Programmable Gate Array (FPGA), a Programmable logic Array (Programmable logic Array, P L A), the processor 1510 may be implemented in the form of at least one of a Digital Signal Processing (DSP), a Field-Programmable Gate Array (FPGA), a Programmable logic Array (GPU), a modem, or the like, wherein the CPU primarily handles operating systems, user interfaces, application programs, etc., the GPU and the modem for displaying content, the modem for rendering and rendering content, the modem for rendering and rendering, and the modem for rendering content may be implemented separately, or the wireless communication may be implemented in a single chip.

The Memory 1520 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 1520 may be used to store an instruction, a program, code, a set of codes, or a set of instructions. The memory 1520 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by the terminal 100 in use, such as a phonebook, audio-video data, chat log data, and the like.

Referring to fig. 16, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer readable medium 1600 has stored therein program code that can be invoked by a processor to perform the methods described in the method embodiments above.

The computer-readable storage medium 1600 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer-readable storage medium 1600 includes a non-volatile computer-readable medium. The computer readable storage medium 1600 has storage space for program code 1610 for performing any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. Program code 1610 may be compressed, for example, in a suitable form.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method for displaying virtual content, comprising:

acquiring an image of a real environment acquired by an image acquisition device;

if a target object exists in the image, acquiring product description information corresponding to the target object;

acquiring visual presentation information of the target object in the real environment, wherein the visual presentation information comprises pose information of the target object relative to the image acquisition device;

generating virtual content according to the product description information and the visual presentation information, wherein the virtual content is used for representing the product description information;

adding the virtual content at the location of the target object in the real environment.

2. The method of claim 1, wherein generating virtual content from the product description information and the visual presentation information comprises:

obtaining model data corresponding to the target object;

generating a virtual model corresponding to the target object based on the model data and the visual presentation information;

generating the virtual content based on the product description information and the virtual model.

3. The method of claim 2, wherein generating the virtual content based on the product description information and the virtual model comprises:

and generating a dynamic display picture based on the product description information and the virtual model, wherein the dynamic display picture is used as the virtual content.

4. The method of claim 2, wherein the model data includes size information and pose information, the pose information including position and rotation information between a target object and the image capture device, and wherein generating a virtual model corresponding to the target object based on the model data and the visual presentation information comprises:

determining size information according to the position between the target object and the image acquisition device, and determining a placing pose according to the rotation information between the target object and the image acquisition device;

and acquiring the virtual model according to the determined size information and the determined placing position.

5. The method of claim 2, wherein generating the virtual content based on the product description information and the virtual model comprises:

and adding specified content on the virtual content, wherein the specified content is set according to the product description information, and a virtual model of the specified content is added as the virtual content.

6. The method of any of claims 2-5, wherein the virtual model is a virtual model of the target object, the virtual model matching a size of the target object within the image.

7. The method of claim 1, wherein said obtaining visual presentation information of said target object within said real environment comprises:

acquiring a plurality of sample data, wherein the sample data is image data obtained in advance according to model data corresponding to a plurality of different visual presentation information of the target object;

training an analytical model based on a plurality of said sample data and each of said visual presentation information;

and analyzing the image comprising the target object based on the trained analysis model to obtain the visual presentation information of the target object in the real environment.

8. A display device for virtual content, comprising:

the first acquisition unit is used for acquiring the image of the real environment acquired by the image acquisition device;

a second obtaining unit, configured to obtain, if a target object exists in the image, product description information corresponding to the target object and visual presentation information of the target object in the real environment, where the visual presentation information includes pose information of the target object with respect to the image acquisition device;

a determining unit, configured to generate virtual content according to the product description information and the visual presentation information, where the virtual content is used to represent the product description information;

a processing unit for adding the virtual content at the position of the target object in the real environment.

9. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-7.

10. A computer-readable medium having stored program code executable by a processor, the program code causing the processor to perform the method of any one of claims 1-7 when executed by the processor.