WO2021228200A1

WO2021228200A1 - Method for realizing interaction in three-dimensional space scene, apparatus and device

Info

Publication number: WO2021228200A1
Application number: PCT/CN2021/093628
Authority: WO
Inventors: 白杰; 姚锟; 贾松林; 郑深圳; 张蕾
Original assignee: 贝壳技术有限公司
Priority date: 2020-05-13
Filing date: 2021-05-13
Publication date: 2021-11-18

Abstract

A method for realizing interaction in a three-dimensional space scene comprises: responding to a user operation detected setting footprint information in a three-dimensional space scene, and determining a first pixel in a current view to which the user's current viewing angle in the three-dimensional space scene corresponds; determining a three-dimensional model to which the first pixel corresponds; determining the position of the user's footprint information in the three-dimensional model, wherein the footprint information is displayed when the three-dimensional space scene is being viewed; and setting the user's footprint information at the position.

Description

Method, device and equipment for realizing three-dimensional space scene interaction

Technical field

The present disclosure relates to virtual reality panoramic technology and streaming media technology, and in particular to a method for realizing three-dimensional space scene interaction, a device for realizing three-dimensional space scene interaction, a storage medium, and electronic equipment.

Background technique

VR (Virtual Reality) panoramic technology is an emerging rich media technology. Because VR panoramic technology can present users with three-dimensional space scenes without blind angles at 720 degrees, and bring users an immersive visual experience, VR panoramic technology is widely used in various fields such as online shopping malls, travel services, and real estate services. How to enable VR panoramic technology to bring users a richer experience is a technical issue worthy of attention.

Compared with two-dimensional images, three-dimensional models can give people a stronger visual perception. With the three-dimensional data of the object, any view of the object can be presented to the user, and the correct projection relationship can be maintained between the views.

In the prior art, while the user terminal is presenting a three-dimensional model, it can support real-time voice on-screen interaction between user terminals, that is, in the process of the user terminal presenting the three-dimensional model, the voice of the opposite user of the user terminal can be transmitted in real time To the user terminal, and the voice acquired by the user terminal can also be transmitted to the opposite terminal in real time.

However, the foregoing interaction methods in the prior art are relatively single, and voice interaction usually has limitations.

Summary of the invention

According to one aspect of the embodiments of the present disclosure, there is provided a method for realizing interaction in a three-dimensional space scene, including: in response to detecting a user operation of setting footprint information in the three-dimensional space scene, determining the user's current position in the three-dimensional space scene The first pixel in the current view corresponding to the perspective; determine the three-dimensional model corresponding to the first pixel; determine the position of the user's footprint information in the three-dimensional model, where the footprint information is used to display when the three-dimensional space scene is viewed ; And set the user's footprint information at the location.

According to another aspect of the embodiments of the present disclosure, there is provided an interaction method based on a three-dimensional model, including: at a first user terminal presenting a user interface: in response to detecting a user's target interaction operation on the user interface, The server that provides page data in the user interface sends an interaction request for the target interaction operation, where the user interface is used to present a three-dimensional model, and the three-dimensional model establishes an association relationship with the user account logged in the second user terminal in advance; Streaming video obtained by the terminal; and presenting the streaming video and three-dimensional model on the user interface.

According to another aspect of the embodiments of the present disclosure, there is provided an interaction method based on a three-dimensional model, including: at a second user terminal: in response to receiving an interaction request sent by a server, acquiring a streaming video, wherein the interaction request indicates The first user terminal detects the user's target interaction operation on the user interface presented by the first user terminal, the user interface is used to present a three-dimensional model, and the three-dimensional model establishes a pre-association relationship with the user account logged in by the second user terminal; and sends a stream to the server Media video, where the server is used to send the streaming video to the first user terminal, so that the first user terminal presents the streaming video and the three-dimensional model on the user interface.

According to another aspect of the embodiments of the present disclosure, there is provided a device for realizing interaction of a three-dimensional space scene, including: a device for executing the method described in any one of the above methods.

According to another aspect of the embodiments of the present disclosure, there is provided an interaction device based on a three-dimensional model, which is provided in a first user terminal, and the device includes: a device for executing the method described in any one of the above methods.

According to another aspect of the embodiments of the present disclosure, there is provided an interaction device based on a three-dimensional model, which is provided in a second user terminal, and the device includes: a device for executing the method described in any one of the above methods.

According to another aspect of the embodiments of the present disclosure, there is provided an interactive system based on a three-dimensional model, including: a first user terminal for presenting a user interface; a second user terminal; and a server. The second user terminal is in communication connection. The first user terminal is configured to: in response to detecting the user's target interaction operation on the user interface, send an interaction request for the target interaction operation to the server, and the user interface is used to present the three-dimensional model, the three-dimensional model and the second 2. The user account logged in by the user terminal establishes an association relationship in advance; the second user terminal is configured to: obtain the streaming video; and send the streaming video to the server; the server is configured to: send the streaming media to the first user terminal Video; and the first user terminal is configured to: present a streaming video and a three-dimensional model on the user interface.

According to another aspect of the embodiments of the present disclosure, a non-transitory computer-readable storage medium is provided, and the storage medium stores a computer program. method.

According to another aspect of the embodiments of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing processor-executable instructions. The processor-executable instructions implement any of the above methods when executed by the processor. The method described in one item.

According to another aspect of the embodiments of the present disclosure, there is provided a computer program product, including a computer program, which when executed by a computer causes the computer to implement the method described in any one of the above methods.

The technical solutions of the present disclosure will be further described in detail below through the accompanying drawings and embodiments.

Description of the drawings

The drawings constituting a part of the specification describe the embodiments of the present disclosure, and together with the description, serve to explain the principle of the present disclosure.

With reference to the accompanying drawings, the present disclosure can be understood more clearly according to the following detailed description, in which:

FIG. 1 is a schematic diagram of an embodiment of an applicable scenario of the present disclosure;

FIG. 2 is a flowchart of an embodiment of a method for realizing interaction of a three-dimensional space scene of the present disclosure;

3 is a flowchart of an embodiment of determining a three-dimensional model corresponding to a first pixel in the present disclosure;

4 is a flowchart of another embodiment of determining a three-dimensional model corresponding to a first pixel point of the present disclosure;

FIG. 5 is a flowchart of an embodiment of presenting footprint information for browsing users in the present disclosure;

FIG. 6 is a schematic structural diagram of an embodiment of an apparatus for realizing interaction in a three-dimensional space scene of the present disclosure;

FIG. 7 is a flowchart of an embodiment of the first interaction method based on a three-dimensional model of the present disclosure.

8A-8C are schematic diagrams of application scenarios for the embodiment of FIG. 7.

Fig. 9 is a flowchart of another embodiment of the first three-dimensional model-based interaction method of the present disclosure.

FIG. 10 is a flowchart of another embodiment of the first interaction method based on a three-dimensional model of the present disclosure.

FIG. 11 is a flowchart of an embodiment of the second three-dimensional model-based interaction method of the present disclosure.

FIG. 12 is a flowchart of another embodiment of the second three-dimensional model-based interaction method of the present disclosure.

FIG. 13 is a flowchart of an embodiment of the first interactive device based on a three-dimensional model of the present disclosure.

FIG. 14 is a flowchart of an embodiment of the second interactive device based on a three-dimensional model of the present disclosure.

FIG. 15 is a schematic diagram of interaction of an embodiment of the interactive system based on a three-dimensional model of the present disclosure.

Fig. 16 is a structural diagram of an electronic device provided by an exemplary embodiment of the present disclosure.

Detailed ways

Hereinafter, exemplary embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, rather than all the embodiments of the present disclosure, and it should be understood that the present disclosure is not limited by the exemplary embodiments described herein.

It should be noted that unless specifically stated otherwise, the relative arrangement of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure.

Those skilled in the art can understand that terms such as “first” and “second” in the embodiments of the present disclosure are only used to distinguish different steps, devices, or modules, etc., and do not represent any specific technical meaning, nor do they mean any difference between them. The necessary logical order.

It should also be understood that in the embodiments of the present disclosure, "plurality" may refer to two or more than two, and "at least one" may refer to one, two, or more than two.

It should also be understood that any component, data, or structure mentioned in the embodiments of the present disclosure can generally be understood as one or more unless it is clearly defined or given opposite enlightenment.

In addition, the term "and/or" in the present disclosure is only an association relationship that describes associated objects, indicating that there can be three relationships, such as A and/or B, which can mean: A alone exists, and A and B exist at the same time. There are three cases of B alone. In addition, the character "/" in the present disclosure generally indicates that the associated objects before and after are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similarities can be referred to each other, and for the sake of brevity, they will not be repeated one by one.

At the same time, it should be understood that, for ease of description, the sizes of the various parts shown in the drawings are not drawn according to actual proportional relationships.

The following description of at least one exemplary embodiment is actually only illustrative, and in no way serves as any limitation to the present disclosure and its application or use.

The technologies, methods, and equipment known to those of ordinary skill in the relevant fields may not be discussed in detail, but where appropriate, the technologies, methods, and equipment should be regarded as part of the specification.

It should be noted that similar reference numerals and letters indicate similar items in the following drawings, therefore, once an item is defined in one drawing, it does not need to be further discussed in the subsequent drawings.

The embodiments of the present disclosure can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with many other general-purpose or special-purpose computing system environments or configurations. Examples of well-known terminal devices, computing systems, environments and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, or servers include, but are not limited to: personal computer systems, server computer systems, thin clients, and thick clients Computers, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, large computer systems, and distributed cloud computing technology environments including any of the above systems, etc.

Electronic devices such as terminal devices, computer systems, and servers can be described in the general context of computer system executable instructions (such as program modules) executed by the computer system. Generally, program modules may include routines, programs, object programs, components, logic, data structures, etc., which perform specific tasks or implement specific abstract data types. The computer system/server can be implemented in a distributed cloud computing environment. In a distributed cloud computing environment, tasks can be performed by remote processing devices linked through a communication network. In a distributed cloud computing environment, program modules may be located on a storage medium of a local or remote computing system including a storage device.

In the process of realizing the present disclosure, the inventor found that in the process of a user experiencing a three-dimensional space scene by adjusting his current perspective, some feelings such as emotions and thoughts are often generated. If the user can set the footprint information used to characterize his feelings into the three-dimensional space scene, it will not only help improve the user’s sense of participation, but the footprint information left by the user can also provide other users watching the three-dimensional space scene. Come for a richer VR panoramic experience.

In the following, an example of an application scenario of the technology for realizing three-dimensional space scene interaction provided by the present disclosure will be described with reference to FIG. 1.

In the real estate field, VR panoramic technology can be used to set a three-dimensional space scene for a house to be rented or a house to be sold. Any user can access through the network and watch the three-dimensional space scene of the corresponding house anytime and anywhere. In the process of the user watching the three-dimensional space scene of the corresponding house, the present disclosure allows the user to leave his own footprint information for the house he is browsing, and the present disclosure can target the user’s own footprint information and other information about the house. The footprint information left by the user for the house is presented to the user together.

In a specific example, suppose that the user is browsing the three-dimensional space scene of the house, and the current view based on the user's current perspective is shown in FIG. 1.

The footprint information 120 left by other users for the three-dimensional space scene of the two-bedroom and one-living house includes: "I like this group of sofas, it's great", "This decorative partition is good", "This sofa is good, high-end class", "Combination and collocation are very diligent, praise and praise", "The design of the tea table is very unique ~ the longest copywriting is 20 characters" and the three-dimensional model 110 shown in the upper right corner of Figure 1. The user who browses the three-dimensional space scene of the house is presented with the footprint information 120 left by other users with respect to the three-dimensional space scene of the house, which helps the user understand other users’ feelings about the house, thereby helping to deepen the user’s experience of the house. Cognition, which in turn helps to improve the user’s browsing experience of the house.

In addition, in the process of watching the three-dimensional space scene of the house, the user can also express his own feelings about the house, that is, leave his own footprint information in the three-dimensional space scene. For example, the user can set footstep information such as "this pillar makes the house look more distinctive" at the position of the pillar shown in FIG. The footprint information set by the user can be instantly displayed in the three-dimensional space scene shown in Figure 1, that is, the user can see the footprint information left by himself during the process of viewing the three-dimensional space scene of the house, which is conducive to enhancing user participation feel.

Furthermore, all other footprint information set by the user for the house that does not belong to the view shown in FIG. 1 can be presented to the user in the form of a bullet screen 130, which is beneficial to improve the user's ability to browse three-dimensional space scenes in other locations of the house. interest.

The technology for realizing the interaction of three-dimensional space scenes provided by the present disclosure can also be applied to various other scenes. For example, when a user browses a three-dimensional space scene of a library, it can target a book or chair or coffee in the library. Set the corresponding footprint information. The footprint information set by the user for the book may be the user's impression of the book or the number of pages currently read by the user. Here, the scenarios where the technology for realizing the interaction of three-dimensional space scenes provided by the present disclosure can be applied will not be illustrated one by one.

FIG. 2 is a flowchart of an embodiment of a method for realizing interaction of a three-dimensional space scene of the present disclosure. The method 200 of the embodiment shown in FIG. 2 includes steps 210 to 240. Each step is described separately below.

In step 210, in response to detecting the user operation of setting the footprint information in the three-dimensional space scene, determine the first pixel in the current view corresponding to the user's current perspective in the three-dimensional space scene.

According to an exemplary embodiment of the present disclosure, a three-dimensional space scene may refer to a space scene with a three-dimensional sense that is presented to the user by using a preset panoramic image and a three-dimensional model. For example, the three-dimensional space scene may be a three-dimensional space scene set for a library, a three-dimensional space scene set for a house, a three-dimensional space scene set for a cafe, or a three-dimensional space scene set for a shopping mall.

According to an exemplary embodiment of the present disclosure, when the user triggers the function of setting the footprint information in the three-dimensional space scene, it can be detected that the user needs to set the footprint information in the three-dimensional space scene. For example, when the user clicks a button for setting footstep information or a corresponding option on a menu, the embodiment of the present disclosure can detect that the user needs to set footstep information in a three-dimensional space scene. For another example, the user can use a preset shortcut to trigger the function of setting footprint information in the three-dimensional space scene. In the embodiment of the present disclosure, the user's footprint information may be information that can indicate that the user has visited the three-dimensional space scene. The footprint information can be considered as the visit trace information of the user.

According to an exemplary embodiment of the present disclosure, the current perspective of the user in the three-dimensional space scene may refer to the position and angle at which the user currently views the three-dimensional space scene. The user's current perspective in the three-dimensional space scene usually changes with the user's operation. For example, the user can control his current perspective in the three-dimensional scene by performing operations such as dragging on the touch screen. The user's current perspective in the three-dimensional space scene determines the content/area of the panorama that the user can currently see, that is, the user's current perspective in the three-dimensional space scene determines the current view.

According to an exemplary embodiment of the present disclosure, the first pixel point is one pixel point in the current view. The first pixel point can be obtained according to a preset default rule. For example, the first pixel may be a specific pixel in the current view, or it may be any pixel in the current view.

In step 220, the three-dimensional model corresponding to the first pixel is determined.

According to an exemplary embodiment of the present disclosure, a three-dimensional space scene is generally formed by a plurality of three-dimensional models. In some embodiments, the three-dimensional space scene may also be formed by a three-dimensional model. A pixel point in the current view seen by the user may be a representation of a point in the three-dimensional model. A pixel in the current view that the user sees may not be a representation of any point in the three-dimensional model. That is to say, under normal circumstances, any point in any three-dimensional model in the three-dimensional space scene can be presented in the panorama, and the points in the panoramic image may not be all three-dimensional models in the three-dimensional scene. In the point. Of course, the present disclosure does not exclude the possibility that some points in the three-dimensional model in the three-dimensional space scene are not presented in the panoramic image.

In some exemplary embodiments, when the first pixel is used to present a point in the three-dimensional model, the three-dimensional model where the point is located is the three-dimensional model corresponding to the first pixel.

In some exemplary embodiments, when the first pixel is used to present a point in a non-three-dimensional model, other pixels in the current view may be used to update the first pixel. In some embodiments, the first pixel may not be updated. At this time, the three-dimensional model corresponding to the first pixel may be: the current view that is similar to the first pixel and is used to present the points in the three-dimensional model. The three-dimensional model corresponding to other pixels. That is to say, when the first pixel is used to present the points in the non-three-dimensional model, and the first pixel is not updated, the three-dimensional model corresponding to other pixels in the current view can be used as the three-dimensional model corresponding to the first pixel. Model.

In step 230, the position of the user's footprint information in the three-dimensional model is determined, where the footprint information is used to display when the three-dimensional space scene is browsed.

Since at least some of the pixels in the panoramic image have a mapping relationship with the points in the three-dimensional model, the positions of the first pixel or the other pixels in the three-dimensional model can be obtained. This location is the location of the user's footprint information.

According to an exemplary embodiment of the present disclosure, all three-dimensional models in a three-dimensional space scene may be respectively provided with their own three-dimensional coordinate systems, or may have the same three-dimensional coordinate system. The position of the user's footprint information in the three-dimensional model can be represented by (x, y, z). That is, the user's footprint information can be deep.

In step 240, the user's footprint information is set at the location.

According to an exemplary embodiment of the present disclosure, setting the user's footprint information at the location may include: setting a three-dimensional model identifier and three-dimensional coordinates for the user's footprint information, and storing the three-dimensional model identifier, three-dimensional coordinates, and user's footprint information. Correspondence.

According to an exemplary embodiment of the present disclosure, the user's footprint information may be used to display to browsing users (such as all browsing users or some browsing users) of the three-dimensional space scene. In the embodiment of the present disclosure, the browsing user of the three-dimensional space scene may include the user who sets the footprint information.

According to an exemplary embodiment of the present disclosure, by using the first pixel in the current view of the user who needs to set the footprint information, the three-dimensional model corresponding to the first pixel and the position of the footprint information in the three-dimensional model are obtained, so that the user can set The footprint information of can be associated with the corresponding position of the corresponding 3D model. In this way, when a panoramic view is used to present a three-dimensional model based on the user’s current perspective to form a three-dimensional space scene, the user’s footprint information can be presented at an appropriate position in the three-dimensional space scene, which is conducive to the user’s understanding of the three-dimensional space scene. The feelings of the specific parts are accurately presented in the corresponding position of the three-dimensional space scene. In this way, the interaction between the user and the three-dimensional space scene is realized. This is not only conducive to improving the user’s sense of participation and immersion, and increasing the length of time the user stays in the 3D space scene, but the footprint information left by the user can also bring more enrichment to at least one user who browses the 3D space scene. VR panoramic experience.

In an optional example, the footprint information includes: at least one of text, picture, audio, video, and a three-dimensional model. The text can be considered as a message in the form of characters (such as text, letters, numbers, or symbols, etc.). A picture can be considered as a message in the form of an image (such as a photo or emoticon, etc.). Audio can be thought of as a voice message (also called memo, etc.). Video can be thought of as a message in the form of an image. The three-dimensional model can be considered as a three-dimensional message. In the embodiments of the present disclosure, the user's footprint information may be referred to as the user's message. A piece of footprint information set by the user may include one or more of text, picture, audio, video, and three-dimensional model at the same time. By enabling the user's footprint information to include at least one of text, pictures, audio, video, and a three-dimensional model, it is beneficial to enrich the expression form of the user's footprint information, thereby helping to enrich the way for the user to interact with the three-dimensional space scene.

In an optional example, obtaining the first pixel point in the current view corresponding to the current perspective of the user in the three-dimensional space scene may be: obtaining the central pixel point of the current view corresponding to the current perspective of the user in the three-dimensional space scene , And use the central pixel as the first pixel. For example, suppose that the user’s current perspective in the 3D space scene triggers the function of setting footprint information in the 3D space scene by clicking a button or an option on the menu. In this case, you can directly use the center pixel of the current view as the first One pixel. The center pixel can be considered as the default pixel set for the user's footprint information, and the user can change the default pixel by dragging and other methods. In an example, the central pixel can be considered as a pixel in the central area of the current view. The central area of the current view may include one pixel or multiple pixels. By directly using the central pixel of the current view as the first pixel, it is not only beneficial to quickly obtain the first pixel, but also beneficial to make the footprint information set by the user be located in a more prominent position in the current view.

In an optional example, obtaining the first pixel in the current view corresponding to the current perspective of the user in the three-dimensional space scene may be: setting the footprint information in the current view corresponding to the current perspective of the user in the three-dimensional space scene The operation of the target position obtains the pixel point in the current view corresponding to the target position of the footprint information, and the pixel point is regarded as the first pixel point. That is, when the user performs the operation of setting the target position of the footprint information, the pixel point where the target position of the footprint information formed by the operation in the current view is located may be used as the first pixel point.

Optionally, the operation of setting the target position of the footprint information may be an operation used to determine the starting target position of the footprint information, an operation used to determine the ending target position of the footprint information, or an operation used to determine the footprint information. Operation of the center target position.

Optionally, the operation of setting the target location of the footprint information may specifically be a click operation or a scroll operation or drag operation based on a tool such as a mouse or a keyboard, and may also be a click operation or a drag operation based on a touch screen. The present disclosure does not limit the specific operation of setting the target position of the footprint information.

By determining the first pixel point according to the user's operation of setting the target position of the footprint information, it is beneficial to make the footprint information set by the user be located at the desired position of the user, thereby improving the flexibility of setting the footprint information and making the footprint information more stable. The location is more appropriate.

Optionally, suppose that while the user is viewing the current view based on his current perspective in the 3D space scene, the user triggers the function of setting footprint information in the 3D space scene by clicking a button or an option on the menu. At the time, the user can use the left mouse button to click, the keyboard up, down, left, and right keys to move the cursor or click the corresponding position on the touch screen to set the position of the desired footprint information in the current view. The pixel at this position can be used as the first pixel.

Optionally, suppose that while the user is viewing the current view based on his current perspective in the 3D space scene, the user triggers the function of setting footprint information in the 3D space scene by clicking a button or an option on the menu. When, you can first use the center pixel of the current view as the first pixel. If the user does not modify the first pixel, then the center pixel is regarded as the final first pixel. If the user uses the left mouse button drag operation, the keyboard up, down, left, and right buttons to move the cursor, or the finger drag operation on the touch screen to change the first pixel, the specific position obtained by the operation result will be changed. The pixel point is regarded as the first pixel point.

In an optional example, the implementation of determining the three-dimensional model corresponding to the first pixel (step 220) may be as shown in FIG. 3. As shown in FIG. 3, step 220 further includes steps 310 to 340.

In step 310, the central pixel point of the front view is determined as the first pixel point.

Optionally, the central pixel may be considered as the default pixel set for the user's footprint information. In an example, assuming that the current view is an image of (2n+1)×(2m+1) (where n and m are both integers greater than 1), then the pixels in the current view (n+1 , M+1) as the central pixel. In another example, assuming that the current view is an image of 2n×2m (where n and m are both integers greater than 1), the pixel point (n, m) and pixel point (n+1) in the current view can be , M), pixel (n, m+1), and pixel (n+1, m+1) are used as the central area of the current view, so that any pixel in the central area can be used as the central pixel.

In step 320, it is determined whether a three-dimensional model is set for the first pixel. If a three-dimensional model is set for the first pixel point, go to step 330. If the three-dimensional model is not set for the first pixel point, go to step 340.

In the example in Figure 3, since not all pixels in the current view are the representation of corresponding points in the three-dimensional model, it is necessary to determine whether a three-dimensional model is set for the first pixel, that is, it is necessary to determine whether the first pixel is It is the pixel point used to present the corresponding point in the three-dimensional model, so that the user's footprint information can be set at the corresponding position in the three-dimensional model.

In step 330, in response to the determination that the three-dimensional model is set for the first pixel, the three-dimensional model set for the first pixel is used as the three-dimensional model corresponding to the first pixel.

In step 340, in response to the determination that the three-dimensional model is not set for the first pixel, the three-dimensional model set for other pixels in the current view is used as the three-dimensional model corresponding to the first pixel.

Optionally, other pixels in the current view are pixels in the current view where the three-dimensional model is set. The pixels where the three-dimensional model is set can be found according to preset rules. In an example, the other pixels found may be the pixels closest to the first pixel in a certain direction (such as the left direction, the right direction, the upper direction, or the lower direction).

Optionally, the first pixel point can be used as a starting point, and according to a preset inspection rule, the pixel point in the current view corresponding to the current perspective in the three-dimensional space scene can be checked. If it is determined that the pixel point with the three-dimensional model is checked , The three-dimensional model corresponding to the first pixel is obtained, and the inspection process is stopped. For example, you can use the first pixel as a starting point to check the pixels in the current view to the left, and determine whether a three-dimensional model is set for the currently checked pixel. If the result of the judgment is that a three-dimensional model is set for the currently inspected pixel, the inspection process is stopped, and the three-dimensional model obtained by the current inspection is used as the three-dimensional model corresponding to the first pixel. In addition, the first pixel point may be updated by using the detected pixel point provided with the three-dimensional model. Of course, the first pixel may not be updated.

By judging whether the first pixel is provided with a three-dimensional model, and performing different operations according to the judgment result, it is helpful to avoid the inability to set the user’s footprint information in the three-dimensional when the first pixel is not provided with a three-dimensional model. The phenomenon at the corresponding position in the model. Further, by using preset inspection rules to obtain other pixels that are provided with a three-dimensional model, and use the three-dimensional model set for the other pixels as the three-dimensional model corresponding to the first pixel, it is beneficial to quickly obtain the corresponding three-dimensional model of the first pixel. Three-dimensional model.

In an optional example, the implementation of determining the three-dimensional model corresponding to the first pixel (step 220) may be as shown in FIG. 4. As shown in FIG. 4, step 220 may include step 410 to step 450.

In step 410, in response to the user's operation of setting the target position of the footprint information in the current view, a pixel point in the current view corresponding to the target position of the footprint information is determined as the first pixel point.

Optionally, the user may be allowed to set the specific location of the footprint information (that is, the target location of the footprint information) in the current view. For example, after the user triggers the function of setting the footprint information in the three-dimensional space scene, the user can set the target position of the footprint information in the current view by tapping, sliding, dragging and other operations on the touch screen. The target position of the footprint information may be the upper left vertex, the lower left vertex, the upper right vertex, or the lower right vertex of the text box. The target location of the footprint information may be the upper left vertex, the lower left vertex, the upper right vertex, or the lower right vertex of the picture. According to an exemplary embodiment of the present disclosure, the target location of the footprint information may be a pixel point in the current view, and the pixel point is the first pixel point.

In step 420, it is determined whether a three-dimensional model is set for the first pixel. If a three-dimensional model is set for the first pixel point, go to step 430. If a three-dimensional model is not set for the first pixel point, go to step 440.

In step 430, in response to the determination that the three-dimensional model is set for the first pixel, the three-dimensional model set for the first pixel is used as the three-dimensional model corresponding to the first pixel.

In step 440, in response to the determination that the three-dimensional model is not set for the first pixel point, prompt information for updating the target position of the footprint information is output.

Optionally, the prompt information is used to prompt the user to update the target location of the footprint information currently set. That is, the prompt information is used to prompt the user that the current target location of the footprint information cannot be set to the footprint information, and the user should reset the target location of the footprint information. The prompt information can be output in the form of text, audio, or graphics. After outputting the prompt information, wait for the user's subsequent operations. If the user triggers the function of canceling the setting of the footprint information at this time, the process shown in Figure 4 ends.

In step 450, in response to the determination that the pixel in the current view corresponding to the target position of the updated footprint information is set with a three-dimensional model, the pixel set with the three-dimensional model is taken as the first pixel. Then, the flow returns to step 420.

If the user currently performs the operation of updating the target position of the footprint information, the target position of the footprint information is obtained again. The target location of the footprint information obtained again may also be a pixel in the current view, and this pixel is the first pixel. That is, the first pixel point obtained last time is updated by the similarity of the target position of the currently obtained footprint information.

By judging whether the target location of the footprint information set by the user is provided with a three-dimensional model, and performing different operations according to the judgment result, it is helpful to avoid the inability to remove the user's footprint when the target location of the footprint information is not provided with a three-dimensional model. The phenomenon that information is set at the corresponding position in the three-dimensional model. By using the cyclic process from step 420 to step 450, it is helpful to prompt the user to finally set his footprint information at the corresponding position of the three-dimensional model, thereby helping to make the position of the footprint information more appropriate.

In an optional example, when the first pixel is provided with a three-dimensional model, since the first pixel in the current view has a mapping relationship with the point in the three-dimensional model, the first pixel can be obtained based on the mapping relationship. The point corresponding to the point in the three-dimensional model, the position of the point is the position of the first pixel point in the three-dimensional model. The position of the first pixel in the three-dimensional model can be directly used as the position of the user's footprint information in the three-dimensional model, which is beneficial to quickly and accurately obtain the position of the user's footprint information in the three-dimensional model.

In an optional example, in the process of viewing the three-dimensional space scene by the browsing user, the browsing user may be presented with at least one user's footprint information left in the three-dimensional space scene. An example is shown in Figure 5.

In FIG. 5, in step 510, for any browsing user browsing the three-dimensional space scene, the footprint area corresponding to the current perspective of the browsing user in the three-dimensional space scene is determined.

Optionally, the browsing user includes a user who sets his footprint information in the three-dimensional space scene. The footprint area can be considered as an area set for the footprint information that needs to be displayed. The footprint area can be a footprint area based on the current view, or a footprint area based on a three-dimensional model. The size of the footprint area can be preset. The shape of the footprint area can be rectangle, circle, triangle, etc.

When the footprint area is a footprint area based on the current view, according to some exemplary embodiments of the present disclosure, an implementation manner of determining the footprint area may be: First, obtain the current view corresponding to the current viewing angle of the browsing user in the three-dimensional space scene. The center pixel of the view, and then the center pixel is the center of the circle, and the predetermined length (such as 1.5 meters in a three-dimensional scene, etc., and 1.5 meters can be converted to the length in the current view) is used as the radius to determine the current view Footprint area. Since at least some of the pixels in the footprint area in the current view have a mapping relationship with points in the three-dimensional model, the footprint information that currently needs to be displayed can be easily obtained by using the footprint area in the current view. In addition, the footprint area in the current view can be regarded as a circle, that is, the footprint area in the current view does not have depth information.

When the footprint area is a footprint area based on a three-dimensional model, according to some exemplary embodiments of the present disclosure, an implementation manner for determining the footprint area may be: First, obtain the current view corresponding to the current perspective of the browsing user in the three-dimensional space scene. View the center pixel of the view, and determine whether the center pixel is set with a three-dimensional model. If the center pixel is set with a three-dimensional model, determine the position of the center pixel in the three-dimensional model, and then use that position as the center of the circle, Use a predetermined length (such as 1.5 meters in a three-dimensional space scene, etc.) as a radius to determine the footprint area in the three-dimensional model. The footprint area may be completely in one 3D model, or it may span multiple 3D models. In addition, the footprint area in the three-dimensional model can be considered as a cylinder, that is, the footprint area in the three-dimensional model has depth information.

In step 520, the footprint information belonging to the footprint area in the three-dimensional model is determined.

When the footprint area is a footprint area based on the current view, the embodiment of the present disclosure can check whether each pixel point in the footprint area has a mapping relationship with a point in the three-dimensional model. If there is a mapping relationship, then it is determined whether the points in the three-dimensional model that have a mapping relationship with the pixel points are provided with footprint information. If the footprint information is set, the footprint information can be regarded as the footprint information belonging to the footprint area.

When the footprint area is a footprint area based on a three-dimensional model, the embodiments of the present disclosure can check whether each point in the footprint area is provided with footprint information. If the footprint information is set, the footprint information can be regarded as the footprint information belonging to the footprint area.

In step 530, in the current view corresponding to the current perspective of the browsing user in the three-dimensional space scene, the footprint information belonging to the footprint area is displayed.

Optionally, the location of each footprint information belonging to the footprint area in the current view can be determined according to the location of each footprint information, so that each footprint information can be displayed according to the location of each footprint information in the current view. In the process of displaying footprint information, it is possible to avoid overlapping display of different footprint information in the current view.

Optionally, the obtained multiple footprint information may have different positions, or may have the same position (that is, the position of the footprint information conflicts). In response to determining that the footprint information belonging to the footprint area has multiple footprint information with different positions, each footprint information may be displayed in the current view directly according to the image positions of the multiple footprint information in the current view. Moreover, the displayed footprint information can be allowed to partially overlap, and the location control can also be used to make the footprint information not overlap each other. In response to determining that the footprint information belonging to the footprint area includes different footprint information at the same location, different image positions may be assigned to different footprint information in the current view, and the image positions described above may be displayed in the current view according to the assigned image positions. Having different footprint information at the same position helps to avoid overlapping display of different footprint information in the current view.

Optionally, all the footprint information belonging to the footprint area can be displayed, or part of the footprint information belonging to the footprint area can be displayed. For example, when the amount of all the footprint information belonging to the footprint area is too large (for example, the number exceeds a predetermined number), part of the footprint information can be selected from it according to a predetermined rule, and the selected part of the footprint information can be displayed in the current view.

Optionally, a predetermined number of footprint information can be randomly selected from all the footprint information belonging to the footprint area, and part of the randomly selected footprint information can be displayed in the current view.

Optionally, it is possible to preferentially select and browse the footprint information set by the user from all the footprint information belonging to the footprint area, or to preferentially select good-quality footprint information, etc., and display part of the selected footprint information in the current view.

In an optional example, the form of a bullet screen may be used to display the footprint information outside the current view for the browsing user. For example, you can first determine all the footprint information in the three-dimensional model that does not belong to the current view, and display all the above-mentioned footprint information in the current view corresponding to the current perspective of the browsing user in the three-dimensional scene in the form of a bullet screen. Part of the footprint information.

In an optional example, the form of a bullet screen may be used to display the footprint information outside the footprint area for the browsing user. For example, you can first determine all the footprint information in the three-dimensional model that does not belong to the footprint area, and display all the above-mentioned footprint information in the current view corresponding to the current perspective of the browsing user in the three-dimensional scene in the form of a bullet screen. Part of the footprint information.

By adopting the form of barrage to display the footprint information that does not belong to the footprint area/current view, it is not only conducive to prompting the browsing user to explore other parts of the three-dimensional space scene, improving the immersion of the browsing user, but also conducive to further improving the VR of the browsing user Panoramic experience.

FIG. 6 is a schematic structural diagram of an embodiment of an apparatus for realizing interaction in a three-dimensional space scene of the present disclosure. The device of this embodiment can be used to implement the foregoing method embodiments of the present disclosure.

As shown in FIG. 6, the device of this embodiment includes: a pixel point acquiring module 600, a three-dimensional model determining module 601, a position determining module 602, and a footprint information setting module 603. In addition, the device may further include: a footprint area determination module 604, a footprint information determination module 605, a footprint information display module 606, and a bullet screen display module 607.

The pixel obtaining module 600 is configured to determine the first pixel in the current view corresponding to the current perspective of the user in the three-dimensional space scene in response to detecting the user operation of setting the footprint information in the three-dimensional space scene.

Optionally, the footprint information may include: at least one of text, picture, audio, video, and a three-dimensional model.

Optionally, the pixel point acquiring module 600 may include: a first sub-module 6001. The first sub-module 6001 is used to determine the center pixel of the current view as the first pixel.

Optionally, the pixel point obtaining module 600 may include: a fifth sub-module 6002. The fifth sub-module 6002 is configured to determine the pixel points in the current view corresponding to the target position of the footprint information in response to the user's operation of setting the target position of the footprint information in the current view corresponding to the current perspective in the three-dimensional space scene. The fifth sub-module 6002 can use the pixel as the first pixel.

The three-dimensional model determining module 601 is used to determine the three-dimensional model corresponding to the first pixel obtained by the pixel obtaining module 600.

Optionally, in a case where the pixel point obtaining module 600 includes the first sub-module 6001, the determining three-dimensional model module 601 may include: the second sub-module 6011, the third sub-module 6012, and the fourth sub-module 6013. The second sub-module 6011 is used to determine whether a three-dimensional model is set for the first pixel. The third sub-module 6012 is configured to, if the determination result of the second sub-module 6011 is that a three-dimensional model is set for the first pixel, use the three-dimensional model set for the first pixel as the three-dimensional model corresponding to the first pixel. The fourth sub-module 6013 is configured to, if the judgment result of the second sub-module 6011 is that no three-dimensional model is set for the first pixel, use the three-dimensional model set for other pixels in the current view as the three-dimensional model corresponding to the first pixel. Model. For example, if the judgment result of the second sub-module 6011 is that a three-dimensional model is not set for the first pixel, the fourth sub-module 6013 can take the first pixel as a starting point and perform a check on the three-dimensional scene according to the preset inspection rules. Check other pixels in the current view corresponding to the current angle of view. If a pixel with a three-dimensional model is detected, the first pixel is updated to a pixel with a three-dimensional model, the three-dimensional model corresponding to the first pixel is obtained, and this inspection is stopped.

In the case where the pixel point obtaining module 600 includes a fifth sub-module 6002, the determining three-dimensional model module 601 may include: a sixth sub-module 6014, a seventh sub-module 6015, and an eighth sub-module 6016. The sixth sub-module 6014 is used to determine whether a three-dimensional model is set for the first pixel. If the determination result of the sixth sub-module 6014 is that a three-dimensional model is set for the first pixel, the seventh sub-module 6015 uses the three-dimensional model set for the first pixel as the three-dimensional model corresponding to the first pixel. If the determination result of the sixth sub-module 6014 is that no three-dimensional model is set for the first pixel, the eighth sub-module 6016 may output prompt information for updating the target position of the footprint information, and the sixth sub-module 6014 determines When the pixel in the current view corresponding to the target location of the footprint information is set with a three-dimensional model, the pixel with the three-dimensional model is used as the first pixel. The eighth sub-module 6016 obtains the three-dimensional model corresponding to the first pixel.

The position determining module 602 is used to determine the position of the user's footprint information in the three-dimensional model determined by the three-dimensional model determining module 601. For example, the position determining module 602 may obtain the position of the first pixel in the three-dimensional model, and the position determining module 602 may use the position of the first pixel in the three-dimensional model as the position of the user's footprint information in the three-dimensional model.

The setting footprint information module 603 is used for setting the user's footprint information at the location determined by the location determining module 602. The user's footprint information set by the setting footprint information module 603 is used to be displayed to users who browse the three-dimensional space scene.

The footprint area determination module 604 is used for determining the footprint area corresponding to the current perspective of the browsing user in the three-dimensional space scene for any browsing user who browses the three-dimensional space scene. For example, the module 604 for determining the footprint area may first determine the center pixel of the current view corresponding to the current perspective of the browsing user in the three-dimensional space scene, and then the module 604 for determining the footprint area 604 takes the center pixel as the center of the circle and the predetermined length as The radius determines the footprint area in the current view.

The footprint information determining module 605 is used to determine the footprint information belonging to the footprint area determined by the footprint area determining module 604 in the three-dimensional model.

The footprint information display module 606 is configured to display the footprint information that belongs to the footprint area determined by the determination footprint information module 605 in the current view corresponding to the current perspective of the browsing user in the three-dimensional space scene.

Optionally, in response to determining that the footprint information belonging to the footprint area has a plurality of footprint information with different positions, the display footprint information module 606 may respectively locate the multiple footprint information in the current view according to the image positions of the multiple footprint information in the current view. Display the multiple footprint information.

Optionally, in response to determining that the footprint information belonging to the footprint area has different footprint information with the same position, the footprint information display module 606 may assign different image positions for different footprint information in the current view, and according to the assigned image Position, display different footstep information in the current view.

The barrage display module 607 is used to determine at least one piece of footprint information in the three-dimensional model that does not belong to the footprint area/current view. The barrage display module 607 displays the at least one footprint information in the current view corresponding to the current perspective of the browsing user in the three-dimensional space scene in the form of a barrage.

For the specific operations performed by the foregoing modules and the sub-modules included therein, reference may be made to the description of FIGS. 2 to 5 in the foregoing method embodiment, and detailed descriptions are omitted here.

Please refer to FIG. 7, which shows a process 700 of an embodiment of the first three-dimensional model-based interaction method according to the present disclosure. The three-dimensional model-based interaction method is applied to a first user terminal, and the first user terminal is presented with a user interface, and the three-dimensional model-based interaction method includes:

Step 710: In response to detecting the user's target interaction operation on the user interface, send an interaction request for the target interaction operation to the server that provides page data for the user interface, where the user interface is used to present the three-dimensional model, the three-dimensional model and the second user The user account logged in by the terminal establishes an association relationship.

In this embodiment, the user can use the first user terminal to interact with the server through the network. The first user terminal may be various electronic devices, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and so on. The first user terminal may be installed with various client applications, such as real estate transaction software. The aforementioned user interface may be a page in an application installed by the first user terminal. In practice, the user can interact with the server through the user interface, thereby realizing interaction with other user terminals (for example, the second user terminal).

In this embodiment, in the case of detecting a user's target interaction operation on the user interface, the first user terminal may send an interaction request for the target interaction operation to a server that provides page data for the user interface.

The aforementioned user interface is used to present a three-dimensional model. The three-dimensional model establishes an association relationship with the user account logged in by the second user terminal in advance. The aforementioned target interaction operation may be various operations for instructing the first user terminal to request interaction (information interaction) with the second user terminal. As an example, the target interaction operation may indicate video communication with the second user terminal. The foregoing interaction request may be used to indicate a user request of the first user terminal to interact with the second user terminal. Exemplarily, the foregoing interaction request may be used to instruct the user of the first user terminal to request video communication with the second user terminal.

Here, when performing step 710, the user interface of the first user terminal may present the above-mentioned three-dimensional model, or may not present the three-dimensional model.

In practice, for each three-dimensional model, it can be associated with a user account in advance. Therefore, for a specific three-dimensional model, the user account that is associated with the three-dimensional model can be determined to determine the user terminal logging in to the user account, and then the user terminal used to interact with the first user terminal is determined (Ie the second user terminal).

The above-mentioned three-dimensional model may be a three-dimensional model of any object. Exemplarily, the three-dimensional model may be a three-dimensional model inside a cell, or a three-dimensional model of a house interior.

Step 720: Receive the streaming video obtained by the server from the second user terminal.

In this embodiment, the above-mentioned first user terminal may receive the streaming video obtained by the server from the second user terminal.

The aforementioned interaction confirmation information may be used to instruct the user of the second user terminal to confirm (agree) to perform the interaction indicated by the aforementioned interaction request with the first user terminal. For example, the foregoing interactive confirmation information may be used to instruct the user of the second user terminal to confirm (agree) to conduct video communication with the first user terminal.

The aforementioned streaming video may include images and/or voice. In practice, the image acquisition device and/or the voice acquisition device of the second user terminal can be used to acquire the aforementioned streaming video.

In practice, the server may use streaming media technology to continuously send the images and/or voice (ie streaming media video) collected by the second user terminal to the first user terminal. Among them, streaming media technology refers to a media format that uses streaming technology to be continuously played in real time on the network. Streaming media technology is also called streaming media technology. Here, the second user terminal may send the continuous image and sound information collected by it to the server after compression processing. The server transmits each compressed package to the first user terminal sequentially or in real time, so that users who use the first user terminal can watch and listen while downloading.

Optionally, the server may send the streaming video collected by the second user terminal to the first user terminal, and may also perform image processing (such as beauty) and voice processing (such as denoising) on the streaming video collected by the second user terminal. ), after operations such as transcoding, recording, and pornography, send the processed streaming video to the first user terminal.

In some optional implementation manners of this embodiment, when the server receives the interaction confirmation information sent by the second user terminal in response to the interaction request, the first user terminal may perform step 720 again.

It can be understood that, in the foregoing optional implementation manner, in the case of the interaction confirmation information sent by the second user terminal in response to the interaction request, the first user terminal can present the streaming video through subsequent steps; however, the second user terminal does not send the above In the case of interactive confirmation information, the first user terminal does not present the streaming video. Therefore, the streaming video and the three-dimensional model can be presented on the user interface of the first user terminal only after obtaining the permission of the user of the second user terminal (for example, connecting to the video call initiated by the first user terminal). This helps to improve the privacy protection of the user of the second user terminal, and provides preparation time for the user of the second user terminal to present the streaming media video to the user of the first user terminal.

In some optional implementation manners of this embodiment, after the server receives the interaction request, the first user terminal may also directly execute the foregoing step 720 (without the interaction confirmation information sent by the second user terminal in response to the interaction request).

It can be understood that, in the foregoing optional implementation manner, the user of the second user terminal may be in a state of shooting a streaming video (for example, a live broadcast) to users of other user terminals. Thus, after the server receives the interaction request, the first user terminal can receive the streaming video obtained by the server from the second user terminal at any time, thereby improving the real-time performance of the streaming video presentation.

In some optional implementation manners of this embodiment, the first user terminal may adopt the following steps to receive the streaming video obtained by the server from the second user terminal:

First, the current network speed value of the first user terminal is sent to the server.

Then, the streaming media video obtained and sent by the server from the second user terminal is received, and the streaming media video has a resolution matching the current network speed value.

Here, the resolution can be positively correlated with the network speed value.

It can be understood that the resolution of the streaming video received by the first user terminal can be reduced when the network is poor by receiving the streaming video sent by the server whose resolution matches the current network speed value and obtained by the second user terminal. Rate to improve the real-time performance of streaming video transmission.

Step 730: Present the streaming video and the three-dimensional model on the user interface.

In this embodiment, the first user terminal may present the streaming video and the three-dimensional model on the same screen on the user interface.

Here, the above-mentioned user interface of the first user terminal may be divided into two parts, and the above-mentioned two parts may respectively present a streaming video and a three-dimensional model. Optionally, the three-dimensional model can also be used as the background of the aforementioned user interface, and the streaming video is presented in a part of the page area of the user interface.

Please refer to FIGS. 8A-8C. FIGS. 8A-8C are schematic diagrams of application scenarios for the embodiment of FIG. 7. As shown in FIG. 8A, in the case where the first user terminal detects a user's target interaction operation 810 for the user interface (the target interaction operation 810 in the figure indicates to enable real-time video interaction), the first user terminal may provide the user interface with The server of the page data sends an interaction request for the target interaction operation 810. Among them, in FIG. 8A, the user interface presents a three-dimensional model of the house of XX home. The three-dimensional model has a pre-established association relationship with the user account logged in by the second user terminal. In FIG. 8B, after the second user terminal receives the aforementioned interaction request, the user of the second user terminal performs an operation 820 of starting interaction. After that, the second user terminal sends the interaction confirmation information for the interaction request and the streaming video collected by the second user terminal to the server. Finally, as shown in FIG. 8C, the first user terminal presents a streaming video 830 and a three-dimensional model on the user interface.

The interaction method based on the three-dimensional model provided by the above-mentioned embodiments of the present disclosure can send an interaction request for the target interaction operation to a server that provides page data for the user interface when the user's target interaction operation for the user interface is detected. The user interface is used for presenting a three-dimensional model, and the three-dimensional model establishes an association relationship with the user account logged in by the second user terminal in advance. After that, the streaming video obtained by the server from the second user terminal is received. Finally, the streaming video and 3D model are presented on the user interface. By presenting the streaming media video and the 3D model on the same page of the terminal device, it is helpful to use the streaming media video to present information related to the 3D model to the user, and increase the diversity of interaction methods. Through multi-dimensional information interaction, users can browse the three-dimensional model more calmly, improve the user's browsing time, and help meet the users' more diversified interactive needs.

In some optional implementation manners of this embodiment, the first user terminal may also perform the following steps:

First, the model adjustment information sent by the server is received, where the model adjustment information indicates an adjustment operation of the user who uses the second user terminal on the three-dimensional model presented on the second user terminal. The adjustment operation includes at least one of the following: zoom, rotate, move, and switch viewpoints.

Here, under normal circumstances, the user can perform at least one operation of zooming, rotating, moving, and switching viewpoints on the three-dimensional model.

Then, according to the adjustment operation indicated by the model adjustment information, the same adjustment operation is performed on the three-dimensional model presented on the user interface.

It can be understood that, in the foregoing optional implementation manner, the operations performed by the user of the second user terminal on the three-dimensional model can be synchronized to the first user terminal. Therefore, when the streaming video collected by the second user terminal is related to the three-dimensional model (for example, the user of the second user terminal explains or introduces the three-dimensional model), it is convenient for the user of the first user terminal to refer to the second user The same three-dimensional model presented by the terminal acquires the information in the streaming video, thereby improving the pertinence of information acquisition.

First, get user feedback on streaming video. The feedback information may include but is not limited to at least one of the following: likes, ratings, comments, and so on. The feedback information may be used to characterize the evaluation of the user of the first user terminal on the streaming video of the user of the second user terminal.

Then, the feedback information is sent to the server, where the server is used to establish an association relationship between the feedback information and the user account. For example, an associative storage method can be used to establish an association relationship between the feedback information and the user account.

It can be understood that establishing an association relationship between the feedback information and the user account can reflect the user's satisfaction with the object indicated by the three-dimensional model and the user of the second user terminal by the user of the first user terminal, and thus can be more targeted for the first user The terminal pushes information.

With further reference to FIG. 9, FIG. 9 is a flow 900 of another embodiment of the first three-dimensional model-based interaction method of the present disclosure. The three-dimensional model-based interaction method is applied to a first user terminal, and the first user terminal is presented with a user interface, and the method includes:

Step 910: In response to detecting the user's target interaction operation on the user interface, send an interaction request for the target interaction operation to a server that provides page data for the user interface.

Step 920: Receive the streaming video obtained by the server from the second user terminal.

Step 930: Present the streaming video and the three-dimensional model on the user interface.

In this embodiment, step 910 to step 930 are basically the same as step 710 to step 730 in the embodiment corresponding to FIG. 7, and will not be repeated here.

Step 940: In response to the current network speed value of the first user terminal being less than or equal to the preset network speed threshold, adjust the target user image based on each frame of voice in the streaming video to generate a new video different from the streaming video.

In this embodiment, when the current network speed value of the first user terminal is less than or equal to the preset network speed threshold, the first user terminal may adjust the target user image based on each frame of voice in the streaming video, Generate a new video. The new video characterizes the actions of the user indicated by the target user's image to perform each frame of voice instructions. The user indicated by the target user image may be a user using the second user terminal. Optionally, the new video may be a streaming video that is sent in segments and instantly transmitted based on the network, or it may be a video that is generated locally without being based on the network.

Specifically, the first user terminal may generate a new video in the following manner: For each frame of voice in the streaming video, input the frame of voice into a predetermined image frame generation model to obtain a target user that matches the frame of voice The image of the user indicated by the image. Thereby, the obtained frames of images that match each frame of voice in the streaming video and the frames of voice are merged to obtain a new video. The user's action in the user's image indicated by the target user image that matches the voice matches the voice. For example, if the voice is "ah" audio, and the audio indicates that the user is in a startled state, then the mouth shape of the user in the user's image indicated by the target user image matching the audio may be the voice "ah" Lip shape, the action can be an action in a state of fright.

Here, the aforementioned image frame generation model may be a recurrent neural network model or a convolutional neural network model obtained by training using a machine learning algorithm based on training samples including voice frames, target user images, and image frames matching the voice frames. An image frame generation model can be trained for each user, and the target user image in each training sample used to train the user’s image frame generation model can be the same. For each voice frame of the user, it is determined that the voice frame corresponds to the voice frame. Matched image frames, and then obtain a training sample set used to train the image frame generation model of the user.

Optionally, the image frame generation model may also be a two-dimensional table or database that stores the voice frame, the target user image, and the image frame matching the voice frame in association with each other. In the case that the image frame generation model is still associated with a database storing voice frames, target user images and image frames matching the voice frames, each record of the database may include voice frames, target user images and matching voice frames Image frame. The target user image in each record can be the same. For each voice frame of the user, an image frame that matches the voice frame is determined, and then the voice frame, the target user image, and the image that matches the voice frame are associated and stored. Frame database, that is, image frame generation model.

In some optional implementation manners of this embodiment, the first user terminal may also determine the target user image by any of the following methods:

(1) Based on the image in the streaming video, the target user image is generated.

Here, you can randomly select an image from each frame of the streaming video as the target user image, or you can select a face image area and the area of the entire image frame from each frame of the streaming video The image whose ratio is greater than the preset threshold is regarded as the target user image.

(2) Determine the user image associated with the user account as the target user image.

Here, the user can upload an image through the user account he uses as the target user image; or after logging in the account he uses, select an image from a predetermined image set as the target user image.

It can be understood that the above-mentioned optional implementation methods can automatically generate a target user image from the images in the streaming video, or the user manually sets the target user image, so that based on multiple target user image determination methods, the new video The generation method is more diversified.

In step 950, the new video is used to replace the streaming video for presentation.

In this embodiment, the first user terminal may use a new video to replace the streaming video for presentation. In other words, when the first user terminal presents a new video, the streaming video can be hidden (that is, no longer presented).

It should be noted that, in addition to the content described above, the embodiment of the present application may also include the same or similar features and effects as the embodiment corresponding to FIG. 7, which will not be repeated here.

It can be seen from FIG. 9 that in the process 900 of the interaction method based on the three-dimensional model in this embodiment, when the current network speed value of the first user terminal is small (less than or equal to the preset network speed threshold), the first The user terminal can locally generate a new video to replace the streaming video presentation. Therefore, the first user terminal only needs to continuously obtain voice from the server, but does not need to continuously obtain video, thereby reducing the occupation of network resources. In a case where the current network speed value of the first user terminal is relatively small, the real-time performance of the video presentation of the first user terminal can be improved.

In some optional implementations of this embodiment, in the case where a new video is presented on the user interface (the streaming video obtained by the second user terminal is not presented), the first user terminal may also send a camera shutdown confirmation to the server. information. Wherein, the camera closing confirmation information is used to determine whether the second user terminal closes the camera.

It can be understood that after the server receives the camera shutdown confirmation information, the server may send to the second user terminal information for determining whether the second user terminal turns off the camera. Therefore, the user of the second user terminal can reduce the occupation of network resources by the second user terminal by turning off the camera.

Please continue to refer to FIG. 10, which is a flowchart of another embodiment of the first three-dimensional model-based interaction method of the present disclosure. The interaction method based on the three-dimensional model is applied to a first user terminal, and the first user terminal presents a user interface. The process 1000 of the interaction method based on the three-dimensional model includes:

Step 1010: In response to detecting the user's target interaction operation on the user interface, send an interaction request for the target interaction operation to a server that provides page data for the user interface. The user interface is used for presenting a three-dimensional model, and the three-dimensional model establishes an association relationship with the user account logged in by the second user terminal in advance.

Step 1020: Receive the streaming video obtained by the server from the second user terminal.

Step 1030: Present the streaming video and the three-dimensional model on the user interface.

In this embodiment, step 1010 to step 1030 are basically the same as step 710 to step 730 in the embodiment corresponding to FIG. 7, and will not be repeated here.

It should be noted that in this embodiment, the three-dimensional model includes three-dimensional sub-models of multiple sub-space scenes, and the sub-space scenes in the multiple sub-space scenes correspond to keywords in a predetermined keyword set.

Step 1040: Perform voice recognition on the voice in the streaming video to obtain a voice recognition result.

In this embodiment, the first user terminal may perform voice recognition on the voice in the streaming video to obtain the voice recognition result.

Here, the voice recognition result can represent the text corresponding to the voice in the streaming video.

Step 1050: In response to the determination that the voice recognition result includes keywords in the keyword set, present on the user interface a three-dimensional sub-model of the corresponding sub-space scene among the multiple sub-space scenes corresponding to the keywords included in the voice recognition result .

In this embodiment, in the case where it is determined that the voice recognition result includes keywords in the keyword set, the first user terminal may display the subspace scene corresponding to the keywords contained in the voice recognition result on the aforementioned user interface. Three-dimensional sub-model.

As an example, if the above-mentioned three-dimensional model is a three-dimensional model inside a house. The house includes a bedroom, a living room, a kitchen, and a bathroom, with a total of four sub-space scenes. That is, the above-mentioned three-dimensional model includes a three-dimensional sub-model of a bedroom, a three-dimensional sub-model of a living room, a three-dimensional sub-model of a kitchen, and a three-dimensional sub-model of a bathroom. The keyword set includes bedroom, living room, kitchen, bathroom. Therefore, the keyword corresponding to the subspace scene bedroom can be "bedroom"; the keyword corresponding to the subspace scene kitchen can be "kitchen"; the keyword corresponding to the subspace scene living room can be "living room" ; The keyword corresponding to the bathroom in the subspace scene can be "toilet". Further, as an example, if the voice recognition result includes the keyword "bedroom", then the first user terminal may present a three-dimensional sub-model of the bedroom on the aforementioned user interface.

Here, by switching the viewpoint of the three-dimensional model, a three-dimensional sub-model that presents the sub-space scene corresponding to the keywords contained in the voice recognition result can be realized.

It should be noted that, in addition to the content described above, the embodiment of the present application may also include the same or similar features and effects as the embodiment corresponding to FIG. 7 and/or FIG. 9, and details are not described herein again.

It can be seen from FIG. 10 that in the process 1000 of the interaction method based on the three-dimensional model in this embodiment, the viewpoint switching of the three-dimensional model can be realized by voice, thereby presenting the subspace scene corresponding to the keywords contained in the voice recognition result The three-dimensional sub-model. As a result, the convenience of browsing the three-dimensional model is improved, and the matching between the presented three-dimensional model and the voice acquired by the second user terminal is improved.

Please continue to refer to FIG. 11, which shows a process 1100 of an embodiment of the second three-dimensional model-based interaction method according to the present disclosure. The three-dimensional model-based interaction method is applied to a second user terminal, and the user account logged in by the second user terminal establishes an association relationship with the three-dimensional model in advance. The interactive method based on the 3D model includes:

Step 1110: In response to receiving the interactive request sent by the server, obtain the streaming video.

In this embodiment, the user can use the second user terminal to interact with the server and the first user terminal through the network. The second user terminal may be various electronic devices, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and so on. The second user terminal may be installed with various client applications, such as real estate transaction software.

In this embodiment, upon receiving the interaction request sent by the server, the streaming video is acquired.

The interaction request indicates that the first user terminal detects the user's target interaction operation on the user interface presented by the first user terminal. Exemplarily, the aforementioned interaction request may be used to instruct the user of the first user terminal to request video communication with the second user terminal. The user interface is used to present the three-dimensional model. Streaming videos can contain images and/or voice. In practice, the image acquisition device and/or the voice acquisition device of the second user terminal can be used to acquire the aforementioned streaming video.

In practice, in the case of detecting a target interaction operation of the user of the first user terminal on the user interface, the first user terminal may send an interaction request for the target interaction operation to a server that provides page data for the user interface. The user interface is used to present the three-dimensional model. The three-dimensional model establishes an association relationship with the user account logged in by the second user terminal in advance. The aforementioned target interaction operation may be various operations for instructing the first user terminal to request interaction (information interaction) with the second user terminal. As an example, the target interaction operation may indicate video communication with the second user terminal.

Here, when step 1110 is performed, the user interface of the first user terminal may present the above-mentioned three-dimensional model, or may not present the three-dimensional model.

Step 1120: Send the streaming video to the server.

In this embodiment, the second user terminal may send the streaming video to the server. The server is used to send the streaming video to the first user terminal, so that the first user terminal presents the streaming video and the three-dimensional model on the user interface.

In practice, the server can use streaming media technology to continuously send the images and/or voice (that is, streaming video) collected by the second user terminal to the first user terminal. Streaming media technology refers to a media format that uses streaming technology to continuously play on the network in real time. Here, the second user terminal may send the continuous image and sound information collected by it to the server after compression processing. The server transmits each compressed package to the first user terminal sequentially or in real time, so that users who use the first user terminal can watch and listen while downloading.

The second three-dimensional model-based interaction method provided by the foregoing embodiment of the present disclosure is applied to a second user terminal, and the user account logged in by the second user terminal establishes an association relationship with the three-dimensional model in advance. The second user terminal may determine whether the user's confirmation operation for the interaction request is detected in the case of receiving the interaction request sent by the server. The interaction request indicates that the first user terminal detects the user's target interaction operation on the user interface presented by the first user terminal, and the user interface is used to present a three-dimensional model. Afterwards, if the confirmation operation is detected, the streaming video is obtained. Finally, the streaming video is sent to the server, where the server is used to send the streaming video to the first user terminal, so that the first user terminal presents the streaming video and the three-dimensional model on the user interface. In the embodiments of the present disclosure, by presenting the streaming media video and the three-dimensional model on the same page of the terminal device, it is helpful to use the streaming media video to present information related to the three-dimensional model to the user, thereby increasing the diversity of interaction modes. Through multi-dimensional information interaction, users can browse the three-dimensional model more calmly, improve the user's browsing time, and help meet the users' more diversified interactive needs.

In some optional implementation manners of this embodiment, the foregoing step 1110 may include the following steps:

First, in the case of receiving the interaction request sent by the server, it is determined whether the user's confirmation operation for the interaction request is detected in response to the detection of the confirmation operation. The confirmation operation indicates that the user of the second user terminal confirms (agrees) to interact with the first user terminal (for example, video communication).

Then, when the confirmation operation is detected, the streaming video is obtained.

It can be understood that in the foregoing optional implementation manner, in the case of the interaction confirmation information sent by the second user terminal in response to the interaction request, the first user terminal may present the streaming video; and the second user terminal does not send the interaction confirmation information. In the case of, the first user terminal does not present the streaming video. Therefore, the streaming video and the three-dimensional model can be presented on the user interface of the first user terminal only after the permission of the user of the second user terminal is obtained (for example, the video call initiated by the first user terminal is connected). This helps to improve the privacy protection of the user of the second user terminal, and provides preparation time for the user of the second user terminal to present the streaming media video to the user of the first user terminal.

In some optional implementations of this embodiment, after the server receives the interaction request, the second user terminal may also directly obtain the streaming video, and send the streaming video to the first user terminal through the server, without the need for a second user terminal. 2. Interaction confirmation information sent by the user of the user terminal in response to the interaction request.

In some optional implementations of this embodiment, in the case that the current network speed value of the first user terminal is less than or equal to the preset network speed threshold, the second user terminal may receive the camera shutdown confirmation message from the server, and display The camera is turned off confirmation message. The camera close confirmation information is used to determine whether the second user terminal closes the camera.

It can be understood that after the server receives the information that the current network speed value of the first user terminal is less than or equal to the preset network speed threshold, the server may send to the second user terminal information for determining whether the second user terminal turns off the camera. Therefore, the user of the second user terminal can reduce the occupation of network resources by the second user terminal by turning off the camera.

In some optional implementation manners in this embodiment, in a case where the user's adjustment operation on the three-dimensional model presented on the second user terminal is detected, the second user terminal may send model adjustment information indicating the adjustment operation to the server , So that the server controls the first user terminal to perform the same adjustment operation on the three-dimensional model presented on the user interface according to the adjustment operation indicated by the model adjustment information. The adjustment operation includes at least one of the following: zoom, rotate, move, and switch viewpoints.

In some optional implementation manners in this embodiment, in the case that the user's adjustment operation on the three-dimensional model presented on the first user terminal is received from the server, the second user terminal may follow the adjustment operation indicated by the model adjustment information , Perform the same adjustment operation on the three-dimensional model presented by the second user terminal. The adjustment operation includes at least one of the following: zoom, rotate, move, and switch viewpoints.

It can be understood that, in the foregoing optional implementation manner, the operations performed by the user of the first user terminal on the three-dimensional model can be synchronized to the second user terminal. In this way, it is convenient for the user of the first user terminal to refer to the same three-dimensional model presented by the second user terminal to obtain information in the streaming video, thereby improving the pertinence of information acquisition.

In some optional implementation manners in this embodiment, in the case of receiving feedback information for the streaming video from the user using the first user terminal sent by the server, the second user terminal may perform the same method as the feedback information. Matching operation. The feedback information may include but is not limited to at least one of the following: likes, ratings, comments, and so on. The feedback information may be used to characterize the evaluation of the user of the first user terminal on the streaming video of the user of the second user terminal.

As an example, if the feedback information of the user of the first user terminal for the streaming video is likes, then the second user terminal may present an operation that matches the feedback information, for example, “XX gave you a like!” .

It can be understood that the foregoing optional implementation manners can improve the authenticity and diversity of interaction.

With further reference to FIG. 12, FIG. 12 is a flow 1200 of another embodiment of the second three-dimensional model-based interaction method of the present disclosure. The three-dimensional model-based interaction method is applied to a first user terminal, and the first user terminal presents a user Interface, the method includes:

Step 1210: In response to receiving the interactive request sent by the server, obtain the streaming video.

Step 1220: Send the streaming video to the server.

In this embodiment, step 1210 to step 1220 are basically the same as step 1110 to step 1120 in the embodiment corresponding to FIG. 11, and will not be repeated here.

Step 1230: Perform voice recognition on the voice acquired by the first user terminal to obtain a voice recognition result.

In this embodiment, the second user terminal may perform voice recognition on the voice acquired by the first user terminal to obtain a voice recognition result.

Step 1240, in response to the determination that the voice recognition result contains keywords in the keyword set, present on the user interface a three-dimensional sub-model of the corresponding sub-space scene among the multiple sub-space scenes corresponding to the keywords contained in the voice recognition result .

In this embodiment, in the case where it is determined that the voice recognition result includes keywords in the keyword set, the second user terminal may present on the user interface a three-dimensional subspace scene corresponding to the keywords contained in the voice recognition result. Model.

As an example, if the above-mentioned three-dimensional model is a three-dimensional model inside a house. The house includes a bedroom, a living room, a kitchen, and a bathroom, with a total of four sub-space scenes, that is, the above-mentioned three-dimensional model includes a three-dimensional sub-model of the bedroom, a three-dimensional sub-model of the living room, a three-dimensional sub-model of the kitchen, and a three-dimensional sub-model of the bathroom. The keyword set includes bedroom, living room, kitchen, bathroom. Therefore, the keyword corresponding to the subspace scene bedroom can be "bedroom"; the keyword corresponding to the subspace scene kitchen can be "kitchen"; the keyword corresponding to the subspace scene living room can be "living room" ; The keyword corresponding to the bathroom in the subspace scene can be "toilet". Further, as an example, if the voice recognition result includes the keyword "bedroom", then the second user terminal may present a three-dimensional sub-model of the bedroom on the aforementioned user interface.

It should be noted that, in addition to the content described above, the embodiment of the present application may also include the same or similar features and effects as the embodiment corresponding to FIG. 11, and details are not described herein again.

It can be seen from FIG. 12 that in the process 1200 of the interaction method based on the three-dimensional model in this embodiment, the viewpoint switching of the three-dimensional model can be realized by voice, thereby presenting the subspace scene corresponding to the keywords contained in the voice recognition result. The three-dimensional sub-model. As a result, the convenience of browsing the three-dimensional model is improved, and the matching between the presented three-dimensional model and the voice acquired by the second user terminal is improved.

With further reference to FIG. 13, as an implementation of the above-mentioned first interaction method based on a three-dimensional model, the present disclosure provides an embodiment of an interaction device based on a three-dimensional model. In addition to the features described below, the device embodiment may also include the same or corresponding features as the method embodiment shown in FIGS. 7, 9, and 10, and produce the same as those in the method embodiment shown in FIGS. The method embodiments shown have the same or corresponding effects.

As shown in FIG. 13, the interaction apparatus 1300 based on the three-dimensional model of this embodiment is set in a first user terminal, and the first user terminal presents a user interface. The device 1300 includes: a first sending unit 1310 configured to send an interaction request for the target interaction operation to a server that provides page data for the user interface in response to detecting a user's target interaction operation on the user interface, wherein the user interface For presenting the three-dimensional model, the three-dimensional model is pre-associated with the user account logged in the second user terminal; the first receiving unit 1320 is configured to receive the streaming video obtained by the server from the second user terminal; the first presenting unit 1330 is It is configured to present streaming video and 3D models on the user interface.

In this embodiment, in the case of detecting the user's target interaction operation on the user interface, the first sending unit 1310 of the interactive device 1300 based on the three-dimensional model may send the information about the target interaction operation to the server that provides page data for the user interface. Interactive request. The user interface is used for presenting a three-dimensional model, and the three-dimensional model establishes an association relationship with the user account logged in by the second user terminal in advance.

In this embodiment, the first receiving unit 1320 may receive the streaming video obtained by the server from the second user terminal.

In this embodiment, the first presentation unit 1330 may present the streaming video and the three-dimensional model on the user interface.

In some optional implementation manners of this embodiment, the first receiving unit is further configured to: in response to the server receiving the interaction confirmation information sent by the second user terminal in response to the interaction request, receive the stream obtained by the server from the second user terminal. Media video.

In some optional implementation manners of this embodiment, the device 1300 further includes: a first adjustment unit (not shown in the figure) configured to respond to the current network speed value of the first user terminal being less than or equal to a preset Network speed threshold, based on each frame of voice in the streaming video, adjust the target user's image to generate a new video, where the new video characterizes the user indicated by the target user's image to perform the actions of each frame of voice instruction; the second presentation unit (Figure Not shown in ), is configured to use new video instead of streaming video for presentation.

In some optional implementation manners of this embodiment, the device 1300 further includes: a first generating unit (not shown in the figure) configured to generate a target user image based on an image in the streaming video; or A determining unit (not shown in the figure) is configured to determine the user image associated with the user account as the target user image.

In some optional implementation manners of this embodiment, the device 1300 further includes: a second sending unit (not shown in the figure), configured to send camera shutdown confirmation information to the server in response to a new video presented on the user interface , Wherein the camera closing confirmation information is used to determine whether the second user terminal closes the camera.

In some optional implementation manners of this embodiment, the first receiving unit is further configured to: send the current network speed value of the first user terminal to the server; receive the streaming video that the server obtains and sends from the second user terminal , The streaming video has a resolution that matches the current network speed value.

In some optional implementation manners of this embodiment, the device 1300 further includes: a second receiving unit (not shown in the figure) configured to receive model adjustment information sent by the server, wherein the model adjustment information indicates the use of the first 2. The user of the user terminal adjusts the three-dimensional model presented on the second user terminal. The adjustment operation includes at least one of the following: zoom, rotate, move, and switch viewpoints; the second adjustment unit (not shown in the figure) is configured According to the adjustment operation indicated by the model adjustment information, the same adjustment operation is performed on the three-dimensional model presented on the user interface.

In some optional implementations of this embodiment, the three-dimensional model includes three-dimensional sub-models of multiple sub-space scenes, and the sub-space scenes in the multiple sub-space scenes correspond to keywords in a predetermined keyword set; and, The device 1300 also includes: a first recognition unit (not shown in the figure), configured to perform voice recognition on the voice in the streaming video to obtain a voice recognition result; and a third presentation unit (not shown in the figure), configured In response to determining that the voice recognition result contains the keywords in the keyword set, a three-dimensional sub-model of the subspace scene corresponding to the keywords contained in the voice recognition result is presented on the user interface.

In some optional implementations of this embodiment, the device 1300 further includes: a first acquiring unit (not shown in the figure), configured to acquire user feedback information for streaming media videos; and a third sending unit (not shown in the figure). Not shown in ), is configured to send feedback information to a server, where the server is used to establish an association relationship between the feedback information and the user account.

The interaction device based on the three-dimensional model provided by the above-mentioned embodiments of the present disclosure is set in a first user terminal, and the first user terminal presents a user interface. In the apparatus 1300, in the case of detecting a user's target interaction operation on the user interface, the first sending unit 1310 may send an interaction request for the target interaction operation to a server that provides page data for the user interface, where the user interface is used for The three-dimensional model is presented. The three-dimensional model is pre-associated with the user account logged in the second user terminal. Then, the first receiving unit 1320 receives the streaming video obtained by the server from the second user terminal. Finally, the first presenting unit 1330 displays on the user interface Streaming videos and 3D models are presented on the website. As a result, streaming media videos and 3D models can be presented on the same page of the terminal device, which helps to use streaming media videos to present information related to the 3D model to users, which improves the diversity of interaction methods. Users browse the 3D model more calmly, increase the user's browsing time, and help meet users' more diversified interactive needs.

With further reference to FIG. 14, as an implementation of the above-mentioned second interaction method based on a three-dimensional model, the present disclosure provides an embodiment of a second interaction device based on a three-dimensional model. Corresponding to the method embodiment, in addition to the features described below, the device embodiment may also include the same or corresponding features as the method embodiment shown in Figs. 11 and 12, and the method implementation shown in Figs. The same or corresponding effect.

As shown in FIG. 14, the interaction device 1400 based on the three-dimensional model of this embodiment is set in the second user terminal. The device 1400 includes: a second determining unit 1410 configured to obtain a streaming video in response to receiving an interaction request sent by a server, wherein the interaction request indicates that the first user terminal detects that the user interface presented to the first user terminal The user interface is used to present a three-dimensional model, and the three-dimensional model has a pre-established association relationship with the user account logged in the second user terminal; the fourth sending unit 1420 is configured to send streaming video to the server, where the server It is used to send the streaming video to the first user terminal, so that the first user terminal presents the streaming video and the three-dimensional model on the user interface.

In this embodiment, upon receiving the interaction request sent by the server, the second determining unit 1410 may obtain the streaming video. The interaction request indicates that the first user terminal detects a user's target interaction operation on the user interface presented by the first user terminal, and the user interface is used to present a three-dimensional model.

In this embodiment, the fourth sending unit 1420 may be configured to send the streaming video to the server, where the server is used to send the streaming video to the first user terminal, so that the first user terminal displays the streaming video on the user interface. And three-dimensional models.

In some optional implementations of this embodiment, the second determining unit 1410 is further configured to: in response to receiving the interaction request sent by the server, determine whether a confirmation operation of the user for the interaction request is detected; in response to detecting the confirmation Operation to obtain streaming video.

In some optional implementations of this embodiment, the device 1400 further includes: a third receiving unit (not shown in the figure), configured to respond to the current network speed value of the first user terminal being less than or equal to a preset The network speed threshold, receiving camera closing confirmation information from the server, and presenting camera closing confirmation information, where the camera closing confirmation information is used to determine whether the second user terminal closes the camera.

In some optional implementation manners of this embodiment, the device 1400 further includes: a fifth sending unit (not shown in the figure), configured to respond to receiving from the server the three-dimensional information presented on the second user terminal by the user. The adjustment operation of the model is to perform the same adjustment operation on the three-dimensional model presented on the user interface according to the adjustment operation indicated by the model adjustment information, where the adjustment operation includes at least one of the following: zoom, rotate, move, and switch viewpoints.

In some optional implementation manners of this embodiment, the device 1400 further includes: a fifth sending unit (not shown in the figure), configured to respond to detecting that the user has received the three-dimensional model presented on the second user terminal The adjustment operation is to send model adjustment information indicating the adjustment operation to the server, so that the server controls the first user terminal to perform the same adjustment operation on the three-dimensional model presented on the user interface according to the adjustment operation indicated by the model adjustment information. The adjustment operation includes the following At least one item: zoom, rotate, move, viewpoint switch.

In some optional implementations of this embodiment, the three-dimensional model includes three-dimensional sub-models of multiple sub-space scenes, and the sub-space scenes in the multiple sub-space scenes correspond to keywords in a predetermined keyword set; and, The device 1400 further includes: a second recognition unit (not shown in the figure), configured to perform voice recognition on the voice acquired by the first user terminal to obtain a voice recognition result; and a fourth presentation unit (not shown in the figure), It is configured to, in response to determining that the voice recognition result contains the keywords in the keyword set, present on the user interface a three-dimensional sub-model of the subspace scene corresponding to the keywords contained in the voice recognition result.

In some optional implementation manners of this embodiment, the apparatus 1400 further includes: an execution unit (not shown in the figure), configured to respond to receiving a stream media message sent by the server and sent by the user using the first user terminal. Video feedback information, perform operations that match the feedback information.

The interaction device based on the three-dimensional model provided by the above-mentioned embodiment of the present disclosure is set in the second user terminal, and the user account logged in by the second user terminal establishes an association relationship with the three-dimensional model in advance. In the device 1400, upon receiving the interaction request sent by the server In the case of, the second determining unit 1410 may obtain the streaming video, where the interaction request indicates that the first user terminal detects the user's target interaction operation on the user interface presented by the first user terminal, and the user interface is used to present the three-dimensional model. The fourth sending unit 1420 may send the streaming video to the server, where the server is used to send the streaming video to the first user terminal, so that the first user terminal can present the streaming video and the three-dimensional model on the user interface. As a result, streaming videos and 3D models can be presented on the same page of the terminal device, which helps to use streaming videos to present information related to the 3D model to users, which improves the diversity of interaction methods. Users browse the 3D model more calmly, increase the user's browsing time, and help meet users' more diversified interactive needs.

Please continue to refer to FIG. 15, which is a schematic diagram of interaction of an embodiment 1500 of the interactive system based on a three-dimensional model of the present disclosure. The interactive system based on the three-dimensional model includes a first user terminal, a second user terminal, and a server. The first user terminal presents a user interface, and the server is respectively communicatively connected with the first user terminal and the second user terminal.

As shown in FIG. 15, the first user terminal, the second user terminal, and the server in the interactive system based on the three-dimensional model can perform the following steps:

Step 1501: The first user terminal detects the user's target interaction operation on the user interface.

In this embodiment, the first user terminal detects the user's target interaction operation on the user interface. The user interface is used for presenting a three-dimensional model, and the three-dimensional model establishes an association relationship with the user account logged in by the second user terminal in advance.

Step 1502: The first user terminal sends an interaction request for the target interaction operation to the server.

In this embodiment, the first user terminal may send an interaction request for a target interaction operation to the server.

Step 1503: The second user terminal obtains the streaming video.

In this embodiment, the second user terminal can obtain the streaming video.

Step 1504: The second user terminal sends the streaming video to the server.

In this embodiment, the second user terminal may send the streaming video to the server.

Step 1505: The server sends the streaming video to the first user terminal.

In this embodiment, the server may send the streaming video to the first user terminal.

Step 1506: The first user terminal presents the streaming video and the three-dimensional model on the user interface.

In this embodiment, the first user terminal may present the streaming video and the three-dimensional model on the user interface.

In this embodiment, on the premise of no conflict, in addition to the content described in this embodiment, the technical features in step 1501 to step 1506 can also refer to the implementation of the first three-dimensional model-based interaction method described above. The technical features in each embodiment of the second three-dimensional model-based interaction method, and the third three-dimensional model-based interaction method will be explained. Moreover, this embodiment may also include the same or corresponding features as the above-mentioned embodiment of the interaction method based on the three-dimensional model, and produce the same or corresponding effects, which will not be repeated here.

The interactive system based on the three-dimensional model provided by the above-mentioned embodiments of the present disclosure includes a first user terminal, a second user terminal, and a server. The first user terminal presents a user interface, and the server is in communication connection with the first user terminal and the second user terminal. . Wherein: the first user terminal is configured to: in response to detecting the user's target interaction operation on the user interface, send an interaction request for the target interaction operation to the server, wherein the user interface is used to present the three-dimensional model, the three-dimensional model and the second user The user account logged in by the terminal establishes an association relationship in advance; the second user terminal is configured to: obtain the streaming video; send the streaming video to the server; the server is also configured to: send the streaming video to the first user terminal; the first user terminal It is configured to: present streaming video and three-dimensional model on the user interface. As a result, streaming media videos and 3D models can be presented on the same page of the terminal device, which helps to use streaming media videos to present information related to the 3D model to users, which improves the diversity of interaction methods. Users browse the 3D model more calmly, increase the user's browsing time, and help meet users' more diversified interactive needs.

The electronic device according to an embodiment of the present disclosure is described below with reference to FIG. 16. FIG. 16 shows a block diagram of an electronic device 1600 according to an embodiment of the present disclosure. As shown in FIG. 16, the electronic device 1600 includes one or more processors 1611 and a memory 1612.

The processor 1611 may be a central processing unit (CPU) or another form of processing unit having the ability to implement three-dimensional scene interaction and/or instruction execution capabilities, and may control other components in the electronic device 1600 to perform desired functions .

The memory 1612 may include one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory, for example, may include: random access memory (RAM) and/or cache memory (cache). The non-volatile memory, for example, may include: read-only memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 1611 may run the program instructions to implement the various methods described above and/or other desired functions. Various contents such as input signals, signal components, noise components, etc. can also be stored in the computer-readable storage medium.

In an example, the electronic device 1600 may further include: an input device 1613, an output device 1614, etc., and these components are interconnected by a bus system and/or other forms of connection mechanisms (not shown). In addition, the input device 1613 may also include, for example, a keyboard, a mouse, and so on. The output device 1614 can output various information to the outside. The output device 1614 may include, for example, a display, a speaker, a printer, a communication network and a remote output device connected to it, and so on.

For simplification, only some of the components related to the embodiments of the present disclosure in the electronic device 1600 are shown in FIG. 16, and components such as buses, input/output interfaces, etc. are omitted. In addition, according to specific application conditions, the electronic device 1600 may also include any other appropriate components. In addition to the above-mentioned methods and devices, the embodiments of the present disclosure may also be computer program products, which include computer program instructions that, when run by a processor, cause the processor to perform operations according to various embodiments of the present disclosure. Steps in various methods.

The computer program product may use any combination of one or more programming languages to write program codes for performing the operations of the embodiments of the present disclosure. The programming languages include object-oriented programming languages, such as Java, C++, etc. , Also includes conventional procedural programming languages, such as "C" language or similar programming languages. The program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.

In addition, the embodiments of the present disclosure may also be a computer-readable storage medium on which computer program instructions are stored. When the computer program instructions are executed by a processor, the processor executes each of the various embodiments of the present disclosure. Steps in a method.

The computer-readable storage medium may adopt any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the above, for example. More specific examples (non-exhaustive list) of readable storage media may include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

The above describes the basic principles of the present disclosure in conjunction with specific embodiments. However, it should be pointed out that the advantages, advantages, effects, etc. mentioned in the present disclosure are only examples and not limitations. These advantages, advantages, effects, etc. cannot be considered as Required for each embodiment of the present disclosure. In addition, the specific details of the foregoing disclosure are only for illustrative purposes and easy-to-understand functions, rather than limitations, and the foregoing details do not limit the present disclosure to the foregoing specific details for implementation.

The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same or similar parts between the various embodiments can be referred to each other. As for the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and the relevant part can refer to the part of the description of the method embodiment.

The block diagrams of the devices, devices, equipment, and systems involved in the present disclosure are merely illustrative examples and are not intended to require or imply that they must be connected, arranged, and configured in the manner shown in the block diagrams. As those skilled in the art will recognize, these devices, devices, devices, and systems can be connected, arranged, and configured in any manner. Words such as "include", "include, "have", etc. are open words that refer to "including but not limited to" and can be used interchangeably. The words "or" and "and" used herein refer to the words " And/or", and can be used interchangeably, unless the context clearly indicates otherwise. The term "such as" used herein refers to the phrase "such as but not limited to" and can be used interchangeably with it.

The method and apparatus of the present disclosure may be implemented in many ways. For example, the method and apparatus of the present disclosure can be implemented by software, hardware, firmware or any combination of software, hardware, and firmware. The above-mentioned order of the steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above, unless specifically stated otherwise. In addition, in some embodiments, the present disclosure can also be implemented as programs recorded in a recording medium, and these programs include machine-readable instructions for implementing methods according to embodiments of the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing a method according to an embodiment of the present disclosure.

It should also be pointed out that in the device, equipment and method of the present disclosure, each component or each step can be decomposed and/or recombined. These decomposition and/or recombination should be regarded as equivalent solutions of the present disclosure.

The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects are very obvious to those skilled in the art, and the general principles defined herein can be applied to other aspects without departing from the scope of the present disclosure. Therefore, the present disclosure is not intended to be limited to the aspects shown here, but in accordance with the widest scope consistent with the principles and novel features disclosed herein.

The above description has been given for the purposes of illustration and description. In addition, this description is not intended to limit the embodiments of the present disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, changes, additions, and subcombinations thereof.

Claims

A method for realizing the interaction of three-dimensional space scenes, including:

In response to detecting a user operation of setting footprint information in the three-dimensional space scene, determining the first pixel in the current view corresponding to the current perspective of the user in the three-dimensional space scene;

Determining a three-dimensional model corresponding to the first pixel;

Determining the position of the user's footprint information in the three-dimensional model, where the footprint information is used to display when the three-dimensional space scene is browsed; and

Set the user's footprint information at the location.
The method according to claim 1, wherein the footprint information includes at least one item in the group consisting of:

Text, pictures, audio, video and 3D models.
The method according to claim 1 or 2, wherein the determining the first pixel in the current view corresponding to the current perspective of the user in the three-dimensional space scene comprises:

Determine the center pixel of the current view as the first pixel.
The method according to claim 3, wherein the determining the three-dimensional model corresponding to the first pixel point comprises:

Determining whether a three-dimensional model is set for the first pixel;

In response to determining that a three-dimensional model is set for the first pixel, using the three-dimensional model set for the first pixel as the three-dimensional model corresponding to the first pixel; and

In response to the determination that the three-dimensional model is not set for the first pixel, the three-dimensional model set for the other pixels in the current view is used as the three-dimensional model corresponding to the first pixel.
The method according to claim 4, wherein the using a three-dimensional model set for other pixels in the current view as a three-dimensional model corresponding to the first pixel includes:

Using the first pixel as a starting point, and according to a preset inspection rule, inspect other pixels in the current view corresponding to the current perspective in the three-dimensional space scene;

In response to determining that the pixel point provided with the three-dimensional model is detected, updating the first pixel point to the pixel point provided with the three-dimensional model;

Obtaining a three-dimensional model corresponding to the first pixel; and

Stop the inspection.
The method according to claim 1 or 2, wherein the determining the first pixel in the current view corresponding to the current perspective of the user in the three-dimensional space scene comprises:

In response to the user's operation of setting the target position of the footprint information in the current view, a pixel point in the current view corresponding to the target position of the footprint information is determined as the first pixel point.
The method according to claim 6, wherein the determining the three-dimensional model corresponding to the first pixel point comprises:

Determining whether a three-dimensional model is set for the first pixel;

In response to determining that a three-dimensional model is set for the first pixel point, using the three-dimensional model set for the first pixel point as a three-dimensional model corresponding to the first pixel point;

In response to the determination that a three-dimensional model is not set for the first pixel point, outputting prompt information for updating the target position of the footprint information;

In response to determining that the pixel in the current view corresponding to the updated target position of the footprint information is set with a three-dimensional model, use the pixel set with the three-dimensional model as the first pixel; and

A three-dimensional model corresponding to the first pixel point is obtained.
The method according to claim 5 or 7, wherein the determining the position of the user's footprint information in the three-dimensional model comprises:

Acquiring the position of the first pixel in the three-dimensional model as the position of the user's footprint information in the three-dimensional model.
The method according to any one of claims 1 to 8, further comprising:

For any browsing user who browses the three-dimensional space scene, determine the footprint area corresponding to the current perspective of the browsing user in the three-dimensional space scene;

Determining the footprint information belonging to the footprint area in the three-dimensional model; and

In the current view corresponding to the current perspective of the browsing user in the three-dimensional space scene, the footprint information belonging to the footprint area is displayed.
The method according to claim 9, wherein the determining the footprint area corresponding to the current perspective of the browsing user in the three-dimensional space scene comprises:

Determining the center pixel of the current view corresponding to the current viewing angle of the browsing user in the three-dimensional space scene; and

The footprint area in the current view is determined by taking the central pixel as the center of the circle and the predetermined length as the radius.
The method according to claim 9 or 10, wherein the displaying the footprint information belonging to the footprint area comprises:

In response to determining that the footprint information belonging to the footprint area has multiple pieces of footprint information with different positions, displaying the multiple pieces of information in the current view according to the image positions of the multiple pieces of footprint information respectively in the current view. Footprint information; and

In response to determining that the footprint information belonging to the footprint area includes different footprint information at the same location, different image positions are assigned to the different footprint information in the current view, and according to the assigned image positions, The different footprint information is displayed in the current view.
The method according to any one of claims 9 to 11, further comprising:

Determining at least one piece of footprint information in the three-dimensional model that does not belong to the footprint area or the current view; and

In the form of a bullet screen, the at least one piece of footprint information is displayed in the current view corresponding to the current perspective of the browsing user in the three-dimensional space scene.
An interactive method based on a three-dimensional model, including:

At the first user terminal presented with the user interface:

In response to detecting a user's target interaction operation on the user interface, sending an interaction request for the target interaction operation to a server that provides page data for the user interface, wherein the user interface is used to present a three-dimensional model, so The three-dimensional model has a pre-established association relationship with the user account logged in by the second user terminal;

Receiving the streaming video obtained by the server from the second user terminal; and

The streaming video and the three-dimensional model are presented on the user interface.
The method according to claim 13, wherein the receiving the streaming video obtained by the server from the second user terminal comprises:

In response to the server receiving the interaction confirmation information sent by the second user terminal in response to the interaction request, receiving the streaming media video obtained by the server from the second user terminal.
The method according to claim 13 or 14, further comprising:

In response to the current network speed value of the first user terminal being less than or equal to the preset network speed threshold, based on each frame of voice in the streaming video, the target user image is adjusted to generate an image different from the streaming video A new video, wherein the new video characterizes that the user indicated by the target user image performs the actions indicated by each frame of voice; and

The new video is used to replace the streaming video for presentation.
The method according to claim 15, further comprising:

Generate the target user image based on the image in the streaming video; or

The user image associated with the user account is determined as the target user image.
The method according to claim 15 or 16, further comprising:

In response to the new video being presented on the user interface, sending camera shutdown confirmation information to the server, where the camera shutdown confirmation information is used to determine whether the second user terminal turns off the camera.
The method according to claim 13, wherein said receiving the streaming video obtained by the server from the second user terminal comprises:

Sending the current network speed value of the first user terminal to the server; and

Receiving a streaming video that the server obtains and sends from the second user terminal, where the streaming video has a resolution matching the current network speed value.
The method according to any one of claims 13-18, further comprising:

Receiving model adjustment information sent by the server, where the model adjustment information indicates an adjustment operation of the user using the second user terminal on the three-dimensional model presented on the second user terminal, the adjustment operation including At least one item from the group consisting of: zooming, rotating, moving and viewpoint switching; and

Perform the same adjustment operation on the three-dimensional model presented on the user interface according to the adjustment operation indicated by the model adjustment information.
The method according to any one of claims 13-19, wherein the three-dimensional model comprises three-dimensional sub-models of a plurality of sub-space scenes, and the corresponding sub-space scenes of the plurality of sub-space scenes correspond to those in a predetermined set of keywords. Corresponding to the corresponding keywords, the method further includes:

Perform voice recognition on the voice in the streaming video to obtain a voice recognition result; and

In response to the determination that the voice recognition result includes the keywords in the keyword set, the corresponding subspace scenes in the plurality of subspace scenes corresponding to the keywords included in the voice recognition result are presented on the user interface. The three-dimensional sub-model of the space scene.
The method according to one of claims 1320, further comprising:

Obtaining user feedback information on the streaming media video; and

The feedback information is sent to the server, where the server is used to establish an association relationship between the feedback information and the user account.
An interactive method based on a three-dimensional model, including:

At the second user terminal:

In response to receiving the interaction request sent by the server, the streaming video is obtained, where the interaction request indicates that the first user terminal detects the user's target interaction operation on the user interface presented by the first user terminal, and the user interface uses Presenting a three-dimensional model, the three-dimensional model pre-establishes an association relationship with the user account logged in by the second user terminal; and

Sending the streaming video to the server, where the server is configured to send the streaming video to the first user terminal, so that the first user terminal displays the Streaming video and the three-dimensional model.
The method according to claim 22, wherein said acquiring the streaming video in response to receiving an interaction request sent by the server comprises:

In response to receiving an interaction request sent by the server, determining whether a user's confirmation operation for the interaction request is detected; and

In response to detecting the confirmation operation, the streaming video is acquired.
The method according to claim 22 or 23, further comprising:

In response to the current network speed value of the first user terminal being less than or equal to the preset network speed threshold, receiving camera shutdown confirmation information from the server; and

Presenting the camera closing confirmation information, where the camera closing confirmation information is used to determine whether the second user terminal closes the camera.
The method according to any one of claims 22-24, further comprising:

In response to receiving from the server the user's adjustment operation on the three-dimensional model presented on the first user terminal, according to the adjustment operation indicated by the model adjustment information, perform the adjustment on the three-dimensional model presented by the second user terminal. The model performs the same adjustment operation, wherein the adjustment operation includes at least one item from the group consisting of: zooming, rotating, moving, and viewpoint switching.
The method according to any one of claims 22-25, further comprising:

In response to detecting the user's adjustment operation on the three-dimensional model presented on the second user terminal, sending model adjustment information indicating the adjustment operation to the server, so that the server controls the first user terminal Perform the same adjustment operation on the three-dimensional model presented on the user interface according to the adjustment operation indicated by the model adjustment information, wherein the adjustment operation includes at least one item in the group consisting of: zooming , Rotation, movement and viewpoint switching.
The method according to any one of claims 22-26, wherein the three-dimensional model comprises three-dimensional sub-models of a plurality of sub-space scenes, and the corresponding sub-space scenes in the plurality of sub-space scenes correspond to those in a predetermined set of keywords. Corresponding to the corresponding keywords,

The method also includes:

Performing voice recognition on the voice acquired by the first user terminal to obtain a voice recognition result; and

In response to the determination that the voice recognition result includes the keywords in the keyword set, corresponding subspace scenes in the plurality of subspace scenes corresponding to the keywords included in the voice recognition result are presented on the user interface The three-dimensional sub-model of the space scene.
The method according to any one of claims 22-27, further comprising:

In response to receiving the feedback information for the streaming video from the user using the first user terminal sent by the server, perform an operation that matches the feedback information.
A device for realizing interaction in a three-dimensional space scene, comprising: a device for executing the method according to any one of claims 1 to 12.
An interaction device based on a three-dimensional model, which is provided in a first user terminal, wherein the device comprises: a device for executing the method according to any one of claims 13 to 21.
An interaction device based on a three-dimensional model, which is provided in a second user terminal, wherein the device includes: a device for executing the method according to any one of claims 22-28.
An interactive system based on a three-dimensional model, including:

The first user terminal is used to present a user interface;

The second user terminal; and

A server, the server is in communication connection with the first user terminal and the second user terminal,

Wherein, the first user terminal is configured to: in response to detecting a user's target interaction operation for the user interface, send an interaction request for the target interaction operation to the server, wherein the user interface is used for Presenting a three-dimensional model, which has a pre-established association relationship with the user account logged in by the second user terminal;

Wherein, the second user terminal is configured to: obtain a streaming video; and send the streaming video to the server;

Wherein, the server is configured to: send the streaming video to the first user terminal; and

Wherein, the first user terminal is configured to: present the streaming video and the three-dimensional model on the user interface.
A non-transitory computer-readable storage medium storing a computer program that, when executed by a computer, enables the computer to implement the method according to any one of claims 1-28.
An electronic device including:

Processor; and

A memory for storing processor-executable instructions that, when executed by the processor, implement the method according to any one of claims 1-28.
A computer program product, comprising a computer program, which when executed by a computer causes the computer to implement the method according to any one of claims 1-28.