CN108874360B

CN108874360B - Panoramic content positioning method and device

Info

Publication number: CN108874360B
Application number: CN201810679316.6A
Authority: CN
Inventors: 杨茗名; 王群; 王宇亮; 张苗
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-06-27
Filing date: 2018-06-27
Publication date: 2023-04-07
Anticipated expiration: 2038-06-27
Also published as: CN108874360A

Abstract

The embodiment of the invention provides a panoramic content positioning method and device. The method comprises the following steps: performing semantic analysis on input control voice to determine user requirements, wherein the user requirements comprise at least one of an operation page, an operation object and an operation type which are required to be operated by a user; if the user needs to operate the current page of the panoramic content, performing image recognition on the current page to find whether an entity matched with an operation object exists in the current page; and if the entity matched with the operation object exists in the current page, operating the matched entity in the current page according to the interactive behavior rule and the operation type. The embodiment of the invention provides a more natural and intelligent interactive experience for the user, makes up the blank of voice in panoramic browsing, saves the using step length of the user and meets the user requirement more accurately.

Description

Panoramic content positioning method and device

Technical Field

The invention relates to the technical field of virtual reality, in particular to a panoramic content positioning method and device.

Background

With the continuous development of VR (Virtual Reality) technology, panoramic content can be presented on more and more devices. Wherein, show VR panorama on network (web), both richened original two-dimentional page content on the web, made the user can enjoy more three-dimensional, immersive user experience again, more is close to real life scene.

On the web, the way and limitation of browsing VR panoramic content by a user are as follows:

a. finger swipe or click: the user slides on the panoramic content by using a finger to check the panoramic content; or clicking other panoramic material entrance links to open new panoramic contents.

Limitation: the user is required to be in direct contact with the equipment, and the operation is not convenient and intelligent enough; browsing is also based on visible content display, if the content desired to be viewed is not displayed in the current visible area, the user can see the content only by dragging the content on the page for multiple times, the user cannot accurately position the content, the use step length is increased, and the user experience is influenced.

b. The gravity sensing mode of the gyroscope is as follows: and starting the gravity sensing function of the equipment, and positioning to specific panoramic contents by changing the position of the equipment.

Limitation: the user needs to rotate the device to adapt to different angles, and can see the relatively full panoramic content. In an extreme case, if a user wants to see the rear part of the panoramic content, the user needs to hold the device by hand and turn to the back to see the panoramic content, which greatly affects the user experience.

Disclosure of Invention

The embodiment of the invention provides a panoramic content positioning method and a panoramic content positioning device, which are used for solving one or more technical problems in the prior art.

In a first aspect, an embodiment of the present invention provides a method for positioning panoramic content, including:

performing semantic analysis on input control voice to determine user requirements, wherein the user requirements comprise at least one of an operation page, an operation object and an operation type which are required to be operated by a user;

if the user requirement is to operate the current page of the panoramic content, performing image recognition on the current page to find whether an entity matched with the operation object exists in the current page;

and if the entity matched with the operation object exists in the current page, operating the matched entity in the current page according to an interactive behavior rule and the operation type.

With reference to the first aspect, in a first implementation manner of the first aspect, an embodiment of the present invention further includes:

if the user requirement is to operate the scenes except the current page, searching whether matched scenes exist according to the panoramic relation data;

and if the matched scene is found, operating the matched scene according to the interactive behavior rule and the operation type.

With reference to the first aspect or the first implementation manner of the first aspect, in a second implementation manner of the first aspect, an embodiment of the present invention further includes:

according to preset object attribute rules, learning the characteristics of different entities through a machine to obtain an image recognition model;

the image identification model is used for identifying each entity included in the panoramic content and recording the coordinates of each entity in the panoramic content.

With reference to the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the searching whether an entity matching the operation object exists in the current page includes:

inputting a two-dimensional image corresponding to the three-dimensional current panoramic content into the image recognition model;

searching whether the attribute of the operation object exists in the attributes of each entity of the current page through the image recognition model;

and if so, acquiring the coordinates of the entity corresponding to the existing attribute on the current page.

With reference to the third implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the searching, by the image recognition model, whether an attribute of the operation object exists in attributes of entities of the current page includes:

reconstructing a three-dimensional environment for the two-dimensional image using a web-based graphics language technique;

and searching whether the attribute of the operation object exists in the attributes of the entities of the current page in the three-dimensional environment through the image recognition model.

With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, and the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the interaction behavior rule includes JSON character strings corresponding to various operation types.

In a second aspect, an embodiment of the present invention provides a panoramic content positioning apparatus, including:

the voice analysis module is used for performing semantic analysis on the input control voice to determine user requirements, wherein the user requirements comprise at least one of an operation page, an operation object and an operation type which are required to be operated by a user;

the image recognition module is used for carrying out image recognition on the current page if the user requirement obtained by the voice analysis module is to operate the current page of the panoramic content so as to search whether an entity matched with the operation object exists in the current page;

and the page interaction module is used for operating the matched entity in the current page according to the interaction behavior rule and the operation type if the entity matched with the operation object exists in the current page.

With reference to the second aspect, in a first implementation manner of the second aspect, the embodiment of the present invention further includes:

the voice analysis module is further used for searching whether a matched scene exists according to the panoramic relation data if the user needs to operate the scene except the current page;

the page interaction module is further configured to operate the matched scene according to the interaction behavior rule and the operation type if the voice analysis module finds the matched scene.

With reference to the second aspect or the first implementation manner of the second aspect, in a second implementation manner of the second aspect, the embodiment of the present invention further includes:

the machine learning module is used for learning the characteristics of different entities through a machine according to preset object attribute rules to obtain an image recognition model;

With reference to the second implementation manner of the second aspect, in a third implementation manner of the second aspect, the image identification module is further configured to:

inputting a two-dimensional image corresponding to the three-dimensional current panoramic content into the image recognition model; searching whether the attribute of the operation object exists in the attributes of each entity of the current page through the image recognition model; and if so, acquiring the coordinates of the entity corresponding to the existing attribute on the current page.

With reference to the third implementation manner of the second aspect, in a fourth implementation manner of the second aspect, the searching, by using the image recognition model, whether an attribute of the operation object exists in attributes of entities of the current page includes:

reconstructing a three-dimensional environment for the two-dimensional image by adopting a network-based graphic language technology;

With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, and the fourth implementation manner of the second aspect, in a fifth implementation manner of the second aspect, in the embodiment of the present invention, the interactive behavior rule includes JSON character strings corresponding to various operation types.

In a third aspect, an embodiment of the present invention provides a device for positioning panoramic content, where the function of the device may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-described functions.

In one possible design, the panoramic content locating apparatus includes a processor and a memory, the memory is used for storing a program supporting the panoramic content locating apparatus to execute the panoramic content locating method, and the processor is configured to execute the program stored in the memory. The panoramic content locating apparatus may further include a communication interface for the panoramic content locating apparatus to communicate with other devices or a communication network.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer software instructions for a panoramic content positioning apparatus, which includes a program for executing the panoramic content positioning method.

One of the above technical solutions has the following advantages or beneficial effects: the method provides a more natural and intelligent interactive experience for the user, makes up the blank of voice in panoramic browsing, saves the use step length of the user, and meets the user requirement more accurately.

Another technical scheme in the above technical scheme has the following advantages or beneficial effects: through AI (Artificial Intelligence) technology, training the voice and image models, voice interaction tasks can be processed in batches, and entities under a 3d scene do not need to be labeled manually.

The foregoing summary is provided for the purpose of description only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present invention will be readily apparent by reference to the drawings and following detailed description.

Drawings

In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.

Fig. 1 is a flowchart of a panoramic content positioning method according to an embodiment of the present invention.

Fig. 2 is a flowchart of a panoramic content positioning method according to an embodiment of the present invention.

Fig. 3 is a flowchart of a panoramic content positioning method according to an embodiment of the present invention.

Fig. 4 is a block diagram of a panoramic content locating apparatus according to an embodiment of the present invention.

Fig. 5 is an exemplary diagram of a panoramic content positioning method according to an embodiment of the present invention.

Fig. 6 is an exemplary diagram of a panoramic content positioning method according to an embodiment of the present invention.

Fig. 7 is an exemplary diagram of a panoramic content positioning method according to an embodiment of the present invention.

Fig. 8 is a flowchart of a panoramic content positioning method according to an embodiment of the present invention.

Fig. 9 is a flowchart of a panoramic content positioning method according to an embodiment of the present invention.

Fig. 10 is a block diagram illustrating a panoramic content locating apparatus according to an embodiment of the present invention.

Detailed Description

In the following, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

Fig. 1 is a flowchart of a panoramic content positioning method according to an embodiment of the present invention. As shown in fig. 1, the method includes:

step S110, performing semantic analysis on the input control voice to determine user requirements, wherein the user requirements comprise at least one of an operation page, an operation object and an operation type which are required to be operated by a user;

step S120, if the user requirement is to operate the current page of the panoramic content, performing image recognition on the current page to find whether an entity matched with the operation object exists in the current page;

step S130, if the entity matched with the operation object exists in the current page, operating the matched entity in the current page according to the interactive behavior rule and the operation type.

In this embodiment, the user's voice input is analyzed by semantics, so as to determine the user's requirement. For example, if the currently displayed panoramic content is an office image including a desk, and the control voice input by the user includes "zoom in on the desk in front", it may be determined that the user needs to operate on the current page of the panoramic content. For another example, if the currently displayed panoramic content is a teaching building at the XX university, the control voice input by the user includes "switching to the school gate at the XX university," and it can be determined that the user needs to operate a scene other than the current page. Of course, the user requirement may also include, for example, an operation object, an operation type, and the like, and may be set according to a scene of an actual application, which is not limited herein.

And if the user needs to operate the current page of the panoramic content, performing image recognition on the current page, and recognizing the types, positions and the like of various entities included in the current page. And searching whether an entity matched with the operation object exists in the current page or not according to the operation object included in the user requirement. The operation object may include various real objects such as animals, plants, supplies, places, and the like displayed in the page. After the operation is found to correspond to the operation, the entity matched with the operation object can be operated according to the operation type in the user requirement and the preset interaction behavior rule. The operation types may include: zoom in, zoom out, switch scenes, view objects in content, etc.

For example, the user enters speech: the table is enlarged, and the operation is required to be performed on the current page of the panoramic content, wherein the operation object is the table, and the operation type is enlargement. Therefore, whether the entity of the table exists in the current page can be matched, and if the entity of the table is matched with the table, the table is amplified according to the interaction behavior rule corresponding to amplification.

In one possible implementation, as shown in fig. 2, the method further includes:

step S140, if the user needs to operate the scenes except the current page, whether matched scenes exist is searched according to the panoramic relation data;

and S150, if the matched scene is found, operating the matched scene according to the interactive behavior rule and the operation type.

The panoramic relation data can be recorded in a text form, such as scenes of a gate of an XX school, a building, a teaching building of the XX school, an XX school, a building, a dining room of the XX school, an XX school, a building and the like; the description of the text of the panoramic relation data is stored in the interactive behavior rule; and if the two are matched, sending an operation instruction to the page interaction module.

For example, if the currently displayed panoramic content is a teaching building of the university XX, the control voice input by the user includes "switching to the school gate of the university XX", whether a scene matching the school gate of the university XX exists or not can be searched in the panoramic relationship data, and if so, the panoramic content corresponding to the scene is opened.

In one possible implementation manner, the method further includes:

For example, according to preset object attribute rules, in combination with an AI (Artificial Intelligence) technique, by machine learning features of different entities, entities in a 3D (three-dimensional) scene, such as sky, ground, river, plant, animal, house, etc., can be identified, and coordinates of the entities in a panoramic scene are recorded.

In one possible implementation, as shown in fig. 3, step S120 includes:

step S121, inputting a two-dimensional image corresponding to the three-dimensional current panoramic content into the image recognition model;

step S122, searching whether the attribute of the operation object exists in the attributes of each entity of the current page through the image recognition model;

and S123, if the current page exists, acquiring the coordinates of the entity corresponding to the existing attribute on the current page.

In a possible implementation manner, finding whether the attribute of the operation object exists in the attributes of the entities of the current page through the image recognition model includes:

In a possible implementation manner, the interactive behavior rule may include JSON (JavaScript Object Notation) strings corresponding to various operation types.

Fig. 4 is a block diagram of a panoramic content locating apparatus according to an embodiment of the present invention, the apparatus including:

a voice analysis module 41, configured to perform semantic analysis on an input control voice to determine a user requirement, where the user requirement includes at least one of an operation page, an operation object, and an operation type that a user needs to operate;

an image recognition module 43, configured to perform image recognition on a current page of the panoramic content if the user requirement obtained by the voice analysis module is to perform an operation on the current page, so as to find whether an entity matching the operation object exists in the current page;

and a page interaction module 45, configured to, if there is an entity matching the operation object in the current page, operate the matching entity in the current page according to an interaction behavior rule and the operation type.

In one possible implementation manner, the method further includes:

In one possible implementation, the image recognition module is further configured to:

In a possible implementation manner, searching whether the attribute of the operation object exists in the attributes of the entities of the current page through the image recognition model includes:

In a possible implementation manner, the interactive behavior rule includes JSON character strings corresponding to various operation types.

The functions of each module in each apparatus in the embodiments of the present invention may refer to the corresponding description in the above method, and are not described herein again.

In an application example, an application scene of a web (network) panoramic content positioning method based on voice interaction comprises the following steps: when browsing the panoramic page, the user clicks the VR mode icon to prompt the user to start the device voice permission, as shown in fig. 5. After the user opens the voice permission, a voice prompt is displayed to guide the user to perform voice interaction, as shown in fig. 6. During the user voice input process, the page synchronously shows the content of the user voice input, as shown in fig. 7. After the input is finished, corresponding operations are executed, for example, jumping to the next panoramic page or viewing the invisible part of the panoramic content on the page.

Taking the above application scenario as an example, as shown in fig. 8 and 9, the principle of implementing the panoramic content positioning method according to the embodiment of the present invention by a plurality of modules includes:

1. and panoramic two-dimensional image data are input to the image recognition module, and panoramic relation data and interactive behavior rules are input to the voice analysis module.

2. The image recognition module reconstructs the 3D environment using webgl (Web-based Graphics Language) technology. And (3) learning the characteristics of different entities through a machine according to preset object attribute rules and by combining an AI (artificial intelligence) graph recognition technology. The image recognition module may recognize an entity in the 3D scene, such as the sky, the ground, a river, a plant, an animal, a house, etc., and record coordinates of the entity in the panoramic scene.

3. The voice analysis module analyzes the voice input of the user and performs semantic analysis. And according to a preset interactive behavior rule, determining the requirements of the user. The method mainly comprises two categories: first, operations on current panoramic content; the second is an operation outside the current panoramic content.

And if the panoramic content is the first type, matching the current page of the panoramic content through an image recognition module. If the entity needing to be operated by the user is matched, an operation instruction for operating the entity can be returned to the page interaction module.

If the panoramic relation data is the second type, other panoramic contents which need to be operated by the user can be searched in the panoramic relation data through the voice analysis module, and if the panoramic relation data is hit, an operation instruction for operating the other panoramic contents can be sent to the page interaction module.

4. And the page interaction module operates the current page depending on the interaction behavior rule according to the instruction returned by the voice analysis module.

Through the flow module, voice interaction behaviors can be processed in batch by means of strong capabilities of machine learning and AI, and a more intelligent, convenient and accurate panoramic content positioning mode is provided for a user.

In the embodiment of the invention, the interactive behavior rules can be edited in advance, and the range of voice interaction is defined, such as JSON character strings corresponding to operation types of editing, amplifying, shrinking, scene switching, viewing and the like. In addition, voice and image models are continuously trained through an AI technology, user requirements and entity information under a 3D scene are recognized, and accuracy is improved.

Fig. 10 is a block diagram illustrating a structure of a panoramic content locating apparatus according to an embodiment of the present invention. As shown in fig. 10, the apparatus includes: a memory 910 and a processor 920, the memory 910 having stored therein computer programs operable on the processor 920. The processor 920, when executing the computer program, implements the panoramic content positioning method in the above embodiments. The number of the memory 910 and the processor 920 may be one or more.

The device also includes:

and a communication interface 930 for communicating with an external device to perform data interactive transmission.

The memory 910 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 910, the processor 920 and the communication interface 930 are implemented independently, the memory 910, the processor 920 and the communication interface 930 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 10, but this is not intended to represent only one bus or type of bus.

Optionally, in an implementation, if the memory 910, the processor 920 and the communication interface 930 are integrated on a chip, the memory 910, the processor 920 and the communication interface 930 may complete communication with each other through an internal interface.

An embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and the computer program is used for implementing the method of any one of the above embodiments when being executed by a processor.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). Further, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following technologies, which are well known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried out in the method of implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.

While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for locating panoramic content, comprising:

the voice analysis module carries out semantic analysis on input control voice to determine user requirements, wherein the user requirements comprise an operation page, an operation object and an operation type which are required to be operated by a user;

if the user requirement is to operate the current page of the panoramic content, an image recognition module performs image recognition on the current page so as to search whether an entity matched with the operation object exists in the current page; searching whether an entity matched with the operation object exists in the current page or not, wherein the searching comprises the following steps: inputting a two-dimensional image corresponding to the three-dimensional current panoramic content into an image recognition model, searching whether the attribute of the operation object exists in the attributes of each entity of the current page or not through the image recognition model, and if so, acquiring the coordinate of the entity corresponding to the existing attribute on the current page; the image recognition model is obtained through machine learning;

if the entity matched with the operation object exists in the current page, a page interaction module operates the matched entity in the current page according to an interaction behavior rule and the operation type;

if the user requirement is to operate scenes except the current page, the voice analysis module searches whether matched scenes exist in the panoramic relation data; wherein the panoramic relationship data comprises a plurality of scenes in textual form;

if the matched scene is found, the voice analysis module opens the panoramic content corresponding to the matched scene and sends an operation instruction for operating the panoramic content to the page interaction module, so that the page interaction module operates the matched scene according to the interaction behavior rule and the operation type.

2. The method of claim 1, further comprising:

3. The method of claim 1, wherein finding whether the attribute of the operation object exists in the attributes of the entities of the current page through the image recognition model comprises:

4. The method according to any one of claims 1 to 3, wherein the interaction behavior rules include JSON character strings corresponding to various operation types.

5. A panoramic content positioning apparatus, comprising:

the voice analysis module is used for performing semantic analysis on the input control voice to determine user requirements, wherein the user requirements comprise an operation page, an operation object and an operation type which are required to be operated by a user;

the image recognition module is used for carrying out image recognition on the current page if the user requirement obtained by the voice analysis module is to operate the current page of the panoramic content so as to search whether an entity matched with the operation object exists in the current page; and for: inputting a two-dimensional image corresponding to the three-dimensional current panoramic content into an image recognition model, searching whether the attribute of the operation object exists in the attributes of each entity of the current page through the image recognition model, and if so, acquiring the coordinate of the entity corresponding to the existing attribute on the current page; the image recognition model is obtained through machine learning;

the page interaction module is used for operating the matched entity in the current page according to the interaction behavior rule and the operation type if the entity matched with the operation object exists in the current page;

the voice analysis module is further configured to search whether a matching scene exists in the panoramic relation data if the user requirement is to operate a scene other than the current page; wherein the panoramic relationship data comprises a plurality of scenes in textual form;

the page interaction module is further configured to, if the voice analysis module finds the matched scene, open the panoramic content corresponding to the matched scene by the voice analysis module, and send an operation instruction for operating the panoramic content to the page interaction module, so that the page interaction module operates the matched scene according to the interaction behavior rule and the operation type.

6. The apparatus of claim 5, further comprising:

7. The apparatus of claim 5, wherein the finding, through the image recognition model, whether the attribute of the operation object exists in the attributes of the entities of the current page comprises:

8. The apparatus according to any one of claims 5 to 7, wherein JSON character strings corresponding to various operation types are included in the interactive behavior rule.

9. An apparatus for panoramic content positioning, the apparatus comprising:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-4.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 4.