CN108874360B - Panoramic content positioning method and device - Google Patents

Panoramic content positioning method and device Download PDF

Info

Publication number
CN108874360B
CN108874360B CN201810679316.6A CN201810679316A CN108874360B CN 108874360 B CN108874360 B CN 108874360B CN 201810679316 A CN201810679316 A CN 201810679316A CN 108874360 B CN108874360 B CN 108874360B
Authority
CN
China
Prior art keywords
current page
entity
image recognition
panoramic
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810679316.6A
Other languages
Chinese (zh)
Other versions
CN108874360A (en
Inventor
杨茗名
王群
王宇亮
张苗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810679316.6A priority Critical patent/CN108874360B/en
Publication of CN108874360A publication Critical patent/CN108874360A/en
Application granted granted Critical
Publication of CN108874360B publication Critical patent/CN108874360B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04815Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the invention provides a panoramic content positioning method and device. The method comprises the following steps: performing semantic analysis on input control voice to determine user requirements, wherein the user requirements comprise at least one of an operation page, an operation object and an operation type which are required to be operated by a user; if the user needs to operate the current page of the panoramic content, performing image recognition on the current page to find whether an entity matched with an operation object exists in the current page; and if the entity matched with the operation object exists in the current page, operating the matched entity in the current page according to the interactive behavior rule and the operation type. The embodiment of the invention provides a more natural and intelligent interactive experience for the user, makes up the blank of voice in panoramic browsing, saves the using step length of the user and meets the user requirement more accurately.

Description

Panoramic content positioning method and device
Technical Field
The invention relates to the technical field of virtual reality, in particular to a panoramic content positioning method and device.
Background
With the continuous development of VR (Virtual Reality) technology, panoramic content can be presented on more and more devices. Wherein, show VR panorama on network (web), both richened original two-dimentional page content on the web, made the user can enjoy more three-dimensional, immersive user experience again, more is close to real life scene.
On the web, the way and limitation of browsing VR panoramic content by a user are as follows:
a. finger swipe or click: the user slides on the panoramic content by using a finger to check the panoramic content; or clicking other panoramic material entrance links to open new panoramic contents.
Limitation: the user is required to be in direct contact with the equipment, and the operation is not convenient and intelligent enough; browsing is also based on visible content display, if the content desired to be viewed is not displayed in the current visible area, the user can see the content only by dragging the content on the page for multiple times, the user cannot accurately position the content, the use step length is increased, and the user experience is influenced.
b. The gravity sensing mode of the gyroscope is as follows: and starting the gravity sensing function of the equipment, and positioning to specific panoramic contents by changing the position of the equipment.
Limitation: the user needs to rotate the device to adapt to different angles, and can see the relatively full panoramic content. In an extreme case, if a user wants to see the rear part of the panoramic content, the user needs to hold the device by hand and turn to the back to see the panoramic content, which greatly affects the user experience.
Disclosure of Invention
The embodiment of the invention provides a panoramic content positioning method and a panoramic content positioning device, which are used for solving one or more technical problems in the prior art.
In a first aspect, an embodiment of the present invention provides a method for positioning panoramic content, including:
performing semantic analysis on input control voice to determine user requirements, wherein the user requirements comprise at least one of an operation page, an operation object and an operation type which are required to be operated by a user;
if the user requirement is to operate the current page of the panoramic content, performing image recognition on the current page to find whether an entity matched with the operation object exists in the current page;
and if the entity matched with the operation object exists in the current page, operating the matched entity in the current page according to an interactive behavior rule and the operation type.
With reference to the first aspect, in a first implementation manner of the first aspect, an embodiment of the present invention further includes:
if the user requirement is to operate the scenes except the current page, searching whether matched scenes exist according to the panoramic relation data;
and if the matched scene is found, operating the matched scene according to the interactive behavior rule and the operation type.
With reference to the first aspect or the first implementation manner of the first aspect, in a second implementation manner of the first aspect, an embodiment of the present invention further includes:
according to preset object attribute rules, learning the characteristics of different entities through a machine to obtain an image recognition model;
the image identification model is used for identifying each entity included in the panoramic content and recording the coordinates of each entity in the panoramic content.
With reference to the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the searching whether an entity matching the operation object exists in the current page includes:
inputting a two-dimensional image corresponding to the three-dimensional current panoramic content into the image recognition model;
searching whether the attribute of the operation object exists in the attributes of each entity of the current page through the image recognition model;
and if so, acquiring the coordinates of the entity corresponding to the existing attribute on the current page.
With reference to the third implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the searching, by the image recognition model, whether an attribute of the operation object exists in attributes of entities of the current page includes:
reconstructing a three-dimensional environment for the two-dimensional image using a web-based graphics language technique;
and searching whether the attribute of the operation object exists in the attributes of the entities of the current page in the three-dimensional environment through the image recognition model.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, and the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the interaction behavior rule includes JSON character strings corresponding to various operation types.
In a second aspect, an embodiment of the present invention provides a panoramic content positioning apparatus, including:
the voice analysis module is used for performing semantic analysis on the input control voice to determine user requirements, wherein the user requirements comprise at least one of an operation page, an operation object and an operation type which are required to be operated by a user;
the image recognition module is used for carrying out image recognition on the current page if the user requirement obtained by the voice analysis module is to operate the current page of the panoramic content so as to search whether an entity matched with the operation object exists in the current page;
and the page interaction module is used for operating the matched entity in the current page according to the interaction behavior rule and the operation type if the entity matched with the operation object exists in the current page.
With reference to the second aspect, in a first implementation manner of the second aspect, the embodiment of the present invention further includes:
the voice analysis module is further used for searching whether a matched scene exists according to the panoramic relation data if the user needs to operate the scene except the current page;
the page interaction module is further configured to operate the matched scene according to the interaction behavior rule and the operation type if the voice analysis module finds the matched scene.
With reference to the second aspect or the first implementation manner of the second aspect, in a second implementation manner of the second aspect, the embodiment of the present invention further includes:
the machine learning module is used for learning the characteristics of different entities through a machine according to preset object attribute rules to obtain an image recognition model;
the image identification model is used for identifying each entity included in the panoramic content and recording the coordinates of each entity in the panoramic content.
With reference to the second implementation manner of the second aspect, in a third implementation manner of the second aspect, the image identification module is further configured to:
inputting a two-dimensional image corresponding to the three-dimensional current panoramic content into the image recognition model; searching whether the attribute of the operation object exists in the attributes of each entity of the current page through the image recognition model; and if so, acquiring the coordinates of the entity corresponding to the existing attribute on the current page.
With reference to the third implementation manner of the second aspect, in a fourth implementation manner of the second aspect, the searching, by using the image recognition model, whether an attribute of the operation object exists in attributes of entities of the current page includes:
reconstructing a three-dimensional environment for the two-dimensional image by adopting a network-based graphic language technology;
and searching whether the attribute of the operation object exists in the attributes of the entities of the current page in the three-dimensional environment through the image recognition model.
With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, and the fourth implementation manner of the second aspect, in a fifth implementation manner of the second aspect, in the embodiment of the present invention, the interactive behavior rule includes JSON character strings corresponding to various operation types.
In a third aspect, an embodiment of the present invention provides a device for positioning panoramic content, where the function of the device may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-described functions.
In one possible design, the panoramic content locating apparatus includes a processor and a memory, the memory is used for storing a program supporting the panoramic content locating apparatus to execute the panoramic content locating method, and the processor is configured to execute the program stored in the memory. The panoramic content locating apparatus may further include a communication interface for the panoramic content locating apparatus to communicate with other devices or a communication network.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer software instructions for a panoramic content positioning apparatus, which includes a program for executing the panoramic content positioning method.
One of the above technical solutions has the following advantages or beneficial effects: the method provides a more natural and intelligent interactive experience for the user, makes up the blank of voice in panoramic browsing, saves the use step length of the user, and meets the user requirement more accurately.
Another technical scheme in the above technical scheme has the following advantages or beneficial effects: through AI (Artificial Intelligence) technology, training the voice and image models, voice interaction tasks can be processed in batches, and entities under a 3d scene do not need to be labeled manually.
The foregoing summary is provided for the purpose of description only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present invention will be readily apparent by reference to the drawings and following detailed description.
Drawings
In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.
Fig. 1 is a flowchart of a panoramic content positioning method according to an embodiment of the present invention.
Fig. 2 is a flowchart of a panoramic content positioning method according to an embodiment of the present invention.
Fig. 3 is a flowchart of a panoramic content positioning method according to an embodiment of the present invention.
Fig. 4 is a block diagram of a panoramic content locating apparatus according to an embodiment of the present invention.
Fig. 5 is an exemplary diagram of a panoramic content positioning method according to an embodiment of the present invention.
Fig. 6 is an exemplary diagram of a panoramic content positioning method according to an embodiment of the present invention.
Fig. 7 is an exemplary diagram of a panoramic content positioning method according to an embodiment of the present invention.
Fig. 8 is a flowchart of a panoramic content positioning method according to an embodiment of the present invention.
Fig. 9 is a flowchart of a panoramic content positioning method according to an embodiment of the present invention.
Fig. 10 is a block diagram illustrating a panoramic content locating apparatus according to an embodiment of the present invention.
Detailed Description
In the following, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
Fig. 1 is a flowchart of a panoramic content positioning method according to an embodiment of the present invention. As shown in fig. 1, the method includes:
step S110, performing semantic analysis on the input control voice to determine user requirements, wherein the user requirements comprise at least one of an operation page, an operation object and an operation type which are required to be operated by a user;
step S120, if the user requirement is to operate the current page of the panoramic content, performing image recognition on the current page to find whether an entity matched with the operation object exists in the current page;
step S130, if the entity matched with the operation object exists in the current page, operating the matched entity in the current page according to the interactive behavior rule and the operation type.
In this embodiment, the user's voice input is analyzed by semantics, so as to determine the user's requirement. For example, if the currently displayed panoramic content is an office image including a desk, and the control voice input by the user includes "zoom in on the desk in front", it may be determined that the user needs to operate on the current page of the panoramic content. For another example, if the currently displayed panoramic content is a teaching building at the XX university, the control voice input by the user includes "switching to the school gate at the XX university," and it can be determined that the user needs to operate a scene other than the current page. Of course, the user requirement may also include, for example, an operation object, an operation type, and the like, and may be set according to a scene of an actual application, which is not limited herein.
And if the user needs to operate the current page of the panoramic content, performing image recognition on the current page, and recognizing the types, positions and the like of various entities included in the current page. And searching whether an entity matched with the operation object exists in the current page or not according to the operation object included in the user requirement. The operation object may include various real objects such as animals, plants, supplies, places, and the like displayed in the page. After the operation is found to correspond to the operation, the entity matched with the operation object can be operated according to the operation type in the user requirement and the preset interaction behavior rule. The operation types may include: zoom in, zoom out, switch scenes, view objects in content, etc.
For example, the user enters speech: the table is enlarged, and the operation is required to be performed on the current page of the panoramic content, wherein the operation object is the table, and the operation type is enlargement. Therefore, whether the entity of the table exists in the current page can be matched, and if the entity of the table is matched with the table, the table is amplified according to the interaction behavior rule corresponding to amplification.
In one possible implementation, as shown in fig. 2, the method further includes:
step S140, if the user needs to operate the scenes except the current page, whether matched scenes exist is searched according to the panoramic relation data;
and S150, if the matched scene is found, operating the matched scene according to the interactive behavior rule and the operation type.
The panoramic relation data can be recorded in a text form, such as scenes of a gate of an XX school, a building, a teaching building of the XX school, an XX school, a building, a dining room of the XX school, an XX school, a building and the like; the description of the text of the panoramic relation data is stored in the interactive behavior rule; and if the two are matched, sending an operation instruction to the page interaction module.
For example, if the currently displayed panoramic content is a teaching building of the university XX, the control voice input by the user includes "switching to the school gate of the university XX", whether a scene matching the school gate of the university XX exists or not can be searched in the panoramic relationship data, and if so, the panoramic content corresponding to the scene is opened.
In one possible implementation manner, the method further includes:
according to preset object attribute rules, learning the characteristics of different entities through a machine to obtain an image recognition model;
the image identification model is used for identifying each entity included in the panoramic content and recording the coordinates of each entity in the panoramic content.
For example, according to preset object attribute rules, in combination with an AI (Artificial Intelligence) technique, by machine learning features of different entities, entities in a 3D (three-dimensional) scene, such as sky, ground, river, plant, animal, house, etc., can be identified, and coordinates of the entities in a panoramic scene are recorded.
In one possible implementation, as shown in fig. 3, step S120 includes:
step S121, inputting a two-dimensional image corresponding to the three-dimensional current panoramic content into the image recognition model;
step S122, searching whether the attribute of the operation object exists in the attributes of each entity of the current page through the image recognition model;
and S123, if the current page exists, acquiring the coordinates of the entity corresponding to the existing attribute on the current page.
In a possible implementation manner, finding whether the attribute of the operation object exists in the attributes of the entities of the current page through the image recognition model includes:
reconstructing a three-dimensional environment for the two-dimensional image by adopting a network-based graphic language technology;
and searching whether the attribute of the operation object exists in the attributes of the entities of the current page in the three-dimensional environment through the image recognition model.
In a possible implementation manner, the interactive behavior rule may include JSON (JavaScript Object Notation) strings corresponding to various operation types.
Fig. 4 is a block diagram of a panoramic content locating apparatus according to an embodiment of the present invention, the apparatus including:
a voice analysis module 41, configured to perform semantic analysis on an input control voice to determine a user requirement, where the user requirement includes at least one of an operation page, an operation object, and an operation type that a user needs to operate;
an image recognition module 43, configured to perform image recognition on a current page of the panoramic content if the user requirement obtained by the voice analysis module is to perform an operation on the current page, so as to find whether an entity matching the operation object exists in the current page;
and a page interaction module 45, configured to, if there is an entity matching the operation object in the current page, operate the matching entity in the current page according to an interaction behavior rule and the operation type.
In one possible implementation manner, the method further includes:
the voice analysis module is further used for searching whether a matched scene exists according to the panoramic relation data if the user needs to operate the scene except the current page;
the page interaction module is further configured to operate the matched scene according to the interaction behavior rule and the operation type if the voice analysis module finds the matched scene.
In one possible implementation manner, the method further includes:
the machine learning module is used for learning the characteristics of different entities through a machine according to preset object attribute rules to obtain an image recognition model;
the image identification model is used for identifying each entity included in the panoramic content and recording the coordinates of each entity in the panoramic content.
In one possible implementation, the image recognition module is further configured to:
inputting a two-dimensional image corresponding to the three-dimensional current panoramic content into the image recognition model; searching whether the attribute of the operation object exists in the attributes of each entity of the current page through the image recognition model; and if so, acquiring the coordinates of the entity corresponding to the existing attribute on the current page.
In a possible implementation manner, searching whether the attribute of the operation object exists in the attributes of the entities of the current page through the image recognition model includes:
reconstructing a three-dimensional environment for the two-dimensional image by adopting a network-based graphic language technology;
and searching whether the attribute of the operation object exists in the attributes of the entities of the current page in the three-dimensional environment through the image recognition model.
In a possible implementation manner, the interactive behavior rule includes JSON character strings corresponding to various operation types.
The functions of each module in each apparatus in the embodiments of the present invention may refer to the corresponding description in the above method, and are not described herein again.
In an application example, an application scene of a web (network) panoramic content positioning method based on voice interaction comprises the following steps: when browsing the panoramic page, the user clicks the VR mode icon to prompt the user to start the device voice permission, as shown in fig. 5. After the user opens the voice permission, a voice prompt is displayed to guide the user to perform voice interaction, as shown in fig. 6. During the user voice input process, the page synchronously shows the content of the user voice input, as shown in fig. 7. After the input is finished, corresponding operations are executed, for example, jumping to the next panoramic page or viewing the invisible part of the panoramic content on the page.
Taking the above application scenario as an example, as shown in fig. 8 and 9, the principle of implementing the panoramic content positioning method according to the embodiment of the present invention by a plurality of modules includes:
1. and panoramic two-dimensional image data are input to the image recognition module, and panoramic relation data and interactive behavior rules are input to the voice analysis module.
2. The image recognition module reconstructs the 3D environment using webgl (Web-based Graphics Language) technology. And (3) learning the characteristics of different entities through a machine according to preset object attribute rules and by combining an AI (artificial intelligence) graph recognition technology. The image recognition module may recognize an entity in the 3D scene, such as the sky, the ground, a river, a plant, an animal, a house, etc., and record coordinates of the entity in the panoramic scene.
3. The voice analysis module analyzes the voice input of the user and performs semantic analysis. And according to a preset interactive behavior rule, determining the requirements of the user. The method mainly comprises two categories: first, operations on current panoramic content; the second is an operation outside the current panoramic content.
And if the panoramic content is the first type, matching the current page of the panoramic content through an image recognition module. If the entity needing to be operated by the user is matched, an operation instruction for operating the entity can be returned to the page interaction module.
If the panoramic relation data is the second type, other panoramic contents which need to be operated by the user can be searched in the panoramic relation data through the voice analysis module, and if the panoramic relation data is hit, an operation instruction for operating the other panoramic contents can be sent to the page interaction module.
4. And the page interaction module operates the current page depending on the interaction behavior rule according to the instruction returned by the voice analysis module.
Through the flow module, voice interaction behaviors can be processed in batch by means of strong capabilities of machine learning and AI, and a more intelligent, convenient and accurate panoramic content positioning mode is provided for a user.
In the embodiment of the invention, the interactive behavior rules can be edited in advance, and the range of voice interaction is defined, such as JSON character strings corresponding to operation types of editing, amplifying, shrinking, scene switching, viewing and the like. In addition, voice and image models are continuously trained through an AI technology, user requirements and entity information under a 3D scene are recognized, and accuracy is improved.
Fig. 10 is a block diagram illustrating a structure of a panoramic content locating apparatus according to an embodiment of the present invention. As shown in fig. 10, the apparatus includes: a memory 910 and a processor 920, the memory 910 having stored therein computer programs operable on the processor 920. The processor 920, when executing the computer program, implements the panoramic content positioning method in the above embodiments. The number of the memory 910 and the processor 920 may be one or more.
The device also includes:
and a communication interface 930 for communicating with an external device to perform data interactive transmission.
The memory 910 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
If the memory 910, the processor 920 and the communication interface 930 are implemented independently, the memory 910, the processor 920 and the communication interface 930 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 10, but this is not intended to represent only one bus or type of bus.
Optionally, in an implementation, if the memory 910, the processor 920 and the communication interface 930 are integrated on a chip, the memory 910, the processor 920 and the communication interface 930 may complete communication with each other through an internal interface.
An embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and the computer program is used for implementing the method of any one of the above embodiments when being executed by a processor.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). Further, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following technologies, which are well known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried out in the method of implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A method for locating panoramic content, comprising:
the voice analysis module carries out semantic analysis on input control voice to determine user requirements, wherein the user requirements comprise an operation page, an operation object and an operation type which are required to be operated by a user;
if the user requirement is to operate the current page of the panoramic content, an image recognition module performs image recognition on the current page so as to search whether an entity matched with the operation object exists in the current page; searching whether an entity matched with the operation object exists in the current page or not, wherein the searching comprises the following steps: inputting a two-dimensional image corresponding to the three-dimensional current panoramic content into an image recognition model, searching whether the attribute of the operation object exists in the attributes of each entity of the current page or not through the image recognition model, and if so, acquiring the coordinate of the entity corresponding to the existing attribute on the current page; the image recognition model is obtained through machine learning;
if the entity matched with the operation object exists in the current page, a page interaction module operates the matched entity in the current page according to an interaction behavior rule and the operation type;
if the user requirement is to operate scenes except the current page, the voice analysis module searches whether matched scenes exist in the panoramic relation data; wherein the panoramic relationship data comprises a plurality of scenes in textual form;
if the matched scene is found, the voice analysis module opens the panoramic content corresponding to the matched scene and sends an operation instruction for operating the panoramic content to the page interaction module, so that the page interaction module operates the matched scene according to the interaction behavior rule and the operation type.
2. The method of claim 1, further comprising:
according to preset object attribute rules, learning the characteristics of different entities through a machine to obtain an image recognition model;
the image identification model is used for identifying each entity included in the panoramic content and recording the coordinates of each entity in the panoramic content.
3. The method of claim 1, wherein finding whether the attribute of the operation object exists in the attributes of the entities of the current page through the image recognition model comprises:
reconstructing a three-dimensional environment for the two-dimensional image by adopting a network-based graphic language technology;
and searching whether the attribute of the operation object exists in the attributes of the entities of the current page in the three-dimensional environment through the image recognition model.
4. The method according to any one of claims 1 to 3, wherein the interaction behavior rules include JSON character strings corresponding to various operation types.
5. A panoramic content positioning apparatus, comprising:
the voice analysis module is used for performing semantic analysis on the input control voice to determine user requirements, wherein the user requirements comprise an operation page, an operation object and an operation type which are required to be operated by a user;
the image recognition module is used for carrying out image recognition on the current page if the user requirement obtained by the voice analysis module is to operate the current page of the panoramic content so as to search whether an entity matched with the operation object exists in the current page; and for: inputting a two-dimensional image corresponding to the three-dimensional current panoramic content into an image recognition model, searching whether the attribute of the operation object exists in the attributes of each entity of the current page through the image recognition model, and if so, acquiring the coordinate of the entity corresponding to the existing attribute on the current page; the image recognition model is obtained through machine learning;
the page interaction module is used for operating the matched entity in the current page according to the interaction behavior rule and the operation type if the entity matched with the operation object exists in the current page;
the voice analysis module is further configured to search whether a matching scene exists in the panoramic relation data if the user requirement is to operate a scene other than the current page; wherein the panoramic relationship data comprises a plurality of scenes in textual form;
the page interaction module is further configured to, if the voice analysis module finds the matched scene, open the panoramic content corresponding to the matched scene by the voice analysis module, and send an operation instruction for operating the panoramic content to the page interaction module, so that the page interaction module operates the matched scene according to the interaction behavior rule and the operation type.
6. The apparatus of claim 5, further comprising:
the machine learning module is used for learning the characteristics of different entities through a machine according to preset object attribute rules to obtain an image recognition model;
the image identification model is used for identifying each entity included in the panoramic content and recording the coordinates of each entity in the panoramic content.
7. The apparatus of claim 5, wherein the finding, through the image recognition model, whether the attribute of the operation object exists in the attributes of the entities of the current page comprises:
reconstructing a three-dimensional environment for the two-dimensional image by adopting a network-based graphic language technology;
and searching whether the attribute of the operation object exists in the attributes of the entities of the current page in the three-dimensional environment through the image recognition model.
8. The apparatus according to any one of claims 5 to 7, wherein JSON character strings corresponding to various operation types are included in the interactive behavior rule.
9. An apparatus for panoramic content positioning, the apparatus comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-4.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 4.
CN201810679316.6A 2018-06-27 2018-06-27 Panoramic content positioning method and device Active CN108874360B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810679316.6A CN108874360B (en) 2018-06-27 2018-06-27 Panoramic content positioning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810679316.6A CN108874360B (en) 2018-06-27 2018-06-27 Panoramic content positioning method and device

Publications (2)

Publication Number Publication Date
CN108874360A CN108874360A (en) 2018-11-23
CN108874360B true CN108874360B (en) 2023-04-07

Family

ID=64295221

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810679316.6A Active CN108874360B (en) 2018-06-27 2018-06-27 Panoramic content positioning method and device

Country Status (1)

Country Link
CN (1) CN108874360B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614613B (en) * 2018-11-30 2020-07-31 北京市商汤科技开发有限公司 Image description statement positioning method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105979035A (en) * 2016-06-28 2016-09-28 广东欧珀移动通信有限公司 AR image processing method and device as well as intelligent terminal
CN106033435A (en) * 2015-03-13 2016-10-19 北京贝虎机器人技术有限公司 Article identification method and apparatus, and indoor map generation method and apparatus

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136545A (en) * 2011-11-22 2013-06-05 中国科学院电子学研究所 High resolution remote sensing image analysis tree automatic extraction method based on space consistency
US9311525B2 (en) * 2014-03-19 2016-04-12 Qualcomm Incorporated Method and apparatus for establishing connection between electronic devices
KR102298457B1 (en) * 2014-11-12 2021-09-07 삼성전자주식회사 Image Displaying Apparatus, Driving Method of Image Displaying Apparatus, and Computer Readable Recording Medium
TWI552892B (en) * 2015-04-14 2016-10-11 鴻海精密工業股份有限公司 Control system and control method for vehicle
CN107608652B (en) * 2017-08-28 2020-05-22 三星电子(中国)研发中心 Method and device for controlling graphical interface through voice
CN107632814A (en) * 2017-09-25 2018-01-26 珠海格力电器股份有限公司 Player method, device and system, storage medium, the processor of audio-frequency information
CN107977183A (en) * 2017-11-16 2018-05-01 百度在线网络技术(北京)有限公司 voice interactive method, device and equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106033435A (en) * 2015-03-13 2016-10-19 北京贝虎机器人技术有限公司 Article identification method and apparatus, and indoor map generation method and apparatus
CN105979035A (en) * 2016-06-28 2016-09-28 广东欧珀移动通信有限公司 AR image processing method and device as well as intelligent terminal

Also Published As

Publication number Publication date
CN108874360A (en) 2018-11-23

Similar Documents

Publication Publication Date Title
JP6893233B2 (en) Image-based data processing methods, devices, electronics, computer-readable storage media and computer programs
CN108073555B (en) Method and system for generating virtual reality environment from electronic document
US10198439B2 (en) Presenting translations of text depicted in images
US10140261B2 (en) Visualizing font similarities for browsing and navigation using a font graph
CN109189879B (en) Electronic book display method and device
CN114375435A (en) Enhancing tangible content on a physical activity surface
US20170351371A1 (en) Touch interaction based search method and apparatus
CN110090444B (en) Game behavior record creating method and device, storage medium and electronic equipment
KR20160061349A (en) Actionable content displayed on a touch screen
CN113656582B (en) Training method of neural network model, image retrieval method, device and medium
CN112214271A (en) Page guiding method and device and electronic equipment
CN115658523A (en) Automatic control and test method for human-computer interaction interface and computer equipment
WO2022237117A1 (en) Touch control method and system for interactive electronic whiteboard, and readable medium
CN109858402B (en) Image detection method, device, terminal and storage medium
CN113837257B (en) Target detection method and device
CN108874360B (en) Panoramic content positioning method and device
CN112817447B (en) AR content display method and system
CN113867875A (en) Method, device, equipment and storage medium for editing and displaying marked object
TWI506569B (en) A method for image tagging that identifies regions and behavior relationship between different objects
CN115620095A (en) Hand information labeling method, device, equipment and storage medium
CN112001380B (en) Recognition method and system for Chinese meaning phrase based on artificial intelligence reality scene
KR20150097250A (en) Sketch retrieval system using tag information, user equipment, service equipment, service method and computer readable medium having computer program recorded therefor
CN114092608A (en) Expression processing method and device, computer readable storage medium and electronic equipment
KR102026475B1 (en) Processing visual input
CN109033346B (en) Method, device, storage medium and terminal equipment for three-dimensional content presentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant