WO2021143315A1 - 场景互动方法、装置、电子设备及计算机存储介质 - Google Patents
场景互动方法、装置、电子设备及计算机存储介质 Download PDFInfo
- Publication number
- WO2021143315A1 WO2021143315A1 PCT/CN2020/127750 CN2020127750W WO2021143315A1 WO 2021143315 A1 WO2021143315 A1 WO 2021143315A1 CN 2020127750 W CN2020127750 W CN 2020127750W WO 2021143315 A1 WO2021143315 A1 WO 2021143315A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- scene
- feature
- image
- real
- audio
- Prior art date
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 73
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000003860 storage Methods 0.000 title claims abstract description 19
- 238000013507 mapping Methods 0.000 claims abstract description 87
- 230000000694 effects Effects 0.000 claims abstract description 53
- 238000000605 extraction Methods 0.000 claims abstract description 49
- 238000004891 communication Methods 0.000 claims description 38
- 230000009471 action Effects 0.000 claims description 30
- 230000004044 response Effects 0.000 claims description 21
- 230000036961 partial effect Effects 0.000 claims description 13
- 230000004927 fusion Effects 0.000 claims description 11
- 230000002452 interceptive effect Effects 0.000 claims description 11
- 230000005540 biological transmission Effects 0.000 claims description 6
- 238000003708 edge detection Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 4
- 238000013473 artificial intelligence Methods 0.000 abstract description 10
- 238000005516 engineering process Methods 0.000 description 34
- 238000010586 diagram Methods 0.000 description 32
- 238000012545 processing Methods 0.000 description 17
- 230000006870 function Effects 0.000 description 13
- 230000010354 integration Effects 0.000 description 11
- 238000004590 computer program Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 238000013527 convolutional neural network Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000012549 training Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000003068 static effect Effects 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000007664 blowing Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 206010068829 Overconfidence Diseases 0.000 description 1
- 108010001267 Protein Subunits Proteins 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000005304 joining Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/16—Image acquisition using multiple overlapping images; Image stitching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/20—Scenes; Scene-specific elements in augmented reality scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/15—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/61—Scene description
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2219/00—Indexing scheme for manipulating 3D models or images for computer graphics
- G06T2219/024—Multi-user, collaborative environment
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/57—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/16—Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
Definitions
- This application relates to the field of artificial intelligence technology, and relates to but not limited to a scene interaction method, device, electronic equipment, and computer storage medium.
- the embodiments of the present application provide a scene interaction method, device, electronic device, and computer storage medium, which can not only improve the interaction efficiency, but also obtain richer and more diverse interaction effects.
- the embodiment of the present application provides a scene interaction method, which is executed by an electronic device, and the method includes: determining at least one real scene to interact with a virtual scene; acquiring real scene information of each of the real scenes in real time; Feature extraction is performed on the real scene information to correspondingly obtain the scene characteristics of each of the real scenes; according to the corresponding relationship between the virtual scene and the real scene, the scene characteristics of the at least one real scene are mapped to the In the virtual scene.
- An embodiment of the present application provides a scene interaction device, which includes: a scene determination module configured to determine at least one real scene to interact with a virtual scene; and an information acquisition module configured to acquire real-time information about each of the real scenes.
- Real scene information a feature extraction module configured to extract features of each of the real scene information to obtain corresponding scene features of each of the real scenes;
- a feature mapping module configured to perform a feature extraction based on the virtual scene and the The corresponding relationship of the real scene is mapped, and the scene feature of the at least one real scene is mapped to the virtual scene.
- FIG. 1 schematically shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application are applied.
- Fig. 3 schematically shows a schematic diagram of an application scenario in which a virtual scene interacts with a real scene in an embodiment of the present application.
- FIG. 8 schematically shows a schematic diagram of the system layout of TensorFlow in an embodiment of the present application.
- Fig. 9 schematically shows a flowchart of the steps of performing feature mapping on scene features in some embodiments of the present application.
- FIG. 10 schematically shows a flow chart of steps in an application scenario of the scene interaction method provided by an embodiment of the present application.
- FIG. 11A schematically shows a schematic diagram of a display state of the stereoscopic spatial image information collected in an embodiment of the present application.
- FIG. 12 schematically shows a schematic diagram of the matching relationship of the voice waveform diagram in an embodiment of the present application.
- Fig. 13 schematically shows a change controller used for scene interaction in an embodiment of the present application.
- Fig. 14 schematically shows a structural block diagram of a scene interaction device in some embodiments of the present application.
- Computer Vision is a science that studies how to make machines "see”. Furthermore, it refers to the use of cameras and computers instead of human eyes to identify, track and track targets. Machine vision such as measurement, and further graphics processing, make computer processing more suitable for human eyes to observe or send to the instrument to detect images. As a scientific discipline, computer vision studies related theories and technologies, trying to establish an artificial intelligence system that can obtain information from images or multi-dimensional data.
- Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping Construction and other technologies also include common face recognition, fingerprint recognition and other biometric recognition technologies.
- speech technology Speech Technology, ST
- ASR Automatic Speech Recognition
- TTS Text To Speech
- voiceprint recognition technology Enabling computers to be able to listen, see, speak, and feel is the future development direction of human-computer interaction, among which voice has become one of the most promising human-computer interaction methods in the future.
- the system architecture 100 may include a client 110, a network 120 and a server 130.
- the client 110 may include various terminal devices such as a smart phone, a tablet computer, a notebook computer, and a desktop computer.
- the server 130 may include various server devices such as a web server, an application server, and a database server.
- the network 120 may be a communication medium of various connection types that can provide a communication link between the client 110 and the server 130, for example, a wired communication link, a wireless communication link, and so on.
- the system architecture in the embodiments of the present application may have any number of clients, networks, and servers.
- the server 130 may be a server group composed of multiple server devices
- the client 110 may also be a terminal composed of multiple terminal devices distributed in the same offline activity scene or in multiple different offline activity scenes. Device cluster.
- the scene interaction method in the embodiment of this application can be applied to the client 110, can also be applied to the server 130, or can also be executed by the client 110 and the server 130 together, which is not specifically limited in the embodiment of this application. .
- This application can include merchant version and user version. Run the business version client of the application and log in to initiate the activity. Online users can run the user version of the application client on the terminal and log in to achieve online synchronization.
- the server 130 is the server corresponding to the application.
- the client 110 includes the client of the merchant and the client of the online user. The merchant forms a virtual scene through the client 110, and each user uploads the user through the client 110.
- Step S210 Determine at least one real scene for interaction with the virtual scene.
- Virtual scenes can be online activity scenes that are displayed to users through terminal devices with display interfaces such as mobile phones and computers, and interact with online users through network communication, while real scenes are interactions with corresponding online activity scenes Of offline activities.
- one virtual scene can interact with one real scene alone, or can also interact with two or more real scenes at the same time.
- Fig. 3 schematically shows a schematic diagram of an application scenario in which a virtual scene interacts with a real scene in an embodiment of the present application.
- the virtual scene 310 may be connected to at least one real scene 320 through network communication, so as to achieve simultaneous interaction with the at least one real scene 320.
- the virtual scene 310 shown in the figure is an application scene of a virtual lottery.
- the virtual scene 310 may also be various application scenes such as a virtual turntable, a virtual bubble blowing, a virtual driving car, and a virtual voting.
- Step S220 Acquire real scene information of each real scene in real time.
- FIG. 4 schematically shows a schematic diagram of a real-time interactive scene communication model established based on WebSocket in an embodiment of the present application.
- the WebSocket protocol is a new network protocol based on TCP. It belongs to the application layer protocol like the http protocol. It implements full-duplex communication between the browser and the server, that is, allows the server to actively send information to the client.
- the communication model may include an application layer 410, a Socket abstraction layer 420, a transport layer 430, a network layer 440, and a link layer 450.
- the application layer 410 includes multiple user processes, and is mainly responsible for providing user interfaces and service support.
- the Socket abstraction layer 420 abstracts the complex operations of the TCP/IP layer into a few simple interfaces called by the application layer 410 to implement process communication in the network.
- the transport layer 430 includes a connection-oriented TCP protocol and a connectionless UDP protocol, and is mainly responsible for the transmission of the entire message from process to process.
- the UDP protocol is the User Datagram Protocol (User Datagram Protocol), which can provide applications with a method to send encapsulated IP datagrams without establishing a connection.
- UDP and TCP are the two main complements in the transport layer 430. Agreement.
- the network layer 440 includes the ICMP protocol, the IP protocol, and the IGMP protocol, and is mainly responsible for routing and transmitting packet data between hosts or between routers and switches.
- the link layer 450 includes an ARP protocol, a hardware interface, and a RARP protocol, and is mainly responsible for establishing and managing links between nodes, and is used to transform an error-free physical channel into an error-free data link that can reliably transmit data frames.
- the ARP protocol is the Address Resolution Protocol (Address Resolution Protocol), which is used to resolve the physical address (MAC address) of the target hardware device 460 through the IP address of the target hardware device 460, and the RARP protocol is used to convert the physical address into an IP address.
- Address Resolution Protocol Address Resolution Protocol
- Fig. 5 schematically shows a communication sequence diagram based on the WebSocket protocol in an embodiment of the present application.
- the WebSocket client 510 first sends a connection request 51 (connecting) to the TCP client 520. Based on the connection request 51, the TCP client 520 sends a synchronization sequence number message 52 (Synchronize Sequence Numbers, SYN), the TCP server 530 responds to the TCP client 520 with a SYN+ACK data packet 53 formed by a synchronization sequence number message and an Acknowledge character (ACK).
- SYN Synchron Sequence Numbers
- the TCP client 520 After receiving the SYN+ACK data packet 53, the TCP client 520 sends an ACK data packet (not shown in the figure) to the TCP server 530, and at the same time returns a connection confirmation message 54 (connected) to the WebSocket client 510.
- the WebSocket client 510 and the TCP client 520 complete a handshake 55 (handshake), and the TCP client 520 and the TCP server 530 perform message sending 56 (send) and message receiving 57 (receive), the TCP server 530 and The WebSocket server 540 performs communication interaction.
- Step S230 Perform feature extraction on each real scene information, so as to correspondingly obtain the scene features of each real scene.
- the scene features obtained through feature extraction in this step may include at least one of image features and audio features.
- this step can first acquire the image information and audio information in the real scene information, and then perform feature extraction on the image information to obtain the image features of the real scene, and The audio information is feature-extracted to obtain the audio features of the real scene.
- the scene image feature is related to information such as the event venue and event background of the real scene, for example, it can be used to reflect that the real scene is an indoor scene or an outdoor scene, or a specific shopping mall or an open-air plaza.
- Character image features are related to people participating in offline activities in real scenes. For example, it is possible to track activity participants such as hosts, guests or audiences in real scenes based on face recognition.
- the feature of the action image is related to the physical action of the person at the activity site, for example, a specific posture or gesture can represent a designated activity instruction.
- voice recognition can be performed on audio information to obtain text audio features of a real scene
- waveform detection of audio information can be performed to obtain waveform audio features of a real scene.
- the text audio feature is related to the voice content such as the dialogue of the participants in the activity in the real scene. For example, it may be a text character obtained by performing voice recognition on the relevant voice content or a specific character code.
- Waveform audio features are related to background music, sound effects, and live event atmosphere in the real scene, and can reflect the noisy or quiet state of the real scene, for example.
- Step S240 According to the corresponding relationship between the virtual scene and the real scene, map the scene feature of the at least one real scene to the virtual scene.
- the various scene features extracted in step S230 can be mapped to the virtual scene through a specified feature mapping method according to the corresponding relationship between the virtual scene and the real scene.
- the image features can be mapped in the virtual scene as a virtual background, a virtual character, etc.
- Corresponding virtual images and audio features can be mapped in the virtual scene as background music, sound effects, or voice commands of the virtual scene, so as to realize the interaction between the real scene and the virtual scene on the scene content.
- the integration of offline recognition and online virtual scenes, as well as the combination of video technology, voice technology, and physical remote sensing technology can be used to enhance the interest of the activity and enhance the interactivity of the activity.
- Regional event participants are integrated into a virtual scene for remote interaction, which enhances the influence of the event on brand marketing, improves user participation in the event, the interest and control of the event, and enhances the value of the event. Very wide range of application scenarios.
- the core features of the real scene can be displayed in the virtual scene and interactive.
- the image information obtained from the real scene information can generally be a dynamic video image collected by an image acquisition device such as a camera, and the same real scene can be imaged by multiple cameras at different positions.
- dynamic video images can be spliced and converted in advance to form static images.
- Fig. 6 schematically shows a flow chart of steps for feature extraction of image information in some embodiments of the present application. As shown in FIG. 6, on the basis of the above embodiments, the feature extraction of image information may include the following steps:
- Step S610 Acquire partial images of the real scene corresponding to different image acquisition parameters from the image information.
- the image capture parameters may include at least one of the image capture angle and the image capture range.
- the image capture parameters may include at least one of the image capture angle and the image capture range. For example, in the same real scene, multiple cameras with different image capture angles and image capture ranges can be arranged to shoot at the same time, and each camera captures
- the video images are all partial images of real scenes.
- Step S620 Perform image stitching on the partial images belonging to the same time interval to obtain a fusion image of the real scene.
- segmentation can be performed according to a preset time length to obtain partial images corresponding to different time intervals. Then, the partial images of the real scene corresponding to different image acquisition parameters that belong to the same time interval are spliced to obtain a fused image of the real scene.
- Step S630 Perform feature extraction on the fused image to obtain image features of the real scene.
- edge detection may be performed on the fused image to obtain a characteristic area in the fused image, and then the characteristic area may be extracted to obtain image characteristics of a real scene. Edge detection can narrow the range of feature extraction and improve the speed and accuracy of feature extraction.
- FIG. 7 schematically shows a schematic diagram of the principle of extracting image features using a CNN model in an embodiment of the present application.
- the input image of the CNN model is a fused image 710 in a time interval after image stitching.
- the input image of the CNN model is a fused image 710 in a time interval after image stitching.
- the CNN model includes at least one or more convolutional layers 720, in addition, one or more pooling layers 730 and one or more other network structures 740 (for example, in some embodiments, other network structures 740 It can be a fully connected layer). After multiple network layers perform feature extraction and feature mapping layer by layer, the image features corresponding to the fused image 710 are finally output.
- FIG. 8 schematically shows a schematic diagram of the system layout of TensorFlow in the embodiment of the present application.
- a TensorFlow cluster 810 contains multiple TensorFlow server 811 (TF Server). These TensorFlow server 811 are divided into a series of batch processing task group jobs, and the task group jobs are divided into a series of batch processing task group jobs. Will be responsible for processing a series of tasks Tasks.
- a TensorFlow cluster 810 generally focuses on a relatively high-level goal, such as training a neural network with multiple machines in parallel.
- a job will contain a series of Tasks dedicated to the same goal.
- the job n corresponding to the parameter server 812 (Parameter Server) is used to process the work related to storing and updating network parameters.
- job0...job n-1 corresponding to each computing server 813 (workers) will be used to carry those stateless nodes that are computationally intensive.
- Tasks in a job will run on different machines.
- a task is generally associated with the processing of a single TensorFlow server, belongs to a specific job and has a unique index in the task list of the job.
- the TensorFlow server is used to run the processing process of grpc_tensorflow_server. It is a member of a cluster and exposes a Master Service and a Worker Service to the outside.
- the Master Service is a remote procedure call protocol (Remote Procedure Call, RPC) service used to interact with a series of remote distributed devices.
- the Master Service implements a session interface for session (Session), that is, the tensorflow::Session interface, and is used to coordinate multiple Worker services.
- Worker service is a remote procedure call service that executes part of the TensorFlow calculation graph (TF graph).
- the TensorFlow client 820 (Client) generally builds a TensorFlow calculation graph and uses the tensorflow::Session interface to complete the interaction with the TensorFlow cluster.
- TensorFlow clients are generally written in Python or C++.
- a TensorFlow client can interact with multiple TensorFlow servers at the same time, and a TensorFlow server can also serve multiple TensorFlow clients at the same time.
- the sample data can be used to train the neural network.
- a large number of offline activity scene videos can be recorded and recorded through simulation.
- tf.nn.conv2d in TensorFlow to call, you can retrieve a large number of videos and pictures for training.
- the OPEN CV can be used to identify the edge of the image.
- the identified block has certain shape data.
- sample data for iterative training can realize the continuous update and optimization of the network parameters in the neural network. For example, a certain network layer involves an algorithm formula a*0.5+b, and the iterative update process for this formula is as follows:
- y is the predicted value, between -1 and +1, and t is the target value (-1 or +1).
- the value of y should be between -1 and +1.
- >1 is discouraged, that is, the classifier is discouraged from overconfidence, and the distance between a correctly classified sample and the dividing line is more than 1 There will be no rewards.
- tf.train.GradientDescentOptimizer can be used as an optimizer for implementing the gradient descent algorithm in Tensorflow.
- the gradient descent algorithm can use any one of standard gradient descent GD, batch gradient descent BGD, and stochastic gradient descent SGD.
- the network parameters for learning and training are W and the loss function is J(W), then the partial derivative of the loss function with respect to the network parameters, that is, the relevant gradient is dJ(W), and the learning rate is ⁇ , then the gradient descent method
- the formula for updating network parameters is:
- the adjustment of the network parameters minimizes the loss function along the decreasing direction of the gradient.
- the basic strategy is to find the fastest downhill path within a limited field of view, and refer to the steepest gradient direction of the current position for each step taken to determine the next step.
- FIG. 9 schematically shows a flowchart of steps for feature mapping of scene features in some embodiments of the present application.
- step S240 according to the difference between the virtual scene and the real scene
- the corresponding relationship, mapping the scene features of at least one real scene to the virtual scene may include the following steps:
- Step S910 According to the corresponding relationship between the virtual scene and the real scene, a feature mapping area corresponding to each real scene is determined in the virtual scene.
- a part of the designated scene display area can be determined as the feature mapping area corresponding to the real scene.
- each real scene can determine a corresponding feature mapping area in the virtual scene.
- These feature mapping areas can be mutually spaced display areas, or they can be partially overlapped or completely overlapped. Overlapping display area.
- Step S920 Display the scene content that has a mapping relationship with the scene feature of the corresponding real scene in the feature mapping area.
- the feature mapping area includes a first feature mapping area and a second feature mapping area.
- the first feature mapping area and the second feature mapping area may be completely overlapping display areas, or may be partially overlapping display areas, or completely Non-overlapping and spaced display areas.
- the image response content that has a mapping relationship with the image feature may be displayed in the first feature mapping area.
- the audio response content that has a mapping relationship with the audio feature can be displayed in the second feature mapping area.
- At least one of scene image features, character image features, and action image features can be obtained from the image features, and then displayed in the first feature mapping area
- the virtual background image that has a mapping relationship with the scene image feature, the virtual character image that has a mapping relationship with the character image feature is displayed in the first feature mapping area, and the action that has a mapping relationship with the action image feature is displayed in the first feature mapping area Response content.
- the image features include scene image features, character image features, and action image features
- multiple image features can be displayed in the same first feature mapping area at the same time, or in different first feature maps. These multiple image features are respectively displayed in the feature mapping area.
- the virtual lottery roulette in the virtual scene can be controlled to start to rotate.
- text audio features and waveform audio features can be obtained from the audio features, and then the text that has a mapping relationship with the text audio features can be displayed in the second feature mapping area Respond to the content, and display audio dynamic effects that have a mapping relationship with the waveform audio feature in the second feature mapping area.
- FIG. 10 schematically shows a flow chart of steps in an application scenario of the scene interaction method provided by an embodiment of the present application.
- This method can be mainly applied to server devices that dynamically control virtual scenes.
- the method for scene interaction in this application scenario mainly includes the following steps:
- Step S1010 Turn on multiple cameras and multiple microphones in an offline scene. Collect stereo spatial image information related to user actions and other activities through multiple cameras, and collect stereo voice information related to user voice and other activities through multiple microphones.
- FIG. 11A schematically shows a schematic diagram of the display state of the stereoscopic spatial image information collected in an embodiment of the present application.
- the stereoscopic spatial image information collected by multiple cameras not only includes a person, but also includes the place where the person is located.
- the scene can also include more detailed information such as character actions and expressions.
- Step S1020 Receive image information and voice information in real time via WebSocket.
- Step S1030 Perform person recognition, action recognition and scene recognition on the image information.
- Step S1040 Through index traversal, the local area of the virtual scene is dynamically changed.
- the feature area can be matted according to the image features obtained in real time, and after matting, the image features of each client's matting can be uniformly dispatched to another virtual scene of the activity, and the characters in each actual scene and their actions can be cast through calculations To the virtual scene, the virtual scene fits the actual activity type.
- FIG. 11B schematically shows a schematic diagram of the display state of the virtual scene after fusion of the content of the real scene in the embodiment of the present application.
- the actual scene characters in the offline activity scene are placed in the virtual scene in the form of real scene objects 1110, and are presented to the user together with the virtual scene objects 1120 generated in the virtual scene.
- the character action and posture of the real scene object 1110 changes in real time following the actual scene character, and the virtual scene object 1120 can be configured and adjusted according to the actual activity type.
- step S1050 the voice information is recognized and converted into text, and a voice waveform diagram is obtained.
- the text part can be used to form voice commands, such as "start lottery", “start voting” and so on.
- the voice waveform graph can be used to match the background music to which it is adapted.
- FIG. 12 schematically shows a schematic diagram of the matching relationship between the voice waveform graph and the background music. As shown in FIG. 12, according to the voice waveform diagram 121 obtained from the voice information, a similar matching waveform diagram 122 can be obtained, and the corresponding background music can be determined based on the matching waveform diagram.
- Step S1060 Through index traversal, the music dynamics of the virtual scene are dynamically changed.
- the background music of the virtual scene can be matched according to the live voice waveform. For example, the offline event scene is relatively quiet, then the more soothing background music can be changed according to the matching result.
- the feature mapping area it is also possible to cut out the feature mapping area according to the image characteristics obtained in real time. After the cut out, the image characteristics of each client's cut out are uniformly scheduled to the virtual scene corresponding to the current activity. And through calculation, the actions of characters in each real scene are put into the virtual scene to realize the type of the virtual scene suitable for the actual activity. At the same time, the background music of the activity can also be matched according to the voice information collected in the real scene.
- Fig. 13 schematically shows a change controller used for scene interaction in an embodiment of the present application.
- an MCU controller 1310 based on a Microcontroller Unit (MCU) can use hardware devices in the form of the Internet of Things to interactively control the physical scenes at the event site.
- Data communication can be carried out at the event site through the Bluetooth communication module 1320 or other types of short-range communication devices.
- the sensor 1330 can detect and collect interactive experience information at the event site.
- the vibration module 1340 can provide physical vibration effects at the event site.
- the lighting module 1350 can provide light visual effects at the event site, and the speaker 1360 can provide music effects at the event site.
- the scene interaction method uses TensorFlow to physically recognize offline scenes and characters, and converts the communication to an online server for display on the terminal screen, and combines offline characters and scenes with online virtual scenes for integration and interaction , Including virtual lottery, virtual turntable, virtual bubble blowing, virtual driving, cycling, voting and other application scenarios, through the integration of offline recognition and online virtual scenes, as well as the combination of video technology, voice technology, and physical remote sensing technology.
- Enhance the fun of the event enhance the interaction of the event, and also allow the customer to integrate the event participants from different regions into a virtual scene for remote interaction, enhance the influence of the event on brand marketing, and increase the user's participation in the event
- the degree, interest and control of the activity enhance the value of the activity and have a very wide range of application scenarios.
- Fig. 14 schematically shows a structural block diagram of a scene interaction device in some embodiments of the present application.
- the scene interaction device 1400 may mainly include:
- the scene determination module 1410 is configured to determine at least one real scene for interaction with the virtual scene.
- the information acquisition module 1420 is configured to acquire real scene information of each of the real scenes in real time.
- the feature extraction module 1430 is configured to perform feature extraction on each of the real scene information, so as to correspondingly obtain the scene features of each of the real scenes.
- the feature mapping module 1440 is configured to map the scene feature of the at least one real scene to the virtual scene according to the corresponding relationship between the virtual scene and the real scene.
- the scene features include at least one of image features and audio features.
- the feature extraction module 1430 includes: an information extraction unit configured to obtain image information and audio information in each real scene information; an image feature extraction unit configured to perform feature extraction on image information to obtain reality The image feature of the scene; the audio feature extraction unit is configured to perform feature extraction on audio information to obtain the audio feature of the real scene.
- the image feature extraction unit includes: a scene recognition subunit configured to perform scene recognition on image information to obtain scene image features of a real scene; and a face recognition subunit configured to perform face recognition on image information.
- Recognition to obtain the character image characteristics of the real scene the character action recognition subunit is configured to perform character action recognition on the image information to obtain the action image characteristics of the displayed scene;
- the first determining subunit is configured to combine the scene image characteristics, the characters The image feature and the action image feature are determined as the image feature of the real scene.
- the image feature extraction unit includes: a partial image acquisition subunit configured to acquire partial images of a real scene corresponding to different image acquisition parameters from image information; an image stitching subunit configured to belong to the same Image splicing is performed on the partial images in the time interval to obtain a fusion image of the real scene; the image feature extraction subunit is configured to perform feature extraction on the fusion image to obtain the image feature of the real scene.
- the image acquisition parameter includes at least one of an image acquisition angle and an image acquisition range.
- the image feature extraction subunit includes: an edge detection subunit configured to perform edge detection on the fused image to obtain a feature region in the fused image; a feature extraction subunit configured to perform feature extraction on the feature region In order to get the image characteristics of the real scene.
- the audio feature extraction unit includes: a voice recognition subunit configured to perform voice recognition on audio information to obtain text audio features of a real scene; and a waveform detection subunit configured to perform waveform detection on audio information to Obtain the waveform audio feature of the real scene; the second determining subunit is configured to determine the text audio feature and the waveform audio feature as the audio feature of the real scene.
- the feature mapping module 1440 includes: an area determination unit configured to determine a feature mapping area corresponding to each real scene in the virtual scene according to the corresponding relationship between the virtual scene and the real scene; and a content display unit, It is configured to display scene content that has a mapping relationship with the scene feature of the corresponding real scene in the feature mapping area.
- the feature mapping area includes a first feature mapping area and a second feature mapping area
- the content display unit includes: an image response content display sub-unit, configured to display the image in the first feature when the scene feature is an image feature The image response content that has a mapping relationship with the image feature is displayed in the feature mapping area;
- the audio response content display subunit is configured to display the audio that has a mapping relationship with the audio feature in the second feature mapping area when the scene feature is an audio feature Response content.
- the image response content display subunit includes: an image feature acquisition subunit configured to acquire at least one of a scene image feature, a character image feature, and an action image feature from the image feature; a virtual background image display subunit The unit is configured to display a virtual background image that has a mapping relationship with the features of the scene image in the feature mapping area; the virtual character image display subunit is configured to display a virtual character image that has a mapping relationship with the features of the character image in the feature mapping area ; The action response content display subunit is configured to display the action response content that has a mapping relationship with the action image feature in the first feature mapping area.
- the audio response content display subunit includes: an audio feature acquisition subunit configured to acquire text audio features and waveform audio features from the audio features; the text response content display subunit is configured to display in the second feature The text response content having a mapping relationship with the text audio feature is displayed in the mapping area; the audio dynamic effect display subunit is configured to display the audio dynamic effect having a mapping relationship with the waveform audio feature in the second feature mapping area.
- the information acquisition module 1420 includes: a link establishment unit configured to establish a full-duplex communication protocol based on the transmission control protocol for real-time communication between the virtual scene and the real scene Link; The link communication unit is configured to use the real-time communication link to obtain real-world scene information of the real-world scene.
- FIG. 15 shows a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
- the computer system 1500 includes a central processing unit (Central Processing Unit, CPU) 1501, which can be loaded into a random system according to a program stored in a read-only memory (Read-Only Memory, ROM) 1502 or from a storage part 1508. Access to the program in the memory (Random Access Memory, RAM) 1503 to execute various appropriate actions and processing. In RAM 1503, various programs and data required for system operation are also stored.
- the CPU 1501, ROM 1502, and RAM 1503 are connected to each other through a bus 1504.
- An input/output (Input/Output, I/O) interface 1505 is also connected to the bus 1504.
- the following components are connected to the I/O interface 1505: input part 1506 including keyboard, mouse, etc.; output part 1507 including cathode ray tube (Cathode Ray Tube, CRT), liquid crystal display (LCD), etc., and speakers 1507
- a storage part 1508 including a hard disk, etc. and a communication part 1509 including a network interface card such as a LAN (Local Area Network) card and a modem.
- the communication section 1509 performs communication processing via a network such as the Internet.
- the driver 1510 is also connected to the I/O interface 1505 as needed.
- a removable medium 1511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 1510 as required, so that the computer program read therefrom is installed into the storage portion 1508 as required.
- the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, and the software product can be stored in a non-volatile storage medium (which can be a Compact Disc Read-Only Memory, CD-ROM) , U disk, mobile hard disk, etc.) or on the network, including several instructions to make a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) execute the method according to the embodiment of the present application.
- a non-volatile storage medium which can be a Compact Disc Read-Only Memory, CD-ROM
- U disk Compact Disc Read-Only Memory
- mobile hard disk etc.
- the network including several instructions to make a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) execute the method according to the embodiment of the present application.
Abstract
Description
Claims (16)
- 一种场景互动方法,所述方法由电子设备执行,包括:确定与虚拟场景进行互动的至少一个现实场景;实时获取每一所述现实场景的现实场景信息;对每一所述现实场景信息进行特征提取,以对应得到每一所述现实场景的场景特征;根据所述虚拟场景与所述现实场景的对应关系,将所述至少一个现实场景的场景特征映射至所述虚拟场景中。
- 根据权利要求1所述的场景互动方法,其中,所述场景特征包括图像特征和音频特征中的至少一种。
- 根据权利要求2所述的场景互动方法,其中,所述对每一所述现实场景信息进行特征提取,以对应得到每一所述现实场景的场景特征,包括:获取每一所述现实场景信息中的图像信息和音频信息;对所述图像信息进行特征提取以得到所述现实场景的图像特征;对所述音频信息进行特征提取以得到所述现实场景的音频特征。
- 根据权利要求3所述的场景互动方法,其中,所述对所述图像信息进行特征提取以得到所述现实场景的图像特征,包括:对所述图像信息进行场景识别以得到所述现实场景的场景图像特征;对所述图像信息进行人脸识别以得到所述现实场景的人物图像特征;对所述图像信息进行人物动作识别以得到所述显示场景的动作图像特征;将所述场景图像特征、所述人物图像特征和所述动作图像特征,确定为所述现实场景的图像特征。
- 根据权利要求3所述的场景互动方法,其中,所述对所述图像信息进行特征提取以得到所述现实场景的图像特征,包括:从所述图像信息中获取对应于不同图像采集参数的所述现实场景的局部图像;对属于同一时间区间内的所述局部图像进行图像拼接,以得到所述现 实场景的融合图像;对所述融合图像进行特征提取以得到所述现实场景的图像特征。
- 根据权利要求5所述的场景互动方法,其中,所述图像采集参数包括图像采集角度和图像采集范围中的至少一种。
- 根据权利要求5所述的场景互动方法,其中,所述对所述融合图像进行特征提取以得到所述现实场景的图像特征,包括:对所述融合图像进行边缘检测以得到所述融合图像中的特征区域;对所述特征区域进行特征提取以得到所述现实场景的图像特征。
- 根据权利要求3所述的场景互动方法,其中,所述对所述音频信息进行特征提取以得到所述现实场景的音频特征,包括:对所述音频信息进行语音识别以得到所述现实场景的文本音频特征;对所述音频信息进行波形检测以得到所述现实场景的波形音频特征;将所述文本音频特征和所述波形音频特征,确定为所述现实场景的音频特征。
- 根据权利要求1所述的场景互动方法,其中,所述根据所述虚拟场景与所述现实场景的对应关系,将所述至少一个现实场景的场景特征映射至所述虚拟场景中,包括:根据所述虚拟场景与所述现实场景的对应关系,在所述虚拟场景中确定与每一所述现实场景相对应的特征映射区域;在所述特征映射区域中展示与对应的所述现实场景的场景特征具有映射关系的场景内容。
- 根据权利要求9所述的场景互动方法,其中,所述特征映射区域包括第一特征映射区域和第二特征映射区域;所述在所述特征映射区域中展示与对应的所述现实场景的场景特征具有映射关系的场景内容,包括:当所述场景特征为图像特征时,在所述第一特征映射区域中展示与所述图像特征具有映射关系的图像响应内容;当所述场景特征为音频特征时,在所述第二特征映射区域中展示与所述音频特征具有映射关系的音频响应内容。
- 根据权利要求10所述的场景互动方法,其中,所述在所述第一 特征映射区域中展示与所述图像特征具有映射关系的图像响应内容,包括:从所述图像特征中获取场景图像特征、人物图像特征和动作图像特征中的至少一种;在所述特征映射区域中展示与所述场景图像特征具有映射关系的虚拟背景图像;在所述特征映射区域中展示与所述人物图像特征具有映射关系的虚拟人物图像;在所述第一特征映射区域中展示与所述动作图像特征具有映射关系的动作响应内容。
- 根据权利要求10所述的场景互动方法,其中,所述在所述第二特征映射区域中展示与所述音频特征具有映射关系的音频响应内容,包括:从所述音频特征中获取文本音频特征和波形音频特征;在所述第二特征映射区域中展示与所述文本音频特征具有映射关系的文本响应内容;在所述第二特征映射区域中展示与所述波形音频特征具有映射关系的音频动态效果。
- 根据权利要求1所述的场景互动方法,其中,所述实时获取所述现实场景的现实场景信息,包括:在所述虚拟场景与所述现实场景之间,建立基于传输控制协议的全双工通信协议进行实时通信的实时通信链路;利用所述实时通信链路获取所述现实场景的现实场景信息。
- 一种场景互动装置,包括:场景确定模块,被配置为确定与虚拟场景进行互动的至少一个现实场景;信息获取模块,被配置为实时获取每一所述现实场景的现实场景信息;特征提取模块,被配置为对每一所述现实场景信息进行特征提取,以对应得到每一所述现实场景的场景特征;特征映射模块,被配置为根据所述虚拟场景与所述现实场景的对应关系,将所述至少一个现实场景的场景特征映射至所述虚拟场景中。
- 一种电子设备,包括:处理器;以及存储器,用于存储所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1至13中任一项所述的场景互动方法。
- 一种计算机可读存储介质,其中存储有计算机可执行指令,所述计算机可执行指令用于被处理器执行时,实现权利要求1至13中任一项所述的场景互动方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020227002916A KR20220027187A (ko) | 2020-01-16 | 2020-11-10 | 장면 인터랙션 방법 및 장치, 전자 장치 및 컴퓨터 저장 매체 |
JP2022521702A JP7408792B2 (ja) | 2020-01-16 | 2020-11-10 | シーンのインタラクション方法及び装置、電子機器並びにコンピュータプログラム |
EP20913676.1A EP3998550A4 (en) | 2020-01-16 | 2020-11-10 | METHOD AND APPARATUS FOR SCENE INTERACTION, ELECTRONIC DEVICE AND COMPUTER STORAGE MEDIA |
US17/666,081 US20220156986A1 (en) | 2020-01-16 | 2022-02-07 | Scene interaction method and apparatus, electronic device, and computer storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010049112.1A CN111274910B (zh) | 2020-01-16 | 2020-01-16 | 场景互动方法、装置及电子设备 |
CN202010049112.1 | 2020-01-16 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/666,081 Continuation US20220156986A1 (en) | 2020-01-16 | 2022-02-07 | Scene interaction method and apparatus, electronic device, and computer storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021143315A1 true WO2021143315A1 (zh) | 2021-07-22 |
Family
ID=71001711
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/127750 WO2021143315A1 (zh) | 2020-01-16 | 2020-11-10 | 场景互动方法、装置、电子设备及计算机存储介质 |
Country Status (6)
Country | Link |
---|---|
US (1) | US20220156986A1 (zh) |
EP (1) | EP3998550A4 (zh) |
JP (1) | JP7408792B2 (zh) |
KR (1) | KR20220027187A (zh) |
CN (1) | CN111274910B (zh) |
WO (1) | WO2021143315A1 (zh) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111274910B (zh) * | 2020-01-16 | 2024-01-30 | 腾讯科技(深圳)有限公司 | 场景互动方法、装置及电子设备 |
CN111986700A (zh) * | 2020-08-28 | 2020-11-24 | 广州繁星互娱信息科技有限公司 | 无接触式操作触发的方法、装置、设备及存储介质 |
CN112053450A (zh) | 2020-09-10 | 2020-12-08 | 脸萌有限公司 | 文字的显示方法、装置、电子设备及存储介质 |
CN112381564A (zh) * | 2020-11-09 | 2021-02-19 | 北京雅邦网络技术发展有限公司 | 汽车销售数字电商 |
CN112995132B (zh) * | 2021-02-01 | 2023-05-02 | 百度在线网络技术(北京)有限公司 | 在线学习的交互方法、装置、电子设备和存储介质 |
CN113377205B (zh) * | 2021-07-06 | 2022-11-11 | 浙江商汤科技开发有限公司 | 场景显示方法及装置、设备、车辆、计算机可读存储介质 |
CN113923463B (zh) * | 2021-09-16 | 2022-07-29 | 南京安汇科技发展有限公司 | 一种直播场景的实时抠像与场景合成系统及实现方法 |
CN114189743B (zh) * | 2021-12-15 | 2023-12-12 | 广州博冠信息科技有限公司 | 数据传输方法、装置、电子设备和存储介质 |
KR20230158283A (ko) * | 2022-05-11 | 2023-11-20 | 삼성전자주식회사 | 전자 장치 및 이의 제어 방법 |
CN115113737B (zh) * | 2022-08-30 | 2023-04-18 | 四川中绳矩阵技术发展有限公司 | 一种虚拟对象声音和图像的重现方法、系统、设备及介质 |
CN116709501A (zh) * | 2022-10-26 | 2023-09-05 | 荣耀终端有限公司 | 业务场景识别方法、电子设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104536579A (zh) * | 2015-01-20 | 2015-04-22 | 刘宛平 | 交互式三维实景与数字图像高速融合处理系统及处理方法 |
US9292085B2 (en) * | 2012-06-29 | 2016-03-22 | Microsoft Technology Licensing, Llc | Configuring an interaction zone within an augmented reality environment |
CN106492461A (zh) * | 2016-09-13 | 2017-03-15 | 广东小天才科技有限公司 | 一种增强现实ar游戏的实现方法及装置、用户终端 |
CN108269307A (zh) * | 2018-01-15 | 2018-07-10 | 歌尔科技有限公司 | 一种增强现实交互方法及设备 |
CN111274910A (zh) * | 2020-01-16 | 2020-06-12 | 腾讯科技(深圳)有限公司 | 场景互动方法、装置及电子设备 |
Family Cites Families (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002169901A (ja) * | 2000-12-01 | 2002-06-14 | I Academy:Kk | インターネットを利用した集合参加型教育システム |
JP4150798B2 (ja) * | 2004-07-28 | 2008-09-17 | 国立大学法人徳島大学 | デジタルフィルタリング方法、デジタルフィルタ装置、デジタルフィルタプログラム及びコンピュータで読み取り可能な記録媒体 |
JP4464360B2 (ja) | 2006-03-27 | 2010-05-19 | 富士フイルム株式会社 | 監視装置、監視方法、及びプログラム |
KR101558553B1 (ko) * | 2009-02-18 | 2015-10-08 | 삼성전자 주식회사 | 아바타 얼굴 표정 제어장치 |
GB2491819A (en) * | 2011-06-08 | 2012-12-19 | Cubicspace Ltd | Server for remote viewing and interaction with a virtual 3-D scene |
US20130222371A1 (en) * | 2011-08-26 | 2013-08-29 | Reincloud Corporation | Enhancing a sensory perception in a field of view of a real-time source within a display screen through augmented reality |
JP2013161205A (ja) * | 2012-02-03 | 2013-08-19 | Sony Corp | 情報処理装置、情報処理方法、及びプログラム |
US20140278403A1 (en) * | 2013-03-14 | 2014-09-18 | Toytalk, Inc. | Systems and methods for interactive synthetic character dialogue |
CN103617432B (zh) * | 2013-11-12 | 2017-10-03 | 华为技术有限公司 | 一种场景识别方法及装置 |
JP2015126524A (ja) * | 2013-12-27 | 2015-07-06 | ブラザー工業株式会社 | 遠隔会議プログラム、端末装置および遠隔会議方法 |
CN103810353A (zh) * | 2014-03-09 | 2014-05-21 | 杨智 | 一种虚拟现实中的现实场景映射系统和方法 |
US10356393B1 (en) * | 2015-02-16 | 2019-07-16 | Amazon Technologies, Inc. | High resolution 3D content |
CN106997236B (zh) | 2016-01-25 | 2018-07-13 | 亮风台(上海)信息科技有限公司 | 基于多模态输入进行交互的方法和设备 |
JP6357595B2 (ja) | 2016-03-08 | 2018-07-11 | 一般社団法人 日本画像認識協会 | 情報伝送システム、情報受信装置、およびコンピュータプログラム |
CN105608746B (zh) * | 2016-03-16 | 2019-10-11 | 成都电锯互动科技有限公司 | 一种将现实进行虚拟实现的方法 |
CN106355153B (zh) * | 2016-08-31 | 2019-10-18 | 上海星视度科技有限公司 | 一种基于增强现实的虚拟对象显示方法、装置以及系统 |
CN106485782A (zh) * | 2016-09-30 | 2017-03-08 | 珠海市魅族科技有限公司 | 一种现实场景在虚拟场景中展示的方法以及装置 |
DE102016121281A1 (de) | 2016-11-08 | 2018-05-09 | 3Dqr Gmbh | Verfahren und Vorrichtung zum Überlagern eines Abbilds einer realen Szenerie mit virtuellen Bild- und Audiodaten und ein mobiles Gerät |
EP4202840A1 (en) | 2016-11-11 | 2023-06-28 | Magic Leap, Inc. | Periocular and audio synthesis of a full face image |
CN108881784B (zh) * | 2017-05-12 | 2020-07-03 | 腾讯科技(深圳)有限公司 | 虚拟场景实现方法、装置、终端及服务器 |
JP6749874B2 (ja) * | 2017-09-08 | 2020-09-02 | Kddi株式会社 | 音波信号から音波種別を判定するプログラム、システム、装置及び方法 |
US20190129607A1 (en) * | 2017-11-02 | 2019-05-02 | Samsung Electronics Co., Ltd. | Method and device for performing remote control |
CN108305308A (zh) * | 2018-01-12 | 2018-07-20 | 北京蜜枝科技有限公司 | 虚拟形象的线下展演系统及方法 |
JP7186009B2 (ja) * | 2018-04-03 | 2022-12-08 | 東京瓦斯株式会社 | 画像処理システム及びプログラム |
CN108985176B (zh) * | 2018-06-20 | 2022-02-25 | 阿里巴巴(中国)有限公司 | 图像生成方法及装置 |
CN109903129A (zh) * | 2019-02-18 | 2019-06-18 | 北京三快在线科技有限公司 | 增强现实显示方法与装置、电子设备、存储介质 |
CN110113298B (zh) * | 2019-03-19 | 2021-09-28 | 视联动力信息技术股份有限公司 | 数据传输方法、装置、信令服务器和计算机可读介质 |
CN110084228A (zh) * | 2019-06-25 | 2019-08-02 | 江苏德劭信息科技有限公司 | 一种基于双流卷积神经网络的危险行为自动识别方法 |
CN110365666B (zh) * | 2019-07-01 | 2021-09-14 | 中国电子科技集团公司第十五研究所 | 军事领域基于增强现实的多端融合协同指挥系统 |
US11232601B1 (en) * | 2019-12-30 | 2022-01-25 | Snap Inc. | Audio-triggered augmented reality eyewear device |
-
2020
- 2020-01-16 CN CN202010049112.1A patent/CN111274910B/zh active Active
- 2020-11-10 KR KR1020227002916A patent/KR20220027187A/ko unknown
- 2020-11-10 JP JP2022521702A patent/JP7408792B2/ja active Active
- 2020-11-10 EP EP20913676.1A patent/EP3998550A4/en active Pending
- 2020-11-10 WO PCT/CN2020/127750 patent/WO2021143315A1/zh unknown
-
2022
- 2022-02-07 US US17/666,081 patent/US20220156986A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9292085B2 (en) * | 2012-06-29 | 2016-03-22 | Microsoft Technology Licensing, Llc | Configuring an interaction zone within an augmented reality environment |
CN104536579A (zh) * | 2015-01-20 | 2015-04-22 | 刘宛平 | 交互式三维实景与数字图像高速融合处理系统及处理方法 |
CN106492461A (zh) * | 2016-09-13 | 2017-03-15 | 广东小天才科技有限公司 | 一种增强现实ar游戏的实现方法及装置、用户终端 |
CN108269307A (zh) * | 2018-01-15 | 2018-07-10 | 歌尔科技有限公司 | 一种增强现实交互方法及设备 |
CN111274910A (zh) * | 2020-01-16 | 2020-06-12 | 腾讯科技(深圳)有限公司 | 场景互动方法、装置及电子设备 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3998550A4 * |
Also Published As
Publication number | Publication date |
---|---|
JP7408792B2 (ja) | 2024-01-05 |
US20220156986A1 (en) | 2022-05-19 |
CN111274910B (zh) | 2024-01-30 |
EP3998550A4 (en) | 2022-11-16 |
JP2022551660A (ja) | 2022-12-12 |
CN111274910A (zh) | 2020-06-12 |
EP3998550A1 (en) | 2022-05-18 |
KR20220027187A (ko) | 2022-03-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021143315A1 (zh) | 场景互动方法、装置、电子设备及计算机存储介质 | |
US11403595B2 (en) | Devices and methods for creating a collaborative virtual session | |
EP3962074A1 (en) | System and method enabling interactions in virtual environments with virtual presence | |
WO2018036149A1 (zh) | 一种多媒体交互教学系统及方法 | |
WO2015192631A1 (zh) | 视频会议系统及方法 | |
CN106547884A (zh) | 一种替身机器人的行为模式学习系统 | |
US20220070241A1 (en) | System and method enabling interactions in virtual environments with virtual presence | |
WO2019119314A1 (zh) | 一种仿真沙盘系统 | |
KR20220030177A (ko) | 가상 환경에서의 애플리케이션의 전달 시스템 및 방법 | |
CN110287947A (zh) | 互动课堂中的互动教室确定方法及装置 | |
US20200351384A1 (en) | System and method for managing virtual reality session technical field | |
CN112839196B (zh) | 一种实现在线会议的方法、装置以及存储介质 | |
WO2022223029A1 (zh) | 虚拟形象的互动方法、装置和设备 | |
KR20220029453A (ko) | 사용자 그래픽 표현 기반 사용자 인증 시스템 및 방법 | |
KR20220029454A (ko) | 가상 환경에서 가상으로 방송하기 위한 시스템 및 방법 | |
WO2022256585A2 (en) | Spatial audio in video conference calls based on content type or participant role | |
KR20220029467A (ko) | 접근하는 사용자 표현 간의 애드혹 가상통신 | |
KR20220030178A (ko) | 가상 환경에서 클라우드 컴퓨팅 기반 가상 컴퓨팅 리소스를 프로비저닝하는 시스템 및 방법 | |
US20220166918A1 (en) | Video chat with plural users using same camera | |
KR102212035B1 (ko) | 제스처 인식 기반 원격 교육서비스 시스템 및 방법 | |
WO2021241221A1 (ja) | 情報処理装置及び情報処理方法 | |
JP2023527624A (ja) | コンピュータプログラムおよびアバター表現方法 | |
WO2024051467A1 (zh) | 图像处理方法、装置、电子设备及存储介质 | |
Noor et al. | VRFlex: Towards the Design of a Virtual Reality Hyflex Class Model | |
JP2022166195A (ja) | 情報抽出装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20913676 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20227002916 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2020913676 Country of ref document: EP Effective date: 20220209 |
|
ENP | Entry into the national phase |
Ref document number: 2022521702 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |