WO2021143315A1 - 场景互动方法、装置、电子设备及计算机存储介质 - Google Patents

场景互动方法、装置、电子设备及计算机存储介质 Download PDF

Info

Publication number
WO2021143315A1
WO2021143315A1 PCT/CN2020/127750 CN2020127750W WO2021143315A1 WO 2021143315 A1 WO2021143315 A1 WO 2021143315A1 CN 2020127750 W CN2020127750 W CN 2020127750W WO 2021143315 A1 WO2021143315 A1 WO 2021143315A1
Authority
WO
WIPO (PCT)
Prior art keywords
scene
feature
image
real
audio
Prior art date
Application number
PCT/CN2020/127750
Other languages
English (en)
French (fr)
Inventor
梁宇轩
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to KR1020227002916A priority Critical patent/KR20220027187A/ko
Priority to JP2022521702A priority patent/JP7408792B2/ja
Priority to EP20913676.1A priority patent/EP3998550A4/en
Publication of WO2021143315A1 publication Critical patent/WO2021143315A1/zh
Priority to US17/666,081 priority patent/US20220156986A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/16Image acquisition using multiple overlapping images; Image stitching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/15Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/61Scene description
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2219/00Indexing scheme for manipulating 3D models or images for computer graphics
    • G06T2219/024Multi-user, collaborative environment
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]

Definitions

  • This application relates to the field of artificial intelligence technology, and relates to but not limited to a scene interaction method, device, electronic equipment, and computer storage medium.
  • the embodiments of the present application provide a scene interaction method, device, electronic device, and computer storage medium, which can not only improve the interaction efficiency, but also obtain richer and more diverse interaction effects.
  • the embodiment of the present application provides a scene interaction method, which is executed by an electronic device, and the method includes: determining at least one real scene to interact with a virtual scene; acquiring real scene information of each of the real scenes in real time; Feature extraction is performed on the real scene information to correspondingly obtain the scene characteristics of each of the real scenes; according to the corresponding relationship between the virtual scene and the real scene, the scene characteristics of the at least one real scene are mapped to the In the virtual scene.
  • An embodiment of the present application provides a scene interaction device, which includes: a scene determination module configured to determine at least one real scene to interact with a virtual scene; and an information acquisition module configured to acquire real-time information about each of the real scenes.
  • Real scene information a feature extraction module configured to extract features of each of the real scene information to obtain corresponding scene features of each of the real scenes;
  • a feature mapping module configured to perform a feature extraction based on the virtual scene and the The corresponding relationship of the real scene is mapped, and the scene feature of the at least one real scene is mapped to the virtual scene.
  • FIG. 1 schematically shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application are applied.
  • Fig. 3 schematically shows a schematic diagram of an application scenario in which a virtual scene interacts with a real scene in an embodiment of the present application.
  • FIG. 8 schematically shows a schematic diagram of the system layout of TensorFlow in an embodiment of the present application.
  • Fig. 9 schematically shows a flowchart of the steps of performing feature mapping on scene features in some embodiments of the present application.
  • FIG. 10 schematically shows a flow chart of steps in an application scenario of the scene interaction method provided by an embodiment of the present application.
  • FIG. 11A schematically shows a schematic diagram of a display state of the stereoscopic spatial image information collected in an embodiment of the present application.
  • FIG. 12 schematically shows a schematic diagram of the matching relationship of the voice waveform diagram in an embodiment of the present application.
  • Fig. 13 schematically shows a change controller used for scene interaction in an embodiment of the present application.
  • Fig. 14 schematically shows a structural block diagram of a scene interaction device in some embodiments of the present application.
  • Computer Vision is a science that studies how to make machines "see”. Furthermore, it refers to the use of cameras and computers instead of human eyes to identify, track and track targets. Machine vision such as measurement, and further graphics processing, make computer processing more suitable for human eyes to observe or send to the instrument to detect images. As a scientific discipline, computer vision studies related theories and technologies, trying to establish an artificial intelligence system that can obtain information from images or multi-dimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping Construction and other technologies also include common face recognition, fingerprint recognition and other biometric recognition technologies.
  • speech technology Speech Technology, ST
  • ASR Automatic Speech Recognition
  • TTS Text To Speech
  • voiceprint recognition technology Enabling computers to be able to listen, see, speak, and feel is the future development direction of human-computer interaction, among which voice has become one of the most promising human-computer interaction methods in the future.
  • the system architecture 100 may include a client 110, a network 120 and a server 130.
  • the client 110 may include various terminal devices such as a smart phone, a tablet computer, a notebook computer, and a desktop computer.
  • the server 130 may include various server devices such as a web server, an application server, and a database server.
  • the network 120 may be a communication medium of various connection types that can provide a communication link between the client 110 and the server 130, for example, a wired communication link, a wireless communication link, and so on.
  • the system architecture in the embodiments of the present application may have any number of clients, networks, and servers.
  • the server 130 may be a server group composed of multiple server devices
  • the client 110 may also be a terminal composed of multiple terminal devices distributed in the same offline activity scene or in multiple different offline activity scenes. Device cluster.
  • the scene interaction method in the embodiment of this application can be applied to the client 110, can also be applied to the server 130, or can also be executed by the client 110 and the server 130 together, which is not specifically limited in the embodiment of this application. .
  • This application can include merchant version and user version. Run the business version client of the application and log in to initiate the activity. Online users can run the user version of the application client on the terminal and log in to achieve online synchronization.
  • the server 130 is the server corresponding to the application.
  • the client 110 includes the client of the merchant and the client of the online user. The merchant forms a virtual scene through the client 110, and each user uploads the user through the client 110.
  • Step S210 Determine at least one real scene for interaction with the virtual scene.
  • Virtual scenes can be online activity scenes that are displayed to users through terminal devices with display interfaces such as mobile phones and computers, and interact with online users through network communication, while real scenes are interactions with corresponding online activity scenes Of offline activities.
  • one virtual scene can interact with one real scene alone, or can also interact with two or more real scenes at the same time.
  • Fig. 3 schematically shows a schematic diagram of an application scenario in which a virtual scene interacts with a real scene in an embodiment of the present application.
  • the virtual scene 310 may be connected to at least one real scene 320 through network communication, so as to achieve simultaneous interaction with the at least one real scene 320.
  • the virtual scene 310 shown in the figure is an application scene of a virtual lottery.
  • the virtual scene 310 may also be various application scenes such as a virtual turntable, a virtual bubble blowing, a virtual driving car, and a virtual voting.
  • Step S220 Acquire real scene information of each real scene in real time.
  • FIG. 4 schematically shows a schematic diagram of a real-time interactive scene communication model established based on WebSocket in an embodiment of the present application.
  • the WebSocket protocol is a new network protocol based on TCP. It belongs to the application layer protocol like the http protocol. It implements full-duplex communication between the browser and the server, that is, allows the server to actively send information to the client.
  • the communication model may include an application layer 410, a Socket abstraction layer 420, a transport layer 430, a network layer 440, and a link layer 450.
  • the application layer 410 includes multiple user processes, and is mainly responsible for providing user interfaces and service support.
  • the Socket abstraction layer 420 abstracts the complex operations of the TCP/IP layer into a few simple interfaces called by the application layer 410 to implement process communication in the network.
  • the transport layer 430 includes a connection-oriented TCP protocol and a connectionless UDP protocol, and is mainly responsible for the transmission of the entire message from process to process.
  • the UDP protocol is the User Datagram Protocol (User Datagram Protocol), which can provide applications with a method to send encapsulated IP datagrams without establishing a connection.
  • UDP and TCP are the two main complements in the transport layer 430. Agreement.
  • the network layer 440 includes the ICMP protocol, the IP protocol, and the IGMP protocol, and is mainly responsible for routing and transmitting packet data between hosts or between routers and switches.
  • the link layer 450 includes an ARP protocol, a hardware interface, and a RARP protocol, and is mainly responsible for establishing and managing links between nodes, and is used to transform an error-free physical channel into an error-free data link that can reliably transmit data frames.
  • the ARP protocol is the Address Resolution Protocol (Address Resolution Protocol), which is used to resolve the physical address (MAC address) of the target hardware device 460 through the IP address of the target hardware device 460, and the RARP protocol is used to convert the physical address into an IP address.
  • Address Resolution Protocol Address Resolution Protocol
  • Fig. 5 schematically shows a communication sequence diagram based on the WebSocket protocol in an embodiment of the present application.
  • the WebSocket client 510 first sends a connection request 51 (connecting) to the TCP client 520. Based on the connection request 51, the TCP client 520 sends a synchronization sequence number message 52 (Synchronize Sequence Numbers, SYN), the TCP server 530 responds to the TCP client 520 with a SYN+ACK data packet 53 formed by a synchronization sequence number message and an Acknowledge character (ACK).
  • SYN Synchron Sequence Numbers
  • the TCP client 520 After receiving the SYN+ACK data packet 53, the TCP client 520 sends an ACK data packet (not shown in the figure) to the TCP server 530, and at the same time returns a connection confirmation message 54 (connected) to the WebSocket client 510.
  • the WebSocket client 510 and the TCP client 520 complete a handshake 55 (handshake), and the TCP client 520 and the TCP server 530 perform message sending 56 (send) and message receiving 57 (receive), the TCP server 530 and The WebSocket server 540 performs communication interaction.
  • Step S230 Perform feature extraction on each real scene information, so as to correspondingly obtain the scene features of each real scene.
  • the scene features obtained through feature extraction in this step may include at least one of image features and audio features.
  • this step can first acquire the image information and audio information in the real scene information, and then perform feature extraction on the image information to obtain the image features of the real scene, and The audio information is feature-extracted to obtain the audio features of the real scene.
  • the scene image feature is related to information such as the event venue and event background of the real scene, for example, it can be used to reflect that the real scene is an indoor scene or an outdoor scene, or a specific shopping mall or an open-air plaza.
  • Character image features are related to people participating in offline activities in real scenes. For example, it is possible to track activity participants such as hosts, guests or audiences in real scenes based on face recognition.
  • the feature of the action image is related to the physical action of the person at the activity site, for example, a specific posture or gesture can represent a designated activity instruction.
  • voice recognition can be performed on audio information to obtain text audio features of a real scene
  • waveform detection of audio information can be performed to obtain waveform audio features of a real scene.
  • the text audio feature is related to the voice content such as the dialogue of the participants in the activity in the real scene. For example, it may be a text character obtained by performing voice recognition on the relevant voice content or a specific character code.
  • Waveform audio features are related to background music, sound effects, and live event atmosphere in the real scene, and can reflect the noisy or quiet state of the real scene, for example.
  • Step S240 According to the corresponding relationship between the virtual scene and the real scene, map the scene feature of the at least one real scene to the virtual scene.
  • the various scene features extracted in step S230 can be mapped to the virtual scene through a specified feature mapping method according to the corresponding relationship between the virtual scene and the real scene.
  • the image features can be mapped in the virtual scene as a virtual background, a virtual character, etc.
  • Corresponding virtual images and audio features can be mapped in the virtual scene as background music, sound effects, or voice commands of the virtual scene, so as to realize the interaction between the real scene and the virtual scene on the scene content.
  • the integration of offline recognition and online virtual scenes, as well as the combination of video technology, voice technology, and physical remote sensing technology can be used to enhance the interest of the activity and enhance the interactivity of the activity.
  • Regional event participants are integrated into a virtual scene for remote interaction, which enhances the influence of the event on brand marketing, improves user participation in the event, the interest and control of the event, and enhances the value of the event. Very wide range of application scenarios.
  • the core features of the real scene can be displayed in the virtual scene and interactive.
  • the image information obtained from the real scene information can generally be a dynamic video image collected by an image acquisition device such as a camera, and the same real scene can be imaged by multiple cameras at different positions.
  • dynamic video images can be spliced and converted in advance to form static images.
  • Fig. 6 schematically shows a flow chart of steps for feature extraction of image information in some embodiments of the present application. As shown in FIG. 6, on the basis of the above embodiments, the feature extraction of image information may include the following steps:
  • Step S610 Acquire partial images of the real scene corresponding to different image acquisition parameters from the image information.
  • the image capture parameters may include at least one of the image capture angle and the image capture range.
  • the image capture parameters may include at least one of the image capture angle and the image capture range. For example, in the same real scene, multiple cameras with different image capture angles and image capture ranges can be arranged to shoot at the same time, and each camera captures
  • the video images are all partial images of real scenes.
  • Step S620 Perform image stitching on the partial images belonging to the same time interval to obtain a fusion image of the real scene.
  • segmentation can be performed according to a preset time length to obtain partial images corresponding to different time intervals. Then, the partial images of the real scene corresponding to different image acquisition parameters that belong to the same time interval are spliced to obtain a fused image of the real scene.
  • Step S630 Perform feature extraction on the fused image to obtain image features of the real scene.
  • edge detection may be performed on the fused image to obtain a characteristic area in the fused image, and then the characteristic area may be extracted to obtain image characteristics of a real scene. Edge detection can narrow the range of feature extraction and improve the speed and accuracy of feature extraction.
  • FIG. 7 schematically shows a schematic diagram of the principle of extracting image features using a CNN model in an embodiment of the present application.
  • the input image of the CNN model is a fused image 710 in a time interval after image stitching.
  • the input image of the CNN model is a fused image 710 in a time interval after image stitching.
  • the CNN model includes at least one or more convolutional layers 720, in addition, one or more pooling layers 730 and one or more other network structures 740 (for example, in some embodiments, other network structures 740 It can be a fully connected layer). After multiple network layers perform feature extraction and feature mapping layer by layer, the image features corresponding to the fused image 710 are finally output.
  • FIG. 8 schematically shows a schematic diagram of the system layout of TensorFlow in the embodiment of the present application.
  • a TensorFlow cluster 810 contains multiple TensorFlow server 811 (TF Server). These TensorFlow server 811 are divided into a series of batch processing task group jobs, and the task group jobs are divided into a series of batch processing task group jobs. Will be responsible for processing a series of tasks Tasks.
  • a TensorFlow cluster 810 generally focuses on a relatively high-level goal, such as training a neural network with multiple machines in parallel.
  • a job will contain a series of Tasks dedicated to the same goal.
  • the job n corresponding to the parameter server 812 (Parameter Server) is used to process the work related to storing and updating network parameters.
  • job0...job n-1 corresponding to each computing server 813 (workers) will be used to carry those stateless nodes that are computationally intensive.
  • Tasks in a job will run on different machines.
  • a task is generally associated with the processing of a single TensorFlow server, belongs to a specific job and has a unique index in the task list of the job.
  • the TensorFlow server is used to run the processing process of grpc_tensorflow_server. It is a member of a cluster and exposes a Master Service and a Worker Service to the outside.
  • the Master Service is a remote procedure call protocol (Remote Procedure Call, RPC) service used to interact with a series of remote distributed devices.
  • the Master Service implements a session interface for session (Session), that is, the tensorflow::Session interface, and is used to coordinate multiple Worker services.
  • Worker service is a remote procedure call service that executes part of the TensorFlow calculation graph (TF graph).
  • the TensorFlow client 820 (Client) generally builds a TensorFlow calculation graph and uses the tensorflow::Session interface to complete the interaction with the TensorFlow cluster.
  • TensorFlow clients are generally written in Python or C++.
  • a TensorFlow client can interact with multiple TensorFlow servers at the same time, and a TensorFlow server can also serve multiple TensorFlow clients at the same time.
  • the sample data can be used to train the neural network.
  • a large number of offline activity scene videos can be recorded and recorded through simulation.
  • tf.nn.conv2d in TensorFlow to call, you can retrieve a large number of videos and pictures for training.
  • the OPEN CV can be used to identify the edge of the image.
  • the identified block has certain shape data.
  • sample data for iterative training can realize the continuous update and optimization of the network parameters in the neural network. For example, a certain network layer involves an algorithm formula a*0.5+b, and the iterative update process for this formula is as follows:
  • y is the predicted value, between -1 and +1, and t is the target value (-1 or +1).
  • the value of y should be between -1 and +1.
  • >1 is discouraged, that is, the classifier is discouraged from overconfidence, and the distance between a correctly classified sample and the dividing line is more than 1 There will be no rewards.
  • tf.train.GradientDescentOptimizer can be used as an optimizer for implementing the gradient descent algorithm in Tensorflow.
  • the gradient descent algorithm can use any one of standard gradient descent GD, batch gradient descent BGD, and stochastic gradient descent SGD.
  • the network parameters for learning and training are W and the loss function is J(W), then the partial derivative of the loss function with respect to the network parameters, that is, the relevant gradient is dJ(W), and the learning rate is ⁇ , then the gradient descent method
  • the formula for updating network parameters is:
  • the adjustment of the network parameters minimizes the loss function along the decreasing direction of the gradient.
  • the basic strategy is to find the fastest downhill path within a limited field of view, and refer to the steepest gradient direction of the current position for each step taken to determine the next step.
  • FIG. 9 schematically shows a flowchart of steps for feature mapping of scene features in some embodiments of the present application.
  • step S240 according to the difference between the virtual scene and the real scene
  • the corresponding relationship, mapping the scene features of at least one real scene to the virtual scene may include the following steps:
  • Step S910 According to the corresponding relationship between the virtual scene and the real scene, a feature mapping area corresponding to each real scene is determined in the virtual scene.
  • a part of the designated scene display area can be determined as the feature mapping area corresponding to the real scene.
  • each real scene can determine a corresponding feature mapping area in the virtual scene.
  • These feature mapping areas can be mutually spaced display areas, or they can be partially overlapped or completely overlapped. Overlapping display area.
  • Step S920 Display the scene content that has a mapping relationship with the scene feature of the corresponding real scene in the feature mapping area.
  • the feature mapping area includes a first feature mapping area and a second feature mapping area.
  • the first feature mapping area and the second feature mapping area may be completely overlapping display areas, or may be partially overlapping display areas, or completely Non-overlapping and spaced display areas.
  • the image response content that has a mapping relationship with the image feature may be displayed in the first feature mapping area.
  • the audio response content that has a mapping relationship with the audio feature can be displayed in the second feature mapping area.
  • At least one of scene image features, character image features, and action image features can be obtained from the image features, and then displayed in the first feature mapping area
  • the virtual background image that has a mapping relationship with the scene image feature, the virtual character image that has a mapping relationship with the character image feature is displayed in the first feature mapping area, and the action that has a mapping relationship with the action image feature is displayed in the first feature mapping area Response content.
  • the image features include scene image features, character image features, and action image features
  • multiple image features can be displayed in the same first feature mapping area at the same time, or in different first feature maps. These multiple image features are respectively displayed in the feature mapping area.
  • the virtual lottery roulette in the virtual scene can be controlled to start to rotate.
  • text audio features and waveform audio features can be obtained from the audio features, and then the text that has a mapping relationship with the text audio features can be displayed in the second feature mapping area Respond to the content, and display audio dynamic effects that have a mapping relationship with the waveform audio feature in the second feature mapping area.
  • FIG. 10 schematically shows a flow chart of steps in an application scenario of the scene interaction method provided by an embodiment of the present application.
  • This method can be mainly applied to server devices that dynamically control virtual scenes.
  • the method for scene interaction in this application scenario mainly includes the following steps:
  • Step S1010 Turn on multiple cameras and multiple microphones in an offline scene. Collect stereo spatial image information related to user actions and other activities through multiple cameras, and collect stereo voice information related to user voice and other activities through multiple microphones.
  • FIG. 11A schematically shows a schematic diagram of the display state of the stereoscopic spatial image information collected in an embodiment of the present application.
  • the stereoscopic spatial image information collected by multiple cameras not only includes a person, but also includes the place where the person is located.
  • the scene can also include more detailed information such as character actions and expressions.
  • Step S1020 Receive image information and voice information in real time via WebSocket.
  • Step S1030 Perform person recognition, action recognition and scene recognition on the image information.
  • Step S1040 Through index traversal, the local area of the virtual scene is dynamically changed.
  • the feature area can be matted according to the image features obtained in real time, and after matting, the image features of each client's matting can be uniformly dispatched to another virtual scene of the activity, and the characters in each actual scene and their actions can be cast through calculations To the virtual scene, the virtual scene fits the actual activity type.
  • FIG. 11B schematically shows a schematic diagram of the display state of the virtual scene after fusion of the content of the real scene in the embodiment of the present application.
  • the actual scene characters in the offline activity scene are placed in the virtual scene in the form of real scene objects 1110, and are presented to the user together with the virtual scene objects 1120 generated in the virtual scene.
  • the character action and posture of the real scene object 1110 changes in real time following the actual scene character, and the virtual scene object 1120 can be configured and adjusted according to the actual activity type.
  • step S1050 the voice information is recognized and converted into text, and a voice waveform diagram is obtained.
  • the text part can be used to form voice commands, such as "start lottery", “start voting” and so on.
  • the voice waveform graph can be used to match the background music to which it is adapted.
  • FIG. 12 schematically shows a schematic diagram of the matching relationship between the voice waveform graph and the background music. As shown in FIG. 12, according to the voice waveform diagram 121 obtained from the voice information, a similar matching waveform diagram 122 can be obtained, and the corresponding background music can be determined based on the matching waveform diagram.
  • Step S1060 Through index traversal, the music dynamics of the virtual scene are dynamically changed.
  • the background music of the virtual scene can be matched according to the live voice waveform. For example, the offline event scene is relatively quiet, then the more soothing background music can be changed according to the matching result.
  • the feature mapping area it is also possible to cut out the feature mapping area according to the image characteristics obtained in real time. After the cut out, the image characteristics of each client's cut out are uniformly scheduled to the virtual scene corresponding to the current activity. And through calculation, the actions of characters in each real scene are put into the virtual scene to realize the type of the virtual scene suitable for the actual activity. At the same time, the background music of the activity can also be matched according to the voice information collected in the real scene.
  • Fig. 13 schematically shows a change controller used for scene interaction in an embodiment of the present application.
  • an MCU controller 1310 based on a Microcontroller Unit (MCU) can use hardware devices in the form of the Internet of Things to interactively control the physical scenes at the event site.
  • Data communication can be carried out at the event site through the Bluetooth communication module 1320 or other types of short-range communication devices.
  • the sensor 1330 can detect and collect interactive experience information at the event site.
  • the vibration module 1340 can provide physical vibration effects at the event site.
  • the lighting module 1350 can provide light visual effects at the event site, and the speaker 1360 can provide music effects at the event site.
  • the scene interaction method uses TensorFlow to physically recognize offline scenes and characters, and converts the communication to an online server for display on the terminal screen, and combines offline characters and scenes with online virtual scenes for integration and interaction , Including virtual lottery, virtual turntable, virtual bubble blowing, virtual driving, cycling, voting and other application scenarios, through the integration of offline recognition and online virtual scenes, as well as the combination of video technology, voice technology, and physical remote sensing technology.
  • Enhance the fun of the event enhance the interaction of the event, and also allow the customer to integrate the event participants from different regions into a virtual scene for remote interaction, enhance the influence of the event on brand marketing, and increase the user's participation in the event
  • the degree, interest and control of the activity enhance the value of the activity and have a very wide range of application scenarios.
  • Fig. 14 schematically shows a structural block diagram of a scene interaction device in some embodiments of the present application.
  • the scene interaction device 1400 may mainly include:
  • the scene determination module 1410 is configured to determine at least one real scene for interaction with the virtual scene.
  • the information acquisition module 1420 is configured to acquire real scene information of each of the real scenes in real time.
  • the feature extraction module 1430 is configured to perform feature extraction on each of the real scene information, so as to correspondingly obtain the scene features of each of the real scenes.
  • the feature mapping module 1440 is configured to map the scene feature of the at least one real scene to the virtual scene according to the corresponding relationship between the virtual scene and the real scene.
  • the scene features include at least one of image features and audio features.
  • the feature extraction module 1430 includes: an information extraction unit configured to obtain image information and audio information in each real scene information; an image feature extraction unit configured to perform feature extraction on image information to obtain reality The image feature of the scene; the audio feature extraction unit is configured to perform feature extraction on audio information to obtain the audio feature of the real scene.
  • the image feature extraction unit includes: a scene recognition subunit configured to perform scene recognition on image information to obtain scene image features of a real scene; and a face recognition subunit configured to perform face recognition on image information.
  • Recognition to obtain the character image characteristics of the real scene the character action recognition subunit is configured to perform character action recognition on the image information to obtain the action image characteristics of the displayed scene;
  • the first determining subunit is configured to combine the scene image characteristics, the characters The image feature and the action image feature are determined as the image feature of the real scene.
  • the image feature extraction unit includes: a partial image acquisition subunit configured to acquire partial images of a real scene corresponding to different image acquisition parameters from image information; an image stitching subunit configured to belong to the same Image splicing is performed on the partial images in the time interval to obtain a fusion image of the real scene; the image feature extraction subunit is configured to perform feature extraction on the fusion image to obtain the image feature of the real scene.
  • the image acquisition parameter includes at least one of an image acquisition angle and an image acquisition range.
  • the image feature extraction subunit includes: an edge detection subunit configured to perform edge detection on the fused image to obtain a feature region in the fused image; a feature extraction subunit configured to perform feature extraction on the feature region In order to get the image characteristics of the real scene.
  • the audio feature extraction unit includes: a voice recognition subunit configured to perform voice recognition on audio information to obtain text audio features of a real scene; and a waveform detection subunit configured to perform waveform detection on audio information to Obtain the waveform audio feature of the real scene; the second determining subunit is configured to determine the text audio feature and the waveform audio feature as the audio feature of the real scene.
  • the feature mapping module 1440 includes: an area determination unit configured to determine a feature mapping area corresponding to each real scene in the virtual scene according to the corresponding relationship between the virtual scene and the real scene; and a content display unit, It is configured to display scene content that has a mapping relationship with the scene feature of the corresponding real scene in the feature mapping area.
  • the feature mapping area includes a first feature mapping area and a second feature mapping area
  • the content display unit includes: an image response content display sub-unit, configured to display the image in the first feature when the scene feature is an image feature The image response content that has a mapping relationship with the image feature is displayed in the feature mapping area;
  • the audio response content display subunit is configured to display the audio that has a mapping relationship with the audio feature in the second feature mapping area when the scene feature is an audio feature Response content.
  • the image response content display subunit includes: an image feature acquisition subunit configured to acquire at least one of a scene image feature, a character image feature, and an action image feature from the image feature; a virtual background image display subunit The unit is configured to display a virtual background image that has a mapping relationship with the features of the scene image in the feature mapping area; the virtual character image display subunit is configured to display a virtual character image that has a mapping relationship with the features of the character image in the feature mapping area ; The action response content display subunit is configured to display the action response content that has a mapping relationship with the action image feature in the first feature mapping area.
  • the audio response content display subunit includes: an audio feature acquisition subunit configured to acquire text audio features and waveform audio features from the audio features; the text response content display subunit is configured to display in the second feature The text response content having a mapping relationship with the text audio feature is displayed in the mapping area; the audio dynamic effect display subunit is configured to display the audio dynamic effect having a mapping relationship with the waveform audio feature in the second feature mapping area.
  • the information acquisition module 1420 includes: a link establishment unit configured to establish a full-duplex communication protocol based on the transmission control protocol for real-time communication between the virtual scene and the real scene Link; The link communication unit is configured to use the real-time communication link to obtain real-world scene information of the real-world scene.
  • FIG. 15 shows a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
  • the computer system 1500 includes a central processing unit (Central Processing Unit, CPU) 1501, which can be loaded into a random system according to a program stored in a read-only memory (Read-Only Memory, ROM) 1502 or from a storage part 1508. Access to the program in the memory (Random Access Memory, RAM) 1503 to execute various appropriate actions and processing. In RAM 1503, various programs and data required for system operation are also stored.
  • the CPU 1501, ROM 1502, and RAM 1503 are connected to each other through a bus 1504.
  • An input/output (Input/Output, I/O) interface 1505 is also connected to the bus 1504.
  • the following components are connected to the I/O interface 1505: input part 1506 including keyboard, mouse, etc.; output part 1507 including cathode ray tube (Cathode Ray Tube, CRT), liquid crystal display (LCD), etc., and speakers 1507
  • a storage part 1508 including a hard disk, etc. and a communication part 1509 including a network interface card such as a LAN (Local Area Network) card and a modem.
  • the communication section 1509 performs communication processing via a network such as the Internet.
  • the driver 1510 is also connected to the I/O interface 1505 as needed.
  • a removable medium 1511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 1510 as required, so that the computer program read therefrom is installed into the storage portion 1508 as required.
  • the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, and the software product can be stored in a non-volatile storage medium (which can be a Compact Disc Read-Only Memory, CD-ROM) , U disk, mobile hard disk, etc.) or on the network, including several instructions to make a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) execute the method according to the embodiment of the present application.
  • a non-volatile storage medium which can be a Compact Disc Read-Only Memory, CD-ROM
  • U disk Compact Disc Read-Only Memory
  • mobile hard disk etc.
  • the network including several instructions to make a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) execute the method according to the embodiment of the present application.

Abstract

本申请提供了一种场景互动方法、装置、电子设备及计算机存储介质,涉及人工智能技术领域。该方法包括:确定与虚拟场景进行互动的至少一个现实场景;实时获取每一所述现实场景的现实场景信息;对每一所述现实场景信息进行特征提取,以对应得到每一所述现实场景的场景特征;根据所述虚拟场景与所述现实场景的对应关系,将所述至少一个现实场景的场景特征映射至所述虚拟场景中。通过本申请实施例,不仅能够提高互动效率,而且可以获得更加丰富多样的互动效果。

Description

场景互动方法、装置、电子设备及计算机存储介质
相关申请的交叉引用
本申请基于申请号为202010049112.1、申请日为2020年01月16日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及人工智能技术领域,涉及但不限于一种场景互动方法、装置、电子设备及计算机存储介质。
背景技术
随着互联网与信息技术的发展,越来越多的企业可以借助网络通信技术组织开展各种形式的线上以及线下营销活动。在活动现场的用户可以直接参与到线下活动中,而不在活动现场的用户也可以借助手机、电脑等网络通信设备参与到线上活动中。
然而,在传统的活动组织方式下,线上活动与线下活动相互分离,通常难以进行直接互动或者仅能进行形式受限的简单互动。因此,如何提高活动场景的互动效率和互动品质是目前亟待解决的问题。
发明内容
有鉴于此,本申请实施例提供一种场景互动方法、装置、电子设备及计算机存储介质,不仅能够提高互动效率,而且可以获得更加丰富多样的互动效果。
本申请实施例的技术方案是这样实现的:
本申请实施例提供一种场景互动方法,该方法由电子设备执行,该方法包括:确定与虚拟场景进行互动的至少一个现实场景;实时获取每一所述现实场景的现实场景信息;对每一所述现实场景信息进行特征提取,以 对应得到每一所述现实场景的场景特征;根据所述虚拟场景与所述现实场景的对应关系,将所述至少一个现实场景的场景特征映射至所述虚拟场景中。
本申请实施例提供一种场景互动装置,该装置包括:场景确定模块,被配置为确定与虚拟场景进行互动的至少一个现实场景;信息获取模块,被配置为实时获取每一所述现实场景的现实场景信息;特征提取模块,被配置为对每一所述现实场景信息进行特征提取,以对应得到每一所述现实场景的场景特征;特征映射模块,被配置为根据所述虚拟场景与所述现实场景的对应关系,将所述至少一个现实场景的场景特征映射至所述虚拟场景中。
本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如以上技术方案中的场景互动方法。
本申请实施例提供一种电子设备,该电子设备包括:处理器;以及存储器,用于存储所述处理器的可执行指令;其中,所述处理器被配置为经由执行所述可执行指令来执行如以上技术方案中的场景互动方法。
在本申请实施例提供的技术方案中,通过对现实场景信息进行特征提取,得到现实场景的场景特征,并将现实场景的场景特征映射至虚拟场景中,实现了将线下人物和场景与线上虚拟场景进行实时地融合与互动,不仅可以提高互动效率,而且可以获得更加丰富多样的互动效果。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请实施例的原理。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1示意性地示出了应用本申请实施例技术方案的示例性系统架构示意图。
图2示意性地示出了本申请一些实施例中场景互动方法的步骤流程图。
图3示意性地示出了本申请实施例由虚拟场景与现实场景进行互动的应用场景示意图。
图4示意性地示出了本申请实施例基于WebSocket建立的实时互动场景通信模型示意图。
图5示意性地示出了本申请实施例基于WebSocket协议的通信时序图。
图6示意性地示出了本申请一些实施例中对图像信息进行特征提取的步骤流程图。
图7示意性地示出了本申请实施例利用CNN模型提取图像特征的原理示意图。
图8示意性地示出了本申请实施例中TensorFlow的系统布局示意图。
图9示意性地示出了本申请一些实施例中对场景特征进行特征映射的步骤流程图。
图10示意性地示出了本申请实施例提供的场景互动方法在一应用场景中的步骤流程图。
图11A示意性地示出了本申请实施例中采集的立体空间图像信息的显示状态示意图。
图11B示意性地示出了本申请实施例中融合现实场景内容后的虚拟场景的显示状态示意图。
图12示意性地示出了本申请实施例中语音波形图的匹配关系示意图。
图13示意性地示出了本申请实施例中用于场景互动的变更控制器。
图14示意性地示出了在本申请一些实施例中的场景互动装置的结构框图。
图15示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本申请实施例将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本申请的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本申请实施例的技术方案而没有特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知方法、装置、实现或者操作以避免模糊本申请实施例的各方面。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
在本申请的相关技术中,单纯的线上活动或者线下活动均不能满足现今多样化的生活方式以及满足越来越充满好奇和趣味性的新青少年用户群体。
因此,基于相关技术中所存在的问题,本申请实施例基于计算机视觉、语音技术以及机器学习等人工智能技术,提供了一种场景互动方法、装置、电子设备及计算机存储介质。该场景互动方法可以应用于人工智能领域,采用人工智能技术实现线下人物和场景与线上虚拟场景进行实时地融合与互动。
下面对人工智能技术进行介绍:人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。本申请实施例主要涉及人工智能技术中的计算机视觉技术和语音处理技术等技术。
需要说明的是,计算机视觉技术(Computer Vision,CV)计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机 和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。语音技术(Speech Technology,ST)的关键技术有自动语音识别技术(Automatic Speech Recognition,ASR)和语音合成技术(Text To Speech,TTS)以及声纹识别技术。让计算机能听、能看、能说、能感觉,是未来人机交互的发展方向,其中语音成为未来最被看好的人机交互方式之一。
图1示意性地示出了应用本申请实施例技术方案的示例性系统架构示意图。
如图1所示,系统架构100可以包括客户端110、网络120和服务端130。客户端110可以包括智能手机、平板电脑、笔记本电脑、台式电脑等各种终端设备。服务端130可以包括网络服务器、应用服务器、数据库服务器等各种服务器设备。网络120可以是能够在客户端110和服务端130之间提供通信链路的各种连接类型的通信介质,例如可以是有线通信链路、无线通信链路等等。
根据实现需要,本申请实施例中的系统架构可以具有任意数目的客户端、网络和服务端。例如,服务端130可以是由多个服务器设备组成的服务器群组,客户端110也可以是由分布在同一线下活动场景或者分布在多个不同线下活动场景的多个终端设备组成的终端设备集群。另外,本申请实施例中的场景互动方法可以应用于客户端110,也可以应用于服务端130,或者还可以由客户端110和服务端130共同执行,本申请实施例对此不做特殊限定。
结合图1,对本申请实施例提供的场景互动方法的应用场景进行说明:
以企业营销活动为例进行说明,当企业组织线上和线下的同步营销活动时,可以采用用于进行营销活动同步的应用来实现,该应用可以包括商 家版和用户版,企业通过在终端上运行应用的商家版客户端并登陆,实现活动的发起,线上用户可以通过在终端上运行应用的用户版客户端并登陆,实现线上同步。本申请实施例中,服务端130为应用对应的服务端,客户端110包括商家的客户端和线上用户的客户端,商家通过客户端110形成虚拟场景,每一用户通过客户端110上传用户当前所处环境的现实场景对应的数据,客户端110将现实场景对应的数据通过网络120发送给服务端130,以使得服务端130能够实时获取现实场景的现实场景信息,并对每一现实场景信息进行特征提取,以对应得到每一现实场景的场景特征;最后根据虚拟场景与现实场景的对应关系,将至少一个现实场景的场景特征映射至虚拟场景中,实现了将线下人物和场景与线上虚拟场景进行实时地融合与互动,不仅可以提高互动效率,而且可以获得更加丰富多样的互动效果。
下面结合具体实施方式对本申请实施例提供的场景互动方法、装置、电子设备及计算机存储介质做出详细说明。
图2示意性地示出了本申请一些实施例中场景互动方法的步骤流程图。该方法可以应用于对虚拟场景进行展示的客户端,例如以线上直播的方式展示线上活动场景的手机或者电脑等终端设备。另外,该方法也可以应用于对线上及线下活动场景进行内容融合的服务端,例如为线上直播平台提供直播内容和技术支持的服务器设备。如图2所示,该方法主要可以包括以下步骤:
步骤S210、确定与虚拟场景进行互动的至少一个现实场景。
虚拟场景可以是通过手机、电脑等具有显示界面的终端设备向用户进行展示,并通过网络通信与线上用户进行交互的线上活动场景,而现实场景则是与对应的线上活动场景进行互动的线下活动场景。在一些可选的实施方式中,一个虚拟场景可以单独与一个现实场景进行互动,或者也可以同时与两个或者两个以上的现实场景同时进行互动。
图3示意性地示出了本申请实施例由虚拟场景与现实场景进行互动的应用场景示意图。如图3所示,虚拟场景310可以通过网络通信连接至少一个现实场景320,以实现同时与至少一个现实场景320进行互动。图中示意的虚拟场景310为一虚拟抽奖的应用场景,虚拟场景310另外也可 以是虚拟大转盘、虚拟吹泡泡、虚拟驾驶汽车、虚拟投票等各种应用场景。
步骤S220、实时获取每一现实场景的现实场景信息。
利用虚拟场景与现实场景之间的网络通信连接,可以实时地获取现实场景的现实场景信息。例如,在现实场景中可以通过摄像头、麦克风等信息采集设备实时地对活动现场进行信息采集,然后将采集到的信息通过网络通信连接传输至虚拟场景所在的服务端或者客户端。在一些可选的实施方式中,本步骤可以在虚拟场景与现实场景之间建立基于传输控制协议(Transmission Control Protocol,TCP)的全双工通信协议(WebSocket)的实时通信链路,并利用该实时通信链路获取现实场景的现实场景信息。
图4示意性地示出了本申请实施例基于WebSocket建立的实时互动场景通信模型示意图。WebSocket协议是基于TCP的一种新的网络协议,和http协议一样属于应用层协议,它实现了浏览器与服务器全双工(full-duplex)通信,也就是允许服务器主动发送信息给客户端。如图4所示,该通信模型可以包括应用层410、Socket抽象层420、传输层430、网络层440和链路层450。其中,应用层410包括多个用户进程,主要负责提供用户接口和服务支持。Socket抽象层420把TCP/IP层复杂的操作抽象为几个简单的接口供应用层410调用以实现进程在网络中通信。传输层430包括面向连接的TCP协议和无连接的UDP协议,主要负责整个报文从进程到进程的传送。UDP协议即用户数据报协议(User Datagram Protocol),可以为应用程序提供无需建立连接就可以发送封装的IP数据报的方法,UDP协议和TCP协议是传输层430中的两个主要的互为补充的协议。网络层440包括ICMP协议、IP协议和IGMP协议,主要负责主机间或与路由器、交换机间对分组数据的路由选择和传递。ICMP协议即Internet控制报文协议(Internet Control Message Protocol),主要用于在主机与路由器之间传递控制信息,包括报告错误、交换受限控制和状态信息等等。IP协议即互联网协议(Internet Protocol),主要负责数据的路由和传输,保证计算机之间可以发送和接收数据报。IGMP协议即Internet组管理协议(Internet Group Management Protocol),运行在主机和组播路由器之间,用于管理组播组成员的加入和离开,维护组播组成员的信息。链路层450包括ARP协议、硬件接口和RARP协议,主要负责建立和管 理节点间的链路,用于将有差错的物理信道变为无差错的、能可靠传输数据帧的数据链路。ARP协议即地址解析协议(Address Resolution Protocol),用于通过目标硬件设备460的IP地址解析目标硬件设备460的物理地址(MAC地址),而RARP协议则用于将物理地址转换为IP地址。
图5示意性地示出了本申请实施例基于WebSocket协议的通信时序图。如图5所示,WebSocket客户端510首先向TCP客户端520发送连接请求51(connecting),基于该连接请求51,TCP客户端520向TCP服务端530发送同步序列编号消息52(Synchronize Sequence Numbers,SYN),TCP服务端530以同步序列编号消息和确认字符(Acknowledge character,ACK)所形成的SYN+ACK数据包53向TCP客户端520作出应答。TCP客户端520接收到SYN+ACK数据包53后,会向TCP服务端530发送ACK数据包(图中未示出),同时会向WebSocket客户端510返回确认连接消息54(connected)。建立连接后,WebSocket客户端510与TCP客户端520完成握手55(handshake),通过TCP客户端520与TCP服务端530进行消息发送56(send)和消息接收57(receive),TCP服务端530与WebSocket服务端540进行通信交互。
步骤S230、对每一现实场景信息进行特征提取,以对应得到每一现实场景的场景特征。
本步骤经过特征提取得到的场景特征可以包括图像特征和音频特征中的至少一种。针对步骤S220中实时获取到的每个现实场景的现实场景信息,本步骤可以首先获取现实场景信息中的图像信息和音频信息,然后对图像信息进行特征提取以得到现实场景的图像特征,并且对音频信息进行特征提取以得到现实场景的音频特征。
举例而言,在对图像信息进行特征提取时,可以对图像信息进行场景识别以得到现实场景的场景图像特征,对图像信息进行人脸识别以得到现实场景的人物图像特征,对图像信息进行人物动作识别以得到显示场景的动作图像特征。其中,场景图像特征与现实场景的活动场地和活动背景等信息相关,例如可以用于反映现实场景为室内场景或者室外场景,又或者是具体的商场或者露天广场等等。人物图像特征与现实场景中参加线下活动的人员相关,例如可以基于人脸识别跟踪现实场景中的主持人、嘉宾或 者观众等活动参与人员。动作图像特征与活动现场人物的肢体动作相关,例如特定的姿势或者手势可以代表指定的活动指令。
在对音频信息进行特征提取时,可以对音频信息进行语音识别以得到现实场景的文本音频特征,对音频信息进行波形检测以得到现实场景的波形音频特征。文本音频特征与现实场景中活动参与人员的对话等语音内容相关,例如可以是对相关语音内容进行语音识别得到的文本字符或者特定的字符编码。波形音频特征与现实场景中的背景音乐、音效以及现场活动氛围等内容相关,例如可以反映现实场景的嘈杂状态或者安静状态。
步骤S240、根据虚拟场景与现实场景的对应关系,将至少一个现实场景的场景特征映射至虚拟场景中。
由步骤S230提取到的各种场景特征可以根据虚拟场景与现实场景的对应关系并通过指定的特征映射方式映射至虚拟场景中,例如,图像特征可以在虚拟场景中映射为虚拟背景、虚拟人物等相应的虚拟图像,音频特征可以在虚拟场景中映射为虚拟场景的背景音乐、音效或者语音指令等内容,从而实现现实场景与虚拟场景在场景内容上产生互动。
在本申请实施例提供的场景互动方法中,通过对现实场景的图像和音频等信息进行识别并转换通信至线上服务器显示于终端屏幕,将线下人物和场景结合线上虚拟场景进行实时地融合与互动,不仅可以提高互动效率,而且可以获得更加丰富多样的互动效果。
在一些实施例中,还可以通过线下识别的传导与线上虚拟场景的统合以及视频技术、语音技术、实体遥感技术的结合来增强活动的趣味性、增强活动的互动性,这样可以让不同区域的活动参与人员都融合到一个虚拟场景来进行远程互动,增强了活动为品牌营销带来的影响力,提高了用户的活动参与度、活动的趣味性和控制性,提升了活动价值,具有极为广泛的应用场景。
基于对现实场景信息的特征提取,可以将现实场景的场景核心特点在虚拟场景中进行展示并实现互动。由现实场景信息中获得的图像信息一般可以是通过摄像机等图像采集设备采集得到的动态视频图像,而且同一现实场景可以由多个摄像机在不同位置进行图像采集。在此基础上,为了提高图像特征提取的处理效率,可以预先将动态视频图像进行拼接和转换以 形成静态图像。图6示意性地示出了本申请一些实施例中对图像信息进行特征提取的步骤流程图。如图6所示,在以上各实施例的基础上,对图像信息进行特征提取可以包括以下步骤:
步骤S610、从图像信息中获取对应于不同图像采集参数的现实场景的局部图像。
图像采集参数可以包括图像采集角度和图像采集范围中的至少一种,例如在同一个现实场景中,可以布置具有不同图像拍摄角度和图像拍摄范围的多个摄像机同时进行拍摄,每个摄像机采集到的视频图像均为现实场景的局部图像。
步骤S620、对属于同一时间区间内的所述局部图像进行图像拼接,以得到所述现实场景的融合图像。
针对采集到的现实场景的连续的局部图像,可以按照预设的时间长度进行切分,得到对应不同时间区间的局部图像。然后将属于同一时间区间内的对应于不同图像采集参数的现实场景的局部图像进行拼接,即得到现实场景的融合图像。
步骤S630、对融合图像进行特征提取以得到现实场景的图像特征。
经过图像拼接处理后可以得到对应于不同时间区间的静态的融合图像,每个融合图像均可以通过特征提取得到相应的现实场景的图像特征。在一些可选的实施方式中,本步骤可以先对融合图像进行边缘检测以得到融合图像中的特征区域,然后再对特征区域进行特征提取以得到现实场景的图像特征。通过边缘检测可以缩小特征提取范围,提高特征提取速度和特征提取的准确性。
在对现实场景信息中的图像信息进行特征提取时,可以通过预先训练的机器学习模型来实现,例如可以采用卷积神经网络(Convolutional Neural Networks,CNN)对输入图像进行卷积和池化处理,最终输出图像特征。图7示意性地示出了本申请实施例利用CNN模型提取图像特征的原理示意图。如图7所示,CNN模型的输入图像是经过图像拼接后的一个时间区间内的融合图像710。针对现实场景对应于同一时间区间和不同图像采集参数的多组局部图像,沿水平方向按照时间顺序进行排列,沿竖直方向按照不同的图像采集参数进行排列,将动态变化的图像拼接成为一幅静态 的融合图像710。在CNN模型中至少包括有一个或者多个卷积层720,另外也可以包括一个或者多个池化层730以及一个或者多个其他网络结构740(例如,在一些实施例中,其他网络结构740可以是全连接层)。经过多个网络层逐层进行特征提取和特征映射后,最终输出得到融合图像710对应的图像特征。
本申请实施例可以利用TensorFlow进行神经网络的训练,图8示意性地示出了本申请实施例中TensorFlow的系统布局示意图。
如图8所示,一个TensorFlow集群810(TF Cluster)包含有多个TensorFlow服务端811(TF Server),这些TensorFlow服务端811被切分为一系列批处理的任务组job,而任务组job又会负责处理一系列任务Tasks。一个TensorFlow集群810一般会专注于一个相对高层的目标,譬如用多台机器并行地训练一个神经网络。
一个job会包含一系列的致力于某个相同目标的任务Tasks。例如,对应于参数服务器812(Parameter Server)的job n会用于处理存储与更新网络参数相关的工作。而对应于各个计算服务器813(workers)的job0……job n-1会用于承载那些用于计算密集型的无状态节点。一般来说一个job中的Tasks会运行在不同的机器中。
一个Task一般会关联到某个单一的TensorFlow服务端的处理过程,属于一个特定的job并且在该job的任务列表中有个唯一的索引。
TensorFlow服务端用于运行grpc_tensorflow_server的处理过程,是一个集群中的一员,并且向外暴露了一个Master Service与一个Worker Service。
Master Service是一个远程过程调用协议(Remote Procedure Call,RPC)服务,用于与一系列远端的分布式设备进行交互。Master Service实现了用于进行会话(Session)的会话接口,即tensorflow::Session接口,并且用来协调多个Worker service。
Worker service是一个执行TensorFlow计算图(TF graph)中部分内容的远程过程调用服务。
TensorFlow客户端820(Client)一般会构建一个TensorFlow计算图并且使用tensorflow::Session接口来完成与TensorFlow集群的交互。 TensorFlow客户端一般会用Python或者C++编写,一般来说一个TensorFlow客户端可以同时与多个TensorFlow服务端进行交互,并且一个TensorFlow服务端也可以同时服务于多个TensorFlow客户端。
基于TensorFlow构建神经网络后,可以利用样本数据对神经网络进行训练。针对线下活动对应的现实场景,可以通过模拟的方式来录制并录入大量的线下活动场景视频。
利用TensorFlow中的tf.nn.conv2d算法进行调用,可以调取大量的视频和图片图像进行训练。利用OPEN CV可以进行图像边缘化识别,识别出来的区块有一定的形状数据,通过形状数据以及训练图像数据统计准备,进行对比即可识别出属于哪些特征。利用样本数据进行迭代训练可以实现对神经网络中网络参数的不断更新优化。例如,某一网络层中涉及一算法公式a*0.5+b,针对该公式的迭代更新过程如下:
5.4*5.0+1.88=28.88
9.35805*5.0+2.67161=49.4619
9.4589*5.0+2.69178=49.9863
9.46147*5.0+2.69229=49.9996
9.46154*5.0+2.69231=50.0
基于该更新过程,可以看到参数a的值从5.4逐渐增大至9.4589并进一步逐渐增大至9.46154,参数b的值从1.88逐渐增大至2.67161并进一步逐渐增大至2.69231。
在一些可选的实施方式中,以基于支持向量机(SupportVectorMachine,SVM)的分类器为例,可以使用如下损失函数:
L(y)=max(0,1-ty)
其中,y为预测值,在-1到+1之间,t为目标值(-1或+1)。y的值在-1和+1之间就可以了,并不鼓励|y|>1|y|>1,即并不鼓励分类器过度自信,让某个正确分类的样本距离分割线超过1并不会有任何奖励。
另外,在本申请的一些可选的实施方式中,可以使用tf.train.GradientDescentOptimizer作为Tensorflow中实现梯度下降算法的优化器。其中,梯度下降算法可以选用标准梯度下降GD、批量梯度下降BGD和随机梯度下降SGD中的任意一种。
以标准梯度下降为例,学习训练的网络参数为W,损失函数为J(W),则损失函数关于网络参数的偏导数即相关梯度为dJ(W),学习率为η,那么梯度下降法更新网络参数的公式即为:
W s+1=W s-ηΔJ(W s)
网络参数的调整沿着梯度方向不断减小的方向最小化损失函数。基本策略是在有限视野内寻找最快下山路径,每迈出一步参考当前位置最陡的梯度方向,从而决定下一步。
基于TensorFlow训练得到的神经网络可以应用于对现实场景的现实场景信息进行特征提取,提取得到的场景特征则被映射至对应的虚拟场景中。图9示意性地示出了本申请一些实施例中对场景特征进行特征映射的步骤流程图,如图9所示,在以上各实施例的基础上,步骤S240中根据虚拟场景与现实场景的对应关系,将至少一个现实场景的场景特征映射至虚拟场景中,可以包括以下步骤:
步骤S910、根据虚拟场景与现实场景的对应关系,在虚拟场景中确定与每一现实场景相对应的特征映射区域。
在虚拟场景中可以将一部分指定的场景展示区域确定为与现实场景相对应的特征映射区域。当一个虚拟场景同时与多个现实场景进行互动时,每个现实场景都可以在虚拟场景中对应确定一个特征映射区域,这些特征映射区域可以是相互间隔的展示区域,也可以是部分重叠或者完全重叠的展示区域。
步骤S920、在特征映射区域中展示与对应的现实场景的场景特征具有映射关系的场景内容。
这里,特征映射区域包括第一特征映射区域和第二特征映射区域,第一特征映射区域与第二特征映射区域可以是完全重叠的展示区域,也可以是部分重叠的展示区域,还可以是完全不重叠且相互间隔的展示区域。
当现实场景的场景特征为图像特征时,可以在第一特征映射区域中展示与图像特征具有映射关系的图像响应内容。当场景特征为音频特征时,可以在第二特征映射区域中展示与音频特征具有映射关系的音频响应内容。
在一些可选的实施方式中,基于图像特征展示图像响应内容时,可以 从图像特征中获取场景图像特征、人物图像特征和动作图像特征中的至少一种,然后在第一特征映射区域中展示与场景图像特征具有映射关系的虚拟背景图像,在第一特征映射区域中展示与人物图像特征具有映射关系的虚拟人物图像,并在第一特征映射区域中展示与动作图像特征具有映射关系的动作响应内容。需要说明的是,如果图像特征中包括场景图像特征、人物图像特征和动作图像特征中的多个,可以在同一个第一特征映射区域中同时展示多个图像特征,还可以在不同的第一特征映射区域中分别展示这多个图像特征。以虚拟抽奖为例,当识别得到的动作图像特征对应了用户转动轮盘的动作时,可以控制虚拟场景中的虚拟抽奖轮盘开始转动。
在一些可选的实施方式中,基于音频特征展示音频响应内容时,可以从音频特征中获取文本音频特征和波形音频特征,然后在第二特征映射区域中展示与文本音频特征具有映射关系的文本响应内容,并在第二特征映射区域中展示与波形音频特征具有映射关系的音频动态效果。
图10示意性地示出了本申请实施例提供的场景互动方法在一应用场景中的步骤流程图。该方法主要可以应用于对虚拟场景进行动态控制的服务器设备。如图10所示,在该应用场景下进行场景互动的方法主要包括以下步骤:
步骤S1010、在线下场景打开多个摄像头和多个麦克风。通过多个摄像头采集与用户动作等活动内容相关的立体空间图像信息,并通过多个麦克风采集与用户语音等活动内容相关的立体语音信息。
图11A示意性地示出了本申请实施例中采集的立体空间图像信息的显示状态示意图,如图11A所示,通过多个摄像头采集的立体空间图像信息中不仅包括人物,还包括人物所在的场景,当然,还可以包括人物动作、神态等更加细节的信息。
步骤S1020、通过WebSocket实时接收图像信息和语音信息。
步骤S1030、对图像信息进行人物识别、动作识别和场景识别。
步骤S1040、通过索引遍历,对虚拟场景的局部区域进行动态变更。例如,可以根据实时获取到的图像特征对特征区域进行抠图,抠图后将各客户端抠图的图像特征统一调度至活动另一虚拟场景,并通过计算将各实际场景人物以及人物动作投放到虚拟场景,虚拟场景契合实际活动类型。 图11B示意性地示出了本申请实施例中融合现实场景内容后的虚拟场景的显示状态示意图。如图11B所示,线下活动场景中的实际场景人物以现实场景对象1110的方式投放在虚拟场景中,与虚拟场景中生成的虚拟场景对象1120共同呈现给用户。其中,现实场景对象1110的人物动作和姿态跟随实际场景人物实时变化,而虚拟场景对象1120可以根据实际活动类型进行配置和调整。
步骤S1050、对语音信息进行识别转化为文字,并得到语音波形图。其中文字部分可以用于形成语音指令,例如“开始抽奖”、“开始投票”等等。语音波形图可以用于匹配与之适应的背景音乐,图12示意性地示出了语音波形图与背景音乐之间的匹配关系示意图。如图12所示,根据由语音信息得到的语音波形图121,可以得到与之相似的匹配波形图122,基于该匹配波形图可以确定相应的背景音乐。
步骤S1060、通过索引遍历,对虚拟场景的音乐动效进行动态变更。虚拟场景的背景音乐可以根据现场的语音波形图进行匹配,例如线下活动现场较为安静,那么可以根据匹配结果变更较为舒缓的背景音乐。
在一些可选的实施方式中,还可以根据实时获取的图像特征,对特征映射区域进行抠图,抠图后将各个客户端抠图的图像特征统一调度至当前的活动对应的虚拟场景中,并通过计算将各个现实场景的人物等动作投放到该虚拟场景中,实现虚拟场景切合实际活动的类型,同时,活动的背景音乐也可以根据现实场景所采集的语音信息来进行匹配。
在一些可选的实施方式中,除了在虚拟场景中映射现实场景的场景特征以外,还可以根据虚拟场景向现实场景反馈互动内容。图13示意性地示出了本申请实施例中用于场景互动的变更控制器。如图13所示,基于微控制单元(Microcontroller Unit,MCU)的MCU控制器1310可以利用物联网形式的硬件设备对活动现场的实体场景进行互动控制。通过蓝牙通信模块1320或者其他类型的短程通信设备可以在活动现场进行数据通信,通过传感器1330可以对活动现场的互动感受信息进行检测和采集,通过振动模块1340可以在活动现场提供物理振动效果,通过灯光模块1350可以在活动现场提供灯光视觉效果,通过扬声器1360可以在活动现场提供音乐效果。
本申请实施例提供的场景互动方法,通过使用TensorFlow对线下场景以及人物进行物理识别,并转换通信至线上服务器显示于终端屏幕,将线下人物和场景结合线上虚拟场景进行融合与互动,包括虚拟抽奖、虚拟大转盘、虚拟吹泡泡、虚拟驾驶骑车和投票等应用场景,通过线下识别的传导与线上虚拟场景的统合以及视频技术、语音技术、实体遥感技术的结合来增强活动的趣味性、增强活动的互动性,还客户让不同区域的活动参与人都融合到一个虚拟场景来进行远程互动,增强了活动为品牌营销带来的影响力,提高了用户的活动参与度、活动的趣味性和控制性,提升了活动价值,具有极为广泛的应用场景。
应当注意,尽管在附图中以特定顺序描述了本申请实施例中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。
以下介绍本申请实施例的装置实施例,可以用于执行本申请上述实施例中的场景互动方法。对于本申请装置实施例中未披露的细节,请参照本申请上述的场景互动方法的实施例。
图14示意性地示出了在本申请一些实施例中的场景互动装置的结构框图。如图14所示,场景互动装置1400主要可以包括:
场景确定模块1410,被配置为确定与虚拟场景进行互动的至少一个现实场景。
信息获取模块1420,被配置为实时获取每一所述现实场景的现实场景信息。
特征提取模块1430,被配置为对每一所述现实场景信息进行特征提取,以对应得到每一所述现实场景的场景特征。
特征映射模块1440,被配置为根据所述虚拟场景与所述现实场景的对应关系,将所述至少一个现实场景的场景特征映射至所述虚拟场景中。
在一些实施例中,场景特征包括图像特征和音频特征中的至少一种。
在一些实施例中,特征提取模块1430包括:信息提取单元,被配置为获取每一现实场景信息中的图像信息和音频信息;图像特征提取单元, 被配置为对图像信息进行特征提取以得到现实场景的图像特征;音频特征提取单元,被配置为对音频信息进行特征提取以得到现实场景的音频特征。
在一些实施例中,图像特征提取单元包括:场景识别子单元,被配置为对图像信息进行场景识别以得到现实场景的场景图像特征;人脸识别子单元,被配置为对图像信息进行人脸识别以得到现实场景的人物图像特征;人物动作识别子单元,被配置为对图像信息进行人物动作识别以得到显示场景的动作图像特征;第一确定子单元,被配置为将场景图像特征、人物图像特征和动作图像特征,确定为现实场景的图像特征。
在一些实施例中,图像特征提取单元包括:局部图像获取子单元,被配置为从图像信息中获取对应于不同图像采集参数的现实场景的局部图像;图像拼接子单元,被配置为对属于同一时间区间内的局部图像进行图像拼接,以得到现实场景的融合图像;图像特征提取子单元,被配置为对融合图像进行特征提取以得到现实场景的图像特征。
在一些实施例中,图像采集参数包括图像采集角度和图像采集范围中的至少一种。
在一些实施例中,图像特征提取子单元包括:边缘检测子单元,被配置为对融合图像进行边缘检测以得到融合图像中的特征区域;特征提取子单元,被配置为对特征区域进行特征提取以得到现实场景的图像特征。
在一些实施例中,音频特征提取单元包括:语音识别子单元,被配置为对音频信息进行语音识别以得到现实场景的文本音频特征;波形检测子单元,被配置为对音频信息进行波形检测以得到现实场景的波形音频特征;第二确定子单元,被配置为将文本音频特征和波形音频特征,确定为现实场景的音频特征。
在一些实施例中,特征映射模块1440包括:区域确定单元,被配置为根据虚拟场景与现实场景的对应关系,在虚拟场景中确定与每一现实场景相对应的特征映射区域;内容展示单元,被配置为在特征映射区域中展示与对应的现实场景的场景特征具有映射关系的场景内容。
在一些实施例中,所述特征映射区域包括第一特征映射区域和第二特征映射区域;内容展示单元包括:图像响应内容展示子单元,被配置为当场景特征为图像特征时,在第一特征映射区域中展示与图像特征具有映射 关系的图像响应内容;音频响应内容展示子单元,被配置为当场景特征为音频特征时,在第二特征映射区域中展示与音频特征具有映射关系的音频响应内容。
在一些实施例中,图像响应内容展示子单元包括:图像特征获取子单元,被配置为从图像特征中获取场景图像特征、人物图像特征和动作图像特征中的至少一种;虚拟背景图像展示子单元,被配置为在特征映射区域中展示与场景图像特征具有映射关系的虚拟背景图像;虚拟人物图像展示子单元,被配置为在特征映射区域中展示与人物图像特征具有映射关系的虚拟人物图像;动作响应内容展示子单元,被配置为在第一特征映射区域中展示与动作图像特征具有映射关系的动作响应内容。
在一些实施例中,音频响应内容展示子单元包括:音频特征获取子单元,被配置为从音频特征中获取文本音频特征和波形音频特征;文本响应内容展示子单元,被配置为在第二特征映射区域中展示与文本音频特征具有映射关系的文本响应内容;音频动态效果展示子单元,被配置为在第二特征映射区域中展示与波形音频特征具有映射关系的音频动态效果。在一些实施例中,信息获取模块1420包括:链路建立单元,被配置为在所述虚拟场景与所述现实场景之间,建立基于传输控制协议的全双工通信协议进行实时通信的实时通信链路;链路通信单元,被配置为利用实时通信链路获取现实场景的现实场景信息。
本申请各实施例中提供的场景互动装置的细节已经在对应的方法实施例中进行了详细的描述,因此此处不再赘述。
图15示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
需要说明的是,图15示出的电子设备的计算机系统1500仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图15所示,计算机系统1500包括中央处理单元(Central Processing Unit,CPU)1501,其可以根据存储在只读存储器(Read-Only Memory,ROM)1502中的程序或者从储存部分1508加载到随机访问存储器(Random Access Memory,RAM)1503中的程序而执行各种适当的动作和处理。在RAM 1503中,还存储有系统操作所需的各种程序和数据。CPU 1501、ROM 1502以及RAM 1503通过总线1504彼此相连。输入/输出(Input/Output,I/O)接口1505也连接至总线1504。
以下部件连接至I/O接口1505:包括键盘、鼠标等的输入部分1506;包括诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分1507;包括硬盘等的存储部分1508;以及包括诸如LAN(Local Area Network,局域网)卡、调制解调器等的网络接口卡的通信部分1509。通信部分1509经由诸如因特网的网络执行通信处理。驱动器1510也根据需要连接至I/O接口1505。可拆卸介质1511,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1510上,以便于从其上读出的计算机程序根据需要被安装入存储部分1508。
特别地,根据本申请的实施例,各个方法流程图中所描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分1509从网络上被下载和安装,和/或从可拆卸介质1511被安装。在该计算机程序被中央处理单元(Central Processing Unit,CPU)1501执行时,执行本申请的系统中限定的各种功能。
需要说明的是,本申请实施例所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的还可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请实施例中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请实施例中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数 据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本申请实施例的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是只读光盘(Compact Disc Read-Only Memory,CD-ROM),U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、触控终端、或者网络设备等)执行根据本申请实施方式的方法。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适 应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。
工业实用性
本申请实施例中,当需要实现线上活动与线下活动同步时,可以通过对现实场景信息进行特征提取,得到现实场景的场景特征,并将现实场景的场景特征映射至虚拟场景中,实现了将线下人物和场景与线上虚拟场景进行实时地融合与互动,不仅可以提高互动效率,而且可以获得更加丰富多样的互动效果,且这种方式能够提高线上用户的活动参与度,提升了活动价值,具有极大的工业实用性。

Claims (16)

  1. 一种场景互动方法,所述方法由电子设备执行,包括:
    确定与虚拟场景进行互动的至少一个现实场景;
    实时获取每一所述现实场景的现实场景信息;
    对每一所述现实场景信息进行特征提取,以对应得到每一所述现实场景的场景特征;
    根据所述虚拟场景与所述现实场景的对应关系,将所述至少一个现实场景的场景特征映射至所述虚拟场景中。
  2. 根据权利要求1所述的场景互动方法,其中,所述场景特征包括图像特征和音频特征中的至少一种。
  3. 根据权利要求2所述的场景互动方法,其中,所述对每一所述现实场景信息进行特征提取,以对应得到每一所述现实场景的场景特征,包括:
    获取每一所述现实场景信息中的图像信息和音频信息;
    对所述图像信息进行特征提取以得到所述现实场景的图像特征;
    对所述音频信息进行特征提取以得到所述现实场景的音频特征。
  4. 根据权利要求3所述的场景互动方法,其中,所述对所述图像信息进行特征提取以得到所述现实场景的图像特征,包括:
    对所述图像信息进行场景识别以得到所述现实场景的场景图像特征;
    对所述图像信息进行人脸识别以得到所述现实场景的人物图像特征;
    对所述图像信息进行人物动作识别以得到所述显示场景的动作图像特征;
    将所述场景图像特征、所述人物图像特征和所述动作图像特征,确定为所述现实场景的图像特征。
  5. 根据权利要求3所述的场景互动方法,其中,所述对所述图像信息进行特征提取以得到所述现实场景的图像特征,包括:
    从所述图像信息中获取对应于不同图像采集参数的所述现实场景的局部图像;
    对属于同一时间区间内的所述局部图像进行图像拼接,以得到所述现 实场景的融合图像;
    对所述融合图像进行特征提取以得到所述现实场景的图像特征。
  6. 根据权利要求5所述的场景互动方法,其中,所述图像采集参数包括图像采集角度和图像采集范围中的至少一种。
  7. 根据权利要求5所述的场景互动方法,其中,所述对所述融合图像进行特征提取以得到所述现实场景的图像特征,包括:
    对所述融合图像进行边缘检测以得到所述融合图像中的特征区域;
    对所述特征区域进行特征提取以得到所述现实场景的图像特征。
  8. 根据权利要求3所述的场景互动方法,其中,所述对所述音频信息进行特征提取以得到所述现实场景的音频特征,包括:
    对所述音频信息进行语音识别以得到所述现实场景的文本音频特征;
    对所述音频信息进行波形检测以得到所述现实场景的波形音频特征;
    将所述文本音频特征和所述波形音频特征,确定为所述现实场景的音频特征。
  9. 根据权利要求1所述的场景互动方法,其中,所述根据所述虚拟场景与所述现实场景的对应关系,将所述至少一个现实场景的场景特征映射至所述虚拟场景中,包括:
    根据所述虚拟场景与所述现实场景的对应关系,在所述虚拟场景中确定与每一所述现实场景相对应的特征映射区域;
    在所述特征映射区域中展示与对应的所述现实场景的场景特征具有映射关系的场景内容。
  10. 根据权利要求9所述的场景互动方法,其中,所述特征映射区域包括第一特征映射区域和第二特征映射区域;
    所述在所述特征映射区域中展示与对应的所述现实场景的场景特征具有映射关系的场景内容,包括:
    当所述场景特征为图像特征时,在所述第一特征映射区域中展示与所述图像特征具有映射关系的图像响应内容;
    当所述场景特征为音频特征时,在所述第二特征映射区域中展示与所述音频特征具有映射关系的音频响应内容。
  11. 根据权利要求10所述的场景互动方法,其中,所述在所述第一 特征映射区域中展示与所述图像特征具有映射关系的图像响应内容,包括:
    从所述图像特征中获取场景图像特征、人物图像特征和动作图像特征中的至少一种;
    在所述特征映射区域中展示与所述场景图像特征具有映射关系的虚拟背景图像;
    在所述特征映射区域中展示与所述人物图像特征具有映射关系的虚拟人物图像;
    在所述第一特征映射区域中展示与所述动作图像特征具有映射关系的动作响应内容。
  12. 根据权利要求10所述的场景互动方法,其中,所述在所述第二特征映射区域中展示与所述音频特征具有映射关系的音频响应内容,包括:
    从所述音频特征中获取文本音频特征和波形音频特征;
    在所述第二特征映射区域中展示与所述文本音频特征具有映射关系的文本响应内容;
    在所述第二特征映射区域中展示与所述波形音频特征具有映射关系的音频动态效果。
  13. 根据权利要求1所述的场景互动方法,其中,所述实时获取所述现实场景的现实场景信息,包括:
    在所述虚拟场景与所述现实场景之间,建立基于传输控制协议的全双工通信协议进行实时通信的实时通信链路;
    利用所述实时通信链路获取所述现实场景的现实场景信息。
  14. 一种场景互动装置,包括:
    场景确定模块,被配置为确定与虚拟场景进行互动的至少一个现实场景;
    信息获取模块,被配置为实时获取每一所述现实场景的现实场景信息;
    特征提取模块,被配置为对每一所述现实场景信息进行特征提取,以对应得到每一所述现实场景的场景特征;
    特征映射模块,被配置为根据所述虚拟场景与所述现实场景的对应关系,将所述至少一个现实场景的场景特征映射至所述虚拟场景中。
  15. 一种电子设备,包括:
    处理器;以及
    存储器,用于存储所述处理器的可执行指令;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1至13中任一项所述的场景互动方法。
  16. 一种计算机可读存储介质,其中存储有计算机可执行指令,所述计算机可执行指令用于被处理器执行时,实现权利要求1至13中任一项所述的场景互动方法。
PCT/CN2020/127750 2020-01-16 2020-11-10 场景互动方法、装置、电子设备及计算机存储介质 WO2021143315A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020227002916A KR20220027187A (ko) 2020-01-16 2020-11-10 장면 인터랙션 방법 및 장치, 전자 장치 및 컴퓨터 저장 매체
JP2022521702A JP7408792B2 (ja) 2020-01-16 2020-11-10 シーンのインタラクション方法及び装置、電子機器並びにコンピュータプログラム
EP20913676.1A EP3998550A4 (en) 2020-01-16 2020-11-10 METHOD AND APPARATUS FOR SCENE INTERACTION, ELECTRONIC DEVICE AND COMPUTER STORAGE MEDIA
US17/666,081 US20220156986A1 (en) 2020-01-16 2022-02-07 Scene interaction method and apparatus, electronic device, and computer storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010049112.1A CN111274910B (zh) 2020-01-16 2020-01-16 场景互动方法、装置及电子设备
CN202010049112.1 2020-01-16

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/666,081 Continuation US20220156986A1 (en) 2020-01-16 2022-02-07 Scene interaction method and apparatus, electronic device, and computer storage medium

Publications (1)

Publication Number Publication Date
WO2021143315A1 true WO2021143315A1 (zh) 2021-07-22

Family

ID=71001711

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/127750 WO2021143315A1 (zh) 2020-01-16 2020-11-10 场景互动方法、装置、电子设备及计算机存储介质

Country Status (6)

Country Link
US (1) US20220156986A1 (zh)
EP (1) EP3998550A4 (zh)
JP (1) JP7408792B2 (zh)
KR (1) KR20220027187A (zh)
CN (1) CN111274910B (zh)
WO (1) WO2021143315A1 (zh)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274910B (zh) * 2020-01-16 2024-01-30 腾讯科技(深圳)有限公司 场景互动方法、装置及电子设备
CN111986700A (zh) * 2020-08-28 2020-11-24 广州繁星互娱信息科技有限公司 无接触式操作触发的方法、装置、设备及存储介质
CN112053450A (zh) 2020-09-10 2020-12-08 脸萌有限公司 文字的显示方法、装置、电子设备及存储介质
CN112381564A (zh) * 2020-11-09 2021-02-19 北京雅邦网络技术发展有限公司 汽车销售数字电商
CN112995132B (zh) * 2021-02-01 2023-05-02 百度在线网络技术(北京)有限公司 在线学习的交互方法、装置、电子设备和存储介质
CN113377205B (zh) * 2021-07-06 2022-11-11 浙江商汤科技开发有限公司 场景显示方法及装置、设备、车辆、计算机可读存储介质
CN113923463B (zh) * 2021-09-16 2022-07-29 南京安汇科技发展有限公司 一种直播场景的实时抠像与场景合成系统及实现方法
CN114189743B (zh) * 2021-12-15 2023-12-12 广州博冠信息科技有限公司 数据传输方法、装置、电子设备和存储介质
KR20230158283A (ko) * 2022-05-11 2023-11-20 삼성전자주식회사 전자 장치 및 이의 제어 방법
CN115113737B (zh) * 2022-08-30 2023-04-18 四川中绳矩阵技术发展有限公司 一种虚拟对象声音和图像的重现方法、系统、设备及介质
CN116709501A (zh) * 2022-10-26 2023-09-05 荣耀终端有限公司 业务场景识别方法、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104536579A (zh) * 2015-01-20 2015-04-22 刘宛平 交互式三维实景与数字图像高速融合处理系统及处理方法
US9292085B2 (en) * 2012-06-29 2016-03-22 Microsoft Technology Licensing, Llc Configuring an interaction zone within an augmented reality environment
CN106492461A (zh) * 2016-09-13 2017-03-15 广东小天才科技有限公司 一种增强现实ar游戏的实现方法及装置、用户终端
CN108269307A (zh) * 2018-01-15 2018-07-10 歌尔科技有限公司 一种增强现实交互方法及设备
CN111274910A (zh) * 2020-01-16 2020-06-12 腾讯科技(深圳)有限公司 场景互动方法、装置及电子设备

Family Cites Families (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002169901A (ja) * 2000-12-01 2002-06-14 I Academy:Kk インターネットを利用した集合参加型教育システム
JP4150798B2 (ja) * 2004-07-28 2008-09-17 国立大学法人徳島大学 デジタルフィルタリング方法、デジタルフィルタ装置、デジタルフィルタプログラム及びコンピュータで読み取り可能な記録媒体
JP4464360B2 (ja) 2006-03-27 2010-05-19 富士フイルム株式会社 監視装置、監視方法、及びプログラム
KR101558553B1 (ko) * 2009-02-18 2015-10-08 삼성전자 주식회사 아바타 얼굴 표정 제어장치
GB2491819A (en) * 2011-06-08 2012-12-19 Cubicspace Ltd Server for remote viewing and interaction with a virtual 3-D scene
US20130222371A1 (en) * 2011-08-26 2013-08-29 Reincloud Corporation Enhancing a sensory perception in a field of view of a real-time source within a display screen through augmented reality
JP2013161205A (ja) * 2012-02-03 2013-08-19 Sony Corp 情報処理装置、情報処理方法、及びプログラム
US20140278403A1 (en) * 2013-03-14 2014-09-18 Toytalk, Inc. Systems and methods for interactive synthetic character dialogue
CN103617432B (zh) * 2013-11-12 2017-10-03 华为技术有限公司 一种场景识别方法及装置
JP2015126524A (ja) * 2013-12-27 2015-07-06 ブラザー工業株式会社 遠隔会議プログラム、端末装置および遠隔会議方法
CN103810353A (zh) * 2014-03-09 2014-05-21 杨智 一种虚拟现实中的现实场景映射系统和方法
US10356393B1 (en) * 2015-02-16 2019-07-16 Amazon Technologies, Inc. High resolution 3D content
CN106997236B (zh) 2016-01-25 2018-07-13 亮风台(上海)信息科技有限公司 基于多模态输入进行交互的方法和设备
JP6357595B2 (ja) 2016-03-08 2018-07-11 一般社団法人 日本画像認識協会 情報伝送システム、情報受信装置、およびコンピュータプログラム
CN105608746B (zh) * 2016-03-16 2019-10-11 成都电锯互动科技有限公司 一种将现实进行虚拟实现的方法
CN106355153B (zh) * 2016-08-31 2019-10-18 上海星视度科技有限公司 一种基于增强现实的虚拟对象显示方法、装置以及系统
CN106485782A (zh) * 2016-09-30 2017-03-08 珠海市魅族科技有限公司 一种现实场景在虚拟场景中展示的方法以及装置
DE102016121281A1 (de) 2016-11-08 2018-05-09 3Dqr Gmbh Verfahren und Vorrichtung zum Überlagern eines Abbilds einer realen Szenerie mit virtuellen Bild- und Audiodaten und ein mobiles Gerät
EP4202840A1 (en) 2016-11-11 2023-06-28 Magic Leap, Inc. Periocular and audio synthesis of a full face image
CN108881784B (zh) * 2017-05-12 2020-07-03 腾讯科技(深圳)有限公司 虚拟场景实现方法、装置、终端及服务器
JP6749874B2 (ja) * 2017-09-08 2020-09-02 Kddi株式会社 音波信号から音波種別を判定するプログラム、システム、装置及び方法
US20190129607A1 (en) * 2017-11-02 2019-05-02 Samsung Electronics Co., Ltd. Method and device for performing remote control
CN108305308A (zh) * 2018-01-12 2018-07-20 北京蜜枝科技有限公司 虚拟形象的线下展演系统及方法
JP7186009B2 (ja) * 2018-04-03 2022-12-08 東京瓦斯株式会社 画像処理システム及びプログラム
CN108985176B (zh) * 2018-06-20 2022-02-25 阿里巴巴(中国)有限公司 图像生成方法及装置
CN109903129A (zh) * 2019-02-18 2019-06-18 北京三快在线科技有限公司 增强现实显示方法与装置、电子设备、存储介质
CN110113298B (zh) * 2019-03-19 2021-09-28 视联动力信息技术股份有限公司 数据传输方法、装置、信令服务器和计算机可读介质
CN110084228A (zh) * 2019-06-25 2019-08-02 江苏德劭信息科技有限公司 一种基于双流卷积神经网络的危险行为自动识别方法
CN110365666B (zh) * 2019-07-01 2021-09-14 中国电子科技集团公司第十五研究所 军事领域基于增强现实的多端融合协同指挥系统
US11232601B1 (en) * 2019-12-30 2022-01-25 Snap Inc. Audio-triggered augmented reality eyewear device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9292085B2 (en) * 2012-06-29 2016-03-22 Microsoft Technology Licensing, Llc Configuring an interaction zone within an augmented reality environment
CN104536579A (zh) * 2015-01-20 2015-04-22 刘宛平 交互式三维实景与数字图像高速融合处理系统及处理方法
CN106492461A (zh) * 2016-09-13 2017-03-15 广东小天才科技有限公司 一种增强现实ar游戏的实现方法及装置、用户终端
CN108269307A (zh) * 2018-01-15 2018-07-10 歌尔科技有限公司 一种增强现实交互方法及设备
CN111274910A (zh) * 2020-01-16 2020-06-12 腾讯科技(深圳)有限公司 场景互动方法、装置及电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3998550A4 *

Also Published As

Publication number Publication date
JP7408792B2 (ja) 2024-01-05
US20220156986A1 (en) 2022-05-19
CN111274910B (zh) 2024-01-30
EP3998550A4 (en) 2022-11-16
JP2022551660A (ja) 2022-12-12
CN111274910A (zh) 2020-06-12
EP3998550A1 (en) 2022-05-18
KR20220027187A (ko) 2022-03-07

Similar Documents

Publication Publication Date Title
WO2021143315A1 (zh) 场景互动方法、装置、电子设备及计算机存储介质
US11403595B2 (en) Devices and methods for creating a collaborative virtual session
EP3962074A1 (en) System and method enabling interactions in virtual environments with virtual presence
WO2018036149A1 (zh) 一种多媒体交互教学系统及方法
WO2015192631A1 (zh) 视频会议系统及方法
CN106547884A (zh) 一种替身机器人的行为模式学习系统
US20220070241A1 (en) System and method enabling interactions in virtual environments with virtual presence
WO2019119314A1 (zh) 一种仿真沙盘系统
KR20220030177A (ko) 가상 환경에서의 애플리케이션의 전달 시스템 및 방법
CN110287947A (zh) 互动课堂中的互动教室确定方法及装置
US20200351384A1 (en) System and method for managing virtual reality session technical field
CN112839196B (zh) 一种实现在线会议的方法、装置以及存储介质
WO2022223029A1 (zh) 虚拟形象的互动方法、装置和设备
KR20220029453A (ko) 사용자 그래픽 표현 기반 사용자 인증 시스템 및 방법
KR20220029454A (ko) 가상 환경에서 가상으로 방송하기 위한 시스템 및 방법
WO2022256585A2 (en) Spatial audio in video conference calls based on content type or participant role
KR20220029467A (ko) 접근하는 사용자 표현 간의 애드혹 가상통신
KR20220030178A (ko) 가상 환경에서 클라우드 컴퓨팅 기반 가상 컴퓨팅 리소스를 프로비저닝하는 시스템 및 방법
US20220166918A1 (en) Video chat with plural users using same camera
KR102212035B1 (ko) 제스처 인식 기반 원격 교육서비스 시스템 및 방법
WO2021241221A1 (ja) 情報処理装置及び情報処理方法
JP2023527624A (ja) コンピュータプログラムおよびアバター表現方法
WO2024051467A1 (zh) 图像处理方法、装置、电子设备及存储介质
Noor et al. VRFlex: Towards the Design of a Virtual Reality Hyflex Class Model
JP2022166195A (ja) 情報抽出装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20913676

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20227002916

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020913676

Country of ref document: EP

Effective date: 20220209

ENP Entry into the national phase

Ref document number: 2022521702

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE