CN117873305A - Information processing apparatus and method, and computer-readable storage medium - Google Patents

Information processing apparatus and method, and computer-readable storage medium Download PDF

Info

Publication number
CN117873305A
CN117873305A CN202211233436.6A CN202211233436A CN117873305A CN 117873305 A CN117873305 A CN 117873305A CN 202211233436 A CN202211233436 A CN 202211233436A CN 117873305 A CN117873305 A CN 117873305A
Authority
CN
China
Prior art keywords
information processing
information
real
processing apparatus
sensor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211233436.6A
Other languages
Chinese (zh)
Inventor
尚弘
施展
李翔
许宽宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Group Corp
Original Assignee
Sony Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Group Corp filed Critical Sony Group Corp
Priority to CN202211233436.6A priority Critical patent/CN117873305A/en
Priority to PCT/CN2023/123183 priority patent/WO2024078384A1/en
Publication of CN117873305A publication Critical patent/CN117873305A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer

Abstract

The present application relates to an information processing apparatus and method, a computer-readable storage medium. Wherein the information processing apparatus includes a processing circuit configured to: in response to a trigger based on at least one element in the real space, a trigger event is generated in a virtual space corresponding to the real space, thereby enabling real-time interaction between the real space and the virtual space.

Description

Information processing apparatus and method, and computer-readable storage medium
Technical Field
The present disclosure relates to the field of information processing technology, and in particular, to enabling real-time interactions between a real space and a virtual space corresponding to the real space. And more particularly, to an information processing apparatus and method, a computer-readable storage medium.
Background
Although virtual fabrication techniques have greatly reduced the requirements for post-production of video, there are still some special effects that require post-production to be accomplished. For example, in the post-production of a film, for example, an actor waves a magic wand which gives a special effect, the position change of the magic wand during shooting cannot be strictly preset, so that the prior art can only realize the special effect through post-synthesis. That is, the post-production of movies typically requires high time and money costs, and special effects need to be seen after synthesis, which is very detrimental to the iteration of the shooting process.
Disclosure of Invention
The following presents a simplified summary of the invention in order to provide a basic understanding of some aspects of the invention. It should be understood that this summary is not an exhaustive overview of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention. Its purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.
According to one aspect of the present disclosure, there is provided an information processing apparatus including a processing circuit configured to: in response to a trigger based on at least one element in the real space, a trigger event is generated in a virtual space corresponding to the real space, thereby enabling real-time interaction between the real space and the virtual space.
In the information processing device according to the embodiment of the disclosure, when triggering is performed based on at least one element in the real space, a triggering event is correspondingly generated in the virtual space, so that real-time interaction between the real space and the virtual space can be realized, and further virtual production capable of realizing real-time interaction can be realized.
According to another aspect of the present disclosure, there is provided an information processing method including: in response to a trigger based on at least one element in the real space, a trigger event is generated in a virtual space corresponding to the real space, thereby enabling real-time interaction between the real space and the virtual space.
According to other aspects of the present invention, there are also provided a computer program code and a computer program product for implementing the above-described information processing method, and a computer readable storage medium having recorded thereon the computer program code for implementing the above-described information processing method.
Drawings
To further clarify the above and other advantages and features of the present invention, a more particular description of the invention will be rendered by reference to the appended drawings. The accompanying drawings are incorporated in and form a part of this specification, along with the detailed description that follows. Elements having the same function and structure are denoted by the same reference numerals. It is appreciated that these drawings depict only typical examples of the invention and are therefore not to be considered limiting of its scope. In the drawings:
fig. 1 shows a functional block diagram of an information processing apparatus according to an embodiment of the present disclosure.
Fig. 2A to 2F are diagrams showing an example of movie virtual production by using the information processing apparatus according to the embodiment of the present disclosure.
Fig. 3A to 3D are exemplary diagrams illustrating online classroom virtual production with an information processing device according to an embodiment of the present disclosure.
Fig. 4 is a flowchart showing a flow example of an information processing method according to an embodiment of the present disclosure.
Fig. 5 is a block diagram showing an example structure of a personal computer that may be employed in the embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure will be described hereinafter with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual implementation are described in this specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions must be made in order to achieve the developer's specific goals, such as compliance with system-and business-related constraints, and that these constraints will vary from one implementation to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.
It is also noted herein that, in order to avoid obscuring the disclosure with unnecessary details, only the device structures and/or processing steps closely related to the solution according to the present disclosure are shown in the drawings, while other details not greatly related to the present disclosure are omitted.
Embodiments according to the present disclosure are described in detail below with reference to the accompanying drawings.
Fig. 1 shows a functional block diagram of an information processing apparatus 100 according to an embodiment of the present disclosure, as shown in fig. 1, the information processing apparatus 100 includes: the processing unit 102 may be configured to generate a trigger event in a virtual space corresponding to the real space in response to a trigger based on at least one element in the real space, thereby enabling real-time interaction between the real space and the virtual space.
Wherein the processing unit 102 may be implemented by one or more processing circuits, which may be implemented as a chip, for example.
As an example, a real space may also be referred to as a real world or a real scene, and a virtual space may also be referred to as a virtual world or a virtual scene.
The at least one element includes one or more of an entity in real space (e.g., a person, an animal, other objects, etc.), a background, light, and sound.
As an example, the real space may be a real shooting space in movie production, and the virtual space may be a predetermined virtual scene in the movie production. As an example, an actor may interact with elements in a virtual scene, such as objects, environments, light, etc., through limb movements, gestures, expressions, etc.; actors can interact with elements such as objects, environments, light rays and the like in the virtual scene through some props and the like; interaction with elements such as objects, environment, light, etc. in the virtual scene can be achieved through some specific sounds; and may real-time change or convert actor sounds to achieve interaction.
As an example, the real space may be a real shooting space in an online classroom, and the virtual space may be a predetermined virtual scene in the online classroom. For example, a teacher may interact with objects in a virtual scene through a pointer.
Those skilled in the art will also recognize other video scenes besides movie production and online class, and will not be further described here.
Taking the special effect of the magic wand in film and television production as the position and the gesture of the magic wand by the actor cannot be predicted, the prior method needs to shoot the action of the actor to swing the magic wand, and then the special effect of the magic wand is superimposed on a film through post production. Such post-production typically requires high time and money costs, and special effects need to be seen after synthesis, which is very detrimental to the iteration of the shooting process.
In the information processing apparatus 100 according to the embodiment of the present disclosure, when triggering is performed based on at least one element in the real space, a trigger event is generated in the virtual space accordingly, whereby real-time interaction between the real space and the virtual space can be achieved, and thus virtual production that can interact in real time can be achieved.
Also taking the magic wand waving special effect in film and television production as an example, the information processing device 100 according to the embodiment of the present disclosure enables an actor to interact in real time in a virtual space when waving a magic wand in the real space through real-time interaction between the real space and the virtual space, that is, can realize virtual production capable of interacting in real time, thereby avoiding the work of post-synthesis to a great extent, and saving time cost and money cost.
Hereinafter, for convenience, a scene of movie production will be described as an example.
As an example, the processing unit 102 may be configured to perform a simulation process on a predetermined region including at least one element in the real space to construct a simulation region, and to implement real-time interaction based on an association between the simulation region and the virtual space. Thereby, the predetermined area is simulated, and a bridge between the predetermined area included in the real space and the virtual space is constructed through the simulated area.
For example, in movie production, the predetermined area may be a shooting area, and performing analog processing on the predetermined area means performing analog or simulation on the shooting area (for example, digitizing a state of the shooting area) to construct an analog area (for example, a digitized area) corresponding to the shooting area.
For example, the processing unit 102 may map (or translate) objects in the simulation area into the virtual space, thereby establishing an association between the simulation area and the virtual space.
For example, the processing unit 102 may obtain, in real time, the positions of all objects in the simulation area under the real world coordinate system through the multi-sensor joint calibration described below, so as to establish an association between the real space and the virtual space via the simulation area, so as to implement real-time interaction between the real space and the virtual space.
As an example, the processing unit 102 may be configured to perform analog processing based on perception data obtained by perceiving a predetermined area by at least one sensor (e.g., a data sensor).
For example, at least one sensor collects data of a predetermined area in real time to obtain sensory data. For example, in movie production, a plurality of sensors are deployed in a shooting area to realize the perception of field change, and the perception forms comprise acousto-optic and electric, video, audio, distance, motion and the like.
The processing unit 102 may process and analyze the perceived data in real time for analog processing. For example, in movie production, the processing unit 102 simulates an actor, prop, scene, and the like in real space (e.g., analyzes coordinates, gestures, actions, expressions, voices, and the like of elements of the actor, prop, and the like in real time) based on perception data and the like, thereby constructing a simulation area.
As an example, the at least one sensor includes one or more of a visible light sensor, an infrared sensor, an ultraviolet sensor, a distance sensing sensor, an audio sensor, a motion sensor.
For example, the visible light sensor, infrared sensor, ultraviolet sensor are video sensors. For example, visible light sensors, infrared sensors, ultraviolet sensors may be used to collect video or image information of a predetermined area included in the real space, providing raw data for video signal processing; visible light sensors, infrared sensors, ultraviolet sensors may be included in cameras, video capture cards, etc., whose acquired bands include, but are not limited to: visible light, infrared, ultraviolet, etc.
For example, the audio sensors are used to collect audio information within a predetermined area included in the real space, provide raw data for audio signal processing, and the number of the audio sensors is not less than 1. The audio sensor may be, for example, a microphone or the like.
For example, the distance sensing sensor is used for sensing the distance from all objects (actors, props, scenery and the like) in a preset area to each sensor in real time; the distance sensing sensor can be combined with algorithms such as point cloud recognition and the like, so that recognition, positioning and tracking accuracy of objects and human bodies can be enhanced; the distance sensing sensor can be combined with video signal processing and multi-sensor joint calibration, and the positions of all objects in the preset area under the real world coordinate system can be acquired in real time, so that the association between the real world coordinate system and the virtual world coordinate system (namely, the association between the real space and the virtual space) is established through the simulation area, and the coordinates of the objects in the preset area can be calculated more accurately; distance sensors include, but are not limited to: LIDAR, I-TOF sensor, D-TOF sensor, structured light distance sensor, millimeter wave distance sensor, etc.
For example, motion sensors are used to sense the motion of all objects within a predetermined area.
The above sensor can sense a predetermined area included in the real space in an all-around manner.
As an example, the processing unit 102 may be configured to analyze at least one element included in the simulation area and trigger when the at least one element satisfies a predetermined trigger condition, and wherein the at least one element included in the simulation area corresponds to the at least one element included in the real space. Thus, by analyzing elements included in the simulation area to analyze different state changes of elements of a predetermined area in the real space, interaction of the real space with the virtual space is triggered when a predetermined trigger condition is satisfied.
For example, the predetermined trigger condition may be the actor waving a magic wand, a change in light in the predetermined area exceeding a predetermined threshold, and so on.
As an example, the at least one element includes one or more of an entity in the analog region, a background, a light, and a sound. For example, the entities in the simulation area may correspond to persons, animals, other objects, etc. in a predetermined area included in the real space; the background in the analog region may correspond to the background in the predetermined region; the light in the analog region may correspond to the light in the predetermined region, and the sound in the analog region may correspond to the sound in the predetermined region.
The processing unit 102 may analyze the changes of all elements in the simulation area in real time (e.g., analyze actors, props, backgrounds, lights, sounds captured in scenes, etc.), track and capture targets for key elements, and acquire spatial positions, attitudes, motion states, etc. of key elements in real time (for actors, face recognition may also be performed, and analyze expression and expression of actors, etc.). The state of the target object is compared with a predetermined trigger condition in advance, and when the state of the target object satisfies the predetermined trigger condition, real-time interaction with the virtual scene is triggered (i.e., triggering is performed).
As an example, the processing unit 102 may be configured to determine whether the at least one element satisfies a predetermined trigger condition based on the element information related to the at least one element, thereby triggering.
As an example, the element information may include one or more of video information, audio information, and three-dimensional information related to at least one element.
As an example, the video information may include one or more of face recognition information, object recognition information, gesture information, expression information, gesture information.
For example, the processing unit 102 may perform face recognition, positioning and tracking, such as for distinguishing, positioning and tracking the identity IDs of the interactive actor subjects, and assign different interactive contents to different IDs. For example, actors A and B wave the pole at the same time, but pole A and B have different effects. For example, the processing unit 102 may include, but is not limited to, the following functions when used for face recognition, positioning, and tracking. Face entry function: the actors not in the database are subjected to face entry, so that all actors can be accurately identified; face recognition function: identifying all people appearing in the image and giving out their identity; face positioning function: positioning a human face appearing in the image, and outputting a coordinate corresponding to the human face in the image; face tracking function: and tracking the moving human face in the image, and dynamically outputting accurate human face coordinates.
The face recognition, positioning and tracking may be achieved by various suitable techniques, for example, see the techniques described in the literature "Sefik Ilkin Serengil et al, lightFace: A Hybrid Deep Face Recognition Framework,2020Innovations in Intelligent Systems and Applications Conference (ASYU), 2020", which are not described in any more detail herein.
For example, the processing unit 102 may perform target object recognition, location, and tracking, such as for recognizing particular objects in video signals, and locating and tracking objects, such as to facilitate rendering real-time special effects (e.g., magic sticks held by actors, lighted firearms, laser-emitting prop guns, etc.) on such objects. For example, the processing unit 102 may include, but is not limited to, the following functions when used for target object identification, localization, and tracking. Object entry function: the objects which are not in the database are recorded, so that all special objects can be accurately identified; object recognition function: identifying a particular object appearing in the image; object positioning function: positioning an object appearing in the image, and outputting a coordinate corresponding to the object in the image; object tracking function: and tracking the moving object face in the image, and dynamically outputting accurate object coordinates.
The identification, localization and tracking of the target object may be achieved by a variety of suitable techniques, such as those described in the literature "Alexey Bochkovskiy et al, YOLOv4: optimal Speed and Accuracy of Object Detection, arxiv:2004.10934,2020," and https:// github.
For example, the processing unit 102 may perform human body keypoint detection, e.g., to identify a human body in a video signal, extract keypoints of a human body present in the video signal, to facilitate triggering some interactive effects by certain specific actions of the actor (e.g., triggering an explosion effect when the actor squats, etc.). For example, the processing unit 102 may include, but is not limited to, the following functions when used to perform human keypoint detection. Human body key point detection function: extracting key coordinate points of bones of a human body appearing in the video signal; human behavior analysis function: the behavior of the human body in the video signal, such as walking, squatting, running, etc., is analyzed.
The detection of key points in the human body can be achieved by various suitable techniques, for example, see the techniques described in https:// development loader, huawei, com/consumer/en/doc/development agent/hiai-Guides/skin-detection-0000001051008415, which are not further described herein.
For example, the processing unit 102 may perform gesture recognition, e.g., for recognizing and tracking a hand in a video signal, recognizing gesture information corresponding to the hand, facilitating triggering of some interactive special effects by an actor completing a specific gesture (e.g., the actor sounding finger triggering an ambient light change special effect, etc.). For example, processing unit 102 may include, but is not limited to, the following functionality when performing gesture recognition. Gesture input function: inputting gestures which are not in a database, so that all gestures can be accurately identified; hand recognition function: identifying a hand appearing in the image; hand positioning function: positioning the hand appearing in the image, and outputting the corresponding coordinates of the hand in the image; hand tracking function: tracking the moving hand in the image, and dynamically outputting accurate hand coordinates; gesture recognition function: gesture information expressed by the hands, such as specific heart, ring finger, tree thumb, etc., is identified.
The gesture recognition may be implemented by a variety of suitable techniques, such as those described in https:// gitsub.
For example, the processing unit 102 may perform expression recognition, e.g., to recognize facial expressions of an actor that has been tracked, to facilitate triggering some interactive special effects by the actor performing a particular expression (e.g., a heart-shaped special effect of flashing light is rendered in a screen when the actor is happy, etc.). For example, the processing unit 102 may include, but is not limited to, the following functions when performing expression recognition. Expression input function: the expressions which are not in the database are recorded, so that all the expressions can be accurately identified; expression recognition function: expression information of the actor's face is identified, such as happy, hard, anger, fear, etc.
The expression recognition may be implemented by various suitable techniques, for example, see the techniques described in https:// gitsub.
As an example, the audio information includes voiceprint identification information and/or voice conversion information.
For example, the processing unit 102 may perform voice recognition, e.g., to identify certain sounds in the audio signal, to facilitate triggering certain interactive special effects by certain sounds (e.g., special effects of an actor striking a barrel filled with explosives to trigger an explosion, etc.). For example, processing unit 102 may include, but is not limited to, the following functions when performing voice recognition. Sound entry function: recording sounds which are not in the database, and ensuring that all specific sounds can be accurately identified; voice recognition function: a specific sound emitted from a predetermined area included in the real space is recognized and an accurate sound emission time is given.
For example, processing unit 102 may perform speech recognition, e.g., to identify specific speech of an actor in an audio signal, to facilitate triggering certain interactive special effects by certain specific speech (e.g., an actor spells to trigger certain magic special effects, etc.). For example, processing unit 102 may include, but is not limited to, the following functionality when performing speech recognition. Voice input function: the method includes the steps that the lines which are not in a database are recorded, and accurate identification of all specific lines is ensured; voice recognition function: and identifying the specific speech sounds emitted by the shooting area, and giving out accurate speech sounds emission time.
For example, the processing unit 102 may perform sound conversion, for example, to perform conversion processing on the sound of an actor in the audio signal in real time, for example, to perform sound conversion processing on the actor, or the like. For example, the processing unit 102 may include, but is not limited to, the following functions when performing sound conversion. Voiceprint recognition function: distinguishing sounds of different actors so as to process only sounds of specific actors; sound conversion function: the actor's sounds are processed in real time to achieve different sound effects.
As an example, the three-dimensional information includes one or more of spatial location information, motion information, distance information.
For example, the processing unit 102 may perform motion capture, e.g., for capturing the motion of an actor in real time, the motion capture and human keypoint detection cooperatively implementing more accurate and faster actor motion capture functions, particularly for some highly difficult motion or visual dead angles, with better complementarity. The motion capture function may be implemented by a specific motion capture module. For example, motion capture modules are often miniaturized and can be worn in a hidden manner on an actor without interfering with or affecting the shooting process; motion capture modules include, but are not limited to: motion capture devices based on inertial devices, motion capture devices based on video signals or depth sensors, motion capture devices based on visible or invisible punctuation, and the like.
For example, the processing unit 102 may perform multi-sensor joint calibration, for example, to achieve joint calibration of the sensors such as video, audio, distance, motion, etc. by photographing a target whose size or three-dimensional structure is completely known, obtain positions and attitudes of all the sensors in a real world coordinate system, and further obtain accurate positions of objects in a predetermined area included in the real space in the real world coordinate system; and by adjusting the virtual world coordinate system, the virtual world coordinate system can be aligned with the real world coordinate system.
The joint calibration of the multiple sensors can be achieved by various suitable techniques, for example, see the literature Guohang Yan et al, openCalib: A Multi-sensor Calibration Toolbox for Autonomous Driving, arXiv:2205.14087,2022 ", which is not described here.
For example, the element information may be used to provide metadata for the simulation process. For example, the processing unit 102 may analyze the states of actors, props, and scenes in the predetermined area using artificial intelligence or the like to provide metadata for the simulation process.
Further, after constructing the simulation area, the processing unit 102 may analyze a change in at least one of the actor, prop, coordinates of the scene, etc. in the simulation area, or at least one of the actor's motion, posture, expression, gesture, etc. or at least one of the actor's voice, scene sound, etc. to determine whether a predetermined trigger condition is satisfied.
For example, the processing unit 102 may also process the element information, and interact with the virtual world already constructed according to a preset scenario rule.
As an example, the processing unit 102 may be configured to render the virtual space based on the trigger event generated by the real-time interaction, resulting in an updated virtual space.
As an example, the trigger event is the generation of a movie effect. For example, special effects are object collisions, explosion special effects, spark flashes, etc. For example, in the event that actor A is identified as waving a magic wand, special effect rendering of the position of the tip of the wand in the virtual world is triggered. For example, upon recognizing that actor B is lifting a fire, a moving light source is added at a corresponding location in the virtual world. For example, when an expression that actor C performs poorly is recognized, the main light source in the virtual world is weakened, and clouds are added to create an atmosphere. Thus, an all-around stereoscopic interaction with the virtual space can be achieved.
As an example, the triggering event may also be an online teaching special effect, such as a special effect on interaction of teachers and students in an online classroom scenario. Other examples of trigger events will also occur to those skilled in the art and will not be discussed further herein.
As an example, the processing unit 102 may be configured to cause the updated virtual space to be displayed on a display device displaying the virtual space instead of the virtual space, resulting in an updated display image related to the trigger event.
As an example, the display device includes an LED screen (light emitting diode screen). The LED screen may also be referred to as an LED VP (virtual production) screen or VP large screen. Other examples of display devices will also occur to those skilled in the art and will not be further described herein.
For example, the updated display image may be used as a real-time shooting background for film shooting.
For example, when the VP is shot, analog processing (e.g., digital processing) is performed on a shooting area in the real space, so that real-time interaction is performed between the shooting area and the virtual space, and the rendered virtual space is displayed on an LED large screen as a VP background, thereby realizing a VP special effect of real-time interaction. That is, the information processing apparatus 100 can implement interaction with the virtual world from a real photographing region using real2virtual (real 2 virtual) technology, and render the interaction result and the special effects on the LED VP screen in real time.
As an example, the processing unit 102 may be configured to capture, using the image capturing device, a predetermined area at the time of generating the trigger event and an image displayed by the display device (for example, an image having a special effect related to the trigger event), thereby obtaining a captured image.
For example, the processing unit 102 may take an image obtained by photographing a predetermined area when a trigger event is generated as a foreground image and an image obtained by photographing an image displayed on the display device as a background image. By generating a trigger event according to a change in the shooting scene (e.g., a change in an element in a predetermined area) to update an image displayed by the display device (i.e., change the shooting background via the updated display image), real-time interactive virtual shooting can be realized.
As an example, the processing unit 102 may be configured to determine a position and/or a size of rendering on the virtual space based on a position and/or a gesture relationship between the image capturing device and the display device. For example, the position and/or size of rendering by generating a special effect on the virtual space is determined based on the positional and/or posture relation between the image pickup device and the display device.
For example, for the scene that the actor mentioned above waves a magic wand emitting special effects, the information processing device 100 captures, for example, the positions and/or attitudes of the actor and the magic wand through real2virtual technology, interacts the information with the virtual space and makes special effects by combining the position and/or attitude relation between the camera device and the display device, and displays the image of precisely overlapped magic wand special effects in the LED large screen through real-time rendering of the virtual space, thereby completing shooting of the magic wand special effects in real time, avoiding post-production, and saving time cost and money cost to a great extent.
For example, the information processing apparatus 100 may be provided with high-performance computing and processing capabilities by a high-performance computer (group), a high-performance graphics card, a cloud computing technology, or the like.
Fig. 2A to 2F are diagrams showing an example of movie virtual production by the information processing apparatus 100 according to the embodiment of the present disclosure.
As shown in fig. 2A, actor 1 and actor 2 stand in front of the LED VR screen, and the photographing region includes at least actor 1 and actor 2, with actor 1 holding a prop. Although not shown in fig. 2A, actor 1 may be wearing a motion capture device (e.g., moCap) to capture his motion. A plurality of sensors (e.g., at least one of a video sensor, an audio sensor, a distance sensing sensor, and a motion sensor) are used to collect data of a photographing region to obtain sensing data, and an image pickup device photographs the photographing region and the LED VP screen. As is apparent from the above description, the information processing apparatus 100 can construct a simulation area corresponding to a photographing region based on perceptual data.
As shown in fig. 2B, the information processing apparatus 100 analyzes the analog region, for example, recognizes actor 1 and actor 2 by face recognition, respectively, and recognizes props (for example, a magic wand held by actor 1) by object recognition, and for example, obtains information about actor 1 and actor 2 as follows: actor 1 holds a magic wand in a standing and left-handed position, coordinates in real space are (600, 800), and expressions are smiles; actor 2 holds a schoolbag in a standing posture, coordinates (300, 600) in real space, and expression is calm.
As shown in fig. 2C, the information processing apparatus 100 performs audio analysis on the analog region, for example, obtains information on 2 pieces of audio (for example, audio 1 and audio 2) as follows: the type of the audio 1 is voice, the voice recognition result is "Confringo", and the voice recognition result is judged to be voiceprint of the actor 1; the type of audio 2 is sound and the recognition result is ambient sound.
As shown in fig. 2D, the information processing apparatus 100 performs position and motion analysis on the analog region, for example, obtains information on actor 1 and actor 2 as: the position of actor 1 in the simulation area is 1.2,22.5,10.5, the position of actor 2 in the simulation area is 2.2,22.3,11.2, and the position of the magic wand in the simulation area is 3.2,25.0,11.2; actor 1 acts as if he is waving the left hand wrist. In fig. 2D, the position information is schematically represented by a block.
As shown in fig. 2E, a virtual space is previously set and a preset virtual space (for example, a scene other than actor 1 and actor 2 and an image pickup device as shown in fig. 2E) is displayed on the LED VP screen, and the information processing apparatus 100 associates position information of elements such as actors, props, and the like in a simulation area with the virtual space, establishes a bridge (association) of a photographing area with the virtual space through the simulation area.
As shown in fig. 2F, assume that the preset trigger condition is "[ actor 1 holds [ magic wand ], waves the wrist, and receives the voice [ confringed ]". When the triggering condition is met, a triggering event is generated, the virtual space is rendered, and for example, the triggering event is a film and television special effect: the head of the magic wand renders light particles and the object in the direction of the magic wand explodes. As above, the position and/or size of the rendering is determined based on the positional and/or pose relationship between the camera and the LED VP screen.
As can be seen in conjunction with fig. 2A-2F, the information processing apparatus 100 according to the embodiment of the present disclosure can realize real-time interaction of a photographing region with a virtual space.
Fig. 3A to 3D are exemplary diagrams illustrating online classroom virtual production with the information processing device 100 according to the embodiment of the present disclosure.
FIG. 3A shows a generalized exemplary diagram of online classroom virtual production. As shown in fig. 3A, a teacher gives lessons in a lesson area, a plurality of sensors (e.g., at least one of a video sensor, an audio sensor, a distance sensing sensor, and a motion sensor) are used to collect data of the lesson area to obtain sensing data, and an image pickup device picks up images of the lesson area and a display screen. As is apparent from the above description, the information processing apparatus 100 can construct a simulation area corresponding to the lecture area based on the perception data. The information processing apparatus 100 can perform classroom interaction with elements in a virtual space displayed on a display screen (for example, "current intensity", "resistance", and "ohm law", etc. shown in fig. 3A) based on automatic analysis of information such as actions, behaviors, voices, etc. of a teacher in an analog region, and display the interaction result in real-time on a student display terminal.
For example, as shown in fig. 3B, the information processing apparatus 100 may analyze the pointer position of the teacher in the simulation area and recognize actions such as the teacher striking the screen, drawing an emphasis on the screen, and the like, render an interaction result of the teacher with the virtual space displayed on the display screen in real time based on the recognized actions, and display the interaction result on the student display terminal in real time.
For example, as shown in fig. 3C, the information processing apparatus 100 may perform real-time rendering of superimposed special effects (e.g., special effects highlighting physical elements in fig. 3C) on the contents in an blackboard writing by automatically analyzing the blackboard writing written by a teacher, such as optical character analysis (OCR), and the like, and display the rendering result on the student display terminal in real time.
For example, as shown in fig. 3D, the information processing apparatus 100 may automatically analyze the voice of a teacher (for example, mr. Smith), convert the voice of the teacher (for example, hi@john, do you know the answer. In combination with Natural Language Processing (NLP) and other techniques, simple interaction with students can be achieved.
As can be seen in conjunction with fig. 3A-3D, the information processing apparatus 100 according to the embodiment of the present disclosure can implement real-time interaction of a lecture area with a virtual space.
The present disclosure also provides embodiments of information processing methods, corresponding to the above-described information processing apparatus embodiments.
Fig. 4 is a flowchart showing a flow example of the information processing method S400 according to the embodiment of the present disclosure.
The information processing method S400 according to the embodiment of the present disclosure starts from S402.
In S404, in response to the trigger based on the at least one element in the real space, a trigger event is generated in the virtual space corresponding to the real space, thereby enabling real-time interaction between the real space and the virtual space.
The information processing method S400 ends at S406.
In the information processing method S400 according to the embodiment of the present disclosure, when triggering is performed based on at least one element in the real space, a triggering event is correspondingly generated in the virtual space, so that real-time interaction between the real space and the virtual space can be achieved, and further virtual production capable of real-time interaction can be achieved.
As an example, in S404, a predetermined region including at least one element in a real space is subjected to simulation processing to construct a simulation region, and real-time interaction is realized based on an association between the simulation region and a virtual space.
As an example, based on perception data obtained by perceiving a predetermined area by at least one sensor, analog processing is performed.
As an example, the at least one sensor includes one or more of a visible light sensor, an infrared sensor, an ultraviolet sensor, a distance sensing sensor, an audio sensor, a motion sensor.
As an example, at least one element included in the simulation area is analyzed, and when the at least one element satisfies a predetermined trigger condition, triggering is performed, and wherein the at least one element included in the simulation area corresponds to the at least one element included in the real space.
As an example, the at least one element includes one or more of an entity in the analog region, a background, a light, and a sound.
As an example, based on element information about at least one element, it is determined whether the at least one element satisfies a predetermined trigger condition, thereby triggering.
As an example, the element information includes one or more of video information, audio information, and three-dimensional information related to at least one element.
As examples, the video information includes one or more of face recognition information, object recognition information, gesture information, expression information, gesture information.
As an example, the audio information includes voiceprint identification information and/or voice conversion information.
As an example, the three-dimensional information includes one or more of spatial location information, motion information, distance information.
As an example, the virtual space is rendered based on trigger events generated by real-time interactions, resulting in an updated virtual space.
As an example, the trigger event is a movie special effect.
As an example, the updated virtual space is caused to be displayed instead of the virtual space on the display device displaying the virtual space, thereby resulting in an updated display image related to the trigger event.
As an example, a predetermined area at the time of generating a trigger event and an image displayed by a display device are photographed using an image pickup device, thereby obtaining a photographed image.
As an example, a position and/or a size of rendering on the virtual space is determined based on a position and/or a pose relationship between the image capturing apparatus and the display apparatus.
The information processing method S400 according to the embodiment of the present disclosure may be performed by the information processing apparatus 100 described above, for example, and specific details thereof may be found in the description of the related processes of the information processing apparatus 100 described above, and will not be repeated here.
While the basic principles of the invention have been described above in connection with specific embodiments, it should be noted that all or any steps or components of the methods and apparatus of the invention will be understood by those skilled in the art to be embodied in any computing device (including processors, storage media, etc.) or network of computing devices, either in hardware, firmware, software, or a combination thereof, which will be accomplished by one skilled in the art with the basic circuit design knowledge or basic programming skills of those in the art upon reading the description of the invention.
The invention also proposes a program product storing machine-readable instruction codes. The above-described methods according to embodiments of the present invention may be performed when the instruction codes are read and executed by a machine.
Accordingly, a storage medium for carrying the above-described program product storing machine-readable instruction codes is also included in the disclosure of the present invention. Storage media include, but are not limited to, floppy diskettes, compact discs, magneto-optical discs, memory cards, memory sticks, and the like.
In the case of implementing the present invention by software or firmware, a program constituting the software is installed from a storage medium or a network to a computer (for example, a general-purpose computer 500 shown in fig. 5) having a dedicated hardware structure, and the computer can execute various functions and the like when various programs are installed.
In fig. 5, a Central Processing Unit (CPU) 501 executes various processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 to a Random Access Memory (RAM) 503. In the RAM 503, data required when the CPU 501 executes various processes and the like is also stored as needed. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output interface 505 is also connected to the bus 504.
The following components are connected to the input/output interface 505: an input section 506 (including a keyboard, a mouse, and the like), an output section 507 (including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like), a storage section 508 (including a hard disk, and the like), and a communication section 509 (including a network interface card such as a LAN card, a modem, and the like). The communication section 509 performs communication processing via a network such as the internet. The drive 55 may also be connected to the input/output interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 55 as needed, so that a computer program read out therefrom is mounted in the storage section 508 as needed.
In the case of implementing the above-described series of processes by software, a program constituting the software is installed from a network such as the internet or a storage medium such as the removable medium 511.
It will be understood by those skilled in the art that such a storage medium is not limited to the removable medium 511 shown in fig. 5, in which a program is stored, which is distributed separately from the apparatus to provide the program to the user. Examples of the removable medium 511 include magnetic disks (including floppy disks (registered trademark)), optical disks (including compact disk read-only memories (CD-ROMs) and Digital Versatile Disks (DVDs)), magneto-optical disks (including Mini Disks (MDs) (registered trademark)), and semiconductor memories. Alternatively, the storage medium may be a ROM 502, a hard disk contained in the storage section 508, or the like, in which a program is stored, and distributed to users together with a device containing them.
It is also noted that in the apparatus, methods and systems of the present invention, components or steps may be disassembled and/or assembled. These decompositions and/or recombinations should be considered equivalents of the present invention. Also, the steps of executing the series of processes described above may naturally be executed in chronological order in the order of description, but are not necessarily executed in chronological order. Some steps may be performed in parallel or independently of each other.
Finally, it is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Furthermore, without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
Although the embodiments of the present invention have been described in detail above with reference to the accompanying drawings, it should be understood that the above-described embodiments are merely illustrative of the present invention and not limiting the present invention. Various modifications and alterations to the above described embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention. The scope of the invention is, therefore, indicated only by the appended claims and their equivalents.
The present technique may also be implemented as follows.
Supplementary note 1. An information processing apparatus includes:
processing circuitry configured to:
in response to a trigger based on at least one element in a real space, a trigger event is generated in a virtual space corresponding to the real space, thereby enabling real-time interaction between the real space and the virtual space.
Supplementary note 2 the information processing apparatus according to supplementary note 1, wherein the processing circuit is configured to perform simulation processing on a predetermined area including the at least one element in the real space to construct a simulation area, and to implement the real-time interaction based on an association between the simulation area and the virtual space.
Supplementary note 3 the information processing apparatus according to supplementary note 2, wherein the processing circuit is configured to perform the simulation processing based on perception data obtained by perceiving the predetermined area by at least one sensor.
The information processing apparatus according to supplementary note 4, wherein the at least one sensor includes one or more of a visible light sensor, an infrared sensor, an ultraviolet sensor, a distance sensing sensor, an audio sensor, and a motion sensor.
The information processing apparatus according to any one of supplementary notes 5, wherein the processing circuit is configured to analyze at least one element included in the simulation area, and to perform the triggering when the at least one element satisfies a predetermined triggering condition, and wherein the at least one element included in the simulation area corresponds to the at least one element included in the real space.
Supplementary note 6 the information processing apparatus according to supplementary note 5, wherein the at least one element includes one or more of an entity, a background, a light, and a sound in the analog region.
Supplementary notes 7. The information processing apparatus according to supplementary notes 5 or 6, wherein the processing circuit is configured to determine whether the at least one element satisfies the predetermined trigger condition based on element information about the at least one element, thereby performing the trigger.
Supplementary note 8 the information processing apparatus according to supplementary note 7, wherein the element information includes one or more of video information, audio information, and three-dimensional information related to the at least one element.
Supplementary note 9 the information processing apparatus according to supplementary note 8, wherein the video information includes one or more of face recognition information, object recognition information, gesture information, expression information, gesture information.
The information processing apparatus according to supplementary note 10, wherein the audio information includes voiceprint identification information and/or voice conversion information.
Supplementary note 11 the information processing apparatus according to supplementary note 8, wherein the three-dimensional information includes one or more of spatial position information, motion information, distance information.
Supplementary note 12 the information processing apparatus according to any one of supplementary notes 2 to 11, wherein the processing circuit is configured to render the virtual space based on a trigger event generated by the real-time interaction, thereby obtaining an updated virtual space.
Supplementary note 13. The information processing apparatus according to supplementary note 12, wherein the triggering event is the generation of a movie effect or an on-line teaching effect.
The information processing apparatus according to supplementary note 14, wherein the processing circuit is further configured to cause the updated virtual space to be displayed instead of the virtual space on a display device that displays the virtual space, thereby obtaining an updated display image related to the trigger event.
Supplementary note 15 the information processing apparatus according to supplementary note 14, wherein the display device includes an LED screen.
The information processing apparatus according to supplementary note 16, wherein the processing circuit is configured to cause an image pickup device to pick up an image of the predetermined area at the time of generation of the trigger event and an image displayed by the display device, thereby obtaining a picked-up image.
Supplementary note 17. The information processing apparatus according to supplementary note 16, wherein the processing circuit is configured to determine a position and/or a size at which the rendering is performed on the virtual space based on a position and/or a posture relation between the image pickup device and the display device.
Supplementary note 18. An information processing method includes:
in response to a trigger based on at least one element in a real space, a trigger event is generated in a virtual space corresponding to the real space, thereby enabling real-time interaction between the real space and the virtual space.
Supplementary note 19. The information processing method according to supplementary note 18, wherein a predetermined area including the at least one element in the real space is subjected to simulation processing to construct a simulation area, and the real-time interaction is realized based on an association between the simulation area and the virtual space.
The supplementary note 20. The information processing method according to supplementary note 19, wherein the simulation process is performed based on perception data obtained by perceiving the predetermined area by at least one sensor.
The method of information processing according to appendix 20, wherein the at least one sensor comprises one or more of a visible light sensor, an infrared sensor, an ultraviolet sensor, a distance sensing sensor, an audio sensor, a motion sensor.
Supplementary note 22 the information processing method according to any one of supplementary notes 19 to 21, wherein at least one element included in the simulation area is analyzed, and the triggering is performed when the at least one element satisfies a predetermined triggering condition, and wherein the at least one element included in the simulation area corresponds to the at least one element included in the real space.
Supplementary note 23. The information processing method according to supplementary note 22, wherein the at least one element includes one or more of an entity, a background, a light, and a sound in the analog region.
Supplementary note 24. The information processing method according to supplementary note 22 or 23, wherein the triggering is performed by determining whether the at least one element satisfies the predetermined triggering condition based on element information about the at least one element.
Supplementary note 25. The information processing method according to supplementary note 24, wherein the element information includes one or more of video information, audio information, and three-dimensional information related to the at least one element.
Supplementary note 26. The information processing method according to supplementary note 25, wherein the video information includes one or more of face recognition information, object recognition information, gesture information, expression information, gesture information.
Supplementary note 27. The information processing method according to supplementary note 25, wherein the audio information includes voiceprint identification information and/or voice conversion information.
Supplementary note 28. The information processing method according to supplementary note 25, wherein the three-dimensional information includes one or more of spatial position information, motion information, distance information.
Supplementary notes 29. The information processing method according to any one of supplementary notes 19 to 28, wherein the virtual space is rendered based on a trigger event generated by the real-time interaction, thereby obtaining an updated virtual space.
Supplementary note 30. The information processing method according to supplementary note 29, wherein the triggering event is the generation of a movie effect or an on-line teaching effect.
Supplementary note 31. The information processing method according to supplementary note 29 or 30, wherein the updated virtual space is caused to be displayed on a display device displaying the virtual space instead of the virtual space, thereby obtaining an updated display image related to the trigger event.
Supplementary note 32. The information processing method according to supplementary note 31, wherein the display device includes an LED screen.
Supplementary note 33. The information processing method according to supplementary note 31 or 32, wherein the image pickup device is caused to pick up the predetermined area at the time of the occurrence of the trigger event and the image displayed by the display device, thereby obtaining a picked-up image.
Supplementary note 34. According to the information processing method of supplementary note 33, wherein the position and/or size at which the rendering is performed on the virtual space is determined based on a positional and/or posture relation between the image pickup device and the display device.
Supplementary note 35 a computer-readable storage medium having stored thereon computer-executable instructions that, when executed, perform the information processing method according to any one of supplementary notes 18 to 34.

Claims (10)

1. An information processing apparatus comprising:
processing circuitry configured to:
in response to a trigger based on at least one element in a real space, a trigger event is generated in a virtual space corresponding to the real space, thereby enabling real-time interaction between the real space and the virtual space.
2. The information processing apparatus according to claim 1, wherein the processing circuit is configured to perform simulation processing on a predetermined region including the at least one element in the real space to construct a simulation region, and to realize the real-time interaction based on an association between the simulation region and the virtual space.
3. The information processing apparatus according to claim 2, wherein the processing circuit is configured to perform the analog processing based on perception data obtained by perceiving the predetermined area by at least one sensor.
4. The information processing apparatus of claim 3, wherein the at least one sensor comprises one or more of a visible light sensor, an infrared sensor, an ultraviolet sensor, a distance sensing sensor, an audio sensor, a motion sensor.
5. The information processing apparatus according to any one of claims 2 to 4, wherein the processing circuit is configured to analyze at least one element included in the simulation area and to perform the triggering when the at least one element satisfies a predetermined triggering condition, and wherein the at least one element included in the simulation area corresponds to the at least one element included in the real space.
6. The information processing apparatus of claim 5, wherein the at least one element comprises one or more of an entity, a background, a light, and a sound in the analog region.
7. The information processing apparatus according to claim 5 or 6, wherein the processing circuit is configured to determine whether the at least one element satisfies the predetermined trigger condition based on element information about the at least one element, thereby performing the trigger.
8. The information processing apparatus according to claim 7, wherein the element information includes one or more of video information, audio information, and three-dimensional information related to the at least one element.
9. An information processing method, comprising:
in response to a trigger based on at least one element in a real space, a trigger event is generated in a virtual space corresponding to the real space, thereby enabling real-time interaction between the real space and the virtual space.
10. A computer-readable storage medium having stored thereon computer-executable instructions that, when executed, perform the information processing method of claim 9.
CN202211233436.6A 2022-10-10 2022-10-10 Information processing apparatus and method, and computer-readable storage medium Pending CN117873305A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211233436.6A CN117873305A (en) 2022-10-10 2022-10-10 Information processing apparatus and method, and computer-readable storage medium
PCT/CN2023/123183 WO2024078384A1 (en) 2022-10-10 2023-10-07 Information processing device and method, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211233436.6A CN117873305A (en) 2022-10-10 2022-10-10 Information processing apparatus and method, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN117873305A true CN117873305A (en) 2024-04-12

Family

ID=90579792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211233436.6A Pending CN117873305A (en) 2022-10-10 2022-10-10 Information processing apparatus and method, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN117873305A (en)
WO (1) WO2024078384A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160042568A1 (en) * 2014-08-08 2016-02-11 Andrew Prestridge Computer system generating realistic virtual environments supporting interaction and/or modification
CN107231531A (en) * 2017-05-23 2017-10-03 青岛大学 A kind of networks VR technology and real scene shooting combination production of film and TV system
CN110060354B (en) * 2019-04-19 2023-08-04 苏州梦想人软件科技有限公司 Positioning and interaction method of real image in virtual space
CN213126145U (en) * 2020-09-14 2021-05-04 秀加科技(北京)有限公司 AR-virtual interactive system
CN113347373B (en) * 2021-06-16 2022-06-03 潍坊幻视软件科技有限公司 Image processing method for making special-effect video in real time through AR space positioning

Also Published As

Publication number Publication date
WO2024078384A1 (en) 2024-04-18

Similar Documents

Publication Publication Date Title
Ullah et al. Action recognition in video sequences using deep bi-directional LSTM with CNN features
Li et al. Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison
CN109165552B (en) Gesture recognition method and system based on human body key points and memory
Kishore et al. Motionlets matching with adaptive kernels for 3-d indian sign language recognition
CN111259751B (en) Human behavior recognition method, device, equipment and storage medium based on video
CN111488773B (en) Action recognition method, device, equipment and storage medium
CN108154075A (en) The population analysis method learnt via single
Bencherif et al. Arabic sign language recognition system using 2D hands and body skeleton data
CN107423398A (en) Exchange method, device, storage medium and computer equipment
Maisto et al. An accurate algorithm for the identification of fingertips using an RGB-D camera
CN111222486B (en) Training method, device and equipment for hand gesture recognition model and storage medium
Drumond et al. An LSTM recurrent network for motion classification from sparse data
dos Santos Anjo et al. A real-time system to recognize static gestures of Brazilian sign language (libras) alphabet using Kinect.
Adhikary et al. A vision-based system for recognition of words used in indian sign language using mediapipe
Fei et al. Flow-pose Net: An effective two-stream network for fall detection
Putra et al. Designing translation tool: Between sign language to spoken text on kinect time series data using dynamic time warping
KR102482841B1 (en) Artificial intelligence mirroring play bag
Gadhiya et al. Analysis of deep learning based pose estimation techniques for locating landmarks on human body parts
CN114779942B (en) Virtual reality immersive interaction system, device and method
CN117873305A (en) Information processing apparatus and method, and computer-readable storage medium
Agarwal et al. Weighted Fast Dynamic Time Warping based multi-view human activity recognition using a RGB-D sensor
Moreno et al. Marker-less feature and gesture detection for an interactive mixed reality avatar
Mesbahi et al. Hand Gesture Recognition Based on Various Deep Learning YOLO Models
Rutuja et al. Hand Gesture Recognition System Using Convolutional Neural Networks
CN111797656A (en) Face key point detection method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication