WO2024009748A1 - Information processing device, information processing method, and recording medium - Google Patents

Information processing device, information processing method, and recording medium Download PDF

Info

Publication number
WO2024009748A1
WO2024009748A1 PCT/JP2023/022697 JP2023022697W WO2024009748A1 WO 2024009748 A1 WO2024009748 A1 WO 2024009748A1 JP 2023022697 W JP2023022697 W JP 2023022697W WO 2024009748 A1 WO2024009748 A1 WO 2024009748A1
Authority
WO
WIPO (PCT)
Prior art keywords
person
representing
time
motion
graph
Prior art date
Application number
PCT/JP2023/022697
Other languages
French (fr)
Japanese (ja)
Inventor
智也 石川
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2024009748A1 publication Critical patent/WO2024009748A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics

Definitions

  • the present technology particularly relates to an information processing device, an information processing method, and a recording medium that can cause a virtual character to perform an appropriate reaction.
  • AR Augmented Reality
  • HMD Head Mounted Display
  • reactions depend on the relationship between the subject and object of the interaction, the attributes of each person, the surrounding space (object arrangement), and the like.
  • the present technology was developed in view of this situation, and is intended to enable a virtual character to perform an appropriate reaction.
  • An information processing device includes a recognition unit that recognizes an interaction motion of the user with respect to a virtual character based on a measurement result of the user's motion by a sensor, and a recognition unit that measures a first person and a second person.
  • a selection unit that selects a reaction action for the interaction action of the user based on a known scene graph generated based on measurement results by a sensor included as an object; and a selection unit that presents the virtual character that performs the reaction action to the user. and a presentation section.
  • FIG. 3 is a diagram showing an example of a reaction of a virtual character realized in an AR application to which the present technology is applied.
  • FIG. 2 is a diagram illustrating a series of processing flows in an information processing system according to an embodiment of the present technology.
  • FIG. 3 is a diagram illustrating an example of processing in a reaction learning phase. It is a figure which shows the example of the graph which shows spatial information. It is a figure which shows the example of the graph which shows person attribute information. It is a figure which shows the example of the graph which shows relational information. It is a flowchart explaining reaction detection processing. It is a figure which shows the example of the graph showing a recognition result.
  • FIG. 3 is a diagram showing an example of a spatio-temporal scene graph.
  • FIG. 12 is a flowchart illustrating reaction partial graph and motion recording processing.
  • FIG. 3 is a diagram illustrating an example of extraction of a reaction subgraph.
  • FIG. 3 is a diagram illustrating an example of meta-ization of a reaction subgraph.
  • FIG. 6 is a diagram illustrating an example of processing in a reaction presentation phase.
  • FIG. 3 is a diagram illustrating an example of user interaction.
  • FIG. 7 is a diagram showing an example of matching between a current scene graph and a reaction subgraph.
  • FIG. 3 is a diagram showing an example of a reaction of a virtual character.
  • 1 is a block diagram showing an example of a functional configuration of an information processing device 12.
  • FIG. 2 is a block diagram showing an example of a functional configuration of an information processing device 22.
  • AR applications that apply this technology are applications that are used to coexist and interact with virtual characters.
  • a user using an AR application wears an AR display device 1 such as an HMD and communicates with a virtual character.
  • a virtual character C which is a humanoid virtual character
  • Display of the virtual character C is realized by rendering a 3D model of the virtual character C.
  • a user wearing the optical see-through type AR display device 1 will feel as if the virtual character C is present in the room in which he or she is.
  • the virtual character C shown in color in FIG. 1 indicates that the virtual character C is a virtual object displayed by the AR display device 1.
  • This technology allows a virtual character to efficiently perform appropriate actions as a reaction to actions (interactions) performed on the virtual character by a real person who is a user of an AR application.
  • Various actions are performed as reactions of the virtual character C according to the user's interaction, such as the action of sitting on another chair being performed as a reaction to the user's interaction of sitting on a chair in the room. .
  • a database is created by continuously measuring and recording interactions and reactions between real people.
  • a scene graph is used that represents real people, objects, and attribute information as nodes, and various relationships such as the relationship between interactions and reactions and the positional relationship of real people as edges.
  • edges that express not only spatial relationships but also temporal relationships, a scene graph (hereinafter referred to as a spatiotemporal scene graph) that expresses spatiotemporal relationships such as interactions and reactions is generated. .
  • a personalized interaction-reaction DB and a meta-ized interaction-reaction DB are generated as DBs using spatiotemporal scene graphs.
  • the personalized interaction-reaction DB is a DB that directly records information such as attribute information recognized in real space.
  • Meta-ized interaction-reaction DB is a DB that records information such as attribute information recognized in real space after abstracting (meta-izing) it.
  • a spatiotemporal scene graph is a scene graph (spatial scene graph) that represents real people, objects, attributes, etc. at a certain time as nodes, and their relationships as edges. It is constructed by connecting the same nodes in with chronological relationship edges.
  • a temporal relationship edge is an edge that represents the passage of a predetermined time.
  • the interaction from person B to person A is All you have to do is check whether there was an interaction immediately before. If there was an interaction from person B to person A immediately before, it is determined that the interaction from person A to person B is a reaction. This makes it possible to efficiently access data in the DB.
  • the spatiotemporal scene graph information itself is information that abstracts the real space context. By describing the real-space context using less information and updating the spatio-temporal scene graph only when a semantic change occurs, it is possible to prevent data from becoming bloated.
  • the real space context is a concept that includes the situations and attributes of real space components such as people in real space and objects in real space.
  • FIG. 2 is a diagram illustrating a series of processing flows in an information processing system according to an embodiment of the present technology.
  • a series of processes in the information processing system consists of two processing phases: a "reaction learning phase” and a “reaction presentation phase.”
  • the information processing system includes a reaction learning side configuration and a reaction presentation side configuration.
  • a measurement device 11 and an information processing device 12 are provided as a configuration on the reaction learning side.
  • the measurement device 11 and the information processing device 12 are connected via wired or wireless communication.
  • the measurement device 11 is a sensor device equipped with various sensors such as a color image sensor and a depth sensor.
  • the measurement device 11 is installed in a room where a real person is present.
  • the measurement device 11 is installed in a space where a person A and a person B are present.
  • Person A and Person B to be measured are communicating through conversation and gestures.
  • person A and person B to be measured may be referred to as a first person and a second person.
  • the measurement device 11 measures the person to be measured and outputs measurement data such as a color image and a distance image to the information processing device 12.
  • the color image output by the measurement device 11 shows not only a person performing an interaction action or a reaction action, but also objects such as furniture around the person.
  • the distance image measured by the measurement device 11 represents the distance to a person and the distance to an object such as furniture.
  • the information processing device 12 analyzes the color image supplied from the measurement device 11 as measurement data and performs image recognition or the like to identify an individual.
  • the information processing device 12 recognizes personal attributes such as age, gender, and height for the identified person.
  • the information processing device 12 also analyzes color images and distance images to recognize objects around people, relationships between objects, relationships between people, and the like.
  • the information processing device 12 generates a spatio-temporal scene graph based on the recognition result and records it together with motion information indicating the motion of the person, thereby generating a DB.
  • the information in the DB generated by the information processing device 12 is supplied to the information processing device 22.
  • the DB included in the information processing device 12 records information on spatio-temporal scene graphs acquired using various people as measurement targets.
  • an AR display device 1 a measurement device 21, an information processing device 22, and an input device 23 are provided as a configuration on the reaction presentation side.
  • the measurement device 21 and the information processing device 22, and the information processing device 22 and the input device 23 are each connected via wired or wireless communication.
  • the AR display device 1 worn by the person A who will experience the AR application is also connected to the information processing device 22 via wired or wireless communication.
  • the measurement device 21 like the measurement device 11, is a sensor device equipped with various sensors such as a color image sensor and a depth sensor.
  • the measurement device 21 measures the motion of the user (person A) who is communicating with the virtual character C displayed on the AR display device 1, and outputs measurement data such as a color image and a distance image to the information processing device 22. do.
  • the information processing device 22 transmits the AR content data to the AR display device 1 and causes the virtual character C to be displayed.
  • AR content data including a 3D model of the virtual character C is input from the input device 23 to the information processing device 22 .
  • the information processing device 22 performs various recognition processes based on the measurement data supplied from the measurement device 21, and generates a spatiotemporal scene graph representing the context around the user.
  • the information processing device 22 searches for a reaction of the virtual character C by comparing the spatio-temporal scene graph representing the user's surrounding context with the spatio-temporal scene graph generated by the processing of the reaction learning phase.
  • the information processing device 22 causes the virtual character C to perform the same movement as the recorded real person's movement found through the search, and presents the user with a reaction according to the user's interaction.
  • the measurement device 11 and the measurement device 21 are devices in different casings, but they may be configured as devices in the same casing. In this case, measurement in the reaction learning phase and measurement in the reaction presentation phase are performed in the same space.
  • the functions of the information processing device 12 and the information processing device 22 may be realized by a server on the Internet. Further, the information processing device 12 and the information processing device 22 may be configured as devices in the same housing.
  • the functions of the measurement device 11 and the information processing device 12 may be realized in one device.
  • the function of the measurement device 21 and the function of the information processing device 22 may be realized in one device.
  • the functions of the information processing device 22 may be installed in the AR display device 1.
  • FIG. 3 is a diagram illustrating an example of processing in the reaction learning phase.
  • the processing of the reaction learning phase includes the following processing: recognition processing (Step 1), reaction detection processing (Step 2), and reaction subgraph & motion recording processing (Step 3).
  • the recognition process is a process of recognizing interactions and reactions between people, attribute information of each person, and spatial information based on measurement data supplied from the measurement device 11.
  • the recognition processing includes spatial recognition processing, human attribute recognition processing, and interaction recognition processing as the processing of Steps 1-1 to 1-3. Processing other than interaction recognition processing may not be performed.
  • Step 1-1 Spatial recognition processing
  • spatial information is generated based on measurement data supplied from the measurement device 11. For example, the following information is generated as spatial information.
  • ⁇ Geometric information of each object shape, size, etc.
  • ⁇ Semantic information of each object category (wall, floor, chair, etc.), parts (backrest, doorknob) - Information indicating relationships between objects: distances between objects such as "Object A is near object B", positional relationships between objects such as "Object A is in front of object B", etc.
  • Documents 1 and 2 can be used as spatial recognition techniques.
  • Reference 1 “[Narita+, IROS2019] G. Narita, T. Seno, T. Ishikawa and Y. Kaji, “PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things”, IROS, 2019.”
  • Reference 2 “[Tahara+, ISMAR2020] T. Tahara, T. Seno, G. Narita and T. Ishikawa, “Retargetable AR: Context-aware Augmented Reality in Indoor Scenes based on 3D Scene Graph”, ISMAR, 2020.”
  • Step 1-2 Person Attribute Recognition Process
  • person attribute information is generated based on the measurement data supplied from the measurement device 11. For example, the following information is generated as the person attribute information.
  • Personal attributes can be recognized using a library such as OpenCV, for example.
  • ⁇ Step 1-3 Interaction recognition processing
  • relational information indicating the interaction between an object and a person and relational information indicating the interaction between the persons are generated. For example, the following information is generated as related information.
  • ⁇ Object-person An action in which a person exerts an action on an object, such as person A sitting on a couch (“person A makes a V motion to object B”).
  • Person A acts as a person.
  • the spatial information generated by the spatial recognition process, the person attribute information generated by the person attribute recognition process, and the relational information generated by the interaction recognition process are acquired as a graph, which is data having a graph structure.
  • FIG. 4 is a diagram showing an example of a graph showing spatial information.
  • the graph representing the spatial information is data composed of three nodes (sofa#1, chair#2, table#3) representing these objects, as shown in FIG.
  • a node representing an object may be referred to as an object node.
  • the sofa node and the chair node are connected by an edge E1 labeled "in front of” and an edge E2 labeled "on left of.”
  • Edge E1 represented as an arrow pointing from the sofa node to the chair node, represents that the chair is in front of the sofa.
  • an edge E2 expressed as an arrow pointing from the chair node to the sofa node indicates that the sofa exists on the left side with respect to the front of the chair.
  • the sofa node and the table node, and the chair node and the table node are also connected by edges E3 and E4, respectively, which are set with labels representing their positional relationships.
  • FIG. 5 is a diagram showing an example of a graph showing personal attribute information.
  • the graph showing the person attribute information will have a node (Person #1) representing the person to be measured, and a node whose age is 60 years old, as shown in A in Figure 5. It is composed of a node (Age:60) that indicates that The node representing the person to be measured and the node representing age 60 are connected by an edge E11 labeled "has".
  • a node representing an attribute may be referred to as an attribute node
  • an edge representing a relationship between an attribute and a person may be referred to as an attribute edge.
  • the graph showing the person attribute information will be divided into a node representing the person to be measured (Person #1) and a node (Person#1) representing the person to be measured, and the height of the person to be measured to be 1.8m. It is composed of a node (Height: 1.8m) that represents 1.8m.
  • the node representing the person to be measured and the node representing the height of 1.8 m are connected by an edge E12 labeled "has".
  • FIG. 6 is a diagram showing an example of a graph showing relational information.
  • the graph showing the relationship information includes a node representing the person to be measured (Person#1), a node representing the table (table#1), and a node representing the chair (Chair#2), as shown in FIG. ).
  • an edge representing a relationship between a person and an object based on the recognized real space context may be referred to as an object edge.
  • the node representing the person to be measured and the node representing the table are connected by an edge E21 labeled "use”. Further, the node representing the person to be measured and the node representing the chair are connected by an edge E22 labeled "sitting on”.
  • the result of the recognition process in Step 1 is thus obtained as data having a graph structure in which objects, people, and person attributes are expressed as nodes, and relationships among them are expressed as edges.
  • the reaction detection process is a process of detecting a person's reaction based on information acquired by the recognition process.
  • Detection of semantic change is based on a graph representing the recognition result at time t, which is the current time, and a graph representing the recognition result at time t-1, which is the previous measurement time. This is the process of detecting. If it is determined that there is no change in the relationship, detection of a semantic change is repeatedly performed based on information newly acquired by the recognition process.
  • ⁇ Step2-2 Generation of spatio-temporal scene graph If a change is detected between the graph representing the recognition result at time t and the graph representing the recognition result at time t-1, a graph of spatial information, a graph of person attribute information, a graph of relational information The graphs are unified into one spatiotemporal scene graph. The integration of each graph is performed by temporally and spatially connecting graphs at past measurement times within time Tth, which is a threshold time, from time t, which is the current time.
  • FIG. 8 is a diagram showing an example of a graph representing the recognition results.
  • time t and time t+ ⁇ t may be referred to as first time and second time, respectively.
  • the spatial information graph, the person attribute information graph, and the relationship information graph at each time of time t and time t+ ⁇ t become the graphs shown on the right side of FIG. 8.
  • the graph at the first time and the graph at the second time may be referred to as a first graph and a second graph, respectively.
  • the graph of spatial information at time t is composed of nodes of the floor as an object.
  • the graph of spatial information at time t+ ⁇ t is also composed of nodes of the floor as an object.
  • the floor node at time t and the floor node at time t+ ⁇ t are connected by a temporal relationship edge with a label of time ⁇ t.
  • the graph of person attribute information at time t is a graph in which the node of person A and the node representing age 20 (Age:20) are connected by an edge labeled "has”, and It is composed of a graph in which the node of person B and the node representing age 19 (Age:19) are connected by an edge labeled "has".
  • the graph of person attribute information at time t+ ⁇ t is also composed of a graph with the same structure.
  • an edge representing a motion relationship between people may be referred to as a motion edge.
  • each node at time t and the same node at time t+ ⁇ t are connected by a temporal relationship edge labeled with time ⁇ t.
  • time ⁇ t it is assumed that there is no change in the graph of person attribute information.
  • each node at time t may be referred to as a previous node, and each node at time t+ ⁇ t may be referred to as a subsequent node.
  • the graph of relational information at time t connects person A's node and floor node with an edge labeled "stand,” and connects person A's node and person B's node with an edge labeled "wave.”
  • An edge labeled "wave” from a node of person A to a node of person B represents that person A waved his hand toward person B.
  • the graph of relational information at time t also includes a label of "stand” that connects the node of person B and the node of the floor.
  • the graph of the relational information at time t+ ⁇ t includes a graph in which the node of person A and the node of person B are connected by an edge from the node of person B to the node of person A that has the label "wave".
  • An edge from a node of person B to a node of person A with the label "wave” represents that person B waved his hand toward person A.
  • the other graphs at time t+ ⁇ t are the same as the relationship information graph at time t.
  • Each node at time t and the same node at time t+ ⁇ t are connected by a temporal relationship edge labeled with time ⁇ t. In this example, it is determined that there has been a change in the graph of the related information.
  • FIG. 9 is a diagram showing an example of a spatio-temporal scene graph.
  • the upper graph constituting the spatiotemporal scene graph shown in FIG. 9 is a graph that integrates the spatial information, person attribute information, and relational information graphs at time t. Furthermore, the lower graph is a graph that integrates the spatial information, person attribute information, and relational information graphs at time t+ ⁇ t.
  • Each node at time t and the same node at time t+ ⁇ t are connected by temporal relationship edges E31 to E35 labeled with time ⁇ t.
  • the fact that person A waves to person B at time t, and person B waves to person A at time t+ ⁇ t after time ⁇ t has elapsed can be expressed by a spatio-temporal scene graph as shown in FIG. be done.
  • the spatio-temporal scene graph connects nodes representing each person who was the object of measurement at the same time with at least an edge representing the relationship between the persons, and also connects nodes representing each person who was the object of measurement at different times. It becomes a scene graph in which the nodes it represents are connected by time-related edges, which are edges that represent the passage of time.
  • one spatio-temporal scene graph is generated by integrating graphs for two times, time t and time t+ ⁇ t, but one time-spatial scene graph is generated by integrating graphs for three or more times.
  • a spatial scene graph may be generated.
  • the relationship between people is expressed as a relationship between people (S, V, O) that is a combination of "Subject”, “Verb”, and "Object".
  • the interpersonal relationships person B, wave, person A
  • S, V, O the interpersonal relationships
  • the interpersonal relationship person B, wave, person A
  • the interpersonal relationship indicates that person B, who is the subject, performed an action such as waving to person A, who is the object.
  • the specific condition is that a relationship between people (O, V, S) representing a relationship opposite to the relationship between people (S, V, O) has occurred in a short time in the past, such as a few seconds. It can be expressed as.
  • the interpersonal relationship (O, V, S) represents a relationship in which a person who is an object O at the current time has acted as a subject in the past, and a person who has become a subject S at the current time has performed an action V as an object.
  • the motion V in the interpersonal relationship (S, V, O) and the motion V in the interpersonal relationship (O, V, S) may be different motions.
  • ⁇ Step2-4 Search for interpersonal relationships (O, V, S) from spatio-temporal scene graphs Targeting human relationships in the past for a short time, human relationships (O, V, , S) is searched.
  • the search for the relationship between people is performed by tracing back the node of the subject S at the current time in the spatio-temporal scene graph, and searching for the object O at the current time, with the person of the subject S at the current time as the object.
  • This process searches for interactions by people.
  • This processing can be realized by tracing the edges of the spatio-temporal scene graph related to the person of the subject S at the current time, so the frequency of access to data can be suppressed and efficient searches can be performed.
  • the interpersonal relationship (O, V, S) for the interpersonal relationship (S, V, O) at the current time is found in a short time in the past, the interpersonal relationship (S, V, O) at the current time is It is determined that
  • the interpersonal relationship (O, V, S) for the interpersonal relationship (person B, wave, person A), which is the interpersonal relationship (S, V, O) at the current time t + ⁇ t is A search for a relationship between persons (person A, wave, person B) representing waving at person B is performed by tracing edges starting from the node of person B at current time t+ ⁇ t.
  • Step 2-1 If it is determined that the interpersonal relationship (O, V, S) did not exist in the short time in the past, the process returns to Step 2-1 and the detection of the semantic change is repeated.
  • Step 3 of FIG. 3 is performed.
  • Step 3 Reaction partial graph & motion recording processing>
  • the reaction subgraph & motion recording process is based on the occurrence time of the interpersonal relationship (S, V, O) determined as a reaction, and the interpersonal relationship (O, V, S) searched as the interaction that is the cause of the reaction.
  • This is a process of recording information on the person of the subject S in the DB up to the time of occurrence.
  • a subgraph related to the person of the subject S and motion information of the person of the subject S at the time of occurrence of the interpersonal relationship (S, V, O), which is a reaction are recorded.
  • ⁇ Step3-1 Extraction of reaction subgraph From the occurrence time of the interpersonal relationship (S, V, O) determined as a reaction to the occurrence time of the interpersonal relationship (O, V, S) searched as the interaction that is the cause of the reaction. , a subgraph related to the person of subject S is extracted as a reaction subgraph. For example, a subgraph starting from the node of the person of the subject S at each time and extending to nodes that are less than or equal to the inter-node distance D is extracted from the spatio-temporal scene graph.
  • the inter-node distance D is a threshold value indicating the degree of relationship between the subject S and the person.
  • the value of the inter-node distance D is, for example, 1, the range from the person node of the subject S to the node connected via one edge is extracted as a reaction subgraph.
  • FIG. 11 is a diagram showing an example of extraction of a reaction subgraph.
  • the reaction subgraph is graph information that includes at least a subgraph representing a relationship between people determined to be an interaction, and a subgraph representing a relationship between people determined to be a reaction.
  • Step3-2 Recording of reaction subgraph and motion information Motion information of the person of subject S at the time of occurrence of the interpersonal relationship (S, V, O) determined as a reaction is recorded in the DB in association with the reaction subgraph. Motion information and reaction subgraphs are associated using, for example, a common ID.
  • the motion information of the person of the subject S is recorded in the reaction motion DB 51 (FIG. 3), and the reaction partial graph is recorded in the individualized interaction-reaction DB 53, which is a DB for the person of the subject S.
  • the personalized interaction-reaction DB 53 is a DB that records various reaction subgraphs used to select the actions of the virtual character when the person S himself or herself becomes a user and uses AR content.
  • time-series data of skeletal estimation results (motion capture data) obtained based on measurement data by the measurement device 11 is recorded.
  • Step3-3 Meta-ization of a reaction subgraph A subgraph is generated in which each node or a set of nodes in a reaction subgraph is replaced with a node having higher-order semantic information.
  • FIG. 12 is a diagram showing an example of meta-ization of a reaction subgraph.
  • FIG. 12 shows the reaction subgraph before metaization, and the right side shows the reaction subgraph after metaization.
  • a node representing person A as a specific person is replaced with a node representing "Person” which does not limit the person.
  • the node representing that the person's age as a person attribute is 19 years old is replaced with a node representing “teens” representing more abstract information.
  • Step 3-4 Recording of Meta-ized Reaction Subgraph and Motion Information
  • the meta-ized reaction subgraph generated by meta-izing the reaction subgraph is recorded in the DB in association with motion information using, for example, a common ID.
  • the meta-ized reaction subgraph is recorded in the meta-ized interaction-reaction DB 52.
  • the meta-interaction-reaction DB 52 is, for example, a DB that records various reaction subgraphs used to select the actions of a virtual character when a person different from the person S becomes a user and uses AR content. .
  • the reaction subgraph recorded in the meta-ized interaction-reaction DB 52 may be used when the person S himself becomes a user and uses the AR content.
  • the reaction subgraph & motion recording process in Step 3 is a process of adding scene graph information that abstractly and semantically expresses the context of each time in real space. Regardless of the length of measurement time or resolution, scene graph information is recorded in Step 3 only when there is a change in the meaning of the measurement data, making it easier to access the data than when all information is recorded. It becomes possible to increase efficiency.
  • FIG. 13 is a diagram illustrating an example of processing in the reaction presentation phase.
  • the processing of the reaction presentation phase includes recognition processing (Step 11), matching processing of the current scene graph with a subgraph in the DB (Step 12), and reaction presentation processing (Step 13).
  • the recognition process in the reaction presentation phase is basically the same process as the recognition process (Step 1 in FIG. 3) in the reaction learning phase.
  • the recognition process in the reaction presentation phase differs from the recognition process in the reaction learning phase in that the user (person A) who is experiencing the AR application and the virtual character C are the measurement targets.
  • the recognition process in the reaction presentation phase is a process of recognizing the interaction between the user and the virtual character C, user attribute information, and spatial information based on the measurement data supplied from the measurement device 21. Duplicate explanations will be omitted as appropriate.
  • Step 11-1 Spatial recognition processing
  • spatial information is generated based on the measurement data supplied from the measurement device 21.
  • spatial information geometric information of each object, semantic information of each object, and information indicating relationships between objects are generated.
  • Step 11-3 Interaction recognition processing
  • relationship information between the object and the user, between the object and the virtual character C, and between the user and the virtual character C is determined based on the measurement data supplied from the measurement device 21. Related information is generated.
  • the information on the attributes, position, and movements (interactions and reactions) of the virtual character C is information that can be acquired by the information processing device 22 itself that is displaying the virtual character C by playing back the AR content.
  • Information on the recognition results regarding the virtual character C is generated based on the virtual character C's attributes, position, motion, etc. as appropriate.
  • a current scene graph which is a spatiotemporal scene graph representing the context around the user who is presented with the virtual character, is generated based on the spatial information, person attribute information, and relationship information acquired through the recognition process.
  • the current scene graph consists of a graph in which nodes representing the user and virtual characters are connected by edges representing their relationships at a predetermined time, and a graph in which nodes representing the users and virtual characters are connected by edges representing their relationships at the current time after a predetermined time has elapsed.
  • a predetermined time regarding the current scene graph may be referred to as past time.
  • the current scene graph is considered to include a past graph that is a graph corresponding to the past time, a current graph that is a graph that corresponds to the current time, and a temporal relationship edge that connects the past node of the past graph and the current node of the current graph. Also good.
  • the current scene graph generated based on the information obtained through recognition processing is matched with each reaction subgraph in the DB. More specifically, matching the current scene graph and the reaction subgraph includes determining whether the person in the reaction subgraph corresponding to the user is the user himself or not. Note that in this disclosure, a known graph in a DB may be referred to as a known scene graph.
  • reaction learning phase If a reaction learning phase is performed that targets users who have experienced the AR application, and the reaction subgraph for the user is recorded in the personalized interaction-reaction DB 53, it is recorded in the personalized interaction-reaction DB 53.
  • the current scene graph is compared with the reaction subgraph for the user.
  • reaction subgraph for the user is not recorded in the personalized interaction-reaction DB 53, the meta-ized reaction subgraph recorded in the meta-ized interaction-reaction DB 52 will be linked to the current scene graph. A match is made.
  • the current scene graph and each reaction subgraph are matched by calculating the distance between them and evaluating the extent to which their constituent elements are common. For example, a reaction subgraph that includes common components and has the closest distance to the current scene graph is used to select an action that becomes a reaction.
  • a reaction subgraph closer than the distance equal to the threshold value Eth may be used to select an action that becomes a reaction.
  • FIG. 14 is a diagram showing an example of user interaction.
  • FIG. 15 is a diagram showing an example of matching the current scene graph and the reaction subgraph.
  • the current scene graph includes a subgraph representing the relationship between people (person A, wave, virtual character C) at time t.
  • the relationship between people (person b, wave, person a) extracted from the reaction subgraph is the relationship between virtual character C and person A (user) at current time t+ ⁇ t, which constitutes the current scene graph. It is applied as a relationship between people.
  • An edge E51 connecting the node of the virtual character C and the node of the person A at the current time t+ ⁇ t represents the relationship between the people (virtual character C, wave, person A).
  • the graph at time t+ ⁇ t includes a component representing that the person corresponding to the virtual character C performed a certain action toward the person A, and the action is an interaction between the person A and the person corresponding to the virtual character.
  • a reaction partial graph including a component representing that the process has been performed in the graph at time t is acquired from the personalized interaction-reaction DB 53 as an evaluation result with the current scene graph.
  • the person corresponding to the virtual character C is a person who, together with the person A, is a measurement target in the real space in the process of the reaction learning phase.
  • the same action as the action performed by the person corresponding to the virtual character C toward the person A is selected as the reaction action of the virtual character C.
  • the reaction presentation process is a process of reading reaction motion information corresponding to the matched reaction subgraph from the reaction motion DB 51 and presenting it as a motion of the virtual character C.
  • the user By causing the virtual character C to perform the same action as the action indicated by the motion information read from the reaction motion DB 51, the user is presented with a reaction to the interaction the user has performed with the virtual character C, as shown in FIG. be done.
  • a virtual character C is shown waving toward the user.
  • FIG. 17 is a block diagram showing an example of the functional configuration of the information processing device 12. As shown in FIG. At least some of the functional units shown in FIG. 17 are realized by executing a predetermined program by the CPU of the computer that constitutes the information processing device 12.
  • the reaction learning processing section 101 includes a recognition section 111, a reaction detection section 112, and a recording control section 113.
  • the recognition unit 111 includes a space recognition unit 121, a person attribute recognition unit 122, and an interaction recognition unit 123.
  • the spatial recognition unit 121 performs spatial recognition processing (Step 1-1 in FIG. 3) based on the measurement data supplied from the measurement device 11, and generates spatial information.
  • the person attribute recognition unit 122 performs person attribute recognition processing (Step 1-2 in FIG. 3) based on the measurement data supplied from the measurement device 11, and generates person attribute information.
  • the interaction recognition unit 123 performs interaction recognition processing (Step 1-3 in FIG. 3) based on the measurement data supplied from the measurement device 11, and generates related information.
  • the spatial information generated by the spatial recognition unit 121, the person attribute information generated by the person attribute recognition unit 122, and the relationship information generated by the interaction recognition unit 123 are supplied to the reaction detection unit 112.
  • the reaction detection unit 112 performs reaction detection processing (Step 2 in FIG. 3, FIG. 7) based on the information supplied from each part of the recognition unit 111.
  • the reaction detection unit 112 includes a spatiotemporal scene graph generation unit 112A.
  • the spatio-temporal scene graph generation unit 112A generates a spatio-temporal scene graph based on the information supplied from the recognition unit 111.
  • the reaction detection unit 112 outputs information on a spatio-temporal scene graph including a graph representing the interpersonal relationship determined to be a reaction to the recording control unit 113.
  • the recording control unit 113 performs reaction partial graph & motion recording processing (Step 3 in FIG. 3, FIG. 10) based on the information supplied from the reaction detection unit 112, and converts the motion information of the motion determined to be a reaction into a reaction. Record it in motion DB51.
  • the recording control unit 113 causes the reaction subgraph extracted from the spatiotemporal scene graph supplied from the reaction detection unit 112 to be recorded in the individualized interaction-reaction DB 53.
  • the recording control unit 113 meta-izes the reaction subgraph extracted from the spatio-temporal scene graph supplied from the reaction detection unit 112, and records it in the meta-ized interaction-reaction DB 52.
  • the reaction motion DB 51, meta-ized interaction-reaction DB 52, and personalized interaction-reaction DB 53 are constructed in a storage unit such as an HDD of a computer that constitutes the information processing device 12. Information in the reaction motion DB 51, meta-ized interaction-reaction DB 52, and personalized interaction-reaction DB 53 is provided to the information processing device 22.
  • FIG. 18 is a block diagram showing an example of the functional configuration of the information processing device 22. As shown in FIG. At least some of the functional units shown in FIG. 18 are realized by executing a predetermined program by the CPU of the computer that constitutes the information processing device 22.
  • a reaction presentation processing section 151 is implemented in the information processing device 22.
  • the reaction presentation processing section 151 includes a recognition section 161, a matching section 162, and a presentation section 163.
  • the recognition unit 161 includes a space recognition unit 171, a person attribute recognition unit 172, and an interaction recognition unit 173.
  • the spatial recognition unit 171 performs spatial recognition processing (Step 11-1 in FIG. 13) based on the measurement data supplied from the measurement device 21, and generates spatial information.
  • the person attribute recognition unit 172 performs person attribute recognition processing (Step 11-2 in FIG. 13) based on the measurement data supplied from the measurement device 21, and generates person attribute information.
  • the spatial information generated by the spatial recognition unit 171, the person attribute information generated by the person attribute recognition unit 172, and the relationship information generated by the interaction recognition unit 173 are supplied to the matching unit 162.
  • the matching unit 162 performs matching processing (Step 12 in FIG. 13, FIG. 7) between the current scene graph and the reaction partial graph in the DB based on the information supplied from each part of the recognition unit 161.
  • the collation unit 162 includes a spatio-temporal scene graph generation unit 162A. Based on the information supplied from the recognition unit 161, the spatio-temporal scene graph generation unit 162A generates a current scene graph, which is a spatio-temporal scene graph including a user node and a virtual character node as constituent elements.
  • the matching unit 162 matches the current scene graph with each reaction partial graph recorded in the meta-ized interaction-reaction DB 52 or the individualized interaction-reaction DB 53.
  • the matching unit 162 selects an action that is a reaction of the virtual character based on a reaction subgraph that matches the current scene graph.
  • the matching unit 162 functions as a selection unit that selects an action that is a reaction of the virtual character based on a current scene graph that is a spatiotemporal scene graph and a reaction subgraph. Information on the action selected by the matching unit 162 as the reaction is supplied to the presenting unit 163 .
  • the presentation unit 163 generates data for displaying the virtual character C by playing the AR content and performing rendering.
  • the presentation unit 163 transmits display data to the AR display device 1 and causes the AR display device 1 to display the virtual character C.
  • the presenting unit 163 reads out the motion information of the reaction selected by the matching unit 162 from the reaction motion DB 51 and causes it to be presented as the motion of the virtual character C.
  • FIG. 19 is a block diagram showing a configuration example of an information processing system.
  • reaction learning processing section 101 and the reaction presentation processing section 151 are realized in different devices, but as shown in A of FIG. 19, they may be realized in the information processing device 201 which is one device. You can also do this.
  • the reaction learning phase is processed in the reaction learning processing section 101
  • the reaction presentation phase is processed in the reaction presentation processing section 151.
  • the reaction presentation processing unit 151 of the information processing device 201 performs processing of the reaction presentation phase and displays the virtual character C on the AR display device 1.
  • reaction learning processing section 101 and the reaction presentation processing section 151 may be implemented in the AR display device 1.
  • the reaction presentation processing unit 151 of the AR display device 1 performs processing of the reaction presentation phase and displays the virtual character C on the display unit 211.
  • the display unit 211 includes a display that displays the virtual character C, and the like.
  • a video transmission type HMD may be used as a display device for the virtual character.
  • a mobile terminal such as a smartphone or a tablet terminal may be used as a display device for the virtual character.
  • the actions performed by the virtual character include not only actions performed by the virtual character alone, such as walking and running, but also actions using objects in real space, such as sitting on a chair and sitting on the floor. It also includes actions performed toward the user, such as talking to the user.
  • the series of processes described above can be executed by hardware or software.
  • a program constituting the software is installed from a program recording medium into a computer built into dedicated hardware or a general-purpose personal computer.
  • FIG. 20 is a block diagram showing an example of the hardware configuration of a computer that executes the series of processes described above using a program.
  • the computers functioning as the information processing device 12 and the information processing device 22 have a configuration similar to that shown in FIG. 20.
  • a CPU Central Processing Unit
  • ROM Read Only Memory
  • RAM Random Access Memory
  • An input/output interface 1005 is further connected to the bus 1004. Connected to the input/output interface 1005 are an input section 1006 consisting of a keyboard, a mouse, etc., and an output section 1007 consisting of a display, speakers, etc. Further, connected to the input/output interface 1005 are a storage unit 1008 consisting of a hard disk or non-volatile memory, a communication unit 1009 consisting of a network interface, etc., and a drive 1010 for driving a removable medium 1011.
  • a reaction motion DB 51, a meta-ized interaction-reaction DB 52, and a personalized interaction-reaction DB 53 are constructed in the storage unit 1008.
  • the CPU 1001 for example, loads the program stored in the storage unit 1008 into the RAM 1003 via the input/output interface 1005 and the bus 1004 and executes it.
  • a reaction presentation processing unit 151 is realized.
  • a program executed by the CPU 1001 is installed in the storage unit 1008 by being recorded on a removable medium 1011 or provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital broadcasting.
  • the program executed by the computer may be a program in which processing is performed chronologically in accordance with the order described in this specification, or may be a program in which processing is performed in parallel or at necessary timing such as when a call is made. It may also be a program that is carried out.
  • a system means a collection of multiple components (devices, modules (components), etc.), regardless of whether all the components are in the same casing. Therefore, multiple devices housed in separate casings and connected via a network, and a single device with multiple modules housed in one casing are both systems. .
  • the present technology can take a cloud computing configuration in which one function is shared and jointly processed by multiple devices via a network.
  • each step described in the above flowchart can be executed by one device or can be shared and executed by multiple devices.
  • one step includes multiple processes
  • the multiple processes included in that one step can be executed by one device or can be shared and executed by multiple devices.
  • the present technology can also have the following configuration.
  • a plurality of previous nodes each representing the first person and the second person at a first time are first motion edges representing a first motion relationship between the first person and the second person. connected, A plurality of subsequent nodes each representing the first person and the second person at a second time after the first time define a second operational relationship between the first person and the second person.
  • the information processing apparatus includes a generation unit that generates a scene graph in which the previous node of the second person and the subsequent node of the second person are connected by a second temporal relation edge representing the passage of time. (2) further comprising a recognition unit that recognizes the first operational relationship and the second operational relationship based on the measurement results, The generation unit is configured to generate, when a second graph including the plurality of subsequent nodes and the second action edge changes with respect to a first graph including the plurality of previous nodes and the first action edge.
  • the information processing device according to (1) above, which generates the scene graph.
  • the recognition unit further recognizes a real space context in which at least one of the first person and the second person is present,
  • the generation unit generates, based on the context of the real space, an object node representing an object in the real space, and an object representing a relationship between the object and at least one of the first person and the second person.
  • the information processing device according to (2) above which generates the scene graph including edges.
  • the recognition unit further recognizes an attribute of at least one of the first person and the second person, (2)
  • the generation unit generates the scene graph including an attribute node representing the attribute and an attribute edge representing a relationship between the attribute and at least one of the first person and the second person. Or the information processing device according to (3).
  • the generation unit is configured such that in the first graph, the first motion edge represents that the first person performed an interaction motion with respect to the second person, and in the second graph, the first motion edge represents the second motion edge in the second graph. If the motion edge indicates that the second person has performed a motion toward the first person, the motion of the second person toward the first person is detected as a reaction motion.
  • the information processing device according to any one of (4).
  • the method further includes a recording control unit that records graph information including a portion of the second graph including a component representing the reaction behavior and a portion of the first graph including a component representing the interaction behavior. 5) The information processing device according to item 5).
  • the information processing device wherein the recording control unit records the graph information in association with motion information of the reaction action.
  • the recording control unit records the graph information as information for selecting an action of the virtual character when at least one of the first person and the second person receives a presentation of the virtual character as a user.
  • the information processing device according to (6) or (7).
  • the graph information is information that abstracts and represents the content represented by the component node, (6)
  • the recording control unit records the graph information as information for selecting an action of the virtual character when a person other than the measurement target receives a presentation of the virtual character as a user. Or the information processing device according to (7).
  • a plurality of previous nodes each representing the first person and the second person at a first time are first motion edges representing a first motion relationship between the first person and the second person.
  • a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time define a second operational relationship between the first person and the second person.
  • An information processing method comprising: generating a scene graph in which the previous node of the second person and the subsequent node of the second person are connected by a second temporal relationship edge representing the passage of time. (11) to the computer, Based on the measurement results by the sensor including the first person and the second person as measurement targets, A plurality of previous nodes each representing the first person and the second person at a first time are first motion edges representing a first motion relationship between the first person and the second person.
  • a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time define a second operational relationship between the first person and the second person.
  • a second motion edge representing The previous node of the first person and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time
  • the known scene graph is a plurality of destination nodes each representing the first person and the second person at a first time; a first motion edge connecting the plurality of destination nodes and representing a first motion relationship between the first person and the second person; a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time; a second motion edge connecting the plurality of subsequent nodes and representing a second motion relationship between the first person and the second person; a first temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time; a second temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time; a second temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time; a second temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time; a second temporal
  • the information processing device further comprising a generation unit that generates a current scene graph including a plurality of time-related edges that connect the current nodes of the virtual character and represent the passage of time;
  • the selection unit selects the reaction action based on the known scene graph that includes components common to the current scene graph.
  • the selection unit selects the known scene graph from the known scene graph indicating that an interaction movement from the first person corresponding to the user to the second person corresponding to the virtual character corresponds to the interaction movement of the user. Based on the graph at the first time, an action performed by the second person toward the first person in the graph at the second time of the known scene graph is selected as the reaction action. ).
  • the first person in the known scene graph is the user; The information processing device according to (14) above.
  • the first person in the known scene graph is a different person from the user;
  • the information processing device Recognizing the user's interaction motion with respect to the virtual character based on the measurement result of the user's motion by the sensor, Selecting a reaction action to the interaction action of the user based on a known scene graph generated based on measurement results by a sensor including a first person and a second person as measurement targets; presenting the virtual character performing the reaction action to the user;
  • the known scene graph is a plurality of destination nodes each representing the first person and the second person at a first time; a first motion edge connecting the plurality of destination nodes and representing a first motion relationship between the first person and the second person; a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time; a second motion edge connecting the plurality of subsequent nodes and representing a second motion relationship between
  • the known scene graph is a plurality of destination nodes each representing the first person and the second person at a first time; a first motion edge connecting the plurality of destination nodes and representing a first motion relationship between the first person and the second person; a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time; a second motion edge connecting the plurality of subsequent nodes and representing a second motion relationship between the first person and the second person; a first temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time; a second temporal relationship

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present disclosure relates to an information processing device, an information processing method, and a recording medium that make it possible to cause a virtual character to execute a suitable reaction. An information processing device according to an embodiment of the present technology: recognizes an action constituting an interaction, by a user, with a virtual character; selects an action constituting a reaction to the interaction by the user on the basis of graph information representing a scenegraph that is generated on the basis of measurement results from a plurality of persons, the scenegraph being such that nodes representing each of the persons being measured at the same time are connected by edges representing the relationships between the persons, and such that nodes representing the same person being measured at different times are connected by time-elapse relationship edges representing the elapse of time; and presents, to the user, a virtual character that performs the action constituting the reaction. The present technology can be applied to an application that presents a virtual character using a head-mounted display (HMD).

Description

情報処理装置、情報処理方法、および記録媒体Information processing device, information processing method, and recording medium
 本技術は、特に、適切なリアクションを仮想キャラクタに実行させることができるようにした情報処理装置、情報処理方法、および記録媒体に関する。 The present technology particularly relates to an information processing device, an information processing method, and a recording medium that can cause a virtual character to perform an appropriate reaction.
 仮想キャラクタとコミュニケーションをとることができるAR(Augmented Reality)アプリケーションがある。ユーザは、HMD(Head Mounted Display)などのAR表示デバイスを装着することにより、実際の空間に重ねて表示される仮想キャラクタとコミュニケーションをとることができる。 There is an AR (Augmented Reality) application that allows you to communicate with virtual characters. By wearing an AR display device such as an HMD (Head Mounted Display), a user can communicate with a virtual character superimposed on the real space.
 仮想キャラクタに対して例えばユーザが手を振った場合、そのことが各種のセンサによる計測結果に基づいて検出され、仮想キャラクタの動作として、ユーザに対して手を振り返すなどの動作が実行される。このようなコミュニケーションは、あるインタラクションをユーザがとった場合に、それに応じたリアクションを仮想キャラクタに実行させることによって実現される。 For example, when a user waves at a virtual character, this is detected based on measurement results from various sensors, and the virtual character performs an action such as waving back at the user. . Such communication is realized by having a virtual character perform a corresponding reaction when a user makes a certain interaction.
国際公開第2020/095368号International Publication No. 2020/095368 国際公開第2020/217727号International Publication No. 2020/217727
 実在の人物間のコミュニケーションにおいても、実在の人物と仮想キャラクタとの間のコミュニケーションにおいても、あるインタラクションに対するリアクションの候補となる動作は複数考えられる。また、リアクションは、インタラクションの主体と客体との関係、各人物の属性、周辺の空間(物体配置)などに依存する。 In both communication between real people and between a real person and a virtual character, there can be multiple actions that can be reaction candidates for a certain interaction. In addition, reactions depend on the relationship between the subject and object of the interaction, the attributes of each person, the surrounding space (object arrangement), and the like.
 そのため、仮想キャラクタのリアクションについてのルールを人手で作成することは困難である。ルールを仮に人手で作成したとしても、適切なリアクションを仮想キャラクタに実行させることができない可能性が高い。 Therefore, it is difficult to manually create rules regarding the reactions of virtual characters. Even if the rules were created manually, there is a high possibility that the virtual character would not be able to perform an appropriate reaction.
 本技術はこのような状況に鑑みてなされたものであり、適切なリアクションを仮想キャラクタに実行させることができるようにするものである。 The present technology was developed in view of this situation, and is intended to enable a virtual character to perform an appropriate reaction.
 本技術の一側面の情報処理装置は、第1の人物と第2の人物を計測対象として含むセンサによる計測結果に基づいて、第1の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の先ノードが前記第1の人物と前記第2の人物間の第1の動作関係を表す第1の動作エッジで接続され、前記第1の時刻の後の第2の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の後ノードが前記第1の人物と前記第2の人物間の第2の動作関係を表す第2の動作エッジで接続され、前記第1の人物の前記先ノードと前記第1の人物の前記後ノードが、時間の経過を表す第1の経時関係エッジで接続され、前記第2の人物の前記先ノードと前記第2の人物の前記後ノードが、時間の経過を表す第2の経時関係エッジで接続されたシーングラフを生成する生成部を備える。 An information processing device according to an aspect of the present technology may measure the first person and the second person at a first time based on measurement results by a sensor that includes the first person and the second person as measurement targets. A plurality of previous nodes each representing a first action edge representing a first action relationship between the first person and the second person are connected by a first action edge representing a first action relationship between the first person and the second person, and a plurality of posterior nodes representing the first person and the second person, respectively, are connected by a second movement edge representing a second movement relationship between the first person and the second person; The previous node of the person and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time, and the previous node of the second person and the subsequent node of the second person are connected by a first temporal relationship edge representing the passage of time. The subsequent node includes a generation unit that generates a scene graph connected by a second temporal relationship edge representing the passage of time.
 本技術の他の側面の情報処理装置は、センサによるユーザの動作の計測結果に基づいて、仮想キャラクタに対する前記ユーザのインタラクション動作を認識する認識部と、第1の人物と第2の人物を計測対象として含むセンサによる計測結果に基づいて生成された既知シーングラフに基づいて、前記ユーザの前記インタラクション動作に対するリアクション動作を選択する選択部と、前記リアクション動作を行う前記仮想キャラクタを前記ユーザに提示する提示部とを備える。 An information processing device according to another aspect of the present technology includes a recognition unit that recognizes an interaction motion of the user with respect to a virtual character based on a measurement result of the user's motion by a sensor, and a recognition unit that measures a first person and a second person. a selection unit that selects a reaction action for the interaction action of the user based on a known scene graph generated based on measurement results by a sensor included as an object; and a selection unit that presents the virtual character that performs the reaction action to the user. and a presentation section.
 本技術の一側面においては、第1の人物と第2の人物を計測対象として含むセンサによる計測結果に基づいて、第1の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の先ノードが前記第1の人物と前記第2の人物間の第1の動作関係を表す第1の動作エッジで接続され、前記第1の時刻の後の第2の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の後ノードが前記第1の人物と前記第2の人物間の第2の動作関係を表す第2の動作エッジで接続され、前記第1の人物の前記先ノードと前記第1の人物の前記後ノードが、時間の経過を表す第1の経時関係エッジで接続され、前記第2の人物の前記先ノードと前記第2の人物の前記後ノードが、時間の経過を表す第2の経時関係エッジで接続されたシーングラフが生成される。 In one aspect of the present technology, based on measurement results by a sensor that includes a first person and a second person as measurement targets, a plurality of people representing the first person and the second person at a first time, respectively. are connected by a first motion edge representing a first motion relationship between the first person and the second person, and the first motion edge at a second time after the first time A plurality of posterior nodes representing a person and the second person, respectively, are connected by a second movement edge representing a second movement relationship between the first person and the second person, and a second movement edge represents a second movement relationship between the first person and the second person, The previous node and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time, and the previous node of the second person and the subsequent node of the second person are connected. , a scene graph connected by a second temporal relationship edge representing the passage of time is generated.
 本技術の他の側面においては、センサによるユーザの動作の計測結果に基づいて、仮想キャラクタに対する前記ユーザのインタラクション動作が認識され、第1の人物と第2の人物を計測対象として含むセンサによる計測結果に基づいて生成された既知シーングラフに基づいて、前記ユーザの前記インタラクション動作に対するリアクション動作が選択され、前記リアクション動作を行う前記仮想キャラクタが前記ユーザに提示される。 In another aspect of the present technology, the interaction motion of the user with respect to the virtual character is recognized based on the measurement result of the user's motion by the sensor, and the sensor includes the first person and the second person as measurement targets. Based on the known scene graph generated based on the results, a reaction action to the interaction action of the user is selected, and the virtual character performing the reaction action is presented to the user.
本技術を適用したARアプリケーションにおいて実現される仮想キャラクタのリアクションの例を示す図である。FIG. 3 is a diagram showing an example of a reaction of a virtual character realized in an AR application to which the present technology is applied. 本技術の一実施形態に係る情報処理システムにおける一連の処理の流れを示す図である。FIG. 2 is a diagram illustrating a series of processing flows in an information processing system according to an embodiment of the present technology. リアクション学習フェーズの処理の例を示す図である。FIG. 3 is a diagram illustrating an example of processing in a reaction learning phase. 空間情報を示すグラフの例を示す図である。It is a figure which shows the example of the graph which shows spatial information. 人物属性情報を示すグラフの例を示す図である。It is a figure which shows the example of the graph which shows person attribute information. 関係情報を示すグラフの例を示す図である。It is a figure which shows the example of the graph which shows relational information. リアクション検出処理について説明するフローチャートである。It is a flowchart explaining reaction detection processing. 認識結果を表すグラフの例を示す図である。It is a figure which shows the example of the graph showing a recognition result. 時空間シーングラフの例を示す図である。FIG. 3 is a diagram showing an example of a spatio-temporal scene graph. リアクション部分グラフ&モーション記録処理について説明するフローチャートである。12 is a flowchart illustrating reaction partial graph and motion recording processing. リアクション部分グラフの抽出の例を示す図である。FIG. 3 is a diagram illustrating an example of extraction of a reaction subgraph. リアクション部分グラフのメタ化の例を示す図である。FIG. 3 is a diagram illustrating an example of meta-ization of a reaction subgraph. リアクション提示フェーズの処理の例を示す図である。FIG. 6 is a diagram illustrating an example of processing in a reaction presentation phase. ユーザのインタラクションの例を示す図である。FIG. 3 is a diagram illustrating an example of user interaction. カレントシーングラフとリアクション部分グラフとの照合の例を示す図である。FIG. 7 is a diagram showing an example of matching between a current scene graph and a reaction subgraph. 仮想キャラクタのリアクションの例を示す図である。FIG. 3 is a diagram showing an example of a reaction of a virtual character. 情報処理装置12の機能構成例を示すブロック図である。1 is a block diagram showing an example of a functional configuration of an information processing device 12. FIG. 情報処理装置22の機能構成例を示すブロック図である。2 is a block diagram showing an example of a functional configuration of an information processing device 22. FIG. 情報処理システムの構成例を示すブロック図である。FIG. 1 is a block diagram showing a configuration example of an information processing system. コンピュータの構成例を示すブロック図である。1 is a block diagram showing an example of the configuration of a computer. FIG.
 以下、本技術を実施するための形態について説明する。説明は以下の順序で行う。
 1.本技術の概要
 2.情報処理システムにおける処理の流れ
 3.リアクション学習フェーズ
 4.リアクション提示フェーズ
 5.各装置の構成
 6.変形例
Hereinafter, a mode for implementing the present technology will be described. The explanation will be given in the following order.
1. Overview of this technology 2. Flow of processing in an information processing system 3. Reaction learning phase 4. Reaction presentation phase 5. Configuration of each device 6. Variant
<<本技術の概要>>
 図1は、本技術を適用したARアプリケーションにおいて実現される仮想キャラクタのリアクションの例を示す図である。
<<Overview of this technology>>
FIG. 1 is a diagram showing an example of a reaction of a virtual character realized in an AR application to which the present technology is applied.
 本技術を適用したARアプリケーションは、仮想キャラクタとの共生や対話などを行うことに用いられるアプリケーションである。図1に示すように、ARアプリケーションを利用するユーザは、HMDなどのAR表示デバイス1を装着し、仮想キャラクタとコミュニケーションをとることになる。 AR applications that apply this technology are applications that are used to coexist and interact with virtual characters. As shown in FIG. 1, a user using an AR application wears an AR display device 1 such as an HMD and communicates with a virtual character.
 図1の例においては、人型の仮想的なキャラクタである仮想キャラクタCがAR表示デバイス1によって表示されている。仮想キャラクタCの表示は、仮想キャラクタCの3Dモデルをレンダリングすることによって実現される。例えば光学シースルー型のAR表示デバイス1を装着しているユーザは、自分がいる部屋に仮想キャラクタCがいるかのような感覚を得ることになる。図1において仮想キャラクタCに色を付して示していることは、仮想キャラクタCがAR表示デバイス1によって表示される仮想的なオブジェクトであることを示している。 In the example of FIG. 1, a virtual character C, which is a humanoid virtual character, is displayed by the AR display device 1. Display of the virtual character C is realized by rendering a 3D model of the virtual character C. For example, a user wearing the optical see-through type AR display device 1 will feel as if the virtual character C is present in the room in which he or she is. The virtual character C shown in color in FIG. 1 indicates that the virtual character C is a virtual object displayed by the AR display device 1.
 なお、AR表示デバイス1として用いられるHMDが光学シースルー型のHMDではなく、ビデオシースルー型のHMDであってもよい。また、ユーザの部屋などの実空間上に仮想キャラクタCが表示されるのではなく、3次元の仮想的な空間である仮想空間上に仮想キャラクタCが表示されるようにしてもよい。 Note that the HMD used as the AR display device 1 may be a video see-through type HMD instead of an optical see-through type HMD. Furthermore, instead of displaying the virtual character C in a real space such as a user's room, the virtual character C may be displayed in a virtual space that is a three-dimensional virtual space.
 このような状態において、図1の上段に示すように、ユーザが仮想キャラクタCに対して手を振った場合、そのことが各種のセンサによる計測結果に基づいて検出される。また、ユーザが仮想キャラクタCに対して手を振ったことに応じて、図1の下段に示すように、仮想キャラクタCの動作として、ユーザに対して手を振り返すなどの動作が実行される。 In such a state, as shown in the upper part of FIG. 1, when the user waves his hand at the virtual character C, this is detected based on measurement results from various sensors. Furthermore, in response to the user waving his hand at the virtual character C, as shown in the lower part of FIG. 1, the virtual character C performs an action such as waving his hand back at the user. .
 本技術は、ARアプリケーションのユーザである実在人物が仮想キャラクタに対して行った動作(インタラクション)に対して、リアクションとなる適切な動作を、仮想キャラクタに効率的に実行させるものである。 This technology allows a virtual character to efficiently perform appropriate actions as a reaction to actions (interactions) performed on the virtual character by a real person who is a user of an AR application.
 部屋にある椅子に座るユーザのインタラクションに対して、もう1脚の椅子に座る動作がリアクションとして実行されるといったように、ユーザのインタラクションに応じた各種の動作が仮想キャラクタCのリアクションとして実行される。 Various actions are performed as reactions of the virtual character C according to the user's interaction, such as the action of sitting on another chair being performed as a reaction to the user's interaction of sitting on a chair in the room. .
 具体的には、主に以下の処理が行われる。 Specifically, the following processing is mainly performed.
 (1)実在人物間のインタラクションとリアクションを継続的に計測し、記録することでデータベース(DB)が生成される。インタラクションとリアクションの記録には、実在人物、物体、属性情報をノードとして表現し、インタラクションとリアクションの関係や実在人物の位置関係などの各種の関係性をエッジとして表現したシーングラフが用いられる。空間的な関係だけでなく、時間的な関係を表すエッジを用いることにより、インタラクションとリアクションなどの時空間上の関係を表現したシーングラフ(以下、適宜、時空間シーングラフという)が生成される。 (1) A database (DB) is created by continuously measuring and recording interactions and reactions between real people. To record interactions and reactions, a scene graph is used that represents real people, objects, and attribute information as nodes, and various relationships such as the relationship between interactions and reactions and the positional relationship of real people as edges. By using edges that express not only spatial relationships but also temporal relationships, a scene graph (hereinafter referred to as a spatiotemporal scene graph) that expresses spatiotemporal relationships such as interactions and reactions is generated. .
 実在人物間のインタラクションとリアクションの関係を、時空間シーングラフを用いて記録しておくことにより、実在人物から仮想キャラクタに対するインタラクションに応じた仮想キャラクタのリアクションとして適切な動作を効率的に検索し、仮想キャラクタに実行させることが可能になる。ある実在人物がリアクションとして行った動作と同じ動作を再現するようにして、仮想キャラクタのリアクションが検索され、実行される。 By recording the relationship between interactions and reactions between real people using a spatio-temporal scene graph, we can efficiently search for an appropriate action as a reaction for a virtual character in response to an interaction between a real person and a virtual character. It becomes possible to have a virtual character perform the actions. The reaction of the virtual character is searched and executed by reproducing the same action as that performed by a certain real person as a reaction.
 (2)時空間シーングラフを用いたDBとして、個人特化インタラクション-リアクションDBとメタ化インタラクション-リアクションDBが生成される。 (2) A personalized interaction-reaction DB and a meta-ized interaction-reaction DB are generated as DBs using spatiotemporal scene graphs.
 個人特化インタラクション-リアクションDBは、実空間において認識された属性情報などの情報をそのまま直接的に記録したDBである。メタ化インタラクション-リアクションDBは、実空間において認識された属性情報などの情報を抽象化(メタ化)して記録したDBである。 The personalized interaction-reaction DB is a DB that directly records information such as attribute information recognized in real space. Meta-ized interaction-reaction DB is a DB that records information such as attribute information recognized in real space after abstracting (meta-izing) it.
 仮想キャラクタによるリアクションの再現時、リアクションの相手となる実在人物が、特定の個人特化インタラクション-リアクションDBが用意されている人物である場合は、その個人特化インタラクション-リアクションDBを用いて、仮想キャラクタのリアクションの検索と再現が行われる。これにより、リアクションの相手となる実在人物に対してより適切なリアクションを再現することが可能となる。 When reproducing a reaction with a virtual character, if the real person to whom the reaction is directed is a person for whom a specific individualized interaction-reaction DB has been prepared, the virtual character can be reproduced using that individualized interaction-reaction DB. Character reactions are searched and reproduced. This makes it possible to reproduce a more appropriate reaction for the real person to whom the reaction is directed.
 一方、リアクションの相手となる実在人物が、特定の個人特化インタラクション-リアクションDBが用意されていない人物である場合は、メタ化インタラクション-リアクションDBを用いて、仮想キャラクタのリアクションの検索と再現が行われる。メタ化インタラクション-リアクションDBを用いてリアクションを再現することにより、リアクションの学習時に計測対象となっていない人物がユーザとして仮想キャラクタの提供を受ける場合にも対応可能となる。 On the other hand, if the real person who is the reaction partner is a person for whom a specific personalized interaction-reaction DB is not available, you can use the meta-ized interaction-reaction DB to search for and reproduce the virtual character's reactions. It will be done. By reproducing reactions using the meta-interaction-reaction DB, it becomes possible to handle cases where a person who is not the measurement target at the time of reaction learning receives a virtual character as a user.
 (3)時空間シーングラフは、ある時刻における実在人物、物体、属性などをノードとして表現し、それらの関係性をエッジとして表現したシーングラフ(空間シーングラフ)を生成し、各時刻のシーングラフにおける同一のノード間を、経時関係エッジで接続することによって構成される。経時関係エッジは、所定の時間の経過を表すエッジである。 (3) A spatiotemporal scene graph is a scene graph (spatial scene graph) that represents real people, objects, attributes, etc. at a certain time as nodes, and their relationships as edges. It is constructed by connecting the same nodes in with chronological relationship edges. A temporal relationship edge is an edge that represents the passage of a predetermined time.
 ある時刻の人物Aから人物Bに対するインタラクションがリアクションであるか否かを判定する場合、人物Aのノードに接続された経時関係エッジに基づいて過去のシーングラフを遡ることによって、人物Bから人物Aに対するインタラクションが直前にあったか否かを確認すればよいことになる。人物Bから人物Aに対するインタラクションが直前にあった場合には、人物Aから人物Bに対するインタラクションはリアクションであると判定される。これにより、DB内のデータに対するアクセスを効率的に行うことが可能となる。 When determining whether an interaction from person A to person B at a certain time is a reaction, the interaction from person B to person A is All you have to do is check whether there was an interaction immediately before. If there was an interaction from person B to person A immediately before, it is determined that the interaction from person A to person B is a reaction. This makes it possible to efficiently access data in the DB.
 (4)時空間シーングラフの情報は、それ自体が、実空間のコンテキストを抽象化した情報である。より少ない情報を用いて実空間のコンテキストを記述し、かつ、意味的な変化が生じた場合にのみ時空間シーングラフを更新することにより、データの肥大化を防止することが可能となる。実空間のコンテキストは、実空間にいる人物、実空間にある物体などの実空間の構成要素のそれぞれの状況、属性を含む概念である。 (4) The spatiotemporal scene graph information itself is information that abstracts the real space context. By describing the real-space context using less information and updating the spatio-temporal scene graph only when a semantic change occurs, it is possible to prevent data from becoming bloated. The real space context is a concept that includes the situations and attributes of real space components such as people in real space and objects in real space.
<<情報処理システムにおける処理の流れ>>
 図2は、本技術の一実施形態に係る情報処理システムにおける一連の処理の流れを示す図である。
<<Processing flow in the information processing system>>
FIG. 2 is a diagram illustrating a series of processing flows in an information processing system according to an embodiment of the present technology.
 図2に示すように、情報処理システムにおける一連の処理は、「リアクション学習フェーズ」と「リアクション提示フェーズ」の2段階の処理フェーズから構成される。情報処理システムには、リアクション学習側の構成とリアクション提示側の構成が含まれる。 As shown in FIG. 2, a series of processes in the information processing system consists of two processing phases: a "reaction learning phase" and a "reaction presentation phase." The information processing system includes a reaction learning side configuration and a reaction presentation side configuration.
 図2の上段に示すように、リアクション学習側の構成として、計測デバイス11と情報処理装置12が設けられる。計測デバイス11と情報処理装置12は、有線または無線の通信を介して接続される。 As shown in the upper part of FIG. 2, a measurement device 11 and an information processing device 12 are provided as a configuration on the reaction learning side. The measurement device 11 and the information processing device 12 are connected via wired or wireless communication.
 計測デバイス11は、カラー画像センサやデプスセンサなどの各種のセンサを搭載したセンサデバイスである。計測デバイス11は、実在人物がいる部屋などに設置される。図2の例においては、人物Aと人物Bがいる空間に計測デバイス11が設置されている。計測対象となる人物Aと人物Bは、会話をしたり、身振り手振りをしたりしてコミュニケーションをとっている。本開示において、計測対象の人物Aと人物Bを第1の人物と第2の人物という場合がある。 The measurement device 11 is a sensor device equipped with various sensors such as a color image sensor and a depth sensor. The measurement device 11 is installed in a room where a real person is present. In the example of FIG. 2, the measurement device 11 is installed in a space where a person A and a person B are present. Person A and Person B to be measured are communicating through conversation and gestures. In the present disclosure, person A and person B to be measured may be referred to as a first person and a second person.
 計測デバイス11は、計測対象の人物を計測し、カラー画像や距離画像などの計測データを情報処理装置12に出力する。計測デバイス11が出力するカラー画像には、インタラクションとなる動作やリアクションとなる動作をとっている人物だけでなく、人物の周囲にある家具などの物体が写っている。計測デバイス11が計測する距離画像により、人物までの距離、家具などの物体までの距離が表される。 The measurement device 11 measures the person to be measured and outputs measurement data such as a color image and a distance image to the information processing device 12. The color image output by the measurement device 11 shows not only a person performing an interaction action or a reaction action, but also objects such as furniture around the person. The distance image measured by the measurement device 11 represents the distance to a person and the distance to an object such as furniture.
 情報処理装置12は、計測データとして計測デバイス11から供給されたカラー画像を解析し、画像認識などを行うことによって、個人を特定する。情報処理装置12は、個人を特定した人物を対象として、年齢、性別、身長等の人物属性を認識する。 The information processing device 12 analyzes the color image supplied from the measurement device 11 as measurement data and performs image recognition or the like to identify an individual. The information processing device 12 recognizes personal attributes such as age, gender, and height for the identified person.
 また、情報処理装置12は、カラー画像や距離画像を解析し、人物の周辺にある物体を認識したり、物体間の関係、人物間の関係などを認識したりする。情報処理装置12は、認識結果に基づいて時空間シーングラフを生成し、人物の動作を示すモーション情報とあわせて記録することによってDBを生成する。情報処理装置12により生成されたDBの情報は、情報処理装置22に供給される。 The information processing device 12 also analyzes color images and distance images to recognize objects around people, relationships between objects, relationships between people, and the like. The information processing device 12 generates a spatio-temporal scene graph based on the recognition result and records it together with motion information indicating the motion of the person, thereby generating a DB. The information in the DB generated by the information processing device 12 is supplied to the information processing device 22.
 リアクション学習フェーズの処理が様々な人物を計測対象として繰り返し行われる。情報処理装置12が有するDBには、様々な人物を計測対象として取得された時空間シーングラフの情報が記録される。 Processing in the reaction learning phase is repeated using various people as measurement targets. The DB included in the information processing device 12 records information on spatio-temporal scene graphs acquired using various people as measurement targets.
 一方、図2の下段に示すように、リアクション提示側の構成として、AR表示デバイス1、計測デバイス21、情報処理装置22、および入力装置23が設けられる。計測デバイス21と情報処理装置22、情報処理装置22と入力装置23は、それぞれ、有線または無線の通信を介して接続される。ARアプリケーションの体験者となる人物Aが装着するAR表示デバイス1も、有線または無線の通信を介して情報処理装置22に接続される。 On the other hand, as shown in the lower part of FIG. 2, an AR display device 1, a measurement device 21, an information processing device 22, and an input device 23 are provided as a configuration on the reaction presentation side. The measurement device 21 and the information processing device 22, and the information processing device 22 and the input device 23 are each connected via wired or wireless communication. The AR display device 1 worn by the person A who will experience the AR application is also connected to the information processing device 22 via wired or wireless communication.
 計測デバイス21は、計測デバイス11と同様にカラー画像センサやデプスセンサなどの各種のセンサを搭載したセンサデバイスである。計測デバイス21は、AR表示デバイス1に表示されている仮想キャラクタCとコミュニケーションをとっているユーザ(人物A)の動作を計測し、カラー画像や距離画像などの計測データを情報処理装置22に出力する。 The measurement device 21, like the measurement device 11, is a sensor device equipped with various sensors such as a color image sensor and a depth sensor. The measurement device 21 measures the motion of the user (person A) who is communicating with the virtual character C displayed on the AR display device 1, and outputs measurement data such as a color image and a distance image to the information processing device 22. do.
 情報処理装置22は、ARコンテンツのデータをAR表示デバイス1に送信し、仮想キャラクタCを表示させる。仮想キャラクタCの3Dモデルを含むARコンテンツのデータが入力装置23から情報処理装置22に対して入力される。 The information processing device 22 transmits the AR content data to the AR display device 1 and causes the virtual character C to be displayed. AR content data including a 3D model of the virtual character C is input from the input device 23 to the information processing device 22 .
 また、情報処理装置22は、計測デバイス21から供給された計測データに基づいて、各種の認識処理を行い、ユーザの周辺のコンテキストを表す時空間シーングラフを生成する。情報処理装置22は、ユーザの周辺のコンテキストを表す時空間シーングラフと、リアクション学習フェーズの処理によって生成された時空間シーングラフとを照合し、仮想キャラクタCのリアクションを検索する。情報処理装置22は、検索により見つかった記録済みの実在人物の動きと同じ動きを仮想キャラクタCに実行させ、ユーザのインタラクションに応じたリアクションをユーザに提示する。 Furthermore, the information processing device 22 performs various recognition processes based on the measurement data supplied from the measurement device 21, and generates a spatiotemporal scene graph representing the context around the user. The information processing device 22 searches for a reaction of the virtual character C by comparing the spatio-temporal scene graph representing the user's surrounding context with the spatio-temporal scene graph generated by the processing of the reaction learning phase. The information processing device 22 causes the virtual character C to perform the same movement as the recorded real person's movement found through the search, and presents the user with a reaction according to the user's interaction.
 図2の例においては、計測デバイス11と計測デバイス21が異なる筐体のデバイスとされているが、同じ筐体のデバイスとして構成されるようにしてもよい。この場合、リアクション学習フェーズにおける計測とリアクション提示フェーズにおける計測が同じ空間において行われることになる。 In the example of FIG. 2, the measurement device 11 and the measurement device 21 are devices in different casings, but they may be configured as devices in the same casing. In this case, measurement in the reaction learning phase and measurement in the reaction presentation phase are performed in the same space.
 また、情報処理装置12と情報処理装置22のそれぞれの機能がインターネット上のサーバによって実現されるようにしてもよい。また、情報処理装置12と情報処理装置22が同じ筐体の装置として構成されるようにしてもよい。 Furthermore, the functions of the information processing device 12 and the information processing device 22 may be realized by a server on the Internet. Further, the information processing device 12 and the information processing device 22 may be configured as devices in the same housing.
 計測デバイス11の機能と情報処理装置12の機能が1つの装置において実現されるようにしてもよい。リアクション提示側についても同様に、計測デバイス21の機能と情報処理装置22の機能が1つの装置において実現されるようにしてもよい。また、情報処理装置22の機能がAR表示デバイス1に搭載されるようにしてもよい。 The functions of the measurement device 11 and the information processing device 12 may be realized in one device. Similarly, on the reaction presentation side, the function of the measurement device 21 and the function of the information processing device 22 may be realized in one device. Further, the functions of the information processing device 22 may be installed in the AR display device 1.
 以下、リアクション学習フェーズとリアクション提示フェーズの各フェーズの処理の詳細について説明する。 Hereinafter, details of the processing of each phase of the reaction learning phase and the reaction presentation phase will be explained.
<<リアクション学習フェーズ>>
 図3は、リアクション学習フェーズの処理の例を示す図である。
<<Reaction learning phase>>
FIG. 3 is a diagram illustrating an example of processing in the reaction learning phase.
 リアクション学習フェーズの処理には、認識処理(Step1)、リアクション検出処理(Step2)、リアクション部分グラフ&モーション記録処理(Step3)の各処理が含まれる。 The processing of the reaction learning phase includes the following processing: recognition processing (Step 1), reaction detection processing (Step 2), and reaction subgraph & motion recording processing (Step 3).
<Step1:認識処理>
 認識処理は、人物間のインタラクションとリアクション、各人物の属性情報、空間情報を計測デバイス11から供給された計測データに基づいて認識する処理である。認識処理には、Step1-1乃至1-3の処理として、空間認識処理、人物属性認識処理、インタラクション認識処理が含まれる。インタラクション認識処理以外の処理については行われないようにしてもよい。
<Step 1: Recognition processing>
The recognition process is a process of recognizing interactions and reactions between people, attribute information of each person, and spatial information based on measurement data supplied from the measurement device 11. The recognition processing includes spatial recognition processing, human attribute recognition processing, and interaction recognition processing as the processing of Steps 1-1 to 1-3. Processing other than interaction recognition processing may not be performed.
・Step1-1:空間認識処理
 空間認識処理においては、計測デバイス11から供給された計測データに基づいて空間情報が生成される。空間情報として例えば以下の情報が生成される。
- Step 1-1: Spatial recognition processing In the spatial recognition processing, spatial information is generated based on measurement data supplied from the measurement device 11. For example, the following information is generated as spatial information.
 ・各物体の幾何情報:形や大きさなど
 ・各物体の意味情報:カテゴリ(壁・床・イスなど)、パーツ(背もたれ、ドアノブ)
 ・物体間の関係を示す情報:「物体Aは物体Bの近くにある」などの物体間の距離、「物体Aは物体Bの前にある」などの物体間の位置関係など
・Geometric information of each object: shape, size, etc. ・Semantic information of each object: category (wall, floor, chair, etc.), parts (backrest, doorknob)
- Information indicating relationships between objects: distances between objects such as "Object A is near object B", positional relationships between objects such as "Object A is in front of object B", etc.
 空間認識の技術として例えば文献1,2に記載されている技術を用いることが可能である。
 文献1「[Narita+, IROS2019] G. Narita, T. Seno, T. Ishikawa and Y. Kaji, “PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things”, IROS, 2019.」
 文献2「[Tahara+, ISMAR2020] T. Tahara, T. Seno, G. Narita and T. Ishikawa, “Retargetable AR: Context-aware Augmented Reality in Indoor Scenes based on 3D Scene Graph”, ISMAR, 2020.」
For example, the techniques described in Documents 1 and 2 can be used as spatial recognition techniques.
Reference 1 “[Narita+, IROS2019] G. Narita, T. Seno, T. Ishikawa and Y. Kaji, “PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things”, IROS, 2019.”
Reference 2 “[Tahara+, ISMAR2020] T. Tahara, T. Seno, G. Narita and T. Ishikawa, “Retargetable AR: Context-aware Augmented Reality in Indoor Scenes based on 3D Scene Graph”, ISMAR, 2020.”
・Step1-2:人物属性認識処理
 人物属性認識処理においては、計測デバイス11から供給された計測データに基づいて人物の属性情報が生成される。人物属性情報として例えば以下の情報が生成される。
- Step 1-2: Person Attribute Recognition Process In the person attribute recognition process, person attribute information is generated based on the measurement data supplied from the measurement device 11. For example, the following information is generated as the person attribute information.
 ・年齢、性別、人種などの、外見から得られる情報
 ・上記情報や画像情報と組み合わせた個人識別が可能な情報
・Information obtained from appearance, such as age, gender, race, etc. ・Information that allows personal identification in combination with the above information and image information
 人物属性については、例えばOpenCVなどのライブラリを利用して認識することが可能である。 Personal attributes can be recognized using a library such as OpenCV, for example.
・Step1-3:インタラクション認識処理
 インタラクション認識処理においては、計測デバイス11から供給された計測データに基づいて、物体-人物間のインタラクションを示す関係情報と、人物-人物間のインタラクションを示す関係情報が生成される。関係情報として例えば以下の情報が生成される。
・Step 1-3: Interaction recognition processing In the interaction recognition processing, based on the measurement data supplied from the measurement device 11, relational information indicating the interaction between an object and a person and relational information indicating the interaction between the persons are generated. For example, the following information is generated as related information.
 ・物体-人物間:人物Aがカウチに座る(「人物Aが物体BにVする」)などの、ある人物がある物体に対して作用を働かせた動作
 ・人物-人物間:人物Aが人物Bに手を振る(「人物Aが人物BにVする」)などの、ある人物がある人物に対して行った動作
・Object-person: An action in which a person exerts an action on an object, such as person A sitting on a couch (“person A makes a V motion to object B”). ・Person-person: Person A acts as a person. An action performed by a person toward another person, such as waving to B (“Person A waves V to Person B”)
 インタラクション認識の技術として例えば文献2に記載されている技術を用いることが可能である。
 文献3「[Gao+, BMVC2018] C. Gao, Y. Zou and J.-B. Huang, “iCAN: Instance-Centric Attention Network for Human-Object Interaction”, BMVC, 2018.」
For example, the technique described in Document 2 can be used as the interaction recognition technique.
Reference 3 “[Gao+, BMVC2018] C. Gao, Y. Zou and J.-B. Huang, “iCAN: Instance-Centric Attention Network for Human-Object Interaction”, BMVC, 2018.”
 空間認識処理によって生成された空間情報、人物属性認識処理によって生成された人物属性情報、および、インタラクション認識処理によって生成された関係情報は、グラフ構造を有するデータであるグラフとして取得される。 The spatial information generated by the spatial recognition process, the person attribute information generated by the person attribute recognition process, and the relational information generated by the interaction recognition process are acquired as a graph, which is data having a graph structure.
 図4は、空間情報を示すグラフの例を示す図である。 FIG. 4 is a diagram showing an example of a graph showing spatial information.
 例えば、計測対象の人物の周辺にソファ、椅子、テーブルが存在し、それらが所定の位置関係を有するように配置されていることが認識されたものとする。この場合、空間情報を示すグラフは、図4に示すように、それらの物体を表す3つのノード(sofa#1、chair#2、table#3)によって構成されるデータとなる。本開示において、物体を表すノードを物体ノードという場合がある。 For example, assume that it has been recognized that a sofa, chair, and table exist around the person to be measured, and that they are arranged in a predetermined positional relationship. In this case, the graph representing the spatial information is data composed of three nodes (sofa#1, chair#2, table#3) representing these objects, as shown in FIG. In this disclosure, a node representing an object may be referred to as an object node.
 図4の例においては、ソファのノードと椅子のノードは、「in front of」のラベルを持つエッジE1と「on left of」のラベルを持つエッジE2で接続される。ソファのノードから椅子のノードに向かう矢印として表されるエッジE1は、椅子がソファの前に存在することを表す。また、椅子のノードからソファのノードに向かう矢印として表されるエッジE2は、椅子の正面に対してソファが左側に存在することを表す。 In the example of FIG. 4, the sofa node and the chair node are connected by an edge E1 labeled "in front of" and an edge E2 labeled "on left of." Edge E1, represented as an arrow pointing from the sofa node to the chair node, represents that the chair is in front of the sofa. Furthermore, an edge E2 expressed as an arrow pointing from the chair node to the sofa node indicates that the sofa exists on the left side with respect to the front of the chair.
 ソファのノードとテーブルのノード、および、椅子のノードとテーブルのノードも、それぞれ、それらの位置関係を表すラベルが設定されたエッジE3,E4で接続される。 The sofa node and the table node, and the chair node and the table node are also connected by edges E3 and E4, respectively, which are set with labels representing their positional relationships.
 図5は、人物属性情報を示すグラフの例を示す図である。 FIG. 5 is a diagram showing an example of a graph showing personal attribute information.
 計測対象の人物の年齢が60歳として認識された場合、人物属性情報を示すグラフは、図5のAに示すように、計測対象の人物を表すノード(Person#1)と、年齢が60歳であることを表すノード(Age:60)によって構成される。計測対象の人物を表すノードと年齢が60歳であることを表すノードは、「has」のラベルを持つエッジE11で接続される。本開示において、属性を表すノードを属性ノード、属性と人物との関係を表すエッジを属性エッジという場合がある。 If the age of the person to be measured is recognized as 60 years old, the graph showing the person attribute information will have a node (Person #1) representing the person to be measured, and a node whose age is 60 years old, as shown in A in Figure 5. It is composed of a node (Age:60) that indicates that The node representing the person to be measured and the node representing age 60 are connected by an edge E11 labeled "has". In this disclosure, a node representing an attribute may be referred to as an attribute node, and an edge representing a relationship between an attribute and a person may be referred to as an attribute edge.
 また、計測対象の人物の身長が1.8mとして認識された場合、人物属性情報を示すグラフは、図5のBに示すように、計測対象の人物を表すノード(Person#1)と、身長が1.8mであることを表すノード(Height:1.8m)によって構成される。計測対象の人物を表すノードと身長が1.8mであることを表すノードは、「has」のラベルを持つエッジE12で接続される。 Furthermore, if the height of the person to be measured is recognized as 1.8m, the graph showing the person attribute information will be divided into a node representing the person to be measured (Person #1) and a node (Person#1) representing the person to be measured, and the height of the person to be measured to be 1.8m. It is composed of a node (Height: 1.8m) that represents 1.8m. The node representing the person to be measured and the node representing the height of 1.8 m are connected by an edge E12 labeled "has".
 図6は、関係情報を示すグラフの例を示す図である。 FIG. 6 is a diagram showing an example of a graph showing relational information.
 物体と人物間のインタラクションとして、計測対象の人物がテーブルを使っていることと、その人物が椅子に座っていることが認識されたものとする。この場合、関係情報を示すグラフは、図6に示すように、計測対象の人物を表すノード(Person#1)、テーブルを表すノード(table#1)、および、椅子を表すノード(chair#2)によって構成される。また、認識された実空間のコンテキストに基づいて人物と物体との関係を表すエッジを物体エッジという場合がある。 As an interaction between an object and a person, it is assumed that it is recognized that the person to be measured is using a table and that the person is sitting on a chair. In this case, the graph showing the relationship information includes a node representing the person to be measured (Person#1), a node representing the table (table#1), and a node representing the chair (Chair#2), as shown in FIG. ). Further, an edge representing a relationship between a person and an object based on the recognized real space context may be referred to as an object edge.
 計測対象の人物を表すノードとテーブルを表すノードは、「use」のラベルを持つエッジE21で接続される。また、計測対象の人物を表すノードと椅子を表すノードは、「sitting on」のラベルを持つエッジE22で接続される。 The node representing the person to be measured and the node representing the table are connected by an edge E21 labeled "use". Further, the node representing the person to be measured and the node representing the chair are connected by an edge E22 labeled "sitting on".
 Step1における認識処理の結果は、このように、物体、人物、人物属性をノードとして表現し、それらの関係性をエッジとして表現したグラフ構造を有するデータとして取得される。 The result of the recognition process in Step 1 is thus obtained as data having a graph structure in which objects, people, and person attributes are expressed as nodes, and relationships among them are expressed as edges.
<Step2:リアクション検出処理>
 リアクション検出処理は、認識処理によって取得された情報に基づいて、人物のリアクションを検出する処理である。
<Step 2: Reaction detection processing>
The reaction detection process is a process of detecting a person's reaction based on information acquired by the recognition process.
 図7のフローチャートを参照して、リアクション検出処理について説明する。 The reaction detection process will be explained with reference to the flowchart in FIG.
・Step2-1.意味的変化の検出
 意味的変化の検出は、現在時刻である時刻tにおける認識結果を表すグラフと、前回の計測時刻である時刻t-1における認識結果を表すグラフとに基づいて、関係の変化を検出する処理である。関係に変化がないと判定された場合、認識処理によって新たに取得された情報に基づいて、意味的変化の検出が繰り返し行われる。
・Step2-1. Detection of semantic change Detection of semantic change is based on a graph representing the recognition result at time t, which is the current time, and a graph representing the recognition result at time t-1, which is the previous measurement time. This is the process of detecting. If it is determined that there is no change in the relationship, detection of a semantic change is repeatedly performed based on information newly acquired by the recognition process.
 一方、前回の計測時刻である時刻t-1における認識結果を表すグラフに対して、所定時間経過後の現在時刻である時刻tにおける認識結果を表すグラフが変化した場合、関係の変化があったものとして判定される。この場合、新規のリアクションが発生している可能性があることから、処理は後段の処理に進む。 On the other hand, if the graph representing the recognition result at time t, which is the current time after a predetermined time has elapsed, changes with respect to the graph representing the recognition result at time t-1, which is the previous measurement time, there is a change in the relationship. It is judged as such. In this case, since there is a possibility that a new reaction has occurred, the process proceeds to the subsequent stage of processing.
・Step2-2.時空間シーングラフの生成
 時刻tにおける認識結果を表すグラフと時刻t-1における認識結果を表すグラフとの間に変化が検出された場合、空間情報のグラフ、人物属性情報のグラフ、関係情報のグラフが1つの時空間シーングラフに統合される。各グラフの統合は、現在時刻である時刻tから、閾値となる時間である時間Tth以内における過去の計測時刻におけるグラフを、時間的・空間的に接続するようにして行われる。
・Step2-2. Generation of spatio-temporal scene graph If a change is detected between the graph representing the recognition result at time t and the graph representing the recognition result at time t-1, a graph of spatial information, a graph of person attribute information, a graph of relational information The graphs are unified into one spatiotemporal scene graph. The integration of each graph is performed by temporally and spatially connecting graphs at past measurement times within time Tth, which is a threshold time, from time t, which is the current time.
 図8は、認識結果を表すグラフの例を示す図である。 FIG. 8 is a diagram showing an example of a graph representing the recognition results.
 図8の左側に示すように、時刻tにおいて人物Aが人物Bに対して手を振り、時間Δt経過後の時刻t+Δtにおいて人物Bが人物Aに対して手を振ったことが認識されているものとする。本開示において、時刻tと時刻t+Δtをそれぞれ第1の時刻と第2の時刻という場合がある。この場合、時刻tと時刻t+Δtのそれぞれの時刻における空間情報のグラフ、人物属性情報のグラフ、関係情報のグラフは、図8の右側に示すグラフとなる。本開示において、第1の時刻のグラフと第2の時刻のグラフをそれぞれ第1のグラフと第2のグラフという場合がある。 As shown on the left side of FIG. 8, it is recognized that person A waved to person B at time t, and that person B waved to person A at time t+Δt after time Δt. shall be taken as a thing. In this disclosure, time t and time t+Δt may be referred to as first time and second time, respectively. In this case, the spatial information graph, the person attribute information graph, and the relationship information graph at each time of time t and time t+Δt become the graphs shown on the right side of FIG. 8. In this disclosure, the graph at the first time and the graph at the second time may be referred to as a first graph and a second graph, respectively.
 例えば、時刻tにおける空間情報のグラフは、物体としての床のノードにより構成される。時刻t+Δtにおける空間情報のグラフも、物体としての床のノードにより構成される。 For example, the graph of spatial information at time t is composed of nodes of the floor as an object. The graph of spatial information at time t+Δt is also composed of nodes of the floor as an object.
 時刻tにおける床のノードと、時刻t+Δtにおける床のノードとは、時間Δtのラベルを持つ経時関係エッジで接続される。この例においては、空間情報のグラフに変化がないものとされている。 The floor node at time t and the floor node at time t+Δt are connected by a temporal relationship edge with a label of time Δt. In this example, it is assumed that there is no change in the spatial information graph.
 また、時刻tにおける人物属性情報のグラフは、人物Aのノードと、年齢が20歳であることを表すノード(Age:20)とを「has」のラベルを持つエッジで接続したグラフ、および、人物Bのノードと、年齢が19歳であることを表すノード(Age:19)とを「has」のラベルを持つエッジで接続したグラフによって構成される。時刻t+Δtにおける人物属性情報のグラフも、同じ構造のグラフにより構成される。本開示において、人物間の動作関係を表すエッジを動作エッジという場合がある。 Further, the graph of person attribute information at time t is a graph in which the node of person A and the node representing age 20 (Age:20) are connected by an edge labeled "has", and It is composed of a graph in which the node of person B and the node representing age 19 (Age:19) are connected by an edge labeled "has". The graph of person attribute information at time t+Δt is also composed of a graph with the same structure. In the present disclosure, an edge representing a motion relationship between people may be referred to as a motion edge.
 時刻tにおける各ノードと、時刻t+Δtにおける同じノードは、時間Δtのラベルを持つ経時関係エッジで接続される。この例においては、人物属性情報のグラフにも変化がないものとされている。本開示において、時刻tにおける各ノードを先ノード、時刻t+Δtにおける各ノードを後ノードという場合がある。 Each node at time t and the same node at time t+Δt are connected by a temporal relationship edge labeled with time Δt. In this example, it is assumed that there is no change in the graph of person attribute information. In this disclosure, each node at time t may be referred to as a previous node, and each node at time t+Δt may be referred to as a subsequent node.
 時刻tにおける関係情報のグラフは、人物Aのノードと床のノードとを「stand」のラベルを持つエッジで接続するとともに、人物Aのノードと人物Bのノードとを、「wave」のラベルを持つ、人物Aのノードから人物Bのノードに向かうエッジで接続したグラフを含む。「wave」のラベルを持つ、人物Aのノードから人物Bのノードに向かうエッジは、人物Aが人物Bに向かって手を振ったことを表す。時刻tにおける関係情報のグラフには、人物Bのノードと床のノードとを接続する「stand」のラベルも含まれる。 The graph of relational information at time t connects person A's node and floor node with an edge labeled "stand," and connects person A's node and person B's node with an edge labeled "wave." Contains a graph connected by an edge from the node of person A to the node of person B. An edge labeled "wave" from a node of person A to a node of person B represents that person A waved his hand toward person B. The graph of relational information at time t also includes a label of "stand" that connects the node of person B and the node of the floor.
 時刻t+Δtにおける関係情報のグラフは、人物Aのノードと人物Bのノードとを、「wave」のラベルを持つ、人物Bのノードから人物Aのノードに向かうエッジで接続したグラフを含む。「wave」のラベルを持つ、人物Bのノードから人物Aのノードに向かうエッジは、人物Bが人物Aに向かって手を振ったことを表す。時刻t+Δtにおける他のグラフは、時刻tにおける関係情報のグラフと同じである。 The graph of the relational information at time t+Δt includes a graph in which the node of person A and the node of person B are connected by an edge from the node of person B to the node of person A that has the label "wave". An edge from a node of person B to a node of person A with the label "wave" represents that person B waved his hand toward person A. The other graphs at time t+Δt are the same as the relationship information graph at time t.
 時刻tにおける各ノードと、時刻t+Δtにおける同じノードは、時間Δtのラベルを持つ経時関係エッジで接続される。この例においては、関係情報のグラフに変化があったものとして判定される。 Each node at time t and the same node at time t+Δt are connected by a temporal relationship edge labeled with time Δt. In this example, it is determined that there has been a change in the graph of the related information.
 このように、時刻tにおける認識結果を表すグラフと時刻t+Δtにおける認識結果を表すグラフとの間に変化が検出された場合、グラフの統合が行われ、時空間シーングラフが生成される。 In this way, when a change is detected between the graph representing the recognition result at time t and the graph representing the recognition result at time t+Δt, the graphs are integrated and a spatiotemporal scene graph is generated.
 図9は、時空間シーングラフの例を示す図である。 FIG. 9 is a diagram showing an example of a spatio-temporal scene graph.
 図9に示す時空間シーングラフを構成する上段のグラフは、時刻tにおける空間情報、人物属性情報、関係情報のそれぞれのグラフを統合したグラフである。また、下段のグラフは、時刻t+Δtにおける空間情報、人物属性情報、関係情報のそれぞれのグラフを統合したグラフである。 The upper graph constituting the spatiotemporal scene graph shown in FIG. 9 is a graph that integrates the spatial information, person attribute information, and relational information graphs at time t. Furthermore, the lower graph is a graph that integrates the spatial information, person attribute information, and relational information graphs at time t+Δt.
 時刻tにおける各ノードと、時刻t+Δtにおける同じノードは、時間Δtのラベルを持つ経時関係エッジE31乃至E35で接続される。 Each node at time t and the same node at time t+Δt are connected by temporal relationship edges E31 to E35 labeled with time Δt.
 例えば、時刻tにおける人物Aを表すノードと時刻t+Δtにおける人物Aを表すノードは経時関係エッジE31で接続され、時刻tにおける人物Bを表すノードと時刻t+Δtにおける人物Bを表すノードは経時関係エッジE32で接続される。同様に、時刻tにおける床を表すノードと時刻t+Δtにおける床を表すノードは経時関係エッジE33で接続される。 For example, a node representing person A at time t and a node representing person A at time t+Δt are connected by a temporal relationship edge E31, and a node representing person B at time t and a node representing person B at time t+Δt are connected by a temporal relationship edge E32. Connected with Similarly, a node representing the floor at time t and a node representing the floor at time t+Δt are connected by a temporal relationship edge E33.
 時刻tにおいて人物Aが人物Bに対して手を振り、時間Δt経過後の時刻t+Δtにおいて人物Bが人物Aに対して手を振ったことは、図9に示すような時空間シーングラフによって表される。このように、時空間シーングラフは、同じ時刻に計測対象となったそれぞれの人物を表すノード間を少なくとも人物間の関係を表すエッジで接続するとともに、異なる時刻に計測対象となった同じ人物を表すノード間を、時間の経過を表すエッジである経時関係エッジで接続したシーングラフとなる。 The fact that person A waves to person B at time t, and person B waves to person A at time t+Δt after time Δt has elapsed can be expressed by a spatio-temporal scene graph as shown in FIG. be done. In this way, the spatio-temporal scene graph connects nodes representing each person who was the object of measurement at the same time with at least an edge representing the relationship between the persons, and also connects nodes representing each person who was the object of measurement at different times. It becomes a scene graph in which the nodes it represents are connected by time-related edges, which are edges that represent the passage of time.
 図9の例においては、時刻tと時刻t+Δtの2時刻分のグラフを統合することによって1つの時空間シーングラフが生成されているが、3時刻分以上のグラフを統合することによって1つの時空間シーングラフが生成されるようにしてもよい。 In the example of FIG. 9, one spatio-temporal scene graph is generated by integrating graphs for two times, time t and time t+Δt, but one time-spatial scene graph is generated by integrating graphs for three or more times. A spatial scene graph may be generated.
・Step2-3.現在時刻の人物間関係(S,V,O)の抽出
 リアクションは、特定の条件下におけるインタラクションである。あるインタラクションがリアクションであるか否かを判定するために、判定の対象となるインタラクションの抽出が行われる。本開示において、インタラクションとリアクションをそれぞれインタラクション動作とリアクション動作という場合がある。
・Step2-3. Extraction of interpersonal relationships (S, V, O) at the current time Reactions are interactions under specific conditions. In order to determine whether a certain interaction is a reaction, the interaction to be determined is extracted. In this disclosure, an interaction and a reaction may be referred to as an interaction action and a reaction action, respectively.
 人物間関係は、「主体S(Subject)」、「動作V(Verb)」、「客体O(Object)」を組み合わせた人物間関係(S,V,O)として表される。図9の時空間シーングラフが生成されている場合、現在時刻である時刻t+Δtの人物間関係(S,V,O)として、人物間関係(人物B,wave,人物A)が抽出される。人物間関係(人物B,wave,人物A)は、主体となる人物Bが、客体となる人物Aに対して、手を振るといった動作を行ったことを表す。 The relationship between people is expressed as a relationship between people (S, V, O) that is a combination of "Subject", "Verb", and "Object". When the spatio-temporal scene graph of FIG. 9 has been generated, the interpersonal relationships (person B, wave, person A) are extracted as the interpersonal relationships (S, V, O) at time t+Δt, which is the current time. The interpersonal relationship (person B, wave, person A) indicates that person B, who is the subject, performed an action such as waving to person A, who is the object.
 ここで、特定の条件は、数秒間などの過去の短時間に、人物間関係(S,V,O)とは逆の関係を表す人物間関係(O,V,S)が発生していること、として表すことができる。人物間関係(O,V,S)は、現在時刻において客体Oとなっている人物が過去に主体となり、現在時刻において主体Sになっている人物を客体として動作Vを行った関係を表す。なお、人物間関係(S,V,O)における動作Vと人物間関係(O,V,S)における動作Vは、異なる動作であってもよい。 Here, the specific condition is that a relationship between people (O, V, S) representing a relationship opposite to the relationship between people (S, V, O) has occurred in a short time in the past, such as a few seconds. It can be expressed as. The interpersonal relationship (O, V, S) represents a relationship in which a person who is an object O at the current time has acted as a subject in the past, and a person who has become a subject S at the current time has performed an action V as an object. Note that the motion V in the interpersonal relationship (S, V, O) and the motion V in the interpersonal relationship (O, V, S) may be different motions.
・Step2-4.時空間シーングラフから人物間関係(O,V,S)の検索
 過去の短時間の人物間関係を対象として、現在時刻の人物間関係(S,V,O)に対する人物間関係(O,V,S)の検索が行われる。
・Step2-4. Search for interpersonal relationships (O, V, S) from spatio-temporal scene graphs Targeting human relationships in the past for a short time, human relationships (O, V, , S) is searched.
 人物間関係(O,V,S)の検索は、時空間シーングラフにおいて、現在時刻の主体Sのノードを過去に遡り、現在時刻の主体Sの人物を客体とした、現在時刻の客体Oの人物によるインタラクションを検索する処理となる。この処理は、現在時刻の主体Sの人物に関係する時空間シーングラフのエッジを辿ることによって実現できることから、データに対するアクセス頻度を抑えることができ、効率的な検索が可能になる。 The search for the relationship between people (O, V, S) is performed by tracing back the node of the subject S at the current time in the spatio-temporal scene graph, and searching for the object O at the current time, with the person of the subject S at the current time as the object. This process searches for interactions by people. This processing can be realized by tracing the edges of the spatio-temporal scene graph related to the person of the subject S at the current time, so the frequency of access to data can be suppressed and efficient searches can be performed.
 現在時刻の人物間関係(S,V,O)に対する人物間関係(O,V,S)が過去の短時間に見つかった場合、現在時刻の人物間関係(S,V,O)は、リアクションであると判定される。 If the interpersonal relationship (O, V, S) for the interpersonal relationship (S, V, O) at the current time is found in a short time in the past, the interpersonal relationship (S, V, O) at the current time is It is determined that
 上述した例の場合、現在時刻t+Δtの人物間関係(S,V,O)である人物間関係(人物B,wave,人物A)に対する人物間関係(O,V,S)として、人物Aが人物Bに対して手を振ったことを表す人物間関係(人物A,wave,人物B)の検索が、現在時刻t+Δtの人物Bのノードを起点としてエッジを辿ることによって行われる。 In the case of the above example, the interpersonal relationship (O, V, S) for the interpersonal relationship (person B, wave, person A), which is the interpersonal relationship (S, V, O) at the current time t + Δt, is A search for a relationship between persons (person A, wave, person B) representing waving at person B is performed by tracing edges starting from the node of person B at current time t+Δt.
 また、人物間関係(人物A,wave,人物B)を表すグラフが短時間である時間Δtの間に含まれることから、人物間関係(人物B,wave,人物A)、すなわち、現在時刻において人物Bが人物Aに対して手を振ったことは、リアクションであると判定されることになる。このように、人物Bが人物Aに対してある動作を行ったことを表す構成要素が時刻t+Δtのグラフに含まれ、人物Aが人物Bに対してある動作を行ったことを表す構成要素が時刻tのグラフに含まれる場合に、人物Bが人物Aに対して行った時刻t+Δtの動作がリアクションとして検出される。 In addition, since the graph representing the relationship between persons (person A, wave, person B) is included during a short time Δt, the relationship between persons (person B, wave, person A), that is, at the current time. Person B's waving at Person A is determined to be a reaction. In this way, a component representing that person B performed a certain action on person A is included in the graph at time t+Δt, and a component representing that person A performed a certain action on person B is included in the graph at time t+Δt. When included in the graph at time t, the action performed by person B on person A at time t+Δt is detected as a reaction.
 人物間関係(O,V,S)が過去の短時間にないと判定された場合、Step2-1に戻り、意味的変化の検出が繰り返し行われる。 If it is determined that the interpersonal relationship (O, V, S) did not exist in the short time in the past, the process returns to Step 2-1 and the detection of the semantic change is repeated.
 一方、人物間関係(O,V,S)が過去の短時間に見つかったと判定された場合、図3のリアクション部分グラフ&モーション記録処理(Step3)が行われる。 On the other hand, if it is determined that the interpersonal relationship (O, V, S) was found in a short time in the past, the reaction partial graph & motion recording process (Step 3) of FIG. 3 is performed.
<Step3:リアクション部分グラフ&モーション記録処理>
 リアクション部分グラフ&モーション記録処理は、リアクションとして判定された人物間関係(S,V,O)の発生時刻から、リアクションの原因であるインタラクションとして検索された人物間関係(O,V,S)の発生時刻までの、主体Sの人物の情報をDBに記録する処理である。主体Sの人物の情報として、主体Sの人物に関係する部分グラフと、リアクションである人物間関係(S,V,O)発生時における主体Sの人物のモーション情報が記録される。
<Step 3: Reaction partial graph & motion recording processing>
The reaction subgraph & motion recording process is based on the occurrence time of the interpersonal relationship (S, V, O) determined as a reaction, and the interpersonal relationship (O, V, S) searched as the interaction that is the cause of the reaction. This is a process of recording information on the person of the subject S in the DB up to the time of occurrence. As information about the person of the subject S, a subgraph related to the person of the subject S and motion information of the person of the subject S at the time of occurrence of the interpersonal relationship (S, V, O), which is a reaction, are recorded.
 図10のフローチャートを参照して、リアクション部分グラフ&モーション記録処理について説明する。 The reaction partial graph & motion recording process will be explained with reference to the flowchart in FIG.
・Step3-1.リアクション部分グラフの抽出
 リアクションとして判定された人物間関係(S,V,O)の発生時刻から、リアクションの原因であるインタラクションとして検索された人物間関係(O,V,S)の発生時刻までの、主体Sの人物と関係がある部分グラフがリアクション部分グラフとして抽出される。例えば、各時刻の主体Sの人物のノードを起点として、ノード間距離D以下にあるノードまでの部分グラフが時空間シーングラフから抽出される。
・Step3-1. Extraction of reaction subgraph From the occurrence time of the interpersonal relationship (S, V, O) determined as a reaction to the occurrence time of the interpersonal relationship (O, V, S) searched as the interaction that is the cause of the reaction. , a subgraph related to the person of subject S is extracted as a reaction subgraph. For example, a subgraph starting from the node of the person of the subject S at each time and extending to nodes that are less than or equal to the inter-node distance D is extracted from the spatio-temporal scene graph.
 ノード間距離Dは、主体Sの人物との関係性の程度を示す閾値となる。ノード間距離Dの値が大きいほど、主体Sの人物との関係性が低い物体や人物の情報が記録されることになるため、ノード間距離Dの値として1,2程度の値が設定される。ノード間距離Dの値が例えば値1である場合、主体Sの人物のノードと1つのエッジを介して接続されるノードまでの範囲がリアクション部分グラフとして抽出される。 The inter-node distance D is a threshold value indicating the degree of relationship between the subject S and the person. The larger the value of the inter-node distance D, the more information about objects and people that have a low relationship with the person of the subject S will be recorded. Therefore, a value of about 1 or 2 is set as the value of the inter-node distance D. Ru. When the value of the inter-node distance D is, for example, 1, the range from the person node of the subject S to the node connected via one edge is extracted as a reaction subgraph.
 図11は、リアクション部分グラフの抽出の例を示す図である。 FIG. 11 is a diagram showing an example of extraction of a reaction subgraph.
 破線#1で囲んで示す時刻t+Δtにおける人物間関係(人物B,wave,人物A)がリアクションであると判定され、破線#2で囲んで示す時刻tにおける人物間関係(人物A,wave,人物B)がインタラクションであると判定されている場合、例えば、図11に示す時空間シーングラフ全体がリアクション部分グラフとして抽出される。リアクション部分グラフは、少なくとも、インタラクションであると判定された人物間関係を表す部分グラフと、リアクションであると判定された人物間関係を表す部分グラフとを含むグラフ情報である。 It is determined that the interpersonal relationship (person B, wave, person A) at time t+Δt surrounded by broken line #1 is a reaction, and the interpersonal relationship (person A, wave, person A) at time t shown surrounded by broken line #2 is a reaction. If B) is determined to be an interaction, for example, the entire spatiotemporal scene graph shown in FIG. 11 is extracted as a reaction subgraph. The reaction subgraph is graph information that includes at least a subgraph representing a relationship between people determined to be an interaction, and a subgraph representing a relationship between people determined to be a reaction.
・Step3-2.リアクション部分グラフとモーション情報の記録
 リアクションとして判定された人物間関係(S,V,O)の発生時刻における主体Sの人物のモーション情報が、リアクション部分グラフと対応付けてDBに記録される。モーション情報とリアクション部分グラフは、例えば共通のIDを用いて対応付けられる。
・Step3-2. Recording of reaction subgraph and motion information Motion information of the person of subject S at the time of occurrence of the interpersonal relationship (S, V, O) determined as a reaction is recorded in the DB in association with the reaction subgraph. Motion information and reaction subgraphs are associated using, for example, a common ID.
 例えば、主体Sの人物のモーション情報はリアクションモーションDB51(図3)に記録され、リアクション部分グラフは、主体Sの人物用のDBである個人特化インタラクション-リアクションDB53に記録される。個人特化インタラクション-リアクションDB53は、主体Sの人物自身がユーザとなってARコンテンツを利用するときに仮想キャラクタの動作の選択に用いられる様々なリアクション部分グラフを記録するDBとなる。 For example, the motion information of the person of the subject S is recorded in the reaction motion DB 51 (FIG. 3), and the reaction partial graph is recorded in the individualized interaction-reaction DB 53, which is a DB for the person of the subject S. The personalized interaction-reaction DB 53 is a DB that records various reaction subgraphs used to select the actions of the virtual character when the person S himself or herself becomes a user and uses AR content.
 モーション情報として、例えば、計測デバイス11による計測データに基づいて得られた骨格推定結果(モーションキャプチャデータ)の時系列データが記録される。 As the motion information, for example, time-series data of skeletal estimation results (motion capture data) obtained based on measurement data by the measurement device 11 is recorded.
・Step3-3.リアクション部分グラフのメタ化
 リアクション部分グラフの各ノードやノードの集合を、より高次な意味情報を有するノードに置き換えた部分グラフが生成される。
・Step3-3. Meta-ization of a reaction subgraph A subgraph is generated in which each node or a set of nodes in a reaction subgraph is replaced with a node having higher-order semantic information.
 図12は、リアクション部分グラフのメタ化の例を示す図である。 FIG. 12 is a diagram showing an example of meta-ization of a reaction subgraph.
 図12の左側はメタ化前のリアクション部分グラフを示し、右側はメタ化後のリアクション部分グラフを示す。 The left side of FIG. 12 shows the reaction subgraph before metaization, and the right side shows the reaction subgraph after metaization.
 例えば、特定の人物としての人物Aを表すノードは、人物を限定しない「Person」を表すノードに置き換えられる。また、人物属性としての年齢が19歳であることを表すノードは、より抽象度の高い情報を表す「10代」であることを表すノードに置き換えられる。 For example, a node representing person A as a specific person is replaced with a node representing "Person" which does not limit the person. Further, the node representing that the person's age as a person attribute is 19 years old is replaced with a node representing “teens” representing more abstract information.
・Step3-4.メタ化リアクション部分グラフとモーション情報の記録
 リアクション部分グラフをメタ化することによって生成されたメタ化リアクション部分グラフが、例えば共通のIDを用いて、モーション情報と対応付けてDBに記録される。メタ化リアクション部分グラフはメタ化インタラクション-リアクションDB52に記録される。
・Step 3-4. Recording of Meta-ized Reaction Subgraph and Motion Information The meta-ized reaction subgraph generated by meta-izing the reaction subgraph is recorded in the DB in association with motion information using, for example, a common ID. The meta-ized reaction subgraph is recorded in the meta-ized interaction-reaction DB 52.
 メタ化インタラクション-リアクションDB52は、例えば、主体Sの人物とは異なる人物がユーザとなってARコンテンツを利用するときに仮想キャラクタの動作の選択に用いられる様々なリアクション部分グラフを記録するDBとなる。 The meta-interaction-reaction DB 52 is, for example, a DB that records various reaction subgraphs used to select the actions of a virtual character when a person different from the person S becomes a user and uses AR content. .
 これにより、特定の人物以外の人物がARコンテンツのユーザである場合においても、高次な意味が同じであれば、特定の人物がARコンテンツのユーザである場合の処理と同様の処理を適用可能となる。メタ化インタラクション-リアクションDB52に記録されているリアクション部分グラフが、主体Sの人物自身がユーザとなってARコンテンツを利用する場合に用いられるようにしてもよい。 As a result, even if a person other than the specific person is the user of the AR content, the same processing as when the specific person is the user of the AR content can be applied, as long as the high-level meaning is the same. becomes. The reaction subgraph recorded in the meta-ized interaction-reaction DB 52 may be used when the person S himself becomes a user and uses the AR content.
 以上のように、Step3におけるリアクション部分グラフ&モーション記録処理は、実空間の各時刻のコンテキストを抽象的かつ意味的に表現したシーングラフの情報を追加する処理となる。計測時間の長さや分解能に関係なく、計測データの意味に変化が生じた場合にのみ、シーングラフの情報がStep3において記録されることから、全ての情報を記録する場合に較べて、データに対するアクセス効率を高めることが可能となる。 As described above, the reaction subgraph & motion recording process in Step 3 is a process of adding scene graph information that abstractly and semantically expresses the context of each time in real space. Regardless of the length of measurement time or resolution, scene graph information is recorded in Step 3 only when there is a change in the meaning of the measurement data, making it easier to access the data than when all information is recorded. It becomes possible to increase efficiency.
<<リアクション提示フェーズ>>
 図13は、リアクション提示フェーズの処理の例を示す図である。
<<Reaction presentation phase>>
FIG. 13 is a diagram illustrating an example of processing in the reaction presentation phase.
 リアクション提示フェーズの処理には、認識処理(Step11)、カレントシーングラフとDB内の部分グラフとの照合処理(Step12)、リアクション提示処理(Step13)の各処理が含まれる。 The processing of the reaction presentation phase includes recognition processing (Step 11), matching processing of the current scene graph with a subgraph in the DB (Step 12), and reaction presentation processing (Step 13).
<Step11:認識処理>
 リアクション提示フェーズにおける認識処理は、基本的に、リアクション学習フェーズにおける認識処理(図3のStep1)の処理と同様の処理である。リアクション提示フェーズにおける認識処理は、ARアプリケーションの体験者であるユーザ(人物A)と仮想キャラクタCが計測対象となる点で、リアクション学習フェーズにおける認識処理と異なる。
<Step 11: Recognition processing>
The recognition process in the reaction presentation phase is basically the same process as the recognition process (Step 1 in FIG. 3) in the reaction learning phase. The recognition process in the reaction presentation phase differs from the recognition process in the reaction learning phase in that the user (person A) who is experiencing the AR application and the virtual character C are the measurement targets.
 すなわち、リアクション提示フェーズにおける認識処理は、ユーザ-仮想キャラクタC間のインタラクション、ユーザの属性情報、空間情報を計測デバイス21から供給された計測データに基づいて認識する処理となる。重複する説明については適宜省略する。 That is, the recognition process in the reaction presentation phase is a process of recognizing the interaction between the user and the virtual character C, user attribute information, and spatial information based on the measurement data supplied from the measurement device 21. Duplicate explanations will be omitted as appropriate.
・Step11-1:空間認識処理
 空間認識処理においては、計測デバイス21から供給された計測データに基づいて空間情報が生成される。空間情報として、各物体の幾何情報、各物体の意味情報、物体間の関係を示す情報が生成される。
- Step 11-1: Spatial recognition processing In the spatial recognition processing, spatial information is generated based on the measurement data supplied from the measurement device 21. As spatial information, geometric information of each object, semantic information of each object, and information indicating relationships between objects are generated.
・Step11-2:人物属性認識処理
 人物属性認識処理においては、計測デバイス21から供給された計測データに基づいて、ユーザの人物属性情報が生成される。人物属性情報として、ユーザの外観から得られる年齢、性別、人種などの情報、個人識別が可能な情報が生成される。
- Step 11-2: Person Attribute Recognition Process In the person attribute recognition process, the user's person attribute information is generated based on the measurement data supplied from the measurement device 21. As personal attribute information, information such as age, gender, race, etc. obtained from the user's appearance, and information that allows personal identification are generated.
・Step11-3:インタラクション認識処理
 インタラクション認識処理においては、計測デバイス21から供給された計測データに基づいて、物体-ユーザ間、物体-仮想キャラクタC間の関係情報と、ユーザ-仮想キャラクタC間の関係情報とが生成される。
・Step 11-3: Interaction recognition processing In the interaction recognition processing, relationship information between the object and the user, between the object and the virtual character C, and between the user and the virtual character C is determined based on the measurement data supplied from the measurement device 21. Related information is generated.
 空間認識処理によって生成された空間情報、人物属性認識処理によって生成された人物属性情報、および、インタラクション認識処理によって生成された関係情報は、グラフ構造を有するデータであるグラフとして取得される。 The spatial information generated by the spatial recognition process, the person attribute information generated by the person attribute recognition process, and the relational information generated by the interaction recognition process are acquired as a graph, which is data having a graph structure.
 仮想キャラクタCの属性、位置、動作(インタラクションとリアクション)のそれぞれの情報は、ARコンテンツを再生して仮想キャラクタCの表示を行っている情報処理装置22自身が取得可能な情報である。仮想キャラクタCに関する認識結果の情報は、適宜、仮想キャラクタCの属性、位置、動作などに基づいて生成される。 The information on the attributes, position, and movements (interactions and reactions) of the virtual character C is information that can be acquired by the information processing device 22 itself that is displaying the virtual character C by playing back the AR content. Information on the recognition results regarding the virtual character C is generated based on the virtual character C's attributes, position, motion, etc. as appropriate.
<Step12:カレントシーングラフとDB内のリアクション部分グラフとの照合>
 認識処理によって取得された空間情報、人物属性情報、および関係情報に基づいて、仮想キャラクタの提示を受けているユーザの周辺のコンテキストを表す時空間シーングラフであるカレントシーングラフが生成される。
<Step 12: Matching the current scene graph with the reaction subgraph in the DB>
A current scene graph, which is a spatiotemporal scene graph representing the context around the user who is presented with the virtual character, is generated based on the spatial information, person attribute information, and relationship information acquired through the recognition process.
 カレントシーングラフは、ユーザと仮想キャラクタを表すノード間を、所定の時刻におけるそれぞれの関係を表すエッジで接続したグラフと、所定時間経過後の現在時刻におけるそれぞれの関係を表すエッジで接続したグラフとを含む時空間シーングラフである。それぞれの時刻におけるユーザのノード間と仮想キャラクタのノード間は経時関係エッジで接続される。本開示において、カレントシーングラフに関する所定の時刻を過去時刻という場合がある。カレントシーングラフは、過去時刻に対応するグラフである過去グラフ、現在時刻に対応するグラフである現在グラフ、過去グラフの過去ノードと現在グラフの現在ノードを接続する経時関係エッジを含むと見なされても良い。 The current scene graph consists of a graph in which nodes representing the user and virtual characters are connected by edges representing their relationships at a predetermined time, and a graph in which nodes representing the users and virtual characters are connected by edges representing their relationships at the current time after a predetermined time has elapsed. is a spatiotemporal scene graph containing User nodes and virtual character nodes at each time are connected by time-related edges. In this disclosure, a predetermined time regarding the current scene graph may be referred to as past time. The current scene graph is considered to include a past graph that is a graph corresponding to the past time, a current graph that is a graph that corresponds to the current time, and a temporal relationship edge that connects the past node of the past graph and the current node of the current graph. Also good.
 認識処理によって取得された情報に基づいて生成されたカレントシーングラフが、DB内のそれぞれのリアクション部分グラフと照合される。より具体的には、カレントシーングラフとリアクション部分グラフの照合は、ユーザに対応するリアクション部分グラフにおける人物がユーザ自身か否かの判定を含む。なお、本開示においてDB内の既知のグラフを既知シーングラフという場合がある。 The current scene graph generated based on the information obtained through recognition processing is matched with each reaction subgraph in the DB. More specifically, matching the current scene graph and the reaction subgraph includes determining whether the person in the reaction subgraph corresponding to the user is the user himself or not. Note that in this disclosure, a known graph in a DB may be referred to as a known scene graph.
 ARアプリケーションの体験者であるユーザを計測対象としたリアクション学習フェーズが行われ、ユーザ用のリアクション部分グラフが個人特化インタラクション-リアクションDB53に記録されている場合、個人特化インタラクション-リアクションDB53に記録されているユーザ用のリアクション部分グラフを対象として、カレントシーングラフとの照合が行われる。 If a reaction learning phase is performed that targets users who have experienced the AR application, and the reaction subgraph for the user is recorded in the personalized interaction-reaction DB 53, it is recorded in the personalized interaction-reaction DB 53. The current scene graph is compared with the reaction subgraph for the user.
 また、ユーザ用のリアクション部分グラフが個人特化インタラクション-リアクションDB53に記録されていない場合、メタ化インタラクション-リアクションDB52に記録されているメタ化されたリアクション部分グラフを対象として、カレントシーングラフとの照合が行われる。 In addition, if the reaction subgraph for the user is not recorded in the personalized interaction-reaction DB 53, the meta-ized reaction subgraph recorded in the meta-ized interaction-reaction DB 52 will be linked to the current scene graph. A match is made.
 カレントシーングラフとそれぞれのリアクション部分グラフとの照合は、両者の距離を算出し、構成要素の共通の程度を評価することによって行われる。例えば、共通する構成要素を含み、カレントシーングラフとのグラフ間の距離が最も近いリアクション部分グラフが、リアクションとなる動作の選択に用いられる。 The current scene graph and each reaction subgraph are matched by calculating the distance between them and evaluating the extent to which their constituent elements are common. For example, a reaction subgraph that includes common components and has the closest distance to the current scene graph is used to select an action that becomes a reaction.
 カレントシーングラフとのグラフ間の距離が最も近いリアクション部分グラフではなく、閾値Ethとなる距離より近いリアクション部分グラフが、リアクションとなる動作の選択に用いられるようにしてもよい。 Rather than the reaction subgraph whose distance from the current scene graph is the closest, a reaction subgraph closer than the distance equal to the threshold value Eth may be used to select an action that becomes a reaction.
 カレントシーングラフと共通する構成要素を含むリアクション部分グラフに基づいて、人物間関係(S,V,O)によって表される過去の時刻におけるインタラクションに対するリアクションとして記録された、客体Oが主体となる人物間関係(O,V,S)が抽出される。 The person whose object O is the subject is recorded as a reaction to the interaction at the past time expressed by the interpersonal relationship (S, V, O) based on the reaction subgraph that includes components common to the current scene graph. The relationship (O, V, S) is extracted.
 図14は、ユーザのインタラクションの例を示す図である。 FIG. 14 is a diagram showing an example of user interaction.
 図14に示すように、時刻tにおいて、ユーザである人物Aが仮想キャラクタCに対して手を振ったことが認識されているものとする。現在時刻t+Δtにおいて、仮想キャラクタCのリアクションとしてどのような動作を実行させるかが、カレントシーングラフとリアクション部分グラフとの照合結果に基づいて選択される。 As shown in FIG. 14, it is assumed that it is recognized that person A, who is a user, waved at virtual character C at time t. At the current time t+Δt, what kind of action is to be performed as a reaction of the virtual character C is selected based on the comparison result between the current scene graph and the reaction partial graph.
 図15は、カレントシーングラフとリアクション部分グラフとの照合の例を示す図である。 FIG. 15 is a diagram showing an example of matching the current scene graph and the reaction subgraph.
 図15の左側に破線#11で囲んで示すように、カレントシーングラフには、時刻tの人物間関係(人物A,wave,仮想キャラクタC)を表す部分グラフが含まれる。 As shown surrounded by a broken line #11 on the left side of FIG. 15, the current scene graph includes a subgraph representing the relationship between people (person A, wave, virtual character C) at time t.
 例えばメタ化インタラクション-リアクションDB52に記録されているそれぞれのリアクション部分グラフとの距離が算出され、図15の右側に示すような、ある人物が他の人物に対して手を振ったときに生成された部分グラフを含むリアクション部分グラフが選択される。破線#12で囲んで示すように、図15の右側に示すリアクション部分グラフには、人物間関係(person,wave,person)を表す部分グラフが含まれる。図15には、リアクションとなる動作を、メタ化インタラクション-リアクションDB52に記録されているリアクション部分グラフを用いて選択する場合の例が示されている。 For example, the distance from each reaction subgraph recorded in the meta-ized interaction-reaction DB 52 is calculated, and the graph generated when a certain person waves at another person as shown on the right side of FIG. The reaction subgraph containing the subgraph is selected. As shown surrounded by broken line #12, the reaction subgraph shown on the right side of FIG. 15 includes a subgraph representing interpersonal relationships (person, wave, person). FIG. 15 shows an example in which an action that becomes a reaction is selected using a reaction subgraph recorded in the meta-ized interaction-reaction DB 52.
 説明の便宜上、破線#12で囲んで示す、インタラクションとしての人物間関係(person,wave,person)における主体Sとなる人物(前者のperson)を人物aという。また、客体Oとなる人物(後者のperson)を人物bという。破線#12で囲んで示す部分グラフは、人物間関係(人物a,wave,人物b)を示す部分グラフとなる。人物aは、ユーザAに対応する人物となり、人物bは、仮想キャラクタに対応する人物となる。 For convenience of explanation, the person (the former person) who is the subject S in the interpersonal relationship (person, wave, person) as an interaction, shown surrounded by the broken line #12, is referred to as person a. Further, the person who becomes object O (the latter person) is called person b. The subgraph surrounded by broken line #12 is a subgraph indicating the relationship between persons (person a, wave, person b). Person a corresponds to user A, and person b corresponds to the virtual character.
 人物間関係(人物a,wave,人物b)に対する時刻t+Δtにおけるリアクションとして、客体Oの人物bが主体となる人物間関係(O,V,S)である人物間関係(人物b,wave,人物a)がリアクション部分グラフから抽出される。 As a reaction at time t+Δt to the interpersonal relationship (person a, wave, person b), the interpersonal relationship (person b, wave, person a) is extracted from the reaction subgraph.
 矢印#13に示すように、リアクション部分グラフから抽出された人物間関係(人物b,wave,人物a)が、カレントシーングラフを構成する、現在時刻t+Δtの仮想キャラクタCと人物A(ユーザ)との人物間関係として適用される。現在時刻t+Δtにおける仮想キャラクタCのノードと人物Aのノードとを接続するエッジE51は、人物間関係(仮想キャラクタC,wave,人物A)を表す。 As shown by arrow #13, the relationship between people (person b, wave, person a) extracted from the reaction subgraph is the relationship between virtual character C and person A (user) at current time t+Δt, which constitutes the current scene graph. It is applied as a relationship between people. An edge E51 connecting the node of the virtual character C and the node of the person A at the current time t+Δt represents the relationship between the people (virtual character C, wave, person A).
 人物間関係(仮想キャラクタC,wave,人物A)により表される、仮想キャラクタCが人物Aに対して手を振る動作が、仮想キャラクタCのリアクションの動作として選択される。 The action of virtual character C waving to person A, which is represented by the relationship between the people (virtual character C, wave, person A), is selected as the reaction action of virtual character C.
 なお、リアクションとなる動作を個人化インタラクション-リアクションDB53に記録されているリアクション部分グラフを用いて選択する場合においても同様の処理が行われる。 Note that similar processing is performed when an action that becomes a reaction is selected using a reaction subgraph recorded in the personalized interaction-reaction DB 53.
 この場合、仮想キャラクタCに対応する人物が人物Aに対してある動作を行ったことを表す構成要素を時刻t+Δtのグラフに含み、人物Aが仮想キャラクタに対応する人物に対してインタラクションとなる動作を行ったことを表す構成要素を時刻tのグラフに含むリアクション部分グラフが、カレントシーングラフとの評価結果として個人化インタラクション-リアクションDB53から取得される。仮想キャラクタCに対応する人物は、リアクション学習フェーズの処理において、人物Aとともに実空間において計測対象となった人物である。 In this case, the graph at time t+Δt includes a component representing that the person corresponding to the virtual character C performed a certain action toward the person A, and the action is an interaction between the person A and the person corresponding to the virtual character. A reaction partial graph including a component representing that the process has been performed in the graph at time t is acquired from the personalized interaction-reaction DB 53 as an evaluation result with the current scene graph. The person corresponding to the virtual character C is a person who, together with the person A, is a measurement target in the real space in the process of the reaction learning phase.
 また、評価結果として取得されたリアクション部分グラフに基づいて、例えば、仮想キャラクタCに対応する人物が人物Aに対して行った動作と同じ動作が、仮想キャラクタCのリアクションの動作として選択される。 Furthermore, based on the reaction subgraph obtained as the evaluation result, for example, the same action as the action performed by the person corresponding to the virtual character C toward the person A is selected as the reaction action of the virtual character C.
 このように、最適なリアクション部分グラフがグラフ間の距離に基づいて選択されるため、カレントシーングラフとリアクション部分グラフの構成要素が完全に一致しない場合でも、適切なリアクションの提示が可能となる。 In this way, since the optimal reaction subgraph is selected based on the distance between the graphs, it is possible to present an appropriate reaction even if the components of the current scene graph and the reaction subgraph do not completely match.
<Step13:リアクションの提示>
 リアクションの提示処理は、マッチしたリアクション部分グラフに対応するリアクションのモーション情報をリアクションモーションDB51から読み出し、仮想キャラクタCの動作として提示する処理である。
<Step 13: Presenting reactions>
The reaction presentation process is a process of reading reaction motion information corresponding to the matched reaction subgraph from the reaction motion DB 51 and presenting it as a motion of the virtual character C.
 リアクションモーションDB51から読み出されたモーション情報が示す動作と同じ動作を仮想キャラクタCに実行させることによって、図16に示すように、ユーザが仮想キャラクタCに対して行ったインタラクションに対するリアクションがユーザに提示される。図16の例においては、ユーザに向かって手を振る動作を行う仮想キャラクタCの様子が提示されている。 By causing the virtual character C to perform the same action as the action indicated by the motion information read from the reaction motion DB 51, the user is presented with a reaction to the interaction the user has performed with the virtual character C, as shown in FIG. be done. In the example of FIG. 16, a virtual character C is shown waving toward the user.
 モーションの生成については、例えば文献3に記載されている。
 文献4「[Starke+, SIGGARAPHasia2019] S. Starke, H. Zhang, T. Komura and J. Saito, “Neural State Machine for Character-Scene Interactions”, SIGGRAPH Asia, 2019.」
Motion generation is described in Document 3, for example.
Reference 4 “[Starke+, SIGGARAPhasia2019] S. Starke, H. Zhang, T. Komura and J. Saito, “Neural State Machine for Character-Scene Interactions”, SIGGRAPH Asia, 2019.”
 これにより、リアクション部分グラフを構成する情報に基づいて、自然な動作をリアクションとして仮想キャラクタCに実行させることが可能となる。 This makes it possible to cause the virtual character C to perform a natural action as a reaction based on the information forming the reaction subgraph.
<<各装置の構成>>
<情報処理装置12の構成>
 図17は、情報処理装置12の機能構成例を示すブロック図である。図17に示す機能部のうちの少なくとも一部は、情報処理装置12を構成するコンピュータのCPUによって所定のプログラムが実行されることによって実現される。
<<Configuration of each device>>
<Configuration of information processing device 12>
FIG. 17 is a block diagram showing an example of the functional configuration of the information processing device 12. As shown in FIG. At least some of the functional units shown in FIG. 17 are realized by executing a predetermined program by the CPU of the computer that constitutes the information processing device 12.
 図17に示すように、情報処理装置12においてはリアクション学習処理部101が実現される。リアクション学習処理部101は、認識部111、リアクション検出部112、および記録制御部113により構成される。 As shown in FIG. 17, a reaction learning processing section 101 is implemented in the information processing device 12. The reaction learning processing section 101 includes a recognition section 111, a reaction detection section 112, and a recording control section 113.
 認識部111は、空間認識部121、人物属性認識部122、およびインタラクション認識部123により構成される。 The recognition unit 111 includes a space recognition unit 121, a person attribute recognition unit 122, and an interaction recognition unit 123.
 空間認識部121は、計測デバイス11から供給された計測データに基づいて空間認識処理(図3のStep1-1)を行い、空間情報を生成する。 The spatial recognition unit 121 performs spatial recognition processing (Step 1-1 in FIG. 3) based on the measurement data supplied from the measurement device 11, and generates spatial information.
 人物属性認識部122は、計測デバイス11から供給された計測データに基づいて人物属性認識処理(図3のStep1-2)を行い、人物属性情報を生成する。 The person attribute recognition unit 122 performs person attribute recognition processing (Step 1-2 in FIG. 3) based on the measurement data supplied from the measurement device 11, and generates person attribute information.
 インタラクション認識部123は、計測デバイス11から供給された計測データに基づいてインタラクション認識処理(図3のStep1-3)を行い、関係情報を生成する。 The interaction recognition unit 123 performs interaction recognition processing (Step 1-3 in FIG. 3) based on the measurement data supplied from the measurement device 11, and generates related information.
 空間認識部121により生成された空間情報、人物属性認識部122により生成された人物属性情報、インタラクション認識部123により生成された関係情報は、リアクション検出部112に供給される。 The spatial information generated by the spatial recognition unit 121, the person attribute information generated by the person attribute recognition unit 122, and the relationship information generated by the interaction recognition unit 123 are supplied to the reaction detection unit 112.
 リアクション検出部112は、認識部111の各部から供給された情報に基づいてリアクション検出処理(図3のStep2、図7)を行う。 The reaction detection unit 112 performs reaction detection processing (Step 2 in FIG. 3, FIG. 7) based on the information supplied from each part of the recognition unit 111.
 リアクション検出部112には時空間シーングラフ生成部112Aが含まれる。時空間シーングラフ生成部112Aは、認識部111から供給された情報に基づいて時空間シーングラフを生成する。 The reaction detection unit 112 includes a spatiotemporal scene graph generation unit 112A. The spatio-temporal scene graph generation unit 112A generates a spatio-temporal scene graph based on the information supplied from the recognition unit 111.
 リアクション検出部112は、リアクションであると判定した人間間関係を表すグラフを含む時空間シーングラフの情報を記録制御部113に出力する。 The reaction detection unit 112 outputs information on a spatio-temporal scene graph including a graph representing the interpersonal relationship determined to be a reaction to the recording control unit 113.
 記録制御部113は、リアクション検出部112から供給された情報に基づいてリアクション部分グラフ&モーション記録処理(図3のStep3、図10)を行い、リアクションであると判定された動作のモーション情報をリアクションモーションDB51に記録させる。 The recording control unit 113 performs reaction partial graph & motion recording processing (Step 3 in FIG. 3, FIG. 10) based on the information supplied from the reaction detection unit 112, and converts the motion information of the motion determined to be a reaction into a reaction. Record it in motion DB51.
 また、記録制御部113は、リアクション検出部112から供給された時空間シーングラフから抽出したリアクション部分グラフを個人特化インタラクション-リアクションDB53に記録させる。 Furthermore, the recording control unit 113 causes the reaction subgraph extracted from the spatiotemporal scene graph supplied from the reaction detection unit 112 to be recorded in the individualized interaction-reaction DB 53.
 記録制御部113は、リアクション検出部112から供給された時空間シーングラフから抽出したリアクション部分グラフのメタ化を行い、メタ化インタラクション-リアクションDB52に記録させる。 The recording control unit 113 meta-izes the reaction subgraph extracted from the spatio-temporal scene graph supplied from the reaction detection unit 112, and records it in the meta-ized interaction-reaction DB 52.
 リアクションモーションDB51、メタ化インタラクション-リアクションDB52、個人特化インタラクション-リアクションDB53は、情報処理装置12を構成するコンピュータのHDDなどの記憶部に構築される。リアクションモーションDB51、メタ化インタラクション-リアクションDB52、個人特化インタラクション-リアクションDB53の情報は情報処理装置22に提供される。 The reaction motion DB 51, meta-ized interaction-reaction DB 52, and personalized interaction-reaction DB 53 are constructed in a storage unit such as an HDD of a computer that constitutes the information processing device 12. Information in the reaction motion DB 51, meta-ized interaction-reaction DB 52, and personalized interaction-reaction DB 53 is provided to the information processing device 22.
<情報処理装置22の構成>
 図18は、情報処理装置22の機能構成例を示すブロック図である。図18に示す機能部のうちの少なくとも一部は、情報処理装置22を構成するコンピュータのCPUによって所定のプログラムが実行されることによって実現される。
<Configuration of information processing device 22>
FIG. 18 is a block diagram showing an example of the functional configuration of the information processing device 22. As shown in FIG. At least some of the functional units shown in FIG. 18 are realized by executing a predetermined program by the CPU of the computer that constitutes the information processing device 22.
 図18に示すように、情報処理装置22においてはリアクション提示処理部151が実現される。リアクション提示処理部151は、認識部161、照合部162、および提示部163により構成される。 As shown in FIG. 18, a reaction presentation processing section 151 is implemented in the information processing device 22. The reaction presentation processing section 151 includes a recognition section 161, a matching section 162, and a presentation section 163.
 認識部161は、空間認識部171、人物属性認識部172、およびインタラクション認識部173により構成される。 The recognition unit 161 includes a space recognition unit 171, a person attribute recognition unit 172, and an interaction recognition unit 173.
 空間認識部171は、計測デバイス21から供給された計測データに基づいて空間認識処理(図13のStep11-1)を行い、空間情報を生成する。 The spatial recognition unit 171 performs spatial recognition processing (Step 11-1 in FIG. 13) based on the measurement data supplied from the measurement device 21, and generates spatial information.
 人物属性認識部172は、計測デバイス21から供給された計測データに基づいて人物属性認識処理(図13のStep11-2)を行い、人物属性情報を生成する。 The person attribute recognition unit 172 performs person attribute recognition processing (Step 11-2 in FIG. 13) based on the measurement data supplied from the measurement device 21, and generates person attribute information.
 インタラクション認識部173は、計測デバイス21から供給された計測データに基づいてインタラクション認識処理(図13のStep11-3)を行い、関係情報を生成する。 The interaction recognition unit 173 performs interaction recognition processing (Step 11-3 in FIG. 13) based on the measurement data supplied from the measurement device 21, and generates related information.
 空間認識部171により生成された空間情報、人物属性認識部172により生成された人物属性情報、インタラクション認識部173により生成された関係情報は、照合部162に供給される。 The spatial information generated by the spatial recognition unit 171, the person attribute information generated by the person attribute recognition unit 172, and the relationship information generated by the interaction recognition unit 173 are supplied to the matching unit 162.
 照合部162は、認識部161の各部から供給された情報に基づいてカレントシーングラフとDB内のリアクション部分グラフとの照合処理(図13のStep12、図7)を行う。 The matching unit 162 performs matching processing (Step 12 in FIG. 13, FIG. 7) between the current scene graph and the reaction partial graph in the DB based on the information supplied from each part of the recognition unit 161.
 照合部162には時空間シーングラフ生成部162Aが含まれる。時空間シーングラフ生成部162Aは、認識部161から供給された情報に基づいて、ユーザのノードと仮想キャラクタのノードを構成要素として含む時空間シーングラフであるカレントシーングラフを生成する。 The collation unit 162 includes a spatio-temporal scene graph generation unit 162A. Based on the information supplied from the recognition unit 161, the spatio-temporal scene graph generation unit 162A generates a current scene graph, which is a spatio-temporal scene graph including a user node and a virtual character node as constituent elements.
 照合部162は、カレントシーングラフと、メタ化インタラクション-リアクションDB52または個人特化インタラクション-リアクションDB53に記録されているそれぞれのリアクション部分グラフとの照合を行う。 The matching unit 162 matches the current scene graph with each reaction partial graph recorded in the meta-ized interaction-reaction DB 52 or the individualized interaction-reaction DB 53.
 照合部162は、カレントシーングラフとマッチするリアクション部分グラフに基づいて、仮想キャラクタのリアクションとなる動作を選択する。照合部162は、仮想キャラクタのリアクションとなる動作を、時空間シーングラフであるカレントシーングラフとリアクション部分グラフとに基づいて選択する選択部として機能する。照合部162により選択されたリアクションとなる動作の情報は提示部163に供給される。 The matching unit 162 selects an action that is a reaction of the virtual character based on a reaction subgraph that matches the current scene graph. The matching unit 162 functions as a selection unit that selects an action that is a reaction of the virtual character based on a current scene graph that is a spatiotemporal scene graph and a reaction subgraph. Information on the action selected by the matching unit 162 as the reaction is supplied to the presenting unit 163 .
 提示部163は、ARコンテンツを再生し、レンダリングなどを行うことによって、仮想キャラクタCの表示用のデータを生成する。提示部163は、表示用のデータをAR表示デバイス1に送信し、仮想キャラクタCをAR表示デバイス1に表示させる。また、提示部163は、照合部162により選択されたリアクションのモーション情報をリアクションモーションDB51から読み出し、仮想キャラクタCの動作として提示させる。 The presentation unit 163 generates data for displaying the virtual character C by playing the AR content and performing rendering. The presentation unit 163 transmits display data to the AR display device 1 and causes the AR display device 1 to display the virtual character C. Further, the presenting unit 163 reads out the motion information of the reaction selected by the matching unit 162 from the reaction motion DB 51 and causes it to be presented as the motion of the virtual character C.
<<変形例>>
<情報処理システムの構成例>
 図19は、情報処理システムの構成例を示すブロック図である。
<<Modified example>>
<Example of configuration of information processing system>
FIG. 19 is a block diagram showing a configuration example of an information processing system.
 以上においては、リアクション学習処理部101とリアクション提示処理部151がそれぞれ異なる装置において実現されるものとしたが、図19のAに示すように1つの装置である情報処理装置201において実現されるようにしてもよい。 In the above, it has been assumed that the reaction learning processing section 101 and the reaction presentation processing section 151 are realized in different devices, but as shown in A of FIG. 19, they may be realized in the information processing device 201 which is one device. You can also do this.
 情報処理装置201は、リアクション学習フェーズの処理をリアクション学習処理部101において行い、リアクション提示フェーズの処理をリアクション提示処理部151において行う。情報処理装置201のリアクション提示処理部151は、リアクション提示フェーズの処理を行い、仮想キャラクタCをAR表示デバイス1に表示させる。 In the information processing device 201, the reaction learning phase is processed in the reaction learning processing section 101, and the reaction presentation phase is processed in the reaction presentation processing section 151. The reaction presentation processing unit 151 of the information processing device 201 performs processing of the reaction presentation phase and displays the virtual character C on the AR display device 1.
 また、図19のBに示すように、リアクション学習処理部101とリアクション提示処理部151がAR表示デバイス1において実現されるようにしてもよい。 Furthermore, as shown in FIG. 19B, the reaction learning processing section 101 and the reaction presentation processing section 151 may be implemented in the AR display device 1.
 AR表示デバイス1のリアクション提示処理部151は、リアクション提示フェーズの処理を行い、仮想キャラクタCを表示部211に表示させる。表示部211は、仮想キャラクタCを表示するディスプレイなどにより構成される。 The reaction presentation processing unit 151 of the AR display device 1 performs processing of the reaction presentation phase and displays the virtual character C on the display unit 211. The display unit 211 includes a display that displays the virtual character C, and the like.
 光学シースルー型のHMDに代えて、ビデオ透過型のHMDが仮想キャラクタの表示デバイスとして用いられるようにしてもよい。また、スマートフォンやタブレット端末などの携帯端末が仮想キャラクタの表示デバイスとして用いられるようにしてもよい。 Instead of the optical see-through type HMD, a video transmission type HMD may be used as a display device for the virtual character. Furthermore, a mobile terminal such as a smartphone or a tablet terminal may be used as a display device for the virtual character.
<その他>
 手を振る動作をリアクションとして仮想キャラクタに実行させる場合について主に説明したが、手を振る以外の各種の動作を仮想キャラクタに実行させることも可能である。仮想キャラクタが実行する動作には、歩く、走るなどの仮想キャラクタが1人で行う動作だけでなく、椅子に座る、床に座るなどの、実空間にある物体を用いた動作が含まれる。また、ユーザに話しかけるなどの、ユーザに向かって行う動作が含まれる。
<Others>
Although the case where the virtual character is caused to perform a waving motion as a reaction has been mainly described, it is also possible to cause the virtual character to perform various motions other than waving. The actions performed by the virtual character include not only actions performed by the virtual character alone, such as walking and running, but also actions using objects in real space, such as sitting on a chair and sitting on the floor. It also includes actions performed toward the user, such as talking to the user.
 上述した一連の処理は、ハードウェアにより実行することもできるし、ソフトウェアにより実行することもできる。一連の処理をソフトウェアにより実行する場合には、そのソフトウェアを構成するプログラムが、専用のハードウェアに組み込まれているコンピュータ、または汎用のパーソナルコンピュータなどに、プログラム記録媒体からインストールされる。 The series of processes described above can be executed by hardware or software. When a series of processes is executed by software, a program constituting the software is installed from a program recording medium into a computer built into dedicated hardware or a general-purpose personal computer.
 図20は、上述した一連の処理をプログラムにより実行するコンピュータのハードウェアの構成例を示すブロック図である。情報処理装置12、情報処理装置22として機能するコンピュータは、図20に示す構成と同様の構成を有する。 FIG. 20 is a block diagram showing an example of the hardware configuration of a computer that executes the series of processes described above using a program. The computers functioning as the information processing device 12 and the information processing device 22 have a configuration similar to that shown in FIG. 20.
 CPU(Central Processing Unit)1001、ROM(Read Only Memory)1002、RAM(Random Access Memory)1003は、バス1004により相互に接続されている。 A CPU (Central Processing Unit) 1001, a ROM (Read Only Memory) 1002, and a RAM (Random Access Memory) 1003 are interconnected by a bus 1004.
 バス1004には、さらに、入出力インタフェース1005が接続されている。入出力インタフェース1005には、キーボード、マウスなどよりなる入力部1006、ディスプレイ、スピーカなどよりなる出力部1007が接続される。また、入出力インタフェース1005には、ハードディスクや不揮発性のメモリなどよりなる記憶部1008、ネットワークインタフェースなどよりなる通信部1009、リムーバブルメディア1011を駆動するドライブ1010が接続される。 An input/output interface 1005 is further connected to the bus 1004. Connected to the input/output interface 1005 are an input section 1006 consisting of a keyboard, a mouse, etc., and an output section 1007 consisting of a display, speakers, etc. Further, connected to the input/output interface 1005 are a storage unit 1008 consisting of a hard disk or non-volatile memory, a communication unit 1009 consisting of a network interface, etc., and a drive 1010 for driving a removable medium 1011.
 リアクションモーションDB51、メタ化インタラクション-リアクションDB52、個人特化インタラクション-リアクションDB53は、記憶部1008に構築される。 A reaction motion DB 51, a meta-ized interaction-reaction DB 52, and a personalized interaction-reaction DB 53 are constructed in the storage unit 1008.
 以上のように構成されるコンピュータでは、CPU1001が、例えば、記憶部1008に記憶されているプログラムを入出力インタフェース1005及びバス1004を介してRAM1003にロードして実行することにより、リアクション学習処理部101、リアクション提示処理部151が実現される。 In the computer configured as described above, the CPU 1001, for example, loads the program stored in the storage unit 1008 into the RAM 1003 via the input/output interface 1005 and the bus 1004 and executes it. , a reaction presentation processing unit 151 is realized.
 CPU1001が実行するプログラムは、例えばリムーバブルメディア1011に記録して、あるいは、ローカルエリアネットワーク、インターネット、デジタル放送といった、有線または無線の伝送媒体を介して提供され、記憶部1008にインストールされる。 A program executed by the CPU 1001 is installed in the storage unit 1008 by being recorded on a removable medium 1011 or provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital broadcasting.
 コンピュータが実行するプログラムは、本明細書で説明する順序に沿って時系列に処理が行われるプログラムであっても良いし、並列に、あるいは呼び出しが行われたとき等の必要なタイミングで処理が行われるプログラムであっても良い。 The program executed by the computer may be a program in which processing is performed chronologically in accordance with the order described in this specification, or may be a program in which processing is performed in parallel or at necessary timing such as when a call is made. It may also be a program that is carried out.
 本明細書において、システムとは、複数の構成要素(装置、モジュール(部品)等)の集合を意味し、すべての構成要素が同一筐体中にあるか否かは問わない。したがって、別個の筐体に収納され、ネットワークを介して接続されている複数の装置、及び、1つの筐体の中に複数のモジュールが収納されている1つの装置は、いずれも、システムである。 In this specification, a system means a collection of multiple components (devices, modules (components), etc.), regardless of whether all the components are in the same casing. Therefore, multiple devices housed in separate casings and connected via a network, and a single device with multiple modules housed in one casing are both systems. .
 本明細書に記載された効果はあくまで例示であって限定されるものでは無く、また他の効果があってもよい。 The effects described in this specification are merely examples and are not limiting, and other effects may also exist.
 本技術の実施の形態は、上述した実施の形態に限定されるものではなく、本技術の要旨を逸脱しない範囲において種々の変更が可能である。 The embodiments of the present technology are not limited to the embodiments described above, and various changes can be made without departing from the gist of the present technology.
 例えば、本技術は、1つの機能をネットワークを介して複数の装置で分担、共同して処理するクラウドコンピューティングの構成をとることができる。 For example, the present technology can take a cloud computing configuration in which one function is shared and jointly processed by multiple devices via a network.
 また、上述のフローチャートで説明した各ステップは、1つの装置で実行する他、複数の装置で分担して実行することができる。 Furthermore, each step described in the above flowchart can be executed by one device or can be shared and executed by multiple devices.
 さらに、1つのステップに複数の処理が含まれる場合には、その1つのステップに含まれる複数の処理は、1つの装置で実行する他、複数の装置で分担して実行することができる。 Further, when one step includes multiple processes, the multiple processes included in that one step can be executed by one device or can be shared and executed by multiple devices.
<構成の組み合わせ例>
 本技術は、以下のような構成をとることもできる。
<Example of configuration combinations>
The present technology can also have the following configuration.
(1)
 第1の人物と第2の人物を計測対象として含むセンサによる計測結果に基づいて、
 第1の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の先ノードが前記第1の人物と前記第2の人物間の第1の動作関係を表す第1の動作エッジで接続され、
 前記第1の時刻の後の第2の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の後ノードが前記第1の人物と前記第2の人物間の第2の動作関係を表す第2の動作エッジで接続され、
 前記第1の人物の前記先ノードと前記第1の人物の前記後ノードが、時間の経過を表す第1の経時関係エッジで接続され、
 前記第2の人物の前記先ノードと前記第2の人物の前記後ノードが、時間の経過を表す第2の経時関係エッジで接続されたシーングラフを生成する生成部を備える
 情報処理装置。
(2)
 前記第1の動作関係と前記第2の動作関係を前記計測結果に基づいて認識する認識部をさらに備え、
 前記生成部は、前記複数の先ノードと前記第1の動作エッジを含む第1のグラフに対して、前記複数の後ノードと前記第2の動作エッジを含む第2のグラフが変化した場合に前記シーングラフを生成する
 前記(1)に記載の情報処理装置。
(3)
 前記認識部は、前記第1の人物と前記第2の人物の少なくとも一方がいる実空間のコンテキストをさらに認識し、
 前記生成部は、前記実空間のコンテキストに基づいて、前記実空間にある物体を表す物体ノードととともに、前記物体と前記第1の人物と前記第2の人物の少なくとも一方との関係を表す物体エッジを含む前記シーングラフを生成する
 前記(2)に記載の情報処理装置。
(4)
 前記認識部は、前記第1の人物と前記第2の人物の少なくとも一方の属性をさらに認識し、
 前記生成部は、前記属性を表す属性ノードととともに、前記属性と前記第1の人物と前記第2の人物の少なくとも一方との関係を表す属性エッジを含む前記シーングラフを生成する
 前記(2)または(3)に記載の情報処理装置。
(5)
 前記生成部は、前記第1のグラフにおいて前記第1の動作エッジが前記第1の人物が前記第2の人物に対するインタラクション動作を行ったことを表し、かつ、前記第2のグラフにおいて前記第2の動作エッジが前記第2の人物が前記第1の人物に対する動作を行ったことを表す場合、前記第1の人物に対する前記第2の人物の前記動作をリアクション動作として検出する
 前記(2)乃至(4)のいずれかに記載の情報処理装置。
(6)
 前記リアクション動作を表す構成要素を含む前記第2のグラフの部分と、前記インタラクション動作を表す構成要素を含む前記第1のグラフの部分とからなるグラフ情報を記録する記録制御部をさらに備える
 前記(5)に記載の情報処理装置。
(7)
 前記記録制御部は、前記リアクション動作のモーション情報と対応付けて前記グラフ情報を記録する
 前記(6)に記載の情報処理装置。
(8)
 前記記録制御部は、前記第1の人物と前記第2の人物の少なくとも一方がユーザとして仮想キャラクタの提示を受けるときに前記仮想キャラクタの動作の選択のための情報として前記グラフ情報を記録する
 前記(6)または(7)に記載の情報処理装置。
(9)
 前記グラフ情報は、構成要素のノードが表す内容を抽象化して表す情報であり、
 前記記録制御部は、前記グラフ情報を、前記計測対象以外の人物がユーザとして仮想キャラクタの提示を受けるときに前記仮想キャラクタの動作の選択のための情報として前記グラフ情報を記録する
 前記(6)または(7)に記載の情報処理装置。
(10)
 情報処理装置が、
 第1の人物と第2の人物を計測対象として含むセンサによる計測結果に基づいて、
 第1の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の先ノードが前記第1の人物と前記第2の人物間の第1の動作関係を表す第1の動作エッジで接続され、
 前記第1の時刻の後の第2の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の後ノードが前記第1の人物と前記第2の人物間の第2の動作関係を表す第2の動作エッジで接続され、
 前記第1の人物の前記先ノードと前記第1の人物の前記後ノードが、時間の経過を表す第1の経時関係エッジで接続され、
 前記第2の人物の前記先ノードと前記第2の人物の前記後ノードが、時間の経過を表す第2の経時関係エッジで接続されたシーングラフを生成する
 情報処理方法。
(11)
 コンピュータに、
 第1の人物と第2の人物を計測対象として含むセンサによる計測結果に基づいて、
 第1の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の先ノードが前記第1の人物と前記第2の人物間の第1の動作関係を表す第1の動作エッジで接続され、
 前記第1の時刻の後の第2の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の後ノードが前記第1の人物と前記第2の人物間の第2の動作関係を表す第2の動作エッジで接続され、
 前記第1の人物の前記先ノードと前記第1の人物の前記後ノードが、時間の経過を表す第1の経時関係エッジで接続され、
 前記第2の人物の前記先ノードと前記第2の人物の前記後ノードが、時間の経過を表す第2の経時関係エッジで接続されたシーングラフを生成する
 処理を実行させるプログラムを記録した記録媒体。
(12)
 センサによるユーザの動作の計測結果に基づいて、仮想キャラクタに対する前記ユーザのインタラクション動作を認識する認識部と、
 第1の人物と第2の人物を計測対象として含むセンサによる計測結果に基づいて生成された既知シーングラフに基づいて、前記ユーザの前記インタラクション動作に対するリアクション動作を選択する選択部と、
 前記リアクション動作を行う前記仮想キャラクタを前記ユーザに提示する提示部と
 を備え、
 前記既知シーングラフは、
  第1の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の先ノード、
  前記複数の先ノードを接続し、前記第1の人物と前記第2の人物間の第1の動作関係を表す第1の動作エッジ、
  前記第1の時刻の後の第2の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の後ノード、
  前記複数の後ノードを接続し、前記第1の人物と前記第2の人物間の第2の動作関係を表す第2の動作エッジ、
  前記第1の人物の前記先ノードと前記第1の人物の前記後ノードを接続し、時間の経過を表す第1の経時関係エッジ、
  前記第2の人物の前記先ノードと前記第2の人物の前記後ノードを接続し、時間の経過を表す第2の経時関係エッジを含む、
 情報処理装置。
(13)
 過去時刻における前記ユーザと前記仮想キャラクタをそれぞれ表す複数の過去ノードが前記ユーザと前記仮想キャラクタの動作関係を表す過去エッジで接続された過去グラフと、現在時刻における前記ユーザと前記仮想キャラクタをそれぞれ表す複数の現在ノードが前記ユーザと前記仮想キャラクタの動作関係を表す現在エッジで接続された現在グラフと、前記ユーザの前記過去ノードと前記ユーザの前記現在ノードの間および前記仮想キャラクタの前記過去ノードと前記仮想キャラクタの前記現在ノードの間を接続するとともに時間の経過を表す複数の経時関係エッジと、を含むカレントシーングラフを生成する生成部をさらに備え、
 前記選択部は、前記カレントシーングラフと共通する構成要素を含む前記既知シーングラフに基づいて、前記リアクション動作を選択する
 前記(12)に記載の情報処理装置。
(14)
 前記選択部は、前記ユーザに対応する前記第1の人物から前記仮想キャラクタに対応する前記第2の人物へのインタラクション動作が前記ユーザの前記インタラクション動作に対応することを示す前記既知シーングラフの前記第1の時刻のグラフに基づいて、前記既知シーングラフの前記第2の時刻のグラフにおける前記第2の人物が前記第1の人物に対して行った動作を前記リアクション動作として選択する
 前記(13)に記載の情報処理装置。
(15)
 前記既知シーングラフにおける前記第1の人物は前記ユーザである、
 前記(14)に記載の情報処理装置。
(16)
 前記既知シーングラフにおける前記第1の人物は前記ユーザとは異なる人物である、
 前記(14)に記載の情報処理装置。
(17)
 情報処理装置が、
 センサによるユーザの動作の計測結果に基づいて、仮想キャラクタに対する前記ユーザのインタラクション動作を認識し、
 第1の人物と第2の人物を計測対象として含むセンサによる計測結果に基づいて生成された既知シーングラフに基づいて、前記ユーザの前記インタラクション動作に対するリアクション動作を選択し、
 前記リアクション動作を行う前記仮想キャラクタを前記ユーザに提示し、
 前記既知シーングラフが、
  第1の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の先ノード、
  前記複数の先ノードを接続し、前記第1の人物と前記第2の人物間の第1の動作関係を表す第1の動作エッジ、
  前記第1の時刻の後の第2の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の後ノード、
  前記複数の後ノードを接続し、前記第1の人物と前記第2の人物間の第2の動作関係を表す第2の動作エッジ、
  前記第1の人物の前記先ノードと前記第1の人物の前記後ノードを接続し、時間の経過を表す第1の経時関係エッジ、
  前記第2の人物の前記先ノードと前記第2の人物の前記後ノードを接続し、時間の経過を表す第2の経時関係エッジを含む、
 情報処理方法。
(18)
 コンピュータに、
 センサによるユーザの動作の計測結果に基づいて、仮想キャラクタに対する前記ユーザのインタラクション動作を認識し、
 第1の人物と第2の人物を計測対象として含むセンサによる計測結果に基づいて生成された既知シーングラフに基づいて、前記ユーザの前記インタラクション動作に対するリアクション動作を選択し、
 前記リアクション動作を行う前記仮想キャラクタを前記ユーザに提示する処理を実行させ、
 前記既知シーングラフが、
  第1の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の先ノード、
  前記複数の先ノードを接続し、前記第1の人物と前記第2の人物間の第1の動作関係を表す第1の動作エッジ、
  前記第1の時刻の後の第2の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の後ノード、
  前記複数の後ノードを接続し、前記第1の人物と前記第2の人物間の第2の動作関係を表す第2の動作エッジ、
  前記第1の人物の前記先ノードと前記第1の人物の前記後ノードを接続し、時間の経過を表す第1の経時関係エッジ、
  前記第2の人物の前記先ノードと前記第2の人物の前記後ノードを接続し、時間の経過を表す第2の経時関係エッジを含む、
 処理を実行させるプログラムを記録した記録媒体。
(1)
Based on the measurement results by the sensor including the first person and the second person as measurement targets,
A plurality of previous nodes each representing the first person and the second person at a first time are first motion edges representing a first motion relationship between the first person and the second person. connected,
A plurality of subsequent nodes each representing the first person and the second person at a second time after the first time define a second operational relationship between the first person and the second person. connected by a second motion edge representing
The previous node of the first person and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time,
The information processing apparatus includes a generation unit that generates a scene graph in which the previous node of the second person and the subsequent node of the second person are connected by a second temporal relation edge representing the passage of time.
(2)
further comprising a recognition unit that recognizes the first operational relationship and the second operational relationship based on the measurement results,
The generation unit is configured to generate, when a second graph including the plurality of subsequent nodes and the second action edge changes with respect to a first graph including the plurality of previous nodes and the first action edge. The information processing device according to (1) above, which generates the scene graph.
(3)
The recognition unit further recognizes a real space context in which at least one of the first person and the second person is present,
The generation unit generates, based on the context of the real space, an object node representing an object in the real space, and an object representing a relationship between the object and at least one of the first person and the second person. The information processing device according to (2) above, which generates the scene graph including edges.
(4)
The recognition unit further recognizes an attribute of at least one of the first person and the second person,
(2) The generation unit generates the scene graph including an attribute node representing the attribute and an attribute edge representing a relationship between the attribute and at least one of the first person and the second person. Or the information processing device according to (3).
(5)
The generation unit is configured such that in the first graph, the first motion edge represents that the first person performed an interaction motion with respect to the second person, and in the second graph, the first motion edge represents the second motion edge in the second graph. If the motion edge indicates that the second person has performed a motion toward the first person, the motion of the second person toward the first person is detected as a reaction motion. The information processing device according to any one of (4).
(6)
The method further includes a recording control unit that records graph information including a portion of the second graph including a component representing the reaction behavior and a portion of the first graph including a component representing the interaction behavior. 5) The information processing device according to item 5).
(7)
The information processing device according to (6), wherein the recording control unit records the graph information in association with motion information of the reaction action.
(8)
The recording control unit records the graph information as information for selecting an action of the virtual character when at least one of the first person and the second person receives a presentation of the virtual character as a user. The information processing device according to (6) or (7).
(9)
The graph information is information that abstracts and represents the content represented by the component node,
(6) The recording control unit records the graph information as information for selecting an action of the virtual character when a person other than the measurement target receives a presentation of the virtual character as a user. Or the information processing device according to (7).
(10)
The information processing device
Based on the measurement results by the sensor including the first person and the second person as measurement targets,
A plurality of previous nodes each representing the first person and the second person at a first time are first motion edges representing a first motion relationship between the first person and the second person. connected,
A plurality of subsequent nodes each representing the first person and the second person at a second time after the first time define a second operational relationship between the first person and the second person. connected by a second motion edge representing
The previous node of the first person and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time,
An information processing method, comprising: generating a scene graph in which the previous node of the second person and the subsequent node of the second person are connected by a second temporal relationship edge representing the passage of time.
(11)
to the computer,
Based on the measurement results by the sensor including the first person and the second person as measurement targets,
A plurality of previous nodes each representing the first person and the second person at a first time are first motion edges representing a first motion relationship between the first person and the second person. connected,
A plurality of subsequent nodes each representing the first person and the second person at a second time after the first time define a second operational relationship between the first person and the second person. connected by a second motion edge representing
The previous node of the first person and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time,
A record of a program that executes a process that generates a scene graph in which the previous node of the second person and the subsequent node of the second person are connected by a second temporal relation edge representing the passage of time. Medium.
(12)
a recognition unit that recognizes the user's interaction motion with respect to the virtual character based on the measurement result of the user's motion by the sensor;
a selection unit that selects a reaction action to the interaction action of the user based on a known scene graph generated based on measurement results by a sensor including a first person and a second person as measurement targets;
a presentation unit that presents the virtual character performing the reaction action to the user;
The known scene graph is
a plurality of destination nodes each representing the first person and the second person at a first time;
a first motion edge connecting the plurality of destination nodes and representing a first motion relationship between the first person and the second person;
a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time;
a second motion edge connecting the plurality of subsequent nodes and representing a second motion relationship between the first person and the second person;
a first temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time;
a second temporal relationship edge connecting the previous node of the second person and the subsequent node of the second person and representing the passage of time;
Information processing device.
(13)
A past graph in which a plurality of past nodes each representing the user and the virtual character at the past time are connected by a past edge representing the operational relationship between the user and the virtual character, and a past graph representing the user and the virtual character at the current time, respectively. A current graph in which a plurality of current nodes are connected by current edges representing a motion relationship between the user and the virtual character, and between the past node of the user and the current node of the user, and between the past node of the virtual character. further comprising a generation unit that generates a current scene graph including a plurality of time-related edges that connect the current nodes of the virtual character and represent the passage of time;
The information processing device according to (12), wherein the selection unit selects the reaction action based on the known scene graph that includes components common to the current scene graph.
(14)
The selection unit selects the known scene graph from the known scene graph indicating that an interaction movement from the first person corresponding to the user to the second person corresponding to the virtual character corresponds to the interaction movement of the user. Based on the graph at the first time, an action performed by the second person toward the first person in the graph at the second time of the known scene graph is selected as the reaction action. ).
(15)
the first person in the known scene graph is the user;
The information processing device according to (14) above.
(16)
the first person in the known scene graph is a different person from the user;
The information processing device according to (14) above.
(17)
The information processing device
Recognizing the user's interaction motion with respect to the virtual character based on the measurement result of the user's motion by the sensor,
Selecting a reaction action to the interaction action of the user based on a known scene graph generated based on measurement results by a sensor including a first person and a second person as measurement targets;
presenting the virtual character performing the reaction action to the user;
The known scene graph is
a plurality of destination nodes each representing the first person and the second person at a first time;
a first motion edge connecting the plurality of destination nodes and representing a first motion relationship between the first person and the second person;
a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time;
a second motion edge connecting the plurality of subsequent nodes and representing a second motion relationship between the first person and the second person;
a first temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time;
a second temporal relationship edge connecting the previous node of the second person and the subsequent node of the second person and representing the passage of time;
Information processing method.
(18)
to the computer,
Recognizing the user's interaction motion with respect to the virtual character based on the measurement result of the user's motion by the sensor,
Selecting a reaction action to the interaction action of the user based on a known scene graph generated based on measurement results by a sensor including a first person and a second person as measurement targets;
performing a process of presenting the virtual character that performs the reaction action to the user;
The known scene graph is
a plurality of destination nodes each representing the first person and the second person at a first time;
a first motion edge connecting the plurality of destination nodes and representing a first motion relationship between the first person and the second person;
a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time;
a second motion edge connecting the plurality of subsequent nodes and representing a second motion relationship between the first person and the second person;
a first temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time;
a second temporal relationship edge connecting the previous node of the second person and the subsequent node of the second person and representing the passage of time;
A recording medium that records a program that executes processing.
 1 AR表示デバイス, 11 計測デバイス, 12 情報処理装置, 21 計測デバイス, 22 情報処理装置, 23 入力装置, 51 リアクションモーションDB, 52 メタ化インタラクション-リアクションDB, 53 個人特化インタラクション-リアクションDB, 101 リアクション学習処理部, 111 認識部, 112 リアクション検出部, 113 記録制御部, 151 リアクション提示処理部, 161 認識部, 162 照合部, 163 提示部 1 AR display device, 11 measurement device, 12 information processing device, 21 measurement device, 22 information processing device, 23 input device, 51 reaction motion DB, 52 meta-ized interaction-reaction DB, 53 personalized interaction-reaction DB, 101 Reaction learning processing unit, 111 recognition unit, 112 reaction detection unit, 113 recording control unit, 151 reaction presentation processing unit, 161 recognition unit, 162 matching unit, 163 presentation unit

Claims (18)

  1.  第1の人物と第2の人物を計測対象として含むセンサによる計測結果に基づいて、
     第1の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の先ノードが前記第1の人物と前記第2の人物間の第1の動作関係を表す第1の動作エッジで接続され、
     前記第1の時刻の後の第2の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の後ノードが前記第1の人物と前記第2の人物間の第2の動作関係を表す第2の動作エッジで接続され、
     前記第1の人物の前記先ノードと前記第1の人物の前記後ノードが、時間の経過を表す第1の経時関係エッジで接続され、
     前記第2の人物の前記先ノードと前記第2の人物の前記後ノードが、時間の経過を表す第2の経時関係エッジで接続されたシーングラフを生成する生成部を備える
     情報処理装置。
    Based on the measurement results by the sensor including the first person and the second person as measurement targets,
    A plurality of previous nodes each representing the first person and the second person at a first time are first motion edges representing a first motion relationship between the first person and the second person. connected,
    A plurality of subsequent nodes each representing the first person and the second person at a second time after the first time define a second operational relationship between the first person and the second person. connected by a second motion edge representing
    The previous node of the first person and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time,
    The information processing apparatus includes a generation unit that generates a scene graph in which the previous node of the second person and the subsequent node of the second person are connected by a second temporal relation edge representing the passage of time.
  2.  前記第1の動作関係と前記第2の動作関係を前記計測結果に基づいて認識する認識部をさらに備え、
     前記生成部は、前記複数の先ノードと前記第1の動作エッジを含む第1のグラフに対して、前記複数の後ノードと前記第2の動作エッジを含む第2のグラフが変化した場合に前記シーングラフを生成する
     請求項1に記載の情報処理装置。
    further comprising a recognition unit that recognizes the first operational relationship and the second operational relationship based on the measurement results,
    The generation unit is configured to generate, when a second graph including the plurality of subsequent nodes and the second action edge changes with respect to a first graph including the plurality of previous nodes and the first action edge. The information processing device according to claim 1 , wherein the scene graph is generated.
  3.  前記認識部は、前記第1の人物と前記第2の人物の少なくとも一方がいる実空間のコンテキストをさらに認識し、
     前記生成部は、前記実空間のコンテキストに基づいて、前記実空間にある物体を表す物体ノードととともに、前記物体と前記第1の人物と前記第2の人物の少なくとも一方との関係を表す物体エッジを含む前記シーングラフを生成する
     請求項2に記載の情報処理装置。
    The recognition unit further recognizes a real space context in which at least one of the first person and the second person is present,
    The generation unit generates, based on the context of the real space, an object node representing an object in the real space, and an object representing a relationship between the object and at least one of the first person and the second person. The information processing device according to claim 2, wherein the scene graph including edges is generated.
  4.  前記認識部は、前記第1の人物と前記第2の人物の少なくとも一方の属性をさらに認識し、
     前記生成部は、前記属性を表す属性ノードととともに、前記属性と前記第1の人物と前記第2の人物の少なくとも一方との関係を表す属性エッジを含む前記シーングラフを生成する
     請求項2に記載の情報処理装置。
    The recognition unit further recognizes an attribute of at least one of the first person and the second person,
    The generation unit generates the scene graph including an attribute node representing the attribute and an attribute edge representing a relationship between the attribute and at least one of the first person and the second person. The information processing device described.
  5.  前記生成部は、前記第1のグラフにおいて前記第1の動作エッジが前記第1の人物が前記第2の人物に対するインタラクション動作を行ったことを表し、かつ、前記第2のグラフにおいて前記第2の動作エッジが前記第2の人物が前記第1の人物に対する動作を行ったことを表す場合、前記第1の人物に対する前記第2の人物の前記動作をリアクション動作として検出する
     請求項2に記載の情報処理装置。
    The generation unit is configured such that in the first graph, the first motion edge represents that the first person performed an interaction motion with respect to the second person, and in the second graph, the first motion edge represents the second motion edge in the second graph. 3. If a motion edge of indicates that the second person has performed a motion toward the first person, the motion of the second person toward the first person is detected as a reaction motion. information processing equipment.
  6.  前記リアクション動作を表す構成要素を含む前記第2のグラフの部分と、前記インタラクション動作を表す構成要素を含む前記第1のグラフの部分とからなるグラフ情報を記録する記録制御部をさらに備える
     請求項5に記載の情報処理装置。
    The method further comprises a recording control unit that records graph information including a portion of the second graph including a component representing the reaction behavior and a portion of the first graph including a component representing the interaction behavior. 5. The information processing device according to 5.
  7.  前記記録制御部は、前記リアクション動作のモーション情報と対応付けて前記グラフ情報を記録する
     請求項6に記載の情報処理装置。
    The information processing device according to claim 6, wherein the recording control unit records the graph information in association with motion information of the reaction action.
  8.  前記記録制御部は、前記第1の人物と前記第2の人物の少なくとも一方がユーザとして仮想キャラクタの提示を受けるときに前記仮想キャラクタの動作の選択のための情報として前記グラフ情報を記録する
     請求項6に記載の情報処理装置。
    The recording control unit records the graph information as information for selecting an action of the virtual character when at least one of the first person and the second person receives a presentation of the virtual character as a user. The information processing device according to item 6.
  9.  前記グラフ情報は、構成要素のノードが表す内容を抽象化して表す情報であり、
     前記記録制御部は、前記グラフ情報を、前記計測対象以外の人物がユーザとして仮想キャラクタの提示を受けるときに前記仮想キャラクタの動作の選択のための情報として前記グラフ情報を記録する
     請求項6に記載の情報処理装置。
    The graph information is information that abstracts and represents the content represented by the component node,
    The recording control unit records the graph information as information for selecting an action of the virtual character when a person other than the measurement target receives a presentation of the virtual character as a user. The information processing device described.
  10.  情報処理装置が、
     第1の人物と第2の人物を計測対象として含むセンサによる計測結果に基づいて、
     第1の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の先ノードが前記第1の人物と前記第2の人物間の第1の動作関係を表す第1の動作エッジで接続され、
     前記第1の時刻の後の第2の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の後ノードが前記第1の人物と前記第2の人物間の第2の動作関係を表す第2の動作エッジで接続され、
     前記第1の人物の前記先ノードと前記第1の人物の前記後ノードが、時間の経過を表す第1の経時関係エッジで接続され、
     前記第2の人物の前記先ノードと前記第2の人物の前記後ノードが、時間の経過を表す第2の経時関係エッジで接続されたシーングラフを生成する
     情報処理方法。
    The information processing device
    Based on the measurement results by the sensor including the first person and the second person as measurement targets,
    A plurality of previous nodes each representing the first person and the second person at a first time are first motion edges representing a first motion relationship between the first person and the second person. connected,
    A plurality of subsequent nodes each representing the first person and the second person at a second time after the first time define a second operational relationship between the first person and the second person. connected by a second motion edge representing
    The previous node of the first person and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time,
    An information processing method, comprising: generating a scene graph in which the previous node of the second person and the subsequent node of the second person are connected by a second temporal relationship edge representing the passage of time.
  11.  コンピュータに、
     第1の人物と第2の人物を計測対象として含むセンサによる計測結果に基づいて、
     第1の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の先ノードが前記第1の人物と前記第2の人物間の第1の動作関係を表す第1の動作エッジで接続され、
     前記第1の時刻の後の第2の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の後ノードが前記第1の人物と前記第2の人物間の第2の動作関係を表す第2の動作エッジで接続され、
     前記第1の人物の前記先ノードと前記第1の人物の前記後ノードが、時間の経過を表す第1の経時関係エッジで接続され、
     前記第2の人物の前記先ノードと前記第2の人物の前記後ノードが、時間の経過を表す第2の経時関係エッジで接続されたシーングラフを生成する
     処理を実行させるプログラムを記録した記録媒体。
    to the computer,
    Based on the measurement results by the sensor including the first person and the second person as measurement targets,
    A plurality of previous nodes each representing the first person and the second person at a first time are first motion edges representing a first motion relationship between the first person and the second person. connected,
    A plurality of subsequent nodes each representing the first person and the second person at a second time after the first time define a second operational relationship between the first person and the second person. connected by a second motion edge representing
    The previous node of the first person and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time,
    A record of a program that executes a process that generates a scene graph in which the previous node of the second person and the subsequent node of the second person are connected by a second temporal relation edge representing the passage of time. Medium.
  12.  センサによるユーザの動作の計測結果に基づいて、仮想キャラクタに対する前記ユーザのインタラクション動作を認識する認識部と、
     第1の人物と第2の人物を計測対象として含むセンサによる計測結果に基づいて生成された既知シーングラフに基づいて、前記ユーザの前記インタラクション動作に対するリアクション動作を選択する選択部と、
     前記リアクション動作を行う前記仮想キャラクタを前記ユーザに提示する提示部と
     を備え、
     前記既知シーングラフは、
      第1の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の先ノード、
      前記複数の先ノードを接続し、前記第1の人物と前記第2の人物間の第1の動作関係を表す第1の動作エッジ、
      前記第1の時刻の後の第2の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の後ノード、
      前記複数の後ノードを接続し、前記第1の人物と前記第2の人物間の第2の動作関係を表す第2の動作エッジ、
      前記第1の人物の前記先ノードと前記第1の人物の前記後ノードを接続し、時間の経過を表す第1の経時関係エッジ、
      前記第2の人物の前記先ノードと前記第2の人物の前記後ノードを接続し、時間の経過を表す第2の経時関係エッジを含む、
     情報処理装置。
    a recognition unit that recognizes the user's interaction motion with respect to the virtual character based on the measurement result of the user's motion by the sensor;
    a selection unit that selects a reaction action to the interaction action of the user based on a known scene graph generated based on measurement results by a sensor including a first person and a second person as measurement targets;
    a presentation unit that presents the virtual character performing the reaction action to the user;
    The known scene graph is
    a plurality of destination nodes each representing the first person and the second person at a first time;
    a first motion edge connecting the plurality of destination nodes and representing a first motion relationship between the first person and the second person;
    a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time;
    a second motion edge connecting the plurality of subsequent nodes and representing a second motion relationship between the first person and the second person;
    a first temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time;
    a second temporal relationship edge connecting the previous node of the second person and the subsequent node of the second person and representing the passage of time;
    Information processing device.
  13.  過去時刻における前記ユーザと前記仮想キャラクタをそれぞれ表す複数の過去ノードが前記ユーザと前記仮想キャラクタの動作関係を表す過去エッジで接続された過去グラフと、現在時刻における前記ユーザと前記仮想キャラクタをそれぞれ表す複数の現在ノードが前記ユーザと前記仮想キャラクタの動作関係を表す現在エッジで接続された現在グラフと、前記ユーザの前記過去ノードと前記ユーザの前記現在ノードの間および前記仮想キャラクタの前記過去ノードと前記仮想キャラクタの前記現在ノードの間を接続するとともに時間の経過を表す複数の経時関係エッジと、を含むカレントシーングラフを生成する生成部をさらに備え、
     前記選択部は、前記カレントシーングラフと共通する構成要素を含む前記既知シーングラフに基づいて、前記リアクション動作を選択する
     請求項12に記載の情報処理装置。
    A past graph in which a plurality of past nodes each representing the user and the virtual character at the past time are connected by a past edge representing the operational relationship between the user and the virtual character, and a past graph representing the user and the virtual character at the current time, respectively. A current graph in which a plurality of current nodes are connected by current edges representing a motion relationship between the user and the virtual character, and between the past node of the user and the current node of the user, and between the past node of the virtual character. further comprising a generation unit that generates a current scene graph including a plurality of time-related edges that connect the current nodes of the virtual character and represent the passage of time;
    The information processing device according to claim 12, wherein the selection unit selects the reaction action based on the known scene graph that includes components common to the current scene graph.
  14.  前記選択部は、前記ユーザに対応する前記第1の人物から前記仮想キャラクタに対応する前記第2の人物へのインタラクション動作が前記ユーザの前記インタラクション動作に対応することを示す前記既知シーングラフの前記第1の時刻のグラフに基づいて、前記既知シーングラフの前記第2の時刻のグラフにおける前記第2の人物が前記第1の人物に対して行った動作を前記リアクション動作として選択する
     請求項13に記載の情報処理装置。
    The selection unit selects the known scene graph from the known scene graph indicating that an interaction movement from the first person corresponding to the user to the second person corresponding to the virtual character corresponds to the interaction movement of the user. 13. Based on the graph at the first time, an action performed by the second person toward the first person in the graph at the second time of the known scene graph is selected as the reaction action. The information processing device described in .
  15.  前記既知シーングラフにおける前記第1の人物は前記ユーザである、
     請求項14に記載の情報処理装置。
    the first person in the known scene graph is the user;
    The information processing device according to claim 14.
  16.  前記既知シーングラフにおける前記第1の人物は前記ユーザとは異なる人物である、
     請求項14に記載の情報処理装置。
    the first person in the known scene graph is a different person from the user;
    The information processing device according to claim 14.
  17.  情報処理装置が、
     センサによるユーザの動作の計測結果に基づいて、仮想キャラクタに対する前記ユーザのインタラクション動作を認識し、
     第1の人物と第2の人物を計測対象として含むセンサによる計測結果に基づいて生成された既知シーングラフに基づいて、前記ユーザの前記インタラクション動作に対するリアクション動作を選択し、
     前記リアクション動作を行う前記仮想キャラクタを前記ユーザに提示し、
     前記既知シーングラフが、
      第1の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の先ノード、
      前記複数の先ノードを接続し、前記第1の人物と前記第2の人物間の第1の動作関係を表す第1の動作エッジ、
      前記第1の時刻の後の第2の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の後ノード、
      前記複数の後ノードを接続し、前記第1の人物と前記第2の人物間の第2の動作関係を表す第2の動作エッジ、
      前記第1の人物の前記先ノードと前記第1の人物の前記後ノードを接続し、時間の経過を表す第1の経時関係エッジ、
      前記第2の人物の前記先ノードと前記第2の人物の前記後ノードを接続し、時間の経過を表す第2の経時関係エッジを含む、
     情報処理方法。
    The information processing device
    Recognizing the user's interaction motion with respect to the virtual character based on the measurement result of the user's motion by the sensor,
    Selecting a reaction action to the interaction action of the user based on a known scene graph generated based on measurement results by a sensor including a first person and a second person as measurement targets;
    presenting the virtual character performing the reaction action to the user;
    The known scene graph is
    a plurality of destination nodes each representing the first person and the second person at a first time;
    a first motion edge connecting the plurality of destination nodes and representing a first motion relationship between the first person and the second person;
    a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time;
    a second motion edge connecting the plurality of subsequent nodes and representing a second motion relationship between the first person and the second person;
    a first temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time;
    a second temporal relationship edge connecting the previous node of the second person and the subsequent node of the second person and representing the passage of time;
    Information processing method.
  18.  コンピュータに、
     センサによるユーザの動作の計測結果に基づいて、仮想キャラクタに対する前記ユーザのインタラクション動作を認識し、
     第1の人物と第2の人物を計測対象として含むセンサによる計測結果に基づいて生成された既知シーングラフに基づいて、前記ユーザの前記インタラクション動作に対するリアクション動作を選択し、
     前記リアクション動作を行う前記仮想キャラクタを前記ユーザに提示する処理を実行させ、
     前記既知シーングラフが、
      第1の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の先ノード、
      前記複数の先ノードを接続し、前記第1の人物と前記第2の人物間の第1の動作関係を表す第1の動作エッジ、
      前記第1の時刻の後の第2の時刻における前記第1の人物と前記第2の人物をそれぞれ表す複数の後ノード、
      前記複数の後ノードを接続し、前記第1の人物と前記第2の人物間の第2の動作関係を表す第2の動作エッジ、
      前記第1の人物の前記先ノードと前記第1の人物の前記後ノードを接続し、時間の経過を表す第1の経時関係エッジ、
      前記第2の人物の前記先ノードと前記第2の人物の前記後ノードを接続し、時間の経過を表す第2の経時関係エッジを含む、
     プログラムを記録した記録媒体。
    to the computer,
    Recognizing the user's interaction motion with respect to the virtual character based on the measurement result of the user's motion by the sensor,
    Selecting a reaction action to the interaction action of the user based on a known scene graph generated based on measurement results by a sensor including a first person and a second person as measurement targets;
    performing a process of presenting the virtual character that performs the reaction action to the user;
    The known scene graph is
    a plurality of destination nodes each representing the first person and the second person at a first time;
    a first motion edge connecting the plurality of destination nodes and representing a first motion relationship between the first person and the second person;
    a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time;
    a second motion edge connecting the plurality of subsequent nodes and representing a second motion relationship between the first person and the second person;
    a first temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time;
    a second temporal relationship edge connecting the previous node of the second person and the subsequent node of the second person and representing the passage of time;
    A recording medium that records a program.
PCT/JP2023/022697 2022-07-04 2023-06-20 Information processing device, information processing method, and recording medium WO2024009748A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022107563 2022-07-04
JP2022-107563 2022-07-04

Publications (1)

Publication Number Publication Date
WO2024009748A1 true WO2024009748A1 (en) 2024-01-11

Family

ID=89453299

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/022697 WO2024009748A1 (en) 2022-07-04 2023-06-20 Information processing device, information processing method, and recording medium

Country Status (1)

Country Link
WO (1) WO2024009748A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013210868A (en) * 2012-03-30 2013-10-10 Sony Corp Information processing apparatus, information processing method and computer program
WO2021045082A1 (en) * 2019-09-05 2021-03-11 国立大学法人東京工業大学 Method for expressing action of virtual character on screen
WO2021095704A1 (en) * 2019-11-15 2021-05-20 ソニー株式会社 Information processing device, information processing method, and program
WO2022113439A1 (en) * 2020-11-30 2022-06-02 株式会社日立製作所 Data analysis device and data analysis method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013210868A (en) * 2012-03-30 2013-10-10 Sony Corp Information processing apparatus, information processing method and computer program
WO2021045082A1 (en) * 2019-09-05 2021-03-11 国立大学法人東京工業大学 Method for expressing action of virtual character on screen
WO2021095704A1 (en) * 2019-11-15 2021-05-20 ソニー株式会社 Information processing device, information processing method, and program
WO2022113439A1 (en) * 2020-11-30 2022-06-02 株式会社日立製作所 Data analysis device and data analysis method

Similar Documents

Publication Publication Date Title
US20230105027A1 (en) Adapting a virtual reality experience for a user based on a mood improvement score
CN109740466B (en) Method for acquiring advertisement putting strategy and computer readable storage medium
CN108876526B (en) Commodity recommendation method and device and computer-readable storage medium
Joho et al. Looking at the viewer: analysing facial activity to detect personal highlights of multimedia contents
JP4736511B2 (en) Information providing method and information providing apparatus
US20190340649A1 (en) Generating and providing augmented reality representations of recommended products based on style compatibility in relation to real-world surroundings
JP6267861B2 (en) Usage measurement techniques and systems for interactive advertising
TW201301177A (en) Selection of advertisements via viewer feedback
EP2960815A1 (en) System and method for dynamically generating contextualised and personalised digital content
TW201301891A (en) Video highlight identification based on environmental sensing
US10642346B2 (en) Action control method and device
CN116484318B (en) Lecture training feedback method, lecture training feedback device and storage medium
JP6783479B1 (en) Video generation program, video generation device and video generation method
EP2874102A2 (en) Generating models for identifying thumbnail images
US11762900B2 (en) Customized selection of video thumbnails to present on social media webpages
Bouzakraoui et al. Appreciation of customer satisfaction through analysis facial expressions and emotions recognition
US11042749B2 (en) Augmented reality mapping systems and related methods
KR102119518B1 (en) Method and system for recommending product based style space created using artificial intelligence
EP2905678A1 (en) Method and system for displaying content to a user
WO2024009748A1 (en) Information processing device, information processing method, and recording medium
US20200098012A1 (en) Recommendation Method and Reality Presenting Device
EP4113413A1 (en) Automatic purchase of digital wish lists content based on user set thresholds
JP2017130170A (en) Conversation interlocking system, conversation interlocking device, conversation interlocking method, and conversation interlocking program
JP6794740B2 (en) Presentation material generation device, presentation material generation system, computer program and presentation material generation method
Ayush Context aware recommendations embedded in augmented viewpoint to retarget consumers in v-commerce

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23835282

Country of ref document: EP

Kind code of ref document: A1