WO2024009748A1

WO2024009748A1 - Information processing device, information processing method, and recording medium

Info

Publication number: WO2024009748A1
Application number: PCT/JP2023/022697
Authority: WO
Inventors: 智也石川
Original assignee: ソニーグループ株式会社
Priority date: 2022-07-04
Filing date: 2023-06-20
Publication date: 2024-01-11

Abstract

The present disclosure relates to an information processing device, an information processing method, and a recording medium that make it possible to cause a virtual character to execute a suitable reaction. An information processing device according to an embodiment of the present technology: recognizes an action constituting an interaction, by a user, with a virtual character; selects an action constituting a reaction to the interaction by the user on the basis of graph information representing a scenegraph that is generated on the basis of measurement results from a plurality of persons, the scenegraph being such that nodes representing each of the persons being measured at the same time are connected by edges representing the relationships between the persons, and such that nodes representing the same person being measured at different times are connected by time-elapse relationship edges representing the elapse of time; and presents, to the user, a virtual character that performs the action constituting the reaction. The present technology can be applied to an application that presents a virtual character using a head-mounted display (HMD).

Description

Information processing device, information processing method, and recording medium

The present technology particularly relates to an information processing device, an information processing method, and a recording medium that can cause a virtual character to perform an appropriate reaction.

There is an AR (Augmented Reality) application that allows you to communicate with virtual characters. By wearing an AR display device such as an HMD (Head Mounted Display), a user can communicate with a virtual character superimposed on the real space.

For example, when a user waves at a virtual character, this is detected based on measurement results from various sensors, and the virtual character performs an action such as waving back at the user. . Such communication is realized by having a virtual character perform a corresponding reaction when a user makes a certain interaction.

International Publication No. 2020/095368 International Publication No. 2020/217727

In both communication between real people and between a real person and a virtual character, there can be multiple actions that can be reaction candidates for a certain interaction. In addition, reactions depend on the relationship between the subject and object of the interaction, the attributes of each person, the surrounding space (object arrangement), and the like.

Therefore, it is difficult to manually create rules regarding the reactions of virtual characters. Even if the rules were created manually, there is a high possibility that the virtual character would not be able to perform an appropriate reaction.

The present technology was developed in view of this situation, and is intended to enable a virtual character to perform an appropriate reaction.

An information processing device according to an aspect of the present technology may measure the first person and the second person at a first time based on measurement results by a sensor that includes the first person and the second person as measurement targets. A plurality of previous nodes each representing a first action edge representing a first action relationship between the first person and the second person are connected by a first action edge representing a first action relationship between the first person and the second person, and a plurality of posterior nodes representing the first person and the second person, respectively, are connected by a second movement edge representing a second movement relationship between the first person and the second person; The previous node of the person and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time, and the previous node of the second person and the subsequent node of the second person are connected by a first temporal relationship edge representing the passage of time. The subsequent node includes a generation unit that generates a scene graph connected by a second temporal relationship edge representing the passage of time.

An information processing device according to another aspect of the present technology includes a recognition unit that recognizes an interaction motion of the user with respect to a virtual character based on a measurement result of the user's motion by a sensor, and a recognition unit that measures a first person and a second person. a selection unit that selects a reaction action for the interaction action of the user based on a known scene graph generated based on measurement results by a sensor included as an object; and a selection unit that presents the virtual character that performs the reaction action to the user. and a presentation section.

In one aspect of the present technology, based on measurement results by a sensor that includes a first person and a second person as measurement targets, a plurality of people representing the first person and the second person at a first time, respectively. are connected by a first motion edge representing a first motion relationship between the first person and the second person, and the first motion edge at a second time after the first time A plurality of posterior nodes representing a person and the second person, respectively, are connected by a second movement edge representing a second movement relationship between the first person and the second person, and a second movement edge represents a second movement relationship between the first person and the second person, The previous node and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time, and the previous node of the second person and the subsequent node of the second person are connected. , a scene graph connected by a second temporal relationship edge representing the passage of time is generated.

In another aspect of the present technology, the interaction motion of the user with respect to the virtual character is recognized based on the measurement result of the user's motion by the sensor, and the sensor includes the first person and the second person as measurement targets. Based on the known scene graph generated based on the results, a reaction action to the interaction action of the user is selected, and the virtual character performing the reaction action is presented to the user.

FIG. 3 is a diagram showing an example of a reaction of a virtual character realized in an AR application to which the present technology is applied. FIG. 2 is a diagram illustrating a series of processing flows in an information processing system according to an embodiment of the present technology. FIG. 3 is a diagram illustrating an example of processing in a reaction learning phase. It is a figure which shows the example of the graph which shows spatial information. It is a figure which shows the example of the graph which shows person attribute information. It is a figure which shows the example of the graph which shows relational information. It is a flowchart explaining reaction detection processing. It is a figure which shows the example of the graph showing a recognition result. FIG. 3 is a diagram showing an example of a spatio-temporal scene graph. 12 is a flowchart illustrating reaction partial graph and motion recording processing. FIG. 3 is a diagram illustrating an example of extraction of a reaction subgraph. FIG. 3 is a diagram illustrating an example of meta-ization of a reaction subgraph. FIG. 6 is a diagram illustrating an example of processing in a reaction presentation phase. FIG. 3 is a diagram illustrating an example of user interaction. FIG. 7 is a diagram showing an example of matching between a current scene graph and a reaction subgraph. FIG. 3 is a diagram showing an example of a reaction of a virtual character. 1 is a block diagram showing an example of a functional configuration of an information processing device 12. FIG. 2 is a block diagram showing an example of a functional configuration of an information processing device 22. FIG. FIG. 1 is a block diagram showing a configuration example of an information processing system. 1 is a block diagram showing an example of the configuration of a computer. FIG.

Hereinafter, a mode for implementing the present technology will be described. The explanation will be given in the following order.
1. Overview of this technology 2. Flow of processing in an information processing system 3. Reaction learning phase 4. Reaction presentation phase 5. Configuration of each device 6. Variant

<<Overview of this technology>>
FIG. 1 is a diagram showing an example of a reaction of a virtual character realized in an AR application to which the present technology is applied.

AR applications that apply this technology are applications that are used to coexist and interact with virtual characters. As shown in FIG. 1, a user using an AR application wears an AR display device 1 such as an HMD and communicates with a virtual character.

In the example of FIG. 1, a virtual character C, which is a humanoid virtual character, is displayed by the AR display device 1. Display of the virtual character C is realized by rendering a 3D model of the virtual character C. For example, a user wearing the optical see-through type AR display device 1 will feel as if the virtual character C is present in the room in which he or she is. The virtual character C shown in color in FIG. 1 indicates that the virtual character C is a virtual object displayed by the AR display device 1.

Note that the HMD used as the AR display device 1 may be a video see-through type HMD instead of an optical see-through type HMD. Furthermore, instead of displaying the virtual character C in a real space such as a user's room, the virtual character C may be displayed in a virtual space that is a three-dimensional virtual space.

In such a state, as shown in the upper part of FIG. 1, when the user waves his hand at the virtual character C, this is detected based on measurement results from various sensors. Furthermore, in response to the user waving his hand at the virtual character C, as shown in the lower part of FIG. 1, the virtual character C performs an action such as waving his hand back at the user. .

This technology allows a virtual character to efficiently perform appropriate actions as a reaction to actions (interactions) performed on the virtual character by a real person who is a user of an AR application.

Various actions are performed as reactions of the virtual character C according to the user's interaction, such as the action of sitting on another chair being performed as a reaction to the user's interaction of sitting on a chair in the room. .

Specifically, the following processing is mainly performed.

(1) A database (DB) is created by continuously measuring and recording interactions and reactions between real people. To record interactions and reactions, a scene graph is used that represents real people, objects, and attribute information as nodes, and various relationships such as the relationship between interactions and reactions and the positional relationship of real people as edges. By using edges that express not only spatial relationships but also temporal relationships, a scene graph (hereinafter referred to as a spatiotemporal scene graph) that expresses spatiotemporal relationships such as interactions and reactions is generated. .

By recording the relationship between interactions and reactions between real people using a spatio-temporal scene graph, we can efficiently search for an appropriate action as a reaction for a virtual character in response to an interaction between a real person and a virtual character. It becomes possible to have a virtual character perform the actions. The reaction of the virtual character is searched and executed by reproducing the same action as that performed by a certain real person as a reaction.

(2) A personalized interaction-reaction DB and a meta-ized interaction-reaction DB are generated as DBs using spatiotemporal scene graphs.

The personalized interaction-reaction DB is a DB that directly records information such as attribute information recognized in real space. Meta-ized interaction-reaction DB is a DB that records information such as attribute information recognized in real space after abstracting (meta-izing) it.

When reproducing a reaction with a virtual character, if the real person to whom the reaction is directed is a person for whom a specific individualized interaction-reaction DB has been prepared, the virtual character can be reproduced using that individualized interaction-reaction DB. Character reactions are searched and reproduced. This makes it possible to reproduce a more appropriate reaction for the real person to whom the reaction is directed.

On the other hand, if the real person who is the reaction partner is a person for whom a specific personalized interaction-reaction DB is not available, you can use the meta-ized interaction-reaction DB to search for and reproduce the virtual character's reactions. It will be done. By reproducing reactions using the meta-interaction-reaction DB, it becomes possible to handle cases where a person who is not the measurement target at the time of reaction learning receives a virtual character as a user.

(3) A spatiotemporal scene graph is a scene graph (spatial scene graph) that represents real people, objects, attributes, etc. at a certain time as nodes, and their relationships as edges. It is constructed by connecting the same nodes in with chronological relationship edges. A temporal relationship edge is an edge that represents the passage of a predetermined time.

When determining whether an interaction from person A to person B at a certain time is a reaction, the interaction from person B to person A is All you have to do is check whether there was an interaction immediately before. If there was an interaction from person B to person A immediately before, it is determined that the interaction from person A to person B is a reaction. This makes it possible to efficiently access data in the DB.

(4) The spatiotemporal scene graph information itself is information that abstracts the real space context. By describing the real-space context using less information and updating the spatio-temporal scene graph only when a semantic change occurs, it is possible to prevent data from becoming bloated. The real space context is a concept that includes the situations and attributes of real space components such as people in real space and objects in real space.

<<Processing flow in the information processing system>>
FIG. 2 is a diagram illustrating a series of processing flows in an information processing system according to an embodiment of the present technology.

As shown in FIG. 2, a series of processes in the information processing system consists of two processing phases: a "reaction learning phase" and a "reaction presentation phase." The information processing system includes a reaction learning side configuration and a reaction presentation side configuration.

As shown in the upper part of FIG. 2, a measurement device 11 and an information processing device 12 are provided as a configuration on the reaction learning side. The measurement device 11 and the information processing device 12 are connected via wired or wireless communication.

The measurement device 11 is a sensor device equipped with various sensors such as a color image sensor and a depth sensor. The measurement device 11 is installed in a room where a real person is present. In the example of FIG. 2, the measurement device 11 is installed in a space where a person A and a person B are present. Person A and Person B to be measured are communicating through conversation and gestures. In the present disclosure, person A and person B to be measured may be referred to as a first person and a second person.

The measurement device 11 measures the person to be measured and outputs measurement data such as a color image and a distance image to the information processing device 12. The color image output by the measurement device 11 shows not only a person performing an interaction action or a reaction action, but also objects such as furniture around the person. The distance image measured by the measurement device 11 represents the distance to a person and the distance to an object such as furniture.

The information processing device 12 analyzes the color image supplied from the measurement device 11 as measurement data and performs image recognition or the like to identify an individual. The information processing device 12 recognizes personal attributes such as age, gender, and height for the identified person.

The information processing device 12 also analyzes color images and distance images to recognize objects around people, relationships between objects, relationships between people, and the like. The information processing device 12 generates a spatio-temporal scene graph based on the recognition result and records it together with motion information indicating the motion of the person, thereby generating a DB. The information in the DB generated by the information processing device 12 is supplied to the information processing device 22.

Processing in the reaction learning phase is repeated using various people as measurement targets. The DB included in the information processing device 12 records information on spatio-temporal scene graphs acquired using various people as measurement targets.

On the other hand, as shown in the lower part of FIG. 2, an AR display device 1, a measurement device 21, an information processing device 22, and an input device 23 are provided as a configuration on the reaction presentation side. The measurement device 21 and the information processing device 22, and the information processing device 22 and the input device 23 are each connected via wired or wireless communication. The AR display device 1 worn by the person A who will experience the AR application is also connected to the information processing device 22 via wired or wireless communication.

The measurement device 21, like the measurement device 11, is a sensor device equipped with various sensors such as a color image sensor and a depth sensor. The measurement device 21 measures the motion of the user (person A) who is communicating with the virtual character C displayed on the AR display device 1, and outputs measurement data such as a color image and a distance image to the information processing device 22. do.

The information processing device 22 transmits the AR content data to the AR display device 1 and causes the virtual character C to be displayed. AR content data including a 3D model of the virtual character C is input from the input device 23 to the information processing device 22 .

Furthermore, the information processing device 22 performs various recognition processes based on the measurement data supplied from the measurement device 21, and generates a spatiotemporal scene graph representing the context around the user. The information processing device 22 searches for a reaction of the virtual character C by comparing the spatio-temporal scene graph representing the user's surrounding context with the spatio-temporal scene graph generated by the processing of the reaction learning phase. The information processing device 22 causes the virtual character C to perform the same movement as the recorded real person's movement found through the search, and presents the user with a reaction according to the user's interaction.

In the example of FIG. 2, the measurement device 11 and the measurement device 21 are devices in different casings, but they may be configured as devices in the same casing. In this case, measurement in the reaction learning phase and measurement in the reaction presentation phase are performed in the same space.

Furthermore, the functions of the information processing device 12 and the information processing device 22 may be realized by a server on the Internet. Further, the information processing device 12 and the information processing device 22 may be configured as devices in the same housing.

The functions of the measurement device 11 and the information processing device 12 may be realized in one device. Similarly, on the reaction presentation side, the function of the measurement device 21 and the function of the information processing device 22 may be realized in one device. Further, the functions of the information processing device 22 may be installed in the AR display device 1.

Hereinafter, details of the processing of each phase of the reaction learning phase and the reaction presentation phase will be explained.

<<Reaction learning phase>>
FIG. 3 is a diagram illustrating an example of processing in the reaction learning phase.

The processing of the reaction learning phase includes the following processing: recognition processing (Step 1), reaction detection processing (Step 2), and reaction subgraph & motion recording processing (Step 3).

<Step 1: Recognition processing>
The recognition process is a process of recognizing interactions and reactions between people, attribute information of each person, and spatial information based on measurement data supplied from the measurement device 11. The recognition processing includes spatial recognition processing, human attribute recognition processing, and interaction recognition processing as the processing of Steps 1-1 to 1-3. Processing other than interaction recognition processing may not be performed.

- Step 1-1: Spatial recognition processing In the spatial recognition processing, spatial information is generated based on measurement data supplied from the measurement device 11. For example, the following information is generated as spatial information.

・Geometric information of each object: shape, size, etc. ・Semantic information of each object: category (wall, floor, chair, etc.), parts (backrest, doorknob)
- Information indicating relationships between objects: distances between objects such as "Object A is near object B", positional relationships between objects such as "Object A is in front of object B", etc.

For example, the techniques described in

Documents

1 and 2 can be used as spatial recognition techniques.
Reference 1 “[Narita+, IROS2019] G. Narita, T. Seno, T. Ishikawa and Y. Kaji, “PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things”, IROS, 2019.”
Reference 2 “[Tahara+, ISMAR2020] T. Tahara, T. Seno, G. Narita and T. Ishikawa, “Retargetable AR: Context-aware Augmented Reality in Indoor Scenes based on 3D Scene Graph”, ISMAR, 2020.”

- Step 1-2: Person Attribute Recognition Process In the person attribute recognition process, person attribute information is generated based on the measurement data supplied from the measurement device 11. For example, the following information is generated as the person attribute information.

・Information obtained from appearance, such as age, gender, race, etc. ・Information that allows personal identification in combination with the above information and image information

Personal attributes can be recognized using a library such as OpenCV, for example.

・Step 1-3: Interaction recognition processing In the interaction recognition processing, based on the measurement data supplied from the measurement device 11, relational information indicating the interaction between an object and a person and relational information indicating the interaction between the persons are generated. For example, the following information is generated as related information.

・Object-person: An action in which a person exerts an action on an object, such as person A sitting on a couch (“person A makes a V motion to object B”). ・Person-person: Person A acts as a person. An action performed by a person toward another person, such as waving to B (“Person A waves V to Person B”)

For example, the technique described in Document 2 can be used as the interaction recognition technique.
Reference 3 “[Gao+, BMVC2018] C. Gao, Y. Zou and J.-B. Huang, “iCAN: Instance-Centric Attention Network for Human-Object Interaction”, BMVC, 2018.”

The spatial information generated by the spatial recognition process, the person attribute information generated by the person attribute recognition process, and the relational information generated by the interaction recognition process are acquired as a graph, which is data having a graph structure.

FIG. 4 is a diagram showing an example of a graph showing spatial information.

For example, assume that it has been recognized that a sofa, chair, and table exist around the person to be measured, and that they are arranged in a predetermined positional relationship. In this case, the graph representing the spatial information is data composed of three nodes (sofa#1, chair#2, table#3) representing these objects, as shown in FIG. In this disclosure, a node representing an object may be referred to as an object node.

In the example of FIG. 4, the sofa node and the chair node are connected by an edge E1 labeled "in front of" and an edge E2 labeled "on left of." Edge E1, represented as an arrow pointing from the sofa node to the chair node, represents that the chair is in front of the sofa. Furthermore, an edge E2 expressed as an arrow pointing from the chair node to the sofa node indicates that the sofa exists on the left side with respect to the front of the chair.

The sofa node and the table node, and the chair node and the table node are also connected by edges E3 and E4, respectively, which are set with labels representing their positional relationships.

FIG. 5 is a diagram showing an example of a graph showing personal attribute information.

If the age of the person to be measured is recognized as 60 years old, the graph showing the person attribute information will have a node (Person #1) representing the person to be measured, and a node whose age is 60 years old, as shown in A in Figure 5. It is composed of a node (Age:60) that indicates that The node representing the person to be measured and the node representing age 60 are connected by an edge E11 labeled "has". In this disclosure, a node representing an attribute may be referred to as an attribute node, and an edge representing a relationship between an attribute and a person may be referred to as an attribute edge.

Furthermore, if the height of the person to be measured is recognized as 1.8m, the graph showing the person attribute information will be divided into a node representing the person to be measured (Person #1) and a node (Person#1) representing the person to be measured, and the height of the person to be measured to be 1.8m. It is composed of a node (Height: 1.8m) that represents 1.8m. The node representing the person to be measured and the node representing the height of 1.8 m are connected by an edge E12 labeled "has".

FIG. 6 is a diagram showing an example of a graph showing relational information.

As an interaction between an object and a person, it is assumed that it is recognized that the person to be measured is using a table and that the person is sitting on a chair. In this case, the graph showing the relationship information includes a node representing the person to be measured (Person#1), a node representing the table (table#1), and a node representing the chair (Chair#2), as shown in FIG. ). Further, an edge representing a relationship between a person and an object based on the recognized real space context may be referred to as an object edge.

The node representing the person to be measured and the node representing the table are connected by an edge E21 labeled "use". Further, the node representing the person to be measured and the node representing the chair are connected by an edge E22 labeled "sitting on".

The result of the recognition process in Step 1 is thus obtained as data having a graph structure in which objects, people, and person attributes are expressed as nodes, and relationships among them are expressed as edges.

<Step 2: Reaction detection processing>
The reaction detection process is a process of detecting a person's reaction based on information acquired by the recognition process.

The reaction detection process will be explained with reference to the flowchart in FIG.

・Step2-1. Detection of semantic change Detection of semantic change is based on a graph representing the recognition result at time t, which is the current time, and a graph representing the recognition result at time t-1, which is the previous measurement time. This is the process of detecting. If it is determined that there is no change in the relationship, detection of a semantic change is repeatedly performed based on information newly acquired by the recognition process.

On the other hand, if the graph representing the recognition result at time t, which is the current time after a predetermined time has elapsed, changes with respect to the graph representing the recognition result at time t-1, which is the previous measurement time, there is a change in the relationship. It is judged as such. In this case, since there is a possibility that a new reaction has occurred, the process proceeds to the subsequent stage of processing.

・Step2-2. Generation of spatio-temporal scene graph If a change is detected between the graph representing the recognition result at time t and the graph representing the recognition result at time t-1, a graph of spatial information, a graph of person attribute information, a graph of relational information The graphs are unified into one spatiotemporal scene graph. The integration of each graph is performed by temporally and spatially connecting graphs at past measurement times within time Tth, which is a threshold time, from time t, which is the current time.

FIG. 8 is a diagram showing an example of a graph representing the recognition results.

As shown on the left side of FIG. 8, it is recognized that person A waved to person B at time t, and that person B waved to person A at time t+Δt after time Δt. shall be taken as a thing. In this disclosure, time t and time t+Δt may be referred to as first time and second time, respectively. In this case, the spatial information graph, the person attribute information graph, and the relationship information graph at each time of time t and time t+Δt become the graphs shown on the right side of FIG. 8. In this disclosure, the graph at the first time and the graph at the second time may be referred to as a first graph and a second graph, respectively.

For example, the graph of spatial information at time t is composed of nodes of the floor as an object. The graph of spatial information at time t+Δt is also composed of nodes of the floor as an object.

The floor node at time t and the floor node at time t+Δt are connected by a temporal relationship edge with a label of time Δt. In this example, it is assumed that there is no change in the spatial information graph.

Further, the graph of person attribute information at time t is a graph in which the node of person A and the node representing age 20 (Age:20) are connected by an edge labeled "has", and It is composed of a graph in which the node of person B and the node representing age 19 (Age:19) are connected by an edge labeled "has". The graph of person attribute information at time t+Δt is also composed of a graph with the same structure. In the present disclosure, an edge representing a motion relationship between people may be referred to as a motion edge.

Each node at time t and the same node at time t+Δt are connected by a temporal relationship edge labeled with time Δt. In this example, it is assumed that there is no change in the graph of person attribute information. In this disclosure, each node at time t may be referred to as a previous node, and each node at time t+Δt may be referred to as a subsequent node.

The graph of relational information at time t connects person A's node and floor node with an edge labeled "stand," and connects person A's node and person B's node with an edge labeled "wave." Contains a graph connected by an edge from the node of person A to the node of person B. An edge labeled "wave" from a node of person A to a node of person B represents that person A waved his hand toward person B. The graph of relational information at time t also includes a label of "stand" that connects the node of person B and the node of the floor.

The graph of the relational information at time t+Δt includes a graph in which the node of person A and the node of person B are connected by an edge from the node of person B to the node of person A that has the label "wave". An edge from a node of person B to a node of person A with the label "wave" represents that person B waved his hand toward person A. The other graphs at time t+Δt are the same as the relationship information graph at time t.

Each node at time t and the same node at time t+Δt are connected by a temporal relationship edge labeled with time Δt. In this example, it is determined that there has been a change in the graph of the related information.

In this way, when a change is detected between the graph representing the recognition result at time t and the graph representing the recognition result at time t+Δt, the graphs are integrated and a spatiotemporal scene graph is generated.

FIG. 9 is a diagram showing an example of a spatio-temporal scene graph.

The upper graph constituting the spatiotemporal scene graph shown in FIG. 9 is a graph that integrates the spatial information, person attribute information, and relational information graphs at time t. Furthermore, the lower graph is a graph that integrates the spatial information, person attribute information, and relational information graphs at time t+Δt.

Each node at time t and the same node at time t+Δt are connected by temporal relationship edges E31 to E35 labeled with time Δt.

For example, a node representing person A at time t and a node representing person A at time t+Δt are connected by a temporal relationship edge E31, and a node representing person B at time t and a node representing person B at time t+Δt are connected by a temporal relationship edge E32. Connected with Similarly, a node representing the floor at time t and a node representing the floor at time t+Δt are connected by a temporal relationship edge E33.

The fact that person A waves to person B at time t, and person B waves to person A at time t+Δt after time Δt has elapsed can be expressed by a spatio-temporal scene graph as shown in FIG. be done. In this way, the spatio-temporal scene graph connects nodes representing each person who was the object of measurement at the same time with at least an edge representing the relationship between the persons, and also connects nodes representing each person who was the object of measurement at different times. It becomes a scene graph in which the nodes it represents are connected by time-related edges, which are edges that represent the passage of time.

In the example of FIG. 9, one spatio-temporal scene graph is generated by integrating graphs for two times, time t and time t+Δt, but one time-spatial scene graph is generated by integrating graphs for three or more times. A spatial scene graph may be generated.

・Step2-3. Extraction of interpersonal relationships (S, V, O) at the current time Reactions are interactions under specific conditions. In order to determine whether a certain interaction is a reaction, the interaction to be determined is extracted. In this disclosure, an interaction and a reaction may be referred to as an interaction action and a reaction action, respectively.

The relationship between people is expressed as a relationship between people (S, V, O) that is a combination of "Subject", "Verb", and "Object". When the spatio-temporal scene graph of FIG. 9 has been generated, the interpersonal relationships (person B, wave, person A) are extracted as the interpersonal relationships (S, V, O) at time t+Δt, which is the current time. The interpersonal relationship (person B, wave, person A) indicates that person B, who is the subject, performed an action such as waving to person A, who is the object.

Here, the specific condition is that a relationship between people (O, V, S) representing a relationship opposite to the relationship between people (S, V, O) has occurred in a short time in the past, such as a few seconds. It can be expressed as. The interpersonal relationship (O, V, S) represents a relationship in which a person who is an object O at the current time has acted as a subject in the past, and a person who has become a subject S at the current time has performed an action V as an object. Note that the motion V in the interpersonal relationship (S, V, O) and the motion V in the interpersonal relationship (O, V, S) may be different motions.

・Step2-4. Search for interpersonal relationships (O, V, S) from spatio-temporal scene graphs Targeting human relationships in the past for a short time, human relationships (O, V, , S) is searched.

The search for the relationship between people (O, V, S) is performed by tracing back the node of the subject S at the current time in the spatio-temporal scene graph, and searching for the object O at the current time, with the person of the subject S at the current time as the object. This process searches for interactions by people. This processing can be realized by tracing the edges of the spatio-temporal scene graph related to the person of the subject S at the current time, so the frequency of access to data can be suppressed and efficient searches can be performed.

If the interpersonal relationship (O, V, S) for the interpersonal relationship (S, V, O) at the current time is found in a short time in the past, the interpersonal relationship (S, V, O) at the current time is It is determined that

In the case of the above example, the interpersonal relationship (O, V, S) for the interpersonal relationship (person B, wave, person A), which is the interpersonal relationship (S, V, O) at the current time t + Δt, is A search for a relationship between persons (person A, wave, person B) representing waving at person B is performed by tracing edges starting from the node of person B at current time t+Δt.

In addition, since the graph representing the relationship between persons (person A, wave, person B) is included during a short time Δt, the relationship between persons (person B, wave, person A), that is, at the current time. Person B's waving at Person A is determined to be a reaction. In this way, a component representing that person B performed a certain action on person A is included in the graph at time t+Δt, and a component representing that person A performed a certain action on person B is included in the graph at time t+Δt. When included in the graph at time t, the action performed by person B on person A at time t+Δt is detected as a reaction.

If it is determined that the interpersonal relationship (O, V, S) did not exist in the short time in the past, the process returns to Step 2-1 and the detection of the semantic change is repeated.

On the other hand, if it is determined that the interpersonal relationship (O, V, S) was found in a short time in the past, the reaction partial graph & motion recording process (Step 3) of FIG. 3 is performed.

<Step 3: Reaction partial graph & motion recording processing>
The reaction subgraph & motion recording process is based on the occurrence time of the interpersonal relationship (S, V, O) determined as a reaction, and the interpersonal relationship (O, V, S) searched as the interaction that is the cause of the reaction. This is a process of recording information on the person of the subject S in the DB up to the time of occurrence. As information about the person of the subject S, a subgraph related to the person of the subject S and motion information of the person of the subject S at the time of occurrence of the interpersonal relationship (S, V, O), which is a reaction, are recorded.

The reaction partial graph & motion recording process will be explained with reference to the flowchart in FIG.

・Step3-1. Extraction of reaction subgraph From the occurrence time of the interpersonal relationship (S, V, O) determined as a reaction to the occurrence time of the interpersonal relationship (O, V, S) searched as the interaction that is the cause of the reaction. , a subgraph related to the person of subject S is extracted as a reaction subgraph. For example, a subgraph starting from the node of the person of the subject S at each time and extending to nodes that are less than or equal to the inter-node distance D is extracted from the spatio-temporal scene graph.

The inter-node distance D is a threshold value indicating the degree of relationship between the subject S and the person. The larger the value of the inter-node distance D, the more information about objects and people that have a low relationship with the person of the subject S will be recorded. Therefore, a value of about 1 or 2 is set as the value of the inter-node distance D. Ru. When the value of the inter-node distance D is, for example, 1, the range from the person node of the subject S to the node connected via one edge is extracted as a reaction subgraph.

FIG. 11 is a diagram showing an example of extraction of a reaction subgraph.

It is determined that the interpersonal relationship (person B, wave, person A) at time t+Δt surrounded by broken line #1 is a reaction, and the interpersonal relationship (person A, wave, person A) at time t shown surrounded by broken line #2 is a reaction. If B) is determined to be an interaction, for example, the entire spatiotemporal scene graph shown in FIG. 11 is extracted as a reaction subgraph. The reaction subgraph is graph information that includes at least a subgraph representing a relationship between people determined to be an interaction, and a subgraph representing a relationship between people determined to be a reaction.

・Step3-2. Recording of reaction subgraph and motion information Motion information of the person of subject S at the time of occurrence of the interpersonal relationship (S, V, O) determined as a reaction is recorded in the DB in association with the reaction subgraph. Motion information and reaction subgraphs are associated using, for example, a common ID.

For example, the motion information of the person of the subject S is recorded in the reaction motion DB 51 (FIG. 3), and the reaction partial graph is recorded in the individualized interaction-reaction DB 53, which is a DB for the person of the subject S. The personalized interaction-reaction DB 53 is a DB that records various reaction subgraphs used to select the actions of the virtual character when the person S himself or herself becomes a user and uses AR content.

As the motion information, for example, time-series data of skeletal estimation results (motion capture data) obtained based on measurement data by the measurement device 11 is recorded.

・Step3-3. Meta-ization of a reaction subgraph A subgraph is generated in which each node or a set of nodes in a reaction subgraph is replaced with a node having higher-order semantic information.

FIG. 12 is a diagram showing an example of meta-ization of a reaction subgraph.

The left side of FIG. 12 shows the reaction subgraph before metaization, and the right side shows the reaction subgraph after metaization.

For example, a node representing person A as a specific person is replaced with a node representing "Person" which does not limit the person. Further, the node representing that the person's age as a person attribute is 19 years old is replaced with a node representing “teens” representing more abstract information.

・Step 3-4. Recording of Meta-ized Reaction Subgraph and Motion Information The meta-ized reaction subgraph generated by meta-izing the reaction subgraph is recorded in the DB in association with motion information using, for example, a common ID. The meta-ized reaction subgraph is recorded in the meta-ized interaction-reaction DB 52.

The meta-interaction-reaction DB 52 is, for example, a DB that records various reaction subgraphs used to select the actions of a virtual character when a person different from the person S becomes a user and uses AR content. .

As a result, even if a person other than the specific person is the user of the AR content, the same processing as when the specific person is the user of the AR content can be applied, as long as the high-level meaning is the same. becomes. The reaction subgraph recorded in the meta-ized interaction-reaction DB 52 may be used when the person S himself becomes a user and uses the AR content.

As described above, the reaction subgraph & motion recording process in Step 3 is a process of adding scene graph information that abstractly and semantically expresses the context of each time in real space. Regardless of the length of measurement time or resolution, scene graph information is recorded in Step 3 only when there is a change in the meaning of the measurement data, making it easier to access the data than when all information is recorded. It becomes possible to increase efficiency.

<<Reaction presentation phase>>
FIG. 13 is a diagram illustrating an example of processing in the reaction presentation phase.

The processing of the reaction presentation phase includes recognition processing (Step 11), matching processing of the current scene graph with a subgraph in the DB (Step 12), and reaction presentation processing (Step 13).

<Step 11: Recognition processing>
The recognition process in the reaction presentation phase is basically the same process as the recognition process (Step 1 in FIG. 3) in the reaction learning phase. The recognition process in the reaction presentation phase differs from the recognition process in the reaction learning phase in that the user (person A) who is experiencing the AR application and the virtual character C are the measurement targets.

That is, the recognition process in the reaction presentation phase is a process of recognizing the interaction between the user and the virtual character C, user attribute information, and spatial information based on the measurement data supplied from the measurement device 21. Duplicate explanations will be omitted as appropriate.

- Step 11-1: Spatial recognition processing In the spatial recognition processing, spatial information is generated based on the measurement data supplied from the measurement device 21. As spatial information, geometric information of each object, semantic information of each object, and information indicating relationships between objects are generated.

- Step 11-2: Person Attribute Recognition Process In the person attribute recognition process, the user's person attribute information is generated based on the measurement data supplied from the measurement device 21. As personal attribute information, information such as age, gender, race, etc. obtained from the user's appearance, and information that allows personal identification are generated.

・Step 11-3: Interaction recognition processing In the interaction recognition processing, relationship information between the object and the user, between the object and the virtual character C, and between the user and the virtual character C is determined based on the measurement data supplied from the measurement device 21. Related information is generated.

The information on the attributes, position, and movements (interactions and reactions) of the virtual character C is information that can be acquired by the information processing device 22 itself that is displaying the virtual character C by playing back the AR content. Information on the recognition results regarding the virtual character C is generated based on the virtual character C's attributes, position, motion, etc. as appropriate.

<Step 12: Matching the current scene graph with the reaction subgraph in the DB>
A current scene graph, which is a spatiotemporal scene graph representing the context around the user who is presented with the virtual character, is generated based on the spatial information, person attribute information, and relationship information acquired through the recognition process.

The current scene graph consists of a graph in which nodes representing the user and virtual characters are connected by edges representing their relationships at a predetermined time, and a graph in which nodes representing the users and virtual characters are connected by edges representing their relationships at the current time after a predetermined time has elapsed. is a spatiotemporal scene graph containing User nodes and virtual character nodes at each time are connected by time-related edges. In this disclosure, a predetermined time regarding the current scene graph may be referred to as past time. The current scene graph is considered to include a past graph that is a graph corresponding to the past time, a current graph that is a graph that corresponds to the current time, and a temporal relationship edge that connects the past node of the past graph and the current node of the current graph. Also good.

The current scene graph generated based on the information obtained through recognition processing is matched with each reaction subgraph in the DB. More specifically, matching the current scene graph and the reaction subgraph includes determining whether the person in the reaction subgraph corresponding to the user is the user himself or not. Note that in this disclosure, a known graph in a DB may be referred to as a known scene graph.

If a reaction learning phase is performed that targets users who have experienced the AR application, and the reaction subgraph for the user is recorded in the personalized interaction-reaction DB 53, it is recorded in the personalized interaction-reaction DB 53. The current scene graph is compared with the reaction subgraph for the user.

In addition, if the reaction subgraph for the user is not recorded in the personalized interaction-reaction DB 53, the meta-ized reaction subgraph recorded in the meta-ized interaction-reaction DB 52 will be linked to the current scene graph. A match is made.

The current scene graph and each reaction subgraph are matched by calculating the distance between them and evaluating the extent to which their constituent elements are common. For example, a reaction subgraph that includes common components and has the closest distance to the current scene graph is used to select an action that becomes a reaction.

Rather than the reaction subgraph whose distance from the current scene graph is the closest, a reaction subgraph closer than the distance equal to the threshold value Eth may be used to select an action that becomes a reaction.

The person whose object O is the subject is recorded as a reaction to the interaction at the past time expressed by the interpersonal relationship (S, V, O) based on the reaction subgraph that includes components common to the current scene graph. The relationship (O, V, S) is extracted.

FIG. 14 is a diagram showing an example of user interaction.

As shown in FIG. 14, it is assumed that it is recognized that person A, who is a user, waved at virtual character C at time t. At the current time t+Δt, what kind of action is to be performed as a reaction of the virtual character C is selected based on the comparison result between the current scene graph and the reaction partial graph.

FIG. 15 is a diagram showing an example of matching the current scene graph and the reaction subgraph.

As shown surrounded by a broken line #11 on the left side of FIG. 15, the current scene graph includes a subgraph representing the relationship between people (person A, wave, virtual character C) at time t.

For example, the distance from each reaction subgraph recorded in the meta-ized interaction-reaction DB 52 is calculated, and the graph generated when a certain person waves at another person as shown on the right side of FIG. The reaction subgraph containing the subgraph is selected. As shown surrounded by broken line #12, the reaction subgraph shown on the right side of FIG. 15 includes a subgraph representing interpersonal relationships (person, wave, person). FIG. 15 shows an example in which an action that becomes a reaction is selected using a reaction subgraph recorded in the meta-ized interaction-reaction DB 52.

For convenience of explanation, the person (the former person) who is the subject S in the interpersonal relationship (person, wave, person) as an interaction, shown surrounded by the broken line #12, is referred to as person a. Further, the person who becomes object O (the latter person) is called person b. The subgraph surrounded by broken line #12 is a subgraph indicating the relationship between persons (person a, wave, person b). Person a corresponds to user A, and person b corresponds to the virtual character.

As a reaction at time t+Δt to the interpersonal relationship (person a, wave, person b), the interpersonal relationship (person b, wave, person a) is extracted from the reaction subgraph.

As shown by arrow #13, the relationship between people (person b, wave, person a) extracted from the reaction subgraph is the relationship between virtual character C and person A (user) at current time t+Δt, which constitutes the current scene graph. It is applied as a relationship between people. An edge E51 connecting the node of the virtual character C and the node of the person A at the current time t+Δt represents the relationship between the people (virtual character C, wave, person A).

The action of virtual character C waving to person A, which is represented by the relationship between the people (virtual character C, wave, person A), is selected as the reaction action of virtual character C.

Note that similar processing is performed when an action that becomes a reaction is selected using a reaction subgraph recorded in the personalized interaction-reaction DB 53.

In this case, the graph at time t+Δt includes a component representing that the person corresponding to the virtual character C performed a certain action toward the person A, and the action is an interaction between the person A and the person corresponding to the virtual character. A reaction partial graph including a component representing that the process has been performed in the graph at time t is acquired from the personalized interaction-reaction DB 53 as an evaluation result with the current scene graph. The person corresponding to the virtual character C is a person who, together with the person A, is a measurement target in the real space in the process of the reaction learning phase.

Furthermore, based on the reaction subgraph obtained as the evaluation result, for example, the same action as the action performed by the person corresponding to the virtual character C toward the person A is selected as the reaction action of the virtual character C.

In this way, since the optimal reaction subgraph is selected based on the distance between the graphs, it is possible to present an appropriate reaction even if the components of the current scene graph and the reaction subgraph do not completely match.

<Step 13: Presenting reactions>
The reaction presentation process is a process of reading reaction motion information corresponding to the matched reaction subgraph from the reaction motion DB 51 and presenting it as a motion of the virtual character C.

By causing the virtual character C to perform the same action as the action indicated by the motion information read from the reaction motion DB 51, the user is presented with a reaction to the interaction the user has performed with the virtual character C, as shown in FIG. be done. In the example of FIG. 16, a virtual character C is shown waving toward the user.

Motion generation is described in Document 3, for example.
Reference 4 “[Starke+, SIGGARAPhasia2019] S. Starke, H. Zhang, T. Komura and J. Saito, “Neural State Machine for Character-Scene Interactions”, SIGGRAPH Asia, 2019.”

This makes it possible to cause the virtual character C to perform a natural action as a reaction based on the information forming the reaction subgraph.

<<Configuration of each device>>
<Configuration of information processing device 12>
FIG. 17 is a block diagram showing an example of the functional configuration of the information processing device 12. As shown in FIG. At least some of the functional units shown in FIG. 17 are realized by executing a predetermined program by the CPU of the computer that constitutes the information processing device 12.

As shown in FIG. 17, a reaction learning processing section 101 is implemented in the information processing device 12. The reaction learning processing section 101 includes a recognition section 111, a reaction detection section 112, and a recording control section 113.

The recognition unit 111 includes a space recognition unit 121, a person attribute recognition unit 122, and an interaction recognition unit 123.

The spatial recognition unit 121 performs spatial recognition processing (Step 1-1 in FIG. 3) based on the measurement data supplied from the measurement device 11, and generates spatial information.

The person attribute recognition unit 122 performs person attribute recognition processing (Step 1-2 in FIG. 3) based on the measurement data supplied from the measurement device 11, and generates person attribute information.

The interaction recognition unit 123 performs interaction recognition processing (Step 1-3 in FIG. 3) based on the measurement data supplied from the measurement device 11, and generates related information.

The spatial information generated by the spatial recognition unit 121, the person attribute information generated by the person attribute recognition unit 122, and the relationship information generated by the interaction recognition unit 123 are supplied to the reaction detection unit 112.

The reaction detection unit 112 performs reaction detection processing (Step 2 in FIG. 3, FIG. 7) based on the information supplied from each part of the recognition unit 111.

The reaction detection unit 112 includes a spatiotemporal scene graph generation unit 112A. The spatio-temporal scene graph generation unit 112A generates a spatio-temporal scene graph based on the information supplied from the recognition unit 111.

The reaction detection unit 112 outputs information on a spatio-temporal scene graph including a graph representing the interpersonal relationship determined to be a reaction to the recording control unit 113.

The recording control unit 113 performs reaction partial graph & motion recording processing (Step 3 in FIG. 3, FIG. 10) based on the information supplied from the reaction detection unit 112, and converts the motion information of the motion determined to be a reaction into a reaction. Record it in motion DB51.

Furthermore, the recording control unit 113 causes the reaction subgraph extracted from the spatiotemporal scene graph supplied from the reaction detection unit 112 to be recorded in the individualized interaction-reaction DB 53.

The recording control unit 113 meta-izes the reaction subgraph extracted from the spatio-temporal scene graph supplied from the reaction detection unit 112, and records it in the meta-ized interaction-reaction DB 52.

The reaction motion DB 51, meta-ized interaction-reaction DB 52, and personalized interaction-reaction DB 53 are constructed in a storage unit such as an HDD of a computer that constitutes the information processing device 12. Information in the reaction motion DB 51, meta-ized interaction-reaction DB 52, and personalized interaction-reaction DB 53 is provided to the information processing device 22.

<Configuration of information processing device 22>
FIG. 18 is a block diagram showing an example of the functional configuration of the information processing device 22. As shown in FIG. At least some of the functional units shown in FIG. 18 are realized by executing a predetermined program by the CPU of the computer that constitutes the information processing device 22.

As shown in FIG. 18, a reaction presentation processing section 151 is implemented in the information processing device 22. The reaction presentation processing section 151 includes a recognition section 161, a matching section 162, and a presentation section 163.

The recognition unit 161 includes a space recognition unit 171, a person attribute recognition unit 172, and an interaction recognition unit 173.

The spatial recognition unit 171 performs spatial recognition processing (Step 11-1 in FIG. 13) based on the measurement data supplied from the measurement device 21, and generates spatial information.

The person attribute recognition unit 172 performs person attribute recognition processing (Step 11-2 in FIG. 13) based on the measurement data supplied from the measurement device 21, and generates person attribute information.

The interaction recognition unit 173 performs interaction recognition processing (Step 11-3 in FIG. 13) based on the measurement data supplied from the measurement device 21, and generates related information.

The spatial information generated by the spatial recognition unit 171, the person attribute information generated by the person attribute recognition unit 172, and the relationship information generated by the interaction recognition unit 173 are supplied to the matching unit 162.

The matching unit 162 performs matching processing (Step 12 in FIG. 13, FIG. 7) between the current scene graph and the reaction partial graph in the DB based on the information supplied from each part of the recognition unit 161.

The collation unit 162 includes a spatio-temporal scene graph generation unit 162A. Based on the information supplied from the recognition unit 161, the spatio-temporal scene graph generation unit 162A generates a current scene graph, which is a spatio-temporal scene graph including a user node and a virtual character node as constituent elements.

The matching unit 162 matches the current scene graph with each reaction partial graph recorded in the meta-ized interaction-reaction DB 52 or the individualized interaction-reaction DB 53.

The matching unit 162 selects an action that is a reaction of the virtual character based on a reaction subgraph that matches the current scene graph. The matching unit 162 functions as a selection unit that selects an action that is a reaction of the virtual character based on a current scene graph that is a spatiotemporal scene graph and a reaction subgraph. Information on the action selected by the matching unit 162 as the reaction is supplied to the presenting unit 163 .

The presentation unit 163 generates data for displaying the virtual character C by playing the AR content and performing rendering. The presentation unit 163 transmits display data to the AR display device 1 and causes the AR display device 1 to display the virtual character C. Further, the presenting unit 163 reads out the motion information of the reaction selected by the matching unit 162 from the reaction motion DB 51 and causes it to be presented as the motion of the virtual character C.

<<Modified example>>
<Example of configuration of information processing system>
FIG. 19 is a block diagram showing a configuration example of an information processing system.

In the above, it has been assumed that the reaction learning processing section 101 and the reaction presentation processing section 151 are realized in different devices, but as shown in A of FIG. 19, they may be realized in the information processing device 201 which is one device. You can also do this.

In the information processing device 201, the reaction learning phase is processed in the reaction learning processing section 101, and the reaction presentation phase is processed in the reaction presentation processing section 151. The reaction presentation processing unit 151 of the information processing device 201 performs processing of the reaction presentation phase and displays the virtual character C on the AR display device 1.

Furthermore, as shown in FIG. 19B, the reaction learning processing section 101 and the reaction presentation processing section 151 may be implemented in the AR display device 1.

The reaction presentation processing unit 151 of the AR display device 1 performs processing of the reaction presentation phase and displays the virtual character C on the display unit 211. The display unit 211 includes a display that displays the virtual character C, and the like.

Instead of the optical see-through type HMD, a video transmission type HMD may be used as a display device for the virtual character. Furthermore, a mobile terminal such as a smartphone or a tablet terminal may be used as a display device for the virtual character.

<Others>
Although the case where the virtual character is caused to perform a waving motion as a reaction has been mainly described, it is also possible to cause the virtual character to perform various motions other than waving. The actions performed by the virtual character include not only actions performed by the virtual character alone, such as walking and running, but also actions using objects in real space, such as sitting on a chair and sitting on the floor. It also includes actions performed toward the user, such as talking to the user.

The series of processes described above can be executed by hardware or software. When a series of processes is executed by software, a program constituting the software is installed from a program recording medium into a computer built into dedicated hardware or a general-purpose personal computer.

FIG. 20 is a block diagram showing an example of the hardware configuration of a computer that executes the series of processes described above using a program. The computers functioning as the information processing device 12 and the information processing device 22 have a configuration similar to that shown in FIG. 20.

A CPU (Central Processing Unit) 1001, a ROM (Read Only Memory) 1002, and a RAM (Random Access Memory) 1003 are interconnected by a bus 1004.

An input/output interface 1005 is further connected to the bus 1004. Connected to the input/output interface 1005 are an input section 1006 consisting of a keyboard, a mouse, etc., and an output section 1007 consisting of a display, speakers, etc. Further, connected to the input/output interface 1005 are a storage unit 1008 consisting of a hard disk or non-volatile memory, a communication unit 1009 consisting of a network interface, etc., and a drive 1010 for driving a removable medium 1011.

A reaction motion DB 51, a meta-ized interaction-reaction DB 52, and a personalized interaction-reaction DB 53 are constructed in the storage unit 1008.

In the computer configured as described above, the CPU 1001, for example, loads the program stored in the storage unit 1008 into the RAM 1003 via the input/output interface 1005 and the bus 1004 and executes it. , a reaction presentation processing unit 151 is realized.

A program executed by the CPU 1001 is installed in the storage unit 1008 by being recorded on a removable medium 1011 or provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital broadcasting.

The program executed by the computer may be a program in which processing is performed chronologically in accordance with the order described in this specification, or may be a program in which processing is performed in parallel or at necessary timing such as when a call is made. It may also be a program that is carried out.

In this specification, a system means a collection of multiple components (devices, modules (components), etc.), regardless of whether all the components are in the same casing. Therefore, multiple devices housed in separate casings and connected via a network, and a single device with multiple modules housed in one casing are both systems. .

The effects described in this specification are merely examples and are not limiting, and other effects may also exist.

The embodiments of the present technology are not limited to the embodiments described above, and various changes can be made without departing from the gist of the present technology.

For example, the present technology can take a cloud computing configuration in which one function is shared and jointly processed by multiple devices via a network.

Furthermore, each step described in the above flowchart can be executed by one device or can be shared and executed by multiple devices.

Further, when one step includes multiple processes, the multiple processes included in that one step can be executed by one device or can be shared and executed by multiple devices.

<Example of configuration combinations>
The present technology can also have the following configuration.

(1)
Based on the measurement results by the sensor including the first person and the second person as measurement targets,
A plurality of previous nodes each representing the first person and the second person at a first time are first motion edges representing a first motion relationship between the first person and the second person. connected,
A plurality of subsequent nodes each representing the first person and the second person at a second time after the first time define a second operational relationship between the first person and the second person. connected by a second motion edge representing
The previous node of the first person and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time,
The information processing apparatus includes a generation unit that generates a scene graph in which the previous node of the second person and the subsequent node of the second person are connected by a second temporal relation edge representing the passage of time.
(2)
further comprising a recognition unit that recognizes the first operational relationship and the second operational relationship based on the measurement results,
The generation unit is configured to generate, when a second graph including the plurality of subsequent nodes and the second action edge changes with respect to a first graph including the plurality of previous nodes and the first action edge. The information processing device according to (1) above, which generates the scene graph.
(3)
The recognition unit further recognizes a real space context in which at least one of the first person and the second person is present,
The generation unit generates, based on the context of the real space, an object node representing an object in the real space, and an object representing a relationship between the object and at least one of the first person and the second person. The information processing device according to (2) above, which generates the scene graph including edges.
(4)
The recognition unit further recognizes an attribute of at least one of the first person and the second person,
(2) The generation unit generates the scene graph including an attribute node representing the attribute and an attribute edge representing a relationship between the attribute and at least one of the first person and the second person. Or the information processing device according to (3).
(5)
The generation unit is configured such that in the first graph, the first motion edge represents that the first person performed an interaction motion with respect to the second person, and in the second graph, the first motion edge represents the second motion edge in the second graph. If the motion edge indicates that the second person has performed a motion toward the first person, the motion of the second person toward the first person is detected as a reaction motion. The information processing device according to any one of (4).
(6)
The method further includes a recording control unit that records graph information including a portion of the second graph including a component representing the reaction behavior and a portion of the first graph including a component representing the interaction behavior. 5) The information processing device according to item 5).
(7)
The information processing device according to (6), wherein the recording control unit records the graph information in association with motion information of the reaction action.
(8)
The recording control unit records the graph information as information for selecting an action of the virtual character when at least one of the first person and the second person receives a presentation of the virtual character as a user. The information processing device according to (6) or (7).
(9)
The graph information is information that abstracts and represents the content represented by the component node,
(6) The recording control unit records the graph information as information for selecting an action of the virtual character when a person other than the measurement target receives a presentation of the virtual character as a user. Or the information processing device according to (7).
(10)
The information processing device
Based on the measurement results by the sensor including the first person and the second person as measurement targets,
A plurality of previous nodes each representing the first person and the second person at a first time are first motion edges representing a first motion relationship between the first person and the second person. connected,
A plurality of subsequent nodes each representing the first person and the second person at a second time after the first time define a second operational relationship between the first person and the second person. connected by a second motion edge representing
The previous node of the first person and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time,
An information processing method, comprising: generating a scene graph in which the previous node of the second person and the subsequent node of the second person are connected by a second temporal relationship edge representing the passage of time.
(11)
to the computer,
Based on the measurement results by the sensor including the first person and the second person as measurement targets,
A plurality of previous nodes each representing the first person and the second person at a first time are first motion edges representing a first motion relationship between the first person and the second person. connected,
A plurality of subsequent nodes each representing the first person and the second person at a second time after the first time define a second operational relationship between the first person and the second person. connected by a second motion edge representing
The previous node of the first person and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time,
A record of a program that executes a process that generates a scene graph in which the previous node of the second person and the subsequent node of the second person are connected by a second temporal relation edge representing the passage of time. Medium.
(12)
a recognition unit that recognizes the user's interaction motion with respect to the virtual character based on the measurement result of the user's motion by the sensor;
a selection unit that selects a reaction action to the interaction action of the user based on a known scene graph generated based on measurement results by a sensor including a first person and a second person as measurement targets;
a presentation unit that presents the virtual character performing the reaction action to the user;
The known scene graph is
a plurality of destination nodes each representing the first person and the second person at a first time;
a first motion edge connecting the plurality of destination nodes and representing a first motion relationship between the first person and the second person;
a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time;
a second motion edge connecting the plurality of subsequent nodes and representing a second motion relationship between the first person and the second person;
a first temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time;
a second temporal relationship edge connecting the previous node of the second person and the subsequent node of the second person and representing the passage of time;
Information processing device.
(13)
A past graph in which a plurality of past nodes each representing the user and the virtual character at the past time are connected by a past edge representing the operational relationship between the user and the virtual character, and a past graph representing the user and the virtual character at the current time, respectively. A current graph in which a plurality of current nodes are connected by current edges representing a motion relationship between the user and the virtual character, and between the past node of the user and the current node of the user, and between the past node of the virtual character. further comprising a generation unit that generates a current scene graph including a plurality of time-related edges that connect the current nodes of the virtual character and represent the passage of time;
The information processing device according to (12), wherein the selection unit selects the reaction action based on the known scene graph that includes components common to the current scene graph.
(14)
The selection unit selects the known scene graph from the known scene graph indicating that an interaction movement from the first person corresponding to the user to the second person corresponding to the virtual character corresponds to the interaction movement of the user. Based on the graph at the first time, an action performed by the second person toward the first person in the graph at the second time of the known scene graph is selected as the reaction action. ).
(15)
the first person in the known scene graph is the user;
The information processing device according to (14) above.
(16)
the first person in the known scene graph is a different person from the user;
The information processing device according to (14) above.
(17)
The information processing device
Recognizing the user's interaction motion with respect to the virtual character based on the measurement result of the user's motion by the sensor,
Selecting a reaction action to the interaction action of the user based on a known scene graph generated based on measurement results by a sensor including a first person and a second person as measurement targets;
presenting the virtual character performing the reaction action to the user;
The known scene graph is
a plurality of destination nodes each representing the first person and the second person at a first time;
a first motion edge connecting the plurality of destination nodes and representing a first motion relationship between the first person and the second person;
a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time;
a second motion edge connecting the plurality of subsequent nodes and representing a second motion relationship between the first person and the second person;
a first temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time;
a second temporal relationship edge connecting the previous node of the second person and the subsequent node of the second person and representing the passage of time;
Information processing method.
(18)
to the computer,
Recognizing the user's interaction motion with respect to the virtual character based on the measurement result of the user's motion by the sensor,
Selecting a reaction action to the interaction action of the user based on a known scene graph generated based on measurement results by a sensor including a first person and a second person as measurement targets;
performing a process of presenting the virtual character that performs the reaction action to the user;
The known scene graph is
a plurality of destination nodes each representing the first person and the second person at a first time;
a first motion edge connecting the plurality of destination nodes and representing a first motion relationship between the first person and the second person;
a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time;
a second motion edge connecting the plurality of subsequent nodes and representing a second motion relationship between the first person and the second person;
a first temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time;
a second temporal relationship edge connecting the previous node of the second person and the subsequent node of the second person and representing the passage of time;
A recording medium that records a program that executes processing.

1 AR display device, 11 measurement device, 12 information processing device, 21 measurement device, 22 information processing device, 23 input device, 51 reaction motion DB, 52 meta-ized interaction-reaction DB, 53 personalized interaction-reaction DB, 101 Reaction learning processing unit, 111 recognition unit, 112 reaction detection unit, 113 recording control unit, 151 reaction presentation processing unit, 161 recognition unit, 162 matching unit, 163 presentation unit

Claims

Based on the measurement results by the sensor including the first person and the second person as measurement targets,
A plurality of previous nodes each representing the first person and the second person at a first time are first motion edges representing a first motion relationship between the first person and the second person. connected,
A plurality of subsequent nodes each representing the first person and the second person at a second time after the first time define a second operational relationship between the first person and the second person. connected by a second motion edge representing
The previous node of the first person and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time,
The information processing apparatus includes a generation unit that generates a scene graph in which the previous node of the second person and the subsequent node of the second person are connected by a second temporal relation edge representing the passage of time.
further comprising a recognition unit that recognizes the first operational relationship and the second operational relationship based on the measurement results,
The generation unit is configured to generate, when a second graph including the plurality of subsequent nodes and the second action edge changes with respect to a first graph including the plurality of previous nodes and the first action edge. The information processing device according to claim 1 , wherein the scene graph is generated.
The recognition unit further recognizes a real space context in which at least one of the first person and the second person is present,
The generation unit generates, based on the context of the real space, an object node representing an object in the real space, and an object representing a relationship between the object and at least one of the first person and the second person. The information processing device according to claim 2, wherein the scene graph including edges is generated.
The recognition unit further recognizes an attribute of at least one of the first person and the second person,
The generation unit generates the scene graph including an attribute node representing the attribute and an attribute edge representing a relationship between the attribute and at least one of the first person and the second person. The information processing device described.
The generation unit is configured such that in the first graph, the first motion edge represents that the first person performed an interaction motion with respect to the second person, and in the second graph, the first motion edge represents the second motion edge in the second graph. 3. If a motion edge of indicates that the second person has performed a motion toward the first person, the motion of the second person toward the first person is detected as a reaction motion. information processing equipment.
The method further comprises a recording control unit that records graph information including a portion of the second graph including a component representing the reaction behavior and a portion of the first graph including a component representing the interaction behavior. 5. The information processing device according to 5.
The information processing device according to claim 6, wherein the recording control unit records the graph information in association with motion information of the reaction action.
The recording control unit records the graph information as information for selecting an action of the virtual character when at least one of the first person and the second person receives a presentation of the virtual character as a user. The information processing device according to item 6.
The graph information is information that abstracts and represents the content represented by the component node,
The recording control unit records the graph information as information for selecting an action of the virtual character when a person other than the measurement target receives a presentation of the virtual character as a user. The information processing device described.
The information processing device
Based on the measurement results by the sensor including the first person and the second person as measurement targets,
A plurality of previous nodes each representing the first person and the second person at a first time are first motion edges representing a first motion relationship between the first person and the second person. connected,
A plurality of subsequent nodes each representing the first person and the second person at a second time after the first time define a second operational relationship between the first person and the second person. connected by a second motion edge representing
The previous node of the first person and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time,
An information processing method, comprising: generating a scene graph in which the previous node of the second person and the subsequent node of the second person are connected by a second temporal relationship edge representing the passage of time.
to the computer,
Based on the measurement results by the sensor including the first person and the second person as measurement targets,
A plurality of previous nodes each representing the first person and the second person at a first time are first motion edges representing a first motion relationship between the first person and the second person. connected,
A plurality of subsequent nodes each representing the first person and the second person at a second time after the first time define a second operational relationship between the first person and the second person. connected by a second motion edge representing
The previous node of the first person and the subsequent node of the first person are connected by a first temporal relationship edge representing the passage of time,
A record of a program that executes a process that generates a scene graph in which the previous node of the second person and the subsequent node of the second person are connected by a second temporal relation edge representing the passage of time. Medium.
a recognition unit that recognizes the user's interaction motion with respect to the virtual character based on the measurement result of the user's motion by the sensor;
a selection unit that selects a reaction action to the interaction action of the user based on a known scene graph generated based on measurement results by a sensor including a first person and a second person as measurement targets;
a presentation unit that presents the virtual character performing the reaction action to the user;
The known scene graph is
a plurality of destination nodes each representing the first person and the second person at a first time;
a first motion edge connecting the plurality of destination nodes and representing a first motion relationship between the first person and the second person;
a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time;
a second motion edge connecting the plurality of subsequent nodes and representing a second motion relationship between the first person and the second person;
a first temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time;
a second temporal relationship edge connecting the previous node of the second person and the subsequent node of the second person and representing the passage of time;
Information processing device.
A past graph in which a plurality of past nodes each representing the user and the virtual character at the past time are connected by a past edge representing the operational relationship between the user and the virtual character, and a past graph representing the user and the virtual character at the current time, respectively. A current graph in which a plurality of current nodes are connected by current edges representing a motion relationship between the user and the virtual character, and between the past node of the user and the current node of the user, and between the past node of the virtual character. further comprising a generation unit that generates a current scene graph including a plurality of time-related edges that connect the current nodes of the virtual character and represent the passage of time;
The information processing device according to claim 12, wherein the selection unit selects the reaction action based on the known scene graph that includes components common to the current scene graph.
The selection unit selects the known scene graph from the known scene graph indicating that an interaction movement from the first person corresponding to the user to the second person corresponding to the virtual character corresponds to the interaction movement of the user. 13. Based on the graph at the first time, an action performed by the second person toward the first person in the graph at the second time of the known scene graph is selected as the reaction action. The information processing device described in .
the first person in the known scene graph is the user;
The information processing device according to claim 14.
the first person in the known scene graph is a different person from the user;
The information processing device according to claim 14.
The information processing device
Recognizing the user's interaction motion with respect to the virtual character based on the measurement result of the user's motion by the sensor,
Selecting a reaction action to the interaction action of the user based on a known scene graph generated based on measurement results by a sensor including a first person and a second person as measurement targets;
presenting the virtual character performing the reaction action to the user;
The known scene graph is
a plurality of destination nodes each representing the first person and the second person at a first time;
a first motion edge connecting the plurality of destination nodes and representing a first motion relationship between the first person and the second person;
a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time;
a second motion edge connecting the plurality of subsequent nodes and representing a second motion relationship between the first person and the second person;
a first temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time;
a second temporal relationship edge connecting the previous node of the second person and the subsequent node of the second person and representing the passage of time;
Information processing method.
to the computer,
Recognizing the user's interaction motion with respect to the virtual character based on the measurement result of the user's motion by the sensor,
Selecting a reaction action to the interaction action of the user based on a known scene graph generated based on measurement results by a sensor including a first person and a second person as measurement targets;
performing a process of presenting the virtual character that performs the reaction action to the user;
The known scene graph is
a plurality of destination nodes each representing the first person and the second person at a first time;
a first motion edge connecting the plurality of destination nodes and representing a first motion relationship between the first person and the second person;
a plurality of subsequent nodes each representing the first person and the second person at a second time after the first time;
a second motion edge connecting the plurality of subsequent nodes and representing a second motion relationship between the first person and the second person;
a first temporal relationship edge that connects the previous node of the first person and the subsequent node of the first person and represents the passage of time;
a second temporal relationship edge connecting the previous node of the second person and the subsequent node of the second person and representing the passage of time;
A recording medium that records a program.