WO2022102550A1

WO2022102550A1 - Information processing device and information processing method

Info

Publication number: WO2022102550A1
Application number: PCT/JP2021/040879
Authority: WO
Inventors: 茜近藤; 賢次杉原; 広岩瀬; 文彦飯田
Original assignee: ソニーグループ株式会社
Priority date: 2020-11-16
Filing date: 2021-11-05
Publication date: 2022-05-19
Also published as: JP2023181568A

Abstract

[Problem] To ascertain the internal state of a participant of an event, and reflect the internal state to the production of the event. [Solution] This information processing device comprises: a feature amount extraction unit which extracts a feature amount on the basis of user sensing information; a first estimation unit which estimates at least one among the attribute and behavior of the user; a clustering unit which performs classification into a plurality of clusters in units of the user or a group composed of a plurality of users on the basis of the estimation by the first estimation unit; and an information processing unit which performs a prescribed information process on the basis of at least one among the estimation by the first estimation unit and the classification by the clustering unit.

Description

Information processing equipment and information processing method

This disclosure relates to an information processing device and an information processing method.

The excitement of live performances and sporting events and the effects of staging are subjective, and there is no effective means to quantitatively measure the effects, and the actual situation is that the effects are evaluated based on the subjectivity and experience of the person in charge. .. In addition, the production of the event is in line with a static scenario, and it is practically difficult to produce a production that immediately responds to the internal state of the audience.

In Patent Document 1, the degree of excitement of the customer is estimated in real time by the seat sensor installed in the audience seat and the wearable sensor worn by the audience, and the staging content is provided to the audience at an effective timing according to the degree of excitement. Is disclosed.

Japanese Unexamined Patent Publication No. 2019-144882

However, in Patent Document 1, only the degree of excitement is estimated, and it is not possible to infer the internal state of each spectator or to produce an effect according to the attributes and tastes of the spectators.

In addition, Patent Document 1 is intended only for the audience who actually visited the event venue, and does not provide any effect to the audience who participates in the event from the remote environment via the network. Not taken into account. Recently, the number of users who participate in an event in a remote environment is increasing, and the importance of event production in consideration of users who participate in an event in a remote environment is increasing.

Therefore, the present disclosure provides an information processing device and an information processing method capable of grasping the internal state of a user participating in an event and reflecting it in the production of the event.

In order to solve the above problems, according to one aspect of the present disclosure,
A feature amount extraction unit that extracts feature amounts based on the user's sensing information,
A first estimation unit that estimates at least one of the user's attributes and behavior based on the feature amount,
A clustering unit that classifies the user or a group consisting of a plurality of users into a plurality of clusters based on the estimation by the first estimation unit.
An information processing apparatus including an information processing unit that performs predetermined information processing based on at least one of estimation by the first estimation unit and classification by the clustering unit is provided.

The sensing information includes an image captured by the image pickup device.
The clustering unit may be classified into the plurality of clusters based on the analysis result of the captured image.

The captured image includes the image of the user, and the captured image includes the image of the user.
The feature amount extraction unit may extract the feature amount including at least one of the user's face, posture, body movement, and skeletal information.

The feature amount extraction unit may extract the feature amount based on at least one of acoustic data, object recognition, and frequency analysis information.

It is provided with an event information acquisition unit that acquires progress information of the event in which the user participates.
The first estimation unit may estimate at least one of the attributes and actions of the user who participates in the event based on the feature amount and the progress information of the event.

Based on the estimation by the first estimation unit, a tagging unit for adding tag information may be provided in units of the user or the group.

The information processing unit may provide information based on the tag information to the user or the group to which the same tag information is given.

The information processing unit may provide and exchange information according to at least one of the attributes and actions of the user.

It may be provided with a situation image generation unit that generates a situation image in which an identifier indicating the internal state of the user, which is determined based on the sensing information, is added to the image taken by the user.

The situation image generation unit may generate the situation image including the progress information of the event in which the user participates and the information regarding at least one of the degree of excitement and the degree of concentration of the user.

The first estimation unit estimates an internal state including at least one of the degree of excitement and the degree of concentration of the user based on the sensing information.
The clustering unit may be classified into the plurality of clusters based on the feature amount and the internal state.

The clustering unit may be classified into the plurality of clusters based on the change in the internal state according to the progress information of the event.

It may be provided with a sensing information acquisition unit that acquires the sensing information regarding at least one of the user at the event venue and the user who participates in the event from the remote environment.

The clustering unit may be classified into the plurality of clusters in units of users who participate in the event from a remote environment or a group consisting of a plurality of users.

A second estimation unit that estimates an internal state including at least one of the degree of excitement and concentration of users who participate in the event from a remote environment based on the sensing information.
A display control unit that adjusts the size of the display area for displaying information about the event based on the internal state estimated by the second estimation unit may be provided.

The display control unit is one with the audience seats of the venue of the event according to the increase in at least one of the degree of excitement and the degree of concentration of the user on the display unit viewed by the user who participates in the event from the remote environment. An image that enhances the experience may be displayed.

When the internal state of a user who participates in the event from a remote environment satisfies a predetermined condition, the display control unit can provide an information providing image and a visual sense according to the predetermined condition within a range visible to the user. At least one of the effect images may be displayed.

The visual effect image may be a virtual person image of another user who participates in the venue of the event and whose internal state satisfies the predetermined condition.

A user who participates in the event from a remote environment is provided with an information exchange unit that exchanges information with the other person corresponding to the virtual person image via the virtual person image when the predetermined condition is satisfied. May be good.

According to another aspect of the present disclosure, a step of extracting a feature amount based on a user's sensing information and
A step of estimating at least one of the user's attributes and behavior based on the feature amount, and
Based on the estimation, the step of classifying into a plurality of clusters in units of the user or a group consisting of a plurality of users, and
An information processing method is provided comprising a step of performing predetermined information processing based on at least one of the estimation and the plurality of clusters.

The block diagram which shows the schematic structure of the information processing system provided with the information processing apparatus by one Embodiment of this disclosure. A block diagram showing a schematic configuration of an information processing system equipped with an information processing device that supports event participation in a remote environment. The functional block diagram of the information processing apparatus by this embodiment. The block diagram which shows the software structure of the information processing apparatus by this embodiment. The flowchart which shows the processing operation of the information processing apparatus by this embodiment. The figure which shows an example of the bone data extracted by a bone data extraction program. The figure which shows an example of the result of having performed the attribute judgment. A diagram showing the types of "meaningful attributes" during a match. The figure which shows the type of "meaningful attribute" while a game is interrupted. A diagram showing the types of "meaningful attributes" for all time zones during a match. The figure which shows an example of the tool screen for an operator. The enlarged view of the image displayed in the 2nd area of FIG. The third area of FIG. 8 is an enlarged view of a part of the displayed image. The figure which shows an example of the processing operation which generates the tool screen for an operator. The figure which shows the detail of the game analysis processing of FIG. The figure which shows an example of the graph generated by the graph display program. The figure which shows an example of the index generated by the index evaluation program. FIG. 3 is a diagram showing a processing operation for automatically verifying tag information of each spectator generated by executing the fan tagging program of FIG. 4. The figure which shows the video data including the estimation tag. The figure which shows the video data including a correct answer tag. The figure which illustrated the main application example of the information processing apparatus by this embodiment. The figure which shows an example of watching an online game. The figure following FIG. 17A. The figure following FIG. 17B. The figure following FIG. 17C. The figure following FIG. 17D. The figure following FIG. 17E. The figure following FIG. 17F. The figure following FIG. 17G. The figure which showed the similarity between various sports and basketball. The figure which showed the similarity between various non-sports events and basketball.

Hereinafter, embodiments of the information processing apparatus and the information processing method will be described with reference to the drawings. In the following, the main components of the information processing device and the information processing method will be mainly described, but the information processing device and the information processing method may have components and functions not shown or described. The following description does not exclude components or functions not shown or described.

FIG. 1 is a block diagram showing a schematic configuration of an information processing system 2 provided with an information processing device 1 according to an embodiment of the present disclosure. The information processing system 2 of FIG. 1 performs various information processing based on video data obtained by photographing the spectators of the stadium 3 with the camera 4 installed on the court (stadium) 3 where the sporting event is held. In the following, as an example of a sporting event, an example of watching a basketball game will be mainly described, but the type of sport does not matter. Further, the present embodiment can be widely applied to various events other than sports (for example, live music performance, event of entertainment, etc.). Further, the event is not limited to the event held at a specific venue such as a stadium, and may be an event delivered by live distribution described later. The event may be such that a user who participates in the event can participate in the event from the remote environment of the venue and the live distribution destination.

The information processing system 2 of FIG. 1 includes a plurality of cameras 4 fixed by a jig 4a at positions where the spectators of the court (stadium) 3 can be photographed, a network device 5, a processing server 6, and a database (hereinafter, DB). It is equipped with a server 7 (abbreviated as). A device other than that shown in FIG. 1 may be connected to the information processing system 2 according to the present embodiment. In addition, it is assumed that there is a separate camera for shooting the game to be held on the court (stadium) 3.

The network device 5 controls to transmit the video data taken by the plurality of cameras 4 to the processing server 6 via the network. The network may be a public line such as the Internet or a dedicated line. Further, the network may be either wireless or wired.

The processing server 6 receives video data captured by a plurality of cameras 4 via the network device 5 and performs various information processing. For example, the processing server 6 performs distortion correction, color correction, camera 4 control processing for normalizing a plurality of video data shot by the plurality of cameras 4 for a plurality of video data shot by the plurality of cameras 4. After that, various information processing is performed.

The DB server 7 stores game progress information of the sporting event being held, stats information such as the results of participating athletes, and video data processed by the processing server 6. The processing server 6 and the DB server 7 may be integrated into one server, or at least one of the processing server 6 and the DB server 7 may be divided into two or more servers.

In addition, the information processing system 2 of FIG. 1 may include a distribution server 9 that distributes information to a mobile terminal 8 or the like possessed by the spectators of the stadium 3. The distribution server 9 controls the distribution information for the spectators transmitted from the processing server 6 to be transmitted to the corresponding spectator's mobile terminal 8 or the like. The spectator's mobile terminal 8 is, for example, a smartphone, a wearable device such as a watch, a penlight possessed by the spectator for supporting an event, or the like.

The information processing apparatus 1 according to the present embodiment includes at least a processing server 6, and may also include a DB server 7, a distribution server 9, and the like.

Audiences watching sporting events do not always watch the game on the court (stadium) 3, but may watch it through TV, PC, mobile terminal 8 at home, etc., or in public viewing. In particular, in the future, wireless networks that wirelessly communicate large volumes of data at high speed and at low cost will rapidly become widespread, and it is expected that the number of spectators watching games outside the stadium 3 will increase. In this specification, participating in an event other than the event venue such as stadium 3 is referred to as event participation in a remote environment or online watching (participation).

The information processing system 2 of FIG. 1 may include an operator server (not shown) for distributing various information to the event operator's PC or the like. In the case of a sporting event, the operator server generates video data in which an identifier is added to the video data of the spectators so that the degree of excitement and concentration of the spectators can be grasped, and displays the video data on the operator tool screen described later. .. The processing server 6 and the distribution server 9 may have the function of the operator server.

FIG. 2 is a block diagram showing a schematic configuration of an information processing system 2 provided with an information processing device 1 corresponding to event participation in a remote environment. The information processing system 2 of FIG. 2 has a system configuration in which an spectator watches a sporting event at a TV 10a or PC 10b or a public viewing venue 10c at a place other than the court (stadium) 3 (for example, at home). Shows. It is assumed that the TV 10a and the PC 10b for watching the sporting event are equipped with a camera 4 for photographing the spectators. Further, it is assumed that the camera 4 for photographing the spectators watching the game in the public viewing is installed in the public viewing venue 10c.

The information processing system 2 of FIG. 2 includes a processing server 6, a DB server 7, and a distribution server 9 that acquire video data from the above-mentioned camera 4 and perform various information processing. The basic processing of these servers is the same as that of each server shown in FIG. However, the processing server 6 and the distribution server 9 can be devised in various ways to give the spectators watching the game in the remote environment the same sense of presence as the spectators watching the game in the stadium 3. .. A specific example thereof will be described later.

The processing server 6 of FIG. 1 may be integrated with the processing server 6 of FIG. 2, and similarly, the DB server 7 of FIG. 1 may be integrated with the DB server 7 of FIG. 2, and similarly, the distribution of FIG. 1 may be integrated. The server 9 may be integrated with the distribution server 9 of FIG. Further, at least two or more of the processing server 6, the DB server 7, and the distribution server 9 of FIGS. 1 and 2 may be integrated, or conversely, various information processing may be performed by distributing them to more servers. It may be distributed and executed. That is, the server configurations of FIGS. 1 and 2 are only examples.

FIG. 3 is a functional block diagram of the information processing apparatus 1 according to the present embodiment. FIG. 3 mainly blocks the functions of the processing server 6, the DB server 7, and the distribution server 9 of FIG. 1 or 2.

The information processing device 1 in FIG. 3 includes a sensing information acquisition unit 11, a feature amount extraction unit 12, a first estimation unit 13, a clustering unit 14, and an information processing unit 15.

The sensing information acquisition unit 11 acquires sensing information. Sensing information is detection information detected by various sensors. A typical example of sensing information is an image taken by an image pickup device. As a more specific example, the sensing information may be video data captured by an image sensor. It should be noted that the captured image of the image pickup device and the video data obtained by the image sensor are not necessarily essential sensing information. The sensing information may include acoustic data of an event venue such as a stadium 3. Acoustic data is useful for determining the degree of excitement of an event. The sensing information may include the detection information of the vibration sensor installed in the audience seats of the event venue such as the stadium 3. Alternatively, the sensing information may include detection information such as an acceleration sensor or a gyro sensor built in the mobile terminal 8 possessed by the spectator or the penlight for cheering. As described above, as long as the sensing information can be used to determine the internal state such as the degree of excitement and the degree of concentration of the audience, the specific type thereof does not matter.

For sporting events, the degree of excitement and concentration of the spectators changes as the game progresses. Therefore, the sensing information acquisition unit 11 acquires sensing information periodically, irregularly, or continuously from the start of the match to the end of the match.

The feature amount extraction unit 12 extracts the feature amount based on the sensing information sensed by the user. For example, the feature amount extraction unit 12 extracts a feature amount including at least one of the face, posture, body movement, and skeletal information (also referred to as bone data) of a user (for example, an event participant). When the sensing information includes video data, the feature amount extraction unit 12 analyzes the video data to identify each spectator spectating at the stadium 3, the degree of smile of each identified spectator, and the wrist. The feature amount can be extracted according to the amount of movement, the amount of movement of the head, the degree of opening of the eyes, the direction of the line of sight, and the like. Further, the feature amount extraction unit 12 may extract the feature amount based on at least one of the acoustic data, the object recognition, and the frequency analysis information.

The first estimation unit 13 estimates at least one of the user's attributes and actions based on the feature amount extracted by the feature amount extraction unit 12. The specific estimation contents of the first estimation unit 13 will be described later, but in the case of a sporting event, for example, the degree of excitement and concentration of each spectator in the game are estimated.

The clustering unit 14 classifies the clusters into a plurality of clusters based on the estimation by the first estimation unit 13 with the user or a group consisting of a plurality of users as a unit. The type of cluster is arbitrary. For example, in the case of a sporting event, attention may be paid to the cheering method, and spectators who have the same cheering method may be assigned to the same cluster. Alternatively, each spectator may be classified into one of a plurality of clusters according to the degree of interest in the game, focusing on the degree of interest in the game.

The information processing unit 15 performs predetermined information processing based on at least one of the estimation by the first estimation unit 13 and the classification by the clustering unit 14. The specific content of information processing performed by the information processing unit 15 is arbitrary. As will be described later, the information processing unit 15 may perform information processing for providing information suitable for each spectator to the clustered spectators. Alternatively, the information processing unit 15 may provide information to the spectators who are watching the game in the remote environment to control the degree of presence according to the degree of excitement of the spectators.

As shown in FIG. 3, the information processing apparatus 1 may include an event information acquisition unit 16. The event information acquisition unit 16 acquires progress information of an event in which the user participates. For example, in the case of a sporting event, the event information acquisition unit 16 acquires progress information such as who scored a score on which team several minutes after the start of the match. The first estimation unit 13 may estimate at least one of the attributes and actions of the user who participates in the event based on the feature amount and the progress information of the event.

As shown in FIG. 3, the information processing apparatus 1 may include a tagging unit 17. The tagging unit 17 adds tag information in units of users or groups participating in the event based on the estimation by the first estimation unit 13. For example, in the case of a sporting event, the tag information may be information that identifies a home fan who supports the home team, an away fan who supports the away team, and a beginner who watches a game. Based on the video data of the spectators, the tagging unit 17 tags the spectators who are excited when the home team scores as home fans, and the spectators who are excited when the away team scores points are away fans. Audiences who do not get excited no matter which team scores the score may be tagged as a beginner.

The information processing unit 15 may provide information based on the tag information to the user or group to which the same tag information is given. Further, the information processing unit 15 may provide at least one of information provision and information exchange according to at least one of the user's attributes and actions. Information provision is, for example, the provision of information related to an event and which the user may be interested in. Information exchange is, for example, conversation with other spectators participating in the event. The function of the information processing unit 15 can be built in, for example, the distribution server 9 of FIG. The distribution server 9 transmits various information related to the tag information to the tagged spectator's mobile terminal 8 or the like based on the instruction from the processing server 6. As an example, spectators tagged as beginners may be provided with information about the rules of the match, and spectators who are rooting for the team that scored may be provided with stats information for the players who scored. ..

As shown in FIG. 3, the information processing apparatus 1 may include a situation image generation unit 18. The situation image generation unit 18 generates a situation image in which an identifier indicating the internal state of the user is added to the image of the user in the video data obtained by shooting the venue of the event. In the event, the situation image is used as an image for the operator for the operator to confirm the event situation. More specifically, the situation image generation unit 18 may generate a situation image (operator image) including information on the progress of the event and information on at least one of the degree of excitement and the degree of concentration of the user. A specific example of the situation image (image for the operator) will be described later. The operator who directs the event can confirm whether or not the effect produced the intended effect by the situation image (image for the operator). The internal state of the user is, for example, the degree of excitement and the degree of concentration of the user. The identifier indicating the internal state is, for example, a circle having a diameter size according to the degree of excitement of the user, and this circle may be superimposed on the face image of the user.

As shown in FIG. 3, the information processing apparatus 1 may include a second estimation unit 19 and a display control unit 20. The second estimation unit 19 estimates the internal state including at least one of the degree of excitement and the degree of concentration of the users who participate in the event from the remote environment based on the sensing information.

The display control unit 20 adjusts the size of the display area for displaying information about the event based on the internal state estimated by the second estimation unit 19. Further, the display control unit 20 may display an image in which the degree of similarity with the actual venue of the event is adjusted according to the internal state on the display unit viewed by the user who participates in the event from the remote environment. Further, when the user who participates in the event from the remote environment satisfies the predetermined condition, the display control unit 20 has at least one of the information providing image and the visual effect image according to the predetermined condition within the range visible to the user. May be displayed. The visual effect image may include a virtual person image of another user who is participating in the venue of the event and whose internal state satisfies a predetermined condition.

As shown in FIG. 3, the information processing apparatus 1 may include an information exchange unit 21. When a user who participates in an event from a remote environment satisfies a predetermined condition, the information exchange unit 21 exchanges information with another person corresponding to the avatar via the avatar.

FIG. 4 is a block diagram showing a software configuration of the information processing apparatus 1 according to the present embodiment. The software configuration of FIG. 4 is executed by the processing server 6, the DB server 7, and the distribution server 9.

As shown in FIG. 4, the software configuration of the information processing apparatus 1 includes a real-time processing 31, a prediction processing 32, and a data distribution processing 33. The real-time processing 31 and the prediction processing 32 are mainly executed by the processing server 6 and the DB server 7. The data distribution process 33 is mainly executed by the distribution server 9.

The real-time processing 31 performs processing for extracting features in real time based on sensing information (for example, video data). The real-time processing 31 includes a face recognition execution program 31a, a bone data extraction program 31b, an acoustic data extraction program 31c, a match live data extraction program 31d, and an extraction result storage process 31e.

The face recognition execution program 31a performs face recognition processing based on the video data, and estimates the degree of smile of each spectator, the amount of movement of the wrist, the amount of movement of the head, the degree of opening of the eyes, the direction of the line of sight, and the like. The bone data extraction program 31b extracts the bone data of each customer based on the video data. The acoustic data extraction program 31c extracts the acoustic data of the event venue. The extracted acoustic data contains, for example, information about the direction and volume of sound. The game live data extraction program 31d acquires the game live data, for example, when the game live data is distributed on the official site of the host team.

The extraction result storage process 31e is extracted by the output data of the face recognition execution program 31a, the data extracted by the bone data extraction program 31b, the data extracted by the acoustic data extraction program 31c, and the match live data extraction program 31d. Control is performed to store the collected data in the DB server 7 as a feature amount.

The prediction process 32 performs a process of assigning tag information to each spectator, a process of classifying each spectator into a plurality of clusters, and a process of determining the behavior of each spectator. More specifically, the prediction process 32 includes a prediction data creation program 32a, a fan tagging program 32b, a fan clustering program 32c, a fan behavior determination program 32d, and a prediction result storage process 32e.

The prediction data creation program 32a is a program for creating a fan tagging program 32b, a fan clustering program 32c, and a fan behavior determination program 32d.

The fan tagging program 32b adds tag information to users who participate in the event (fans in the case of sports events) based on the feature amount obtained by the real-time processing 31. As described above, the tag information is, for example, a home fan, an away fan, a beginner, and the like.

The fan clustering program 32c classifies users (fans, etc.) who participate in the event into a plurality of clusters based on the feature amount obtained by the real-time processing 31. As described above, a plurality of clusters may be classified into a plurality of clusters according to the degree of excitement of the game, or may be classified into a plurality of clusters according to the degree of concentration in the game, or a combination of several conditions. It may be classified into a plurality of clusters.

The fan behavior determination program 32d determines the behavior of users (fans, etc.) who participate in the event. For example, it is determined that an enthusiastic fan who is staring at the game, a fan who eats and drinks or has a conversation without watching the game, a fan who seems to be bored by looking only at a smartphone, and the like.

The prediction result storage process 32e performs a process of storing the prediction result obtained by executing the fan tagging program 32b, the fan clustering program 32c, and the fan behavior determination program 32d in a predetermined database.

The data distribution process 33 distributes information for distribution to the audience and the operator to the distribution server 9 and the operator server (tool selection server) 30 based on the prediction result data obtained by the prediction process 32. Generate. Information for distribution for spectators is transmitted from the distribution server 9 to the corresponding spectator's mobile terminal 8. Information for distribution for the operator is transmitted from the operator server 30 to the operator's PC or the like. The video data of the spectators' seats and the video data of the match stored in the cloud storage 22 are input to the operator server 30.

FIG. 5 is a flowchart showing the processing operation of the information processing apparatus 1 according to the present embodiment. This flow chart is performed regularly, irregularly, or continuously during the duration of the event. First, user information such as the names of users participating in the event is acquired (step S1). When a user who participates in an event applies for participation in an event and has registered user information such as a name and contact information, this registration information may be acquired. For example, when the spectator seats of the event are designated in advance, the user information may be acquired including the seat information of the spectator seats. If the user's seat information can be acquired, it is possible to grasp who is sitting in which seat and improve the reliability of information distribution.

Next, the sensing information is acquired, and the feature amount of the user is extracted based on the sensing information (step S2). When the sensing information includes video data of the audience seat, the face recognition execution program 31a and the bone data extraction program 31b described above are executed to include the facial expression, posture, body movement, skeletal information, etc. of the user. Feature quantity can be extracted. Further, based on the video data, the feature amount including the movement amount of the user can be extracted by the optical flow. Further, when the sensing information includes acoustic data, the feature amount including the user's voice can be extracted.

Next, acquire the event environment information and content information (step S3). For example, in the case of a sporting event, the game progress information may be acquired by the game live data extraction program 31d shown in FIG. Alternatively, the stats information of the players participating in the match and the players of the team to which they belong may be acquired from the DB server 7 or the like. Alternatively, environmental information such as the number of participants in the event, the venue, the weather, the temperature, the season, and the time of the event may be acquired.

Next, attribute determination and action determination are performed for each user or group participating in the event, and it is determined whether or not the clustering process has been normally performed based on the determination result (step S4). Here, for example, the attribute determination and the behavior determination of each user are performed using the result of learning with a teacher using an existing database for the feature amount extracted based on the video data. In addition, a clustering process is performed in which a plurality of participants having similar attributes are grouped and classified into a plurality of clusters.

If the clustering process cannot be performed normally, such as when learning is not performed correctly, or if the reliability is low even if it can be performed, the analysis results are registered in the data retention DB without classifying into multiple clusters. (Step S5).

If the attribute determination and the action determination are performed in step S4 and the clustering process can be performed normally, the classification results for a plurality of clusters are registered in the result DB (step S6).

Next, output processing is performed based on the processing results of steps S5 and S6 (step S7). If the clustering process can be performed normally, the information corresponding to each cluster is provided to the users classified into each cluster, and the classification result of a plurality of clusters is provided to the operator. Alternatively, the entire or part of the venue of the event may be directed to liven up the event. As described above, the specific content of the output processing is arbitrary.

FIG. 6A is a diagram showing an example of bone data 34 extracted by the bone data extraction program 31b. FIG. 6A shows the bone data 34 superimposed on the video data obtained by photographing the audience seats of the stadium 3. As shown in the figure, the bone data 34 is composed of a circle and a polygonal line. The circle indicates the position of the spectator's face, and the polygonal line connected to the circle indicates the movement of the body. Audiences who stand up and cheer for them have a long line. Further, whether or not the person is applauding can also be detected by the bone data 34. For example, an spectator whose polygonal line becomes longer at the timing of increasing the score can be presumed to be a fan supporting the team that increased the score.

FIG. 6B is a diagram showing an example of the result of attribute determination. As mentioned above, it is possible to determine whether the player is a home fan, an away fan, or a beginner based on whether or not the body moves a lot when the score is increased. FIG. 6B shows an area 35b in the home area 35a where home fans are gathered, an area 35c where away fans are gathered, and an area 35d where beginners are gathered by analyzing video data obtained by photographing the audience seats. Is shown as an example of automatic determination.

The information processing apparatus 1 according to the present embodiment prepares "meaningful attributes" in advance according to the situation of the event in order to estimate the internal state of the user participating in the event and determine the attributes. The information processing apparatus 1 performs a process of applying to any of the "meaningful attributes" prepared in advance based on the result of analyzing the video data. This makes it possible to easily determine the attributes of the user.

7A, 7B and 7C are diagrams showing an example of a type of "meaningful attribute" for estimating the internal state of the spectator of a sporting event and determining the attribute. FIG. 7A shows the types of "meaningful attributes" during a match (also called during a period), and FIG. 7B shows the "meaning" during a match being interrupted (also called a non-period). The type of "certain attribute" is shown, and FIG. 7C shows the type of "meaningful attribute" for all time zones during the match.

As shown in FIG. 7A, during the continuation of the match, the attributes for enlivening the match include the reaction to the match development and whether or not the match is being watched. The reaction to the game development can be judged by the degree of smile, the amount of wrist movement, and the degree of eye opening. Whether or not you are watching the game can be judged by the degree of smile and whether or not you are watching the court (stadium) 3.

In addition, during the continuation of the match, the attributes of other candidates include the presence or absence of a reaction (unintentional reaction), the intentional reaction, the reaction before and after the score, the audio volume of the entire event venue, and commentary. .. For unintentional reactions, the degree of concentration during play can be determined by whether or not the player is looking at the court (stadium) 3, and the degree of eye opening can be used to determine whether or not the player is staring at the court. The intentional reaction can be determined to be intentionally moving the wrist based on the amount of movement of the wrist, and can be determined to be applauding based on the amount of movement of the optical flow in the area under the face. Also, depending on the degree of smile, you can judge whether you are happy or sad by increasing the score. The amount of sound in the entire venue can be judged from the size of the acoustic data. Also, depending on how you open your mouth, you can judge whether you are explaining the game or listening to the explanation.

As shown in FIG. 7B, during the interruption of the game, the degree of participation in the spectator involvement type performance and the degree of interest in the non-spectator involvement type performance are the attributes for the responsiveness to the performance. As for the degree of participation in the audience involvement type performance, the degree of participation in the giveaway can be determined by looking at the court (stadium) 3 where the performance is performed, and the degree of participation in the audience involvement type cheerleader can be determined by the optical flow. The degree of non-audience involvement type performance participation is the degree of cheerleader interest, staff dance interest, free throw interest, commemorative photo interest, and control interest, depending on the average degree of eye opening and watching court (stadium) 3. Can be judged. In addition, the degree of interest of DJ can be determined from the amount of movement of the face, the amount of movement of the skeleton, and the amount of movement of the optical flow.

As shown in FIG. 7C, there are attributes related to the synchronization rate with other people and attributes related to actions that are not directly related to the game in the entire time zone of the game. Regarding the synchronization rate with other people, we support with fan attributes (home fans, away fans, beginners) by conducting supervised learning based on bone data, degree of eye opening, degree of smile, and amount of face movement. You can judge the style attributes (watching the game, cheering with a loud voice, cheering with gestures, focusing on shooting). Also, by detecting the color around the face, the attributes of the team goods and the attributes of the team-colored clothes can be determined. In addition, regarding the synchronization rate with other people, the attribute of whether or not it is plain clothes, the attribute of age, the attribute of whether or not they came to support in a group, and the attribute of whether or not they came with a small child be.

For behaviors that are not directly related to the game, there are attributes that are not related to basketball, such as whether or not they are drinking and whether or not they are eating. In addition, regarding whether or not the person is in the audience seat, the attribute of whether or not the person is away from the seat can be determined depending on the presence or absence of data. In addition, the attribute of whether or not both hands are raised can be determined by whether or not the wrist is above the face, and the attribute of whether or not the arms are crossed is determined by whether or not the X direction of the wrist is crossed. It can be judged, and the attribute of whether or not the face is touched can be judged from the distance between the face and the wrist.

For actions that are not directly related to the game, there is an attribute of actions that may be related to basketball, such as whether or not you are watching a smartphone. In addition, the attribute of calmness can be determined by whether or not the body is moving regardless of the game development. In addition, the attributes of facial expressions (anxiety, frequency of change) can be determined by the degree of smile. In addition, the attribute of whether or not you are looking at the pamphlet, the attribute of whether or not you have many conversations, the attribute of whether or not you are taking pictures (match photography, selfie, directing photography), and the court (stadium) There is an attribute of whether or not you are looking at the screen of 3.

In various events including sporting events, the event operator often performs various productions in order to increase the degree of excitement and concentration of the participating users. In this embodiment, an operator tool screen is prepared for the operator to objectively verify the effect of the effect performed by the operator. The above-mentioned situation image (image for the operator) is displayed on the tool screen for the operator.

FIG. 8 is a diagram showing an example of the operator tool screen 40. The operator tool screen 40 of FIG. 8 can be displayed on a PC or the like owned by the operator. The operator can verify in detail the reaction of the event participants to the production performed during the event by the situation image (image for the operator) displayed on the operator tool screen 40.

The operator tool screen 40 in FIG. 8 shows an example in which a performance or the like is produced during a basketball game. The operator tool screen 40 of FIG. 8 displays the first area 40a for displaying the video data of the game, the second area 40b for displaying the reaction of the spectators, the score change of the game, the degree of excitement of the spectators, and the degree of concentration. It has a third area 40c and the like. In the present specification, the images displayed in the second area 40b and the third area 40c are collectively referred to as a situation image (operator image).

FIG. 9A is an enlarged view of the image displayed in the second area 40b of FIG. In the image of the second area 40b, a circle (identifier) 41 indicating the degree of excitement is superimposed on the face of the audience. The greater the degree of excitement, the larger the radius of the circle 41. Above the second area 40b, a plurality of buttons 42 for selecting the range of spectator seats to be displayed in the second area 40b are arranged. In the example of FIG. 9A, the range of Tokyo bench is selected. By arbitrarily selecting a button, the operator can display an arbitrary place in the spectator seats of the entire stadium 3 and visually grasp the degree of excitement of the spectators at that place.

FIG. 9B is an enlarged view of a part of the image displayed in the third area 40c of FIG. The gray range 43 of FIG. 9B is the match duration, and the black range 44 between the grays is the match interruption period. The horizontal axis is time, and the vertical solid line shows the scores of the opposing teams. In FIG. 9B, two polygonal lines are shown, one of which shows the degree of excitement of the audience and the other of which shows the degree of concentration of the audience. Further, the numbers in the rectangular frame 45 indicate the score and the score difference of each team at that time.

With the images of the second area 40b and the third area 40c in the tool screen 40 for the operator, the operator should examine in detail how the degree of excitement and concentration of the spectators changed with the progress of the game. Can be done. At a sporting event, the home fan and the away fan show different reactions depending on the scoring situation, but on the operator tool screen 40 of FIG. 8, the range in which the home fan and the away fan are sitting can be examined in detail separately.

FIG. 10 is a diagram showing an example of a processing operation for generating the operator tool screen 40. The processing operation of FIG. 10 may be performed by the processing server 6 or the operator server 30. In the present specification, the process of generating the operator tool screen 40 is referred to as an exciting visualization tool.

The excitement visualization tool has a match analysis process 46a, an excitement circle drawing program 46b, an analysis data processing program 46c, a web display program 46d, and CSS (Cascading Style Sheets) 46e.

As will be described later, the match analysis process 46a outputs information on the concentration, excitement, and fan attributes of each spectator as analysis items for each spectator based on the result of analyzing the video data. In addition, the match analysis process 46a outputs information on the score, player substitution, match suspension period, performance content performed during the match interruption, and performance time as match analysis items. Further, the game analysis process 46a outputs information on the degree of concentration and the degree of excitement for each frame of the video data and for each fan.

The exciting circle drawing program 46b superimposes a circle proportional to the degree of excitement of each spectator on the face image of the spectator based on the video data of the spectator seat and the output data of the game analysis process 46a.

The analysis data processing program 46c extracts only the information required by the tool from the CSV data after analysis. Convert pickle data to json format after analysis.

The web display program 46d is based on the video image of the game, the image generated by the circle drawing program 46b, and the CSS46e that describes the layout definition of the tool, and the game is displayed in the first area 40a of the tool screen 40 for the operator. The process of reproducing the video image is performed, and the images of the second area 40b and the third area 40c are generated. The WebAP server 47 may perform the processing of the web display program 46d and the CSS46e.

FIG. 11 is a diagram showing details of the game analysis process 46a of FIG. The game analysis process 46a includes a spectator video data processing program 48a, a game data processing program 48b, a graph display program 48c, an index evaluation program 48d, a simple CSV creation program 48e for designers, and a setting change program 48f.

Data related to the feature amount generated by the feature quantification process 48g is input to the audience video data processing program 48a. In the feature quantity processing 48g, the feature quantity is extracted by executing the face recognition execution program 31a, the bone data extraction program 31b, the optical flow processing, and the like on the video of the spectator seat and the video of the game. By the feature quantification process 48g, the detection data representing the feature quantity and the fan attribute information are output. In the feature quantification process 48g, the attributes of the fans are determined from the behavior of the spectators and the development of the game.

The spectator video data processing program 48a performs personal identification, addition of analysis items, video data integration of a plurality of cameras 4, correction between cameras 4, etc. based on the video data of the spectator seats, and the degree of concentration and excitement of each spectator. , And fan attributes, etc. are output.

Match progress data is input to the match data processing program 48b via, for example, the Internet. The match progress data includes information such as performance information performed during the match and video time of the match, in addition to the match progress information. The match data processing program 48b performs match data processing, personal item integration, performance information processing, manual inconsistency detection, etc. based on the output data of the spectator video data processing program 48a and the match progress data, and each spectator. In addition to outputting the degree of concentration, the degree of excitement, and the fan attributes, information on points, player substitution, game suspension period, performance content and period is output as analysis items for the game.

The graph display program 48c generates a graph 49 showing the time change of the degree of concentration and the degree of excitement based on the data obtained by executing the game data processing program 48b.

FIG. 12 is a diagram showing an example of the graph 49 generated by the graph display program 48c, and is an enlarged view of the graph 49 shown in FIG. The horizontal axis shows the elapsed time of the game, and the vertical axis shows the degree of excitement and concentration of the spectators. FIG. 12 shows a graph of the degree of excitement and a graph of the degree of concentration of the audience. The gray period 49a in FIG. 12 is the game continuation period, and the dark gray period 49b indicates the period in which some kind of production is performed. By presenting this graph to the operator, it is possible to grasp at a glance whether or not the audience was excited by the production.

The index evaluation program 48d in FIG. 11 generates an index for evaluating the degree of concentration and the degree of excitement. FIG. 13 is a diagram showing an example of the index 50 generated by the index evaluation program 48d, and is an enlarged view of the one shown in FIG. In the example of the index 50 in FIG. 13, the correlation between the home team scoring 1 to 3 points, the correlation with the home team score, the away team scoring 1 to 3 points, and the away team For each of the correlations with the score of, the index of the total number of spectators whose degree of excitement increased, the index of the total number of spectators whose degree of excitement decreased, the index of the rate of increase, and the degree of excitement within a predetermined period. An index for the average value is shown.

The simple CSV creation program 48e for designers in FIG. 11 creates a CSV file containing information on the degree of concentration and excitement for each frame of video and for each fan based on the execution result of the game data processing program 48b.

The setting change program 48f in FIG. 11 switches the input / output folder for storing the files obtained by executing the game data processing program 48b, the graph display program 48c, the index evaluation program 48d, and the simple CSV creation program 48e for the designer. Further, the setting change program 48f changes the color and display position of the graph generated by the graph display program 48c. Further, the setting change program 48f switches the video frame for calculating the index generated by the index evaluation program 48d.

FIG. 14 is a diagram showing a processing operation for automatically verifying the tag information of each spectator generated by executing the fan tagging program 32b of FIG. In this specification, the process of automatically verifying tag information is referred to as a tag information automatic verification tool. The processing of the tag information automatic verification tool may be performed by the processing server 6 of FIG. 1, the operator server 30, or the like.

The tag information automatic verification tool has a score information file generation program 51a, a bone data cleaning program 51b, and a fan tagging / evaluation program 51c.

The score information file generation program 51a corrects the time lag between the match video and the spectator video based on the data related to the feature obtained by the feature quantification process 48g, and the time file in the scoring team and the spectator video (hereinafter, score information). File) is generated.

The bone data cleaning program 51b extracts the bone coordinates of the spectator when the home team and the away team raise the score based on the score information file, and outputs the movement amount of the spectator at the time of scoring based on the bone coordinates.

The fan tagging / evaluation program 51c compares the changes in bone data with the average value of bone data at the time of scoring when the home team scores and when the away team scores, and tags the audience. Judgment of attachment is made. The determination is made, for example, by rule-based or deep learning (DNN: Deep Neural Network).

FIG. 15A shows video data 52a including tag information (also referred to as estimated tag) generated by the fan tagging / evaluation program 51c, and FIG. 15B shows video data 52b including correct tag information (also referred to as correct tag). There is. 15A and 15B are enlargements of the image illustrated in FIG. In FIGS. 15A and 15B, different tag information 52a to 52c are displayed in different colors.

In the fan tagging / evaluation program 51c, the result of comparing the correct answer tag and the estimated tag is presented in a table format as shown in FIG. 14, and the correct answer rate of the tag information is presented numerically.

The information processing device 1 according to the present embodiment can be used for various purposes. FIG. 16 is a diagram illustrating a main application example of the information processing apparatus 1 according to the present embodiment. The information processing apparatus 1 according to the present embodiment can be applied to applications other than those shown in FIG.

As shown in FIG. 16, the information processing apparatus 1 according to the present embodiment may have a function of automatically archiving the timing at which an event rises. As a result, it is possible to easily acquire a video of the timing when the event is exciting.

Further, the information processing apparatus 1 according to the present embodiment may have a function of performing an effect or performing a cleaning work in an area on the court (stadium) 3 that is not attracting attention. By directing in an area that is not attracting attention, the audience's line of sight can be directed to the area that is not attracting attention. In addition, by performing the cleaning work in an area that is not attracting attention, the coat can be cleaned without bothering the eyes of the spectators.

Further, the information processing device 1 according to the present embodiment may change the production according to the cheering style of the audience. According to this embodiment, since the cheering style of the audience can be automatically extracted, the type of production can be switched according to the degree of excitement and concentration of the audience. For example, for a spectator who is speaking out, the degree of excitement may be further increased by vibrating the spectator seats.

Further, the information processing device 1 according to the present embodiment may provide advertising information or food and drink information when the concentration of the spectators to the game is reduced. For example, by providing advertising information and food and drink information in anticipation of a period during which the game is temporarily interrupted, it is possible to enhance the advertising effect and improve the sales of restaurants and the like in the stadium 3.

Further, the information processing device 1 according to the present embodiment may have a function of distributing information on the goods of the attention player to the smartphone of the spectator or the like.

Further, the information processing device 1 according to the present embodiment may have a function of giving benefits such as virtual currency and points to the spectator according to the support style of the spectator. In addition, information on the number of times the event has been attended may be acquired for each spectator, and benefits such as points may be given to the spectators who have participated frequently.

Further, the information processing device 1 according to the present embodiment may have a function of visualizing the excitement of the entire event venue or between teams and feeding back to the organizer team. As a result, it is possible to increase the spirit of each player in the organizer team.

Further, the information processing device 1 according to the present embodiment may have a function of delivering a photographed image of the spectator himself and a friend at the time of excitement to the corresponding spectator's smartphone.

Further, the information processing apparatus 1 according to the present embodiment may have an attribute with less excitement and a function of delivering a predetermined sound source only to the audience in the area.

Further, the information processing device 1 according to the present embodiment may have a function of creating an image of a virtual stadium 3 by a fake cloud that captures the characteristics of each attribute. For example, when the degree of excitement of the audience increases, a virtual person (avatar) is displayed so that the audience can cheer with the avatar, or when the degree of excitement of the audience increases further, a conversation with the avatar will occur. It may be provided with a function such as enabling it or enabling high five. Specific examples of this function will be described with reference to FIGS. 17D and 17E described later.

Further, the information processing device 1 according to the present embodiment transmits the degree of excitement of the spectators who are watching the game online (watching the game in a remote environment) to the venue of the event, and the degree of excitement at the venue of the event is watched online. It may have a function of transmitting to a place. For example, when the total amount of excitement at the online watching place reaches the standard amount, the information that the standard amount has been reached is transmitted to the event venue via the network, and the effect sound and visual effect image are transmitted at the event venue. You may perform the production by such as. On the contrary, when the amount of sound or the like at the venue of the event reaches the reference amount, some visual effect may be produced at the online watching place.

Further, the information processing apparatus 1 according to the present embodiment may have a function of matching spectators who are reacting at the same timing with a high synchronization rate. The spectators are assumed to be far from each other, for example, matching a spectator at the event venue with a spectator watching online. For example, an avatar cheering at another place that matches the cheering style of the spectator watching online may be displayed so that conversation and high five can be performed. Specific examples of this function will also be described with reference to FIGS. 17D and 17E described later.

FIGS. 17A to 17H are diagrams showing an example of participating in an event in a remote environment, that is, watching an online game. FIGS. 17A to 17H show a state in which the spectators are guided to the place where the online spectator is to be watched, the spectators are explained how to watch the spectators, and then the spectators are actually watched. It is assumed that this place has a function to display the video of the game on the wall instead of the spectator's home, and a function to display 3D video such as avatar using AR (Augmented Reality) technology. .. In addition, instead of the 3D image using AR, a function to display the audience cheering at other places such as the event venue, virtual people, etc. in a part of the image displayed on the wall surface. May be provided. 17A to 17F show an example of watching a soccer game online, but the type of sporting event to watch online does not matter.

First, FIG. 17A shows how the guide guides the spectators to the spectator place. The spectators are guided to a spectator place called a space equipped with a multi-faceted projector system (hereinafter referred to as "warp square"). There is a sofa in Warp Square, and the image of Stadium 3 is displayed on the wall in front of the sofa. The display on the wall surface is performed by, for example, a projector. Further, a camera 4 (not shown) for capturing an image of the audience sitting on the sofa is provided, and the degree of excitement and concentration of the audience are analyzed based on the video data captured by the camera 4. The guide gives the audience a megaphone and explains that they can use the megaphone to cheer loudly. As a result, the spectators in Warp Square can cheer in the same cheering style as the spectators in Stadium 3.

Next, as shown in FIG. 17B, the spectator sits on the couch in Warp Square and presses the execute button on the remote controller, or the user's actions (eg, sitting on the couch, looking at the wall, etc.). The image of the stadium 3 is projected on the front wall surface by the automatic recognition behavior sensing.

Next, as shown in FIG. 17C, when the match starts and the spectators cheer, the higher the degree of excitement and concentration of the spectators, and / or the excitement of the content provided at the event is estimated from the behavior of the spectators. In that case, the size of the image displayed on the wall surface becomes large. The larger the audience, the larger the size of the image, the more they cheer by swinging the megaphone as well as raising their voices. As shown in the figure on the right side of FIG. 17C, when the degree of excitement reaches the climax, the image of the game of the stadium 3 is projected on the entire wall surface in front of the sofa. In this way, by varying the image size according to the degree of excitement and concentration of the spectators, it is possible to further increase the degree of excitement of the spectators who want to enjoy watching the game with a larger image size.

Next, as shown in the figure on the left side of FIG. 17D, when the favored team raises the score, the virtual person (avatar) is displayed as a 3D image using AR technology, and the conversation with the avatar is performed. You may be able to have a conversation with. In addition, as shown in the figure on the right side of FIG. 17D, when the favored team raises the score, the image of confetti is displayed by shaking the megaphone violently, and the voice of "gooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo A function may be provided. The avatar may be a virtual person who reflects the appearance of an spectator who has a similar cheering style at Stadium 3. As a result, it is possible to enhance the sense of unity and solidarity with the spectators in the stadium 3.

If a player on the favored team scores a high score, as shown in FIG. 17E, a 3D image that makes the avatar happy may be displayed. As shown in the figure on the right side of FIG. 17E, an example is shown in which an avatar makes a gesture for a high five via a megaphone. When a high five is performed via a megaphone according to the operation of the avatar, as shown in the figure on the left side of FIG. 17F, an effect sound may be produced or a visual effect image may be displayed at the place where the high five is performed.

After that, when the game is interrupted, the spectators sit on the sofa and stop cheering, so the size of the image displayed on the wall becomes smaller as shown in the figure on the right side of Fig. 17F.

After that, when the match is resumed, the same images as those in FIGS. 17C to 17F described above are displayed. The method of displaying the image is not limited to that shown in FIGS. 17C to 17F. For example, as shown in the figure on the left side of Fig. 17G, when the game is lively, the screen is divided into two in the vertical direction, the video of the game is displayed on the upper side, and the audience seats of the stadium 3 are on the lower side. You may display a panoramic image as if you were doing it. When cheering by shaking the megaphone in this state, the silhouette of the avatar of the spectator in the stadium 3 may be displayed and the voice of the avatar cheering may be heard as shown in the figure on the right side of FIG. 17G.

In a state where the degree of excitement just before the goal is high, as shown in the figure on the left side of Fig. 17H, if the next avatar speaks to it and answers it by voice, the "Like" icon may be displayed. When the goal is reached, as shown in the figure on the right side of FIG. 17H, the voice of "gooooooo" may be heard, and the image of confetti may be displayed in synchronization with the movement of the megaphone.

In the above-described embodiment, an example of mainly watching a basketball sporting event was mainly described, but there are sports in which the shooting conditions of video data are different from those of basketball and the behavior of the spectators is different. In this case, it is necessary to change the processes shown in FIGS. 5, 10, 11, 14, and the like as necessary.

FIG. 18 is a diagram showing the similarity between various sports and basketball. The horizontal axis of FIG. 18 indicates the degree to which the shooting conditions are different from those of basketball, and the right side indicates that the shooting conditions are more different. The shooting conditions are indoor / outdoor, dark / bright, wide / narrow, and large / small number of people. The vertical axis of FIG. 18 shows the degree to which the behaviors of the basketball and the spectators are different, and the lower side shows that the behaviors of the spectators are more different.

As can be seen from FIG. 18, when the shooting conditions and the behavior of the spectators are comprehensively considered, ice hockey is most similar to basketball, and golf is most different from basketball. Therefore, for golf, it may be necessary to drastically change the processes shown in FIGS. 5, 10, 11 and 14 described above. Further, for sports other than golf, it may be necessary to change the processing of FIGS. 5, 10, 11 and 14 as necessary.

The information processing device 1 and the information processing system 2 according to the present embodiment can be applied to various events other than sporting events. FIG. 19 is a diagram showing the similarity between various non-sports events and basketball. Similar to FIG. 18, the horizontal axis of FIG. 19 indicates the degree to which the shooting conditions differ from that of basketball, and the vertical axis indicates the degree to which the behavior of the basketball and the spectator differ.

As can be seen from FIG. 19, when the shooting conditions and the behavior of the audience are comprehensively considered, the music (live) event is most similar to basketball, and watching a movie in a movie theater is the most different from basketball. Therefore, for movie theaters, it may be necessary to drastically change the processes shown in FIGS. 5, 10, 11, 14, and the like described above. Further, for events other than movie theaters, it may be necessary to change the processing of FIGS. 5, 10, 11 and 14 as necessary.

As described above, in the present embodiment, since the internal state of the user participating in the event is estimated based on the sensing information, at least one of the user's attributes and behavior can be estimated, and the user is classified into a plurality of clusters. be able to. Further, according to the present embodiment, it is possible to tag a user or a group of users based on the estimation result of the attributes and behaviors of the users participating in the event and the classification result of a plurality of clusters, and the user or the group of users can be tagged. Can provide information suitable for. As a result, for example, it is possible to provide information suitable for each of the fans of the home team, the fans of the away team, and the beginners, and it is possible to further improve the attractiveness of watching sports.

Furthermore, it is possible to perform an effect to increase the degree of excitement based on the degree of excitement and concentration of the user. In addition, by providing the operator tool screen 40 that displays the progress information of the event and the information indicating the time change of the degree of excitement and the degree of concentration of the user, the effect of the effect that could be judged only subjectively in the past can be achieved. It can be analyzed and evaluated in detail and objectively, and can be useful for producing more attractive productions.

Further, in the present embodiment, it is possible to infer the internal state of the user who participates in the event in the remote environment and provide an effect as if he / she is at the event venue. For example, the production method can be changed according to the degree of excitement of the user in the remote environment to make the degree of excitement higher, or the effect as if at the event venue can be performed, or the user at the event venue can be displayed as an avatar. By making it possible to have conversations and high fives with Avata, it is possible to increase the sense of unity and solidarity among fans who are cheering at a remote location, and to increase the motivation to participate in the event.

At least a part of the information processing apparatus described in the above-described embodiment may be configured by hardware or software. When configured by software, a program that realizes at least a part of the functions of the information processing apparatus may be stored in a recording medium such as a flexible disk or a CD-ROM, read by a computer, and executed. The recording medium is not limited to a removable one such as a magnetic disk or an optical disk, and may be a fixed recording medium such as a hard disk device or a memory.

Further, a program that realizes at least a part of the functions of the information processing device may be distributed via a communication line (including wireless communication) such as the Internet. Further, the program may be encrypted, modulated, compressed, and distributed via a wired line or a wireless line such as the Internet, or stored in a recording medium.

The present technology can have the following configurations.
(1) A feature amount extraction unit that extracts a feature amount based on the user's sensing information,
A first estimation unit that estimates at least one of the user's attributes and behavior based on the feature amount,
A clustering unit that classifies the user or a group consisting of a plurality of users into a plurality of clusters based on the estimation by the first estimation unit.
An information processing apparatus including an information processing unit that performs predetermined information processing based on at least one of estimation by the first estimation unit and classification by the clustering unit.
(2) The sensing information includes an image captured by the image pickup device.
The information processing apparatus according to (1), wherein the clustering unit is classified into the plurality of clusters based on the analysis result of the captured image.
(3) The captured image includes the image of the user.
The information processing apparatus according to (2), wherein the feature amount extraction unit extracts the feature amount including at least one of the user's face, posture, body movement, and skeletal information.
(4) The information according to any one of (1) to (3), wherein the feature amount extraction unit extracts the feature amount based on at least one of acoustic data, object recognition, and frequency analysis information. Processing device.
(5) Further provided with an event information acquisition unit for acquiring progress information of the event in which the user participates.
The first estimation unit estimates at least one of the attributes and behaviors of the user participating in the event based on the feature amount and the progress information of the event, any one of (1) to (4). The information processing device described in.
(6) The information according to any one of (1) to (5), comprising a tagging unit for adding tag information in units of the user or the group based on the estimation by the first estimation unit. Processing equipment.
(7) The information processing apparatus according to (6), wherein the information processing unit provides information based on the tag information to the user or the group to which the same tag information is attached.
(8) The information processing apparatus according to (7), wherein the information processing unit provides at least one of information provision and information exchange according to at least one of the user's attributes and actions.
(9) (7) or (8) or (8) or (8) or (8), which comprises a situation image generation unit for generating a situation image in which an identifier indicating the internal state of the user, which is determined based on the sensing information, is added to the image taken by the user. ). The information processing device.
(10) The situation image generation unit, according to (9), which generates the situation image including the progress information of the event in which the user participates and the information about at least one of the degree of excitement and the degree of concentration of the user. Information processing device.
(11) The first estimation unit estimates an internal state including at least one of the degree of excitement and the degree of concentration of the user based on the sensing information.
The information processing apparatus according to any one of (1) to (10), wherein the clustering unit is classified into the plurality of clusters based on the feature amount and the internal state.
(12) The information processing apparatus according to (11), wherein the clustering unit is classified into the plurality of clusters based on the change in the internal state according to the progress information of the event.
(13) Any one of (1) to (12) further comprising a sensing information acquisition unit that acquires the sensing information regarding at least one of the user at the event venue and the user participating in the event from the remote environment. The information processing device described in the section.
(14) The clustering unit classifies the event into the plurality of clusters in units of users who participate in the event from a remote environment or a group consisting of a plurality of users.
The information processing apparatus according to 13).
(15) A second estimation unit that estimates an internal state including at least one of the degree of excitement and the degree of concentration of users who participate in the event from a remote environment based on the sensing information.
The information processing according to (13) or (14), comprising a display control unit that adjusts the size of a display area for displaying information about the event based on the internal state estimated by the second estimation unit. Device.
(16) The display control unit is a seat of the audience at the venue of the event in response to an increase in at least one of the degree of excitement and the degree of concentration of the user on the display unit viewed by a user who participates in the event from a remote environment. The information processing apparatus according to (15), which displays an image that enhances a sense of unity with.
(17) When the internal state of a user who participates in the event from a remote environment satisfies a predetermined condition, the display control unit provides information according to the predetermined condition within a range visible to the user. The information processing apparatus according to any one of (15) to (16), which displays at least one of an image and a visual effect image.
(18) The information processing apparatus according to (17), wherein the visual effect image is a virtual person image of another user who participates in the venue of the event and whose internal state satisfies the predetermined condition.
(19) An information exchange unit in which a user who participates in the event from a remote environment exchanges information with the other person corresponding to the virtual person image via the virtual person image when the predetermined condition is satisfied. The information processing apparatus according to (18).
(20) A step of extracting a feature amount based on the user's sensing information,
A step of estimating at least one of the user's attributes and behavior based on the feature amount, and
Based on the estimation, the step of classifying into a plurality of clusters in units of the user or a group consisting of a plurality of users, and
An information processing method comprising a step of performing predetermined information processing based on at least one of the estimation and the plurality of clusters.

The aspects of the present disclosure are not limited to the individual embodiments described above, but also include various modifications that can be conceived by those skilled in the art, and the effects of the present disclosure are not limited to the above-mentioned contents. That is, various additions, changes and partial deletions are possible without departing from the conceptual idea and purpose of the present disclosure derived from the contents specified in the claims and their equivalents.

1 Information processing device, 2 Information processing system, 3 Court (stadium), 4 Camera, 5 Network equipment, 6 Processing server, 7 DB server, 8 Mobile terminal, 9 Distribution server, 11 Sensing information acquisition unit, 12 Feature quantity extraction unit , 13 1st estimation unit, 14 clustering unit, 15 information processing unit, 16 event information acquisition unit, 17 tagging unit, 18 situation image generation unit, 19 second estimation unit, 20 display control unit, 21 information exchange unit, 22 Cloud storage, 30 operator server (server for tool selection)

Claims

A feature amount extraction unit that extracts feature amounts based on the user's sensing information,
A first estimation unit that estimates at least one of the user's attributes and behavior based on the feature amount,
A clustering unit that classifies the user or a group consisting of a plurality of users into a plurality of clusters based on the estimation by the first estimation unit.
An information processing apparatus including an information processing unit that performs predetermined information processing based on at least one of estimation by the first estimation unit and classification by the clustering unit.
The sensing information includes an image captured by the image pickup device.
The information processing apparatus according to claim 1, wherein the clustering unit is classified into the plurality of clusters based on the analysis result of the captured image.
The captured image includes the image of the user, and the captured image includes the image of the user.
The information processing device according to claim 2, wherein the feature amount extraction unit extracts the feature amount including at least one of the user's face, posture, body movement, skeletal information, and voice.
The information processing device according to claim 1, wherein the feature amount extraction unit extracts the feature amount based on at least one of acoustic data, object recognition, and frequency analysis information.
Further, an event information acquisition unit for acquiring progress information of an event in which the user participates is provided.
The information processing apparatus according to claim 1, wherein the first estimation unit estimates at least one of the attributes and actions of the user who participates in the event based on the feature amount and the progress information of the event.
The information processing apparatus according to claim 1, further comprising a tagging unit for adding tag information in units of the user or the group based on the estimation by the first estimation unit.
The information processing device according to claim 6, wherein the information processing unit provides information based on the tag information to the user or the group to which the same tag information is attached.
The information processing device according to claim 7, wherein the information processing unit provides at least one of information provision and information exchange according to at least one of a user's attributes and actions.
The information processing apparatus according to claim 1, further comprising a situation image generation unit that generates a situation image in which an identifier indicating the internal state of the user, which is determined based on the sensing information, is added to an image taken by the user. ..
The information processing apparatus according to claim 9, wherein the situation image generation unit generates the situation image including progress information of an event in which the user participates and information on at least one of the degree of excitement and the degree of concentration of the user. ..
The first estimation unit estimates an internal state including at least one of the degree of excitement and the degree of concentration of the user based on the sensing information.
The information processing apparatus according to claim 1, wherein the clustering unit is classified into the plurality of clusters based on the feature amount and the internal state.
The information processing device according to claim 11, wherein the clustering unit is classified into the plurality of clusters based on the change in the internal state according to the progress information of the event.
The information processing apparatus according to claim 1, further comprising a sensing information acquisition unit that acquires the sensing information regarding at least one of a user at an event venue and a user participating in the event from a remote environment.
The information processing device according to claim 13, wherein the clustering unit is classified into the plurality of clusters in units of a user who participates in the event from a remote environment or a group consisting of a plurality of users.
A second estimation unit that estimates an internal state including at least one of the degree of excitement and concentration of users who participate in the event from a remote environment based on the sensing information.
The information processing apparatus according to claim 13, further comprising a display control unit that adjusts the size of a display area for displaying information about the event based on the internal state estimated by the second estimation unit.
The display control unit is one with the audience seats of the venue of the event according to the increase in at least one of the degree of excitement and the degree of concentration of the user on the display unit viewed by the user who participates in the event from the remote environment. The information processing apparatus according to claim 15, which displays an image that enhances the experience.
When the internal state of a user who participates in the event from a remote environment satisfies a predetermined condition, the display control unit can provide an information providing image and a visual sense according to the predetermined condition within a range visible to the user. The information processing apparatus according to claim 15, wherein at least one of the effect images is displayed.
The information processing device according to claim 17, wherein the visual effect image is a virtual person image of another user who participates in the venue of the event and whose internal state satisfies the predetermined condition.
A user who participates in the event from a remote environment includes an information exchange unit that exchanges information with the other person corresponding to the virtual person image via the virtual person image when the predetermined condition is satisfied. The information processing apparatus according to claim 18.
Steps to extract features based on user sensing information,
A step of estimating at least one of the user's attributes and behavior based on the feature amount, and
Based on the estimation, the step of classifying into a plurality of clusters in units of the user or a group consisting of a plurality of users, and
An information processing method comprising a step of performing predetermined information processing based on at least one of the estimation and the plurality of clusters.