WO2023120263A1

WO2023120263A1 - Information processing device and information processing method

Info

Publication number: WO2023120263A1
Application number: PCT/JP2022/045588
Authority: WO
Inventors: 広岩瀬; 悠朽木; 俊之荒木
Original assignee: ソニーグループ株式会社
Priority date: 2021-12-20
Filing date: 2022-12-12
Publication date: 2023-06-29

Abstract

An information processing device according to the present disclosure includes: an acquisition unit for acquiring state information indicating the state of a first user in real-time viewing of an event and an event content, which is a video of the event, said first user being a user having viewed the event in real time; and a generation unit for generating highlights of the event content to be provided to a second user using a part of the event content determined by the state information of the first user acquired by the acquisition unit.

Description

Information processing device and information processing method

The present disclosure relates to an information processing device and an information processing method.

Technology is provided that automatically generates highlights (digests) of content such as video (video). For example, a technology has been provided that identifies whether or not a visitor to an event such as a concert, a sports game, or a lecture is smiling and generates highlights (for example, Patent Document 1).

JP 2007-104091 A

However, with the conventional technology, it is not always possible to generate highlights according to the user. In the conventional technology, highlights are generated according to the number of smiles of the visitors, and it may be difficult to generate highlights appropriately when the event does not make people smile. There is room. Therefore, it is desired to generate highlights according to the user.

Therefore, the present disclosure proposes an information processing device and an information processing method capable of generating user-specific highlights.

In order to solve the above problems, an information processing apparatus according to one embodiment of the present disclosure includes state information indicating a state of a first user who has viewed an event in real time, when the event is viewed in real time; providing a second user with an acquisition unit for acquiring event content, which is a video of an event, and a part of the event content determined by the state information of the first user acquired by the acquisition unit; a generation unit that generates highlights of the event content.

FIG. 3 is a diagram showing an example of information processing according to an embodiment of the present disclosure; FIG. FIG. 3 is a diagram showing an example of information processing according to an embodiment of the present disclosure; FIG. 1 is a diagram illustrating a configuration example of an information processing system according to an embodiment of the present disclosure; FIG. It is a figure which shows the example of arrangement|positioning of an audience imaging device. FIG. 4 is a diagram illustrating a configuration example of a highlight generation server according to an embodiment of the present disclosure; FIG. FIG. 3 is a diagram illustrating an example of a dataset storage unit according to an embodiment of the present disclosure; FIG. It is a figure showing an example of a model information storage part concerning an embodiment of this indication. 4 is a diagram illustrating an example of a threshold information storage unit according to an embodiment of the present disclosure; FIG. FIG. 2 is a diagram illustrating a configuration example of an edge viewing terminal according to an embodiment of the present disclosure; FIG. 4 is a flow chart showing a processing procedure of the information processing device according to the embodiment of the present disclosure; It is a figure which shows the functional structural example regarding learning of an information processing system. It is a figure which shows an example of learning and inference regarding the personal attribute determination device of an information processing system. FIG. 10 is a diagram showing an example of learning and inference regarding the action index determiner of the information processing system; FIG. 2 illustrates an example of learning and reasoning for a highlight scene predictor of an information processing system; FIG. 4 is a diagram illustrating a functional configuration example related to highlight generation of an information processing system; FIG. 10 is a flowchart showing a processing procedure regarding highlight generation; FIG. FIG. 10 is a diagram illustrating an example of superimposed display of wipes on highlights; FIG. 10 is a diagram illustrating an example of angle estimation of a highlight; It is a figure which shows an example of presentation of the information regarding an audience. 1 is a hardware configuration diagram showing an example of a computer that implements functions of an information processing apparatus; FIG.

Below, embodiments of the present disclosure will be described in detail based on the drawings. The information processing apparatus and information processing method according to the present application are not limited to this embodiment. Further, in each of the following embodiments, the same parts are denoted by the same reference numerals, thereby omitting redundant explanations.

The present disclosure will be described according to the order of items shown below.
1. Embodiment 1-1. Outline of information processing according to embodiment of present disclosure 1-1-1. First example (first user = second user)
1-1-2. Second example (first user ≠ second user)
1-1-3. Background and Effects 1-2. Configuration of information processing system according to embodiment 1-2-1. Arrangement example of spectator imaging device 1-3. Configuration of Information Processing Apparatus According to Embodiment 1-4. Configuration of terminal device according to embodiment 1-5. Information processing procedure according to the embodiment 1-6. Configuration and processing of information processing system 1-6-1. Functional configuration example related to learning of information processing system 1-6-2. Learning and reasoning about personal attribute determiner 1-6-3. Learning and reasoning about action index determiner 1-6-4. Learning and reasoning about highlight scene predictor 1-6-5. Functional configuration example related to highlight generation of information processing system 1-7. Example of highlight generation process flow 1-8. Processing example 1-8-1. Highlight Scene Prediction by Action Index Determinator 1-8-2. Angle estimation example 1-8-3. Presentation example 1-9. Examples of applications, modifications, effects, etc. 2. Other Embodiments 2-1. Other configuration examples 2-2. Others 3. Effects of the present disclosure 4 . Hardware configuration

[1. embodiment]
[1-1. Overview of information processing according to the embodiment of the present disclosure]
1 and 2 are diagrams illustrating an example of information processing according to an embodiment of the present disclosure. Although the system configuration and the like will be described later, the information processing according to the embodiment of the present disclosure is realized by the information processing system 1 including the highlight generation server 100, the plurality of edge viewing terminals 10, and the like. In the following description, a sports event such as a basketball game will be described as an example of an event for which highlights are generated (hereinafter also referred to as a "target event"). Here, the highlight is a video generated using video content (also referred to as “event content”) that captures the event, and is, for example, a digest video that is shorter than the event content. The sports here also include e-sports (electronic sports) performed using electronic devices (computers). Note that the target event is not limited to sports, and may be various events in which there are users (also referred to as “spectators”) who watch the event. A spectator is a user watching or watching an event.

For example, the target events may be music events such as live performances and concerts, events involving the improvisation of creative works such as calligraphy and painting, and art-related events such as plays, musicals, vaudeville, and live comedy. Also, the target event may be a lecture, talk show, seminar, or the like. Also, the target event may be an event in which a large number of people view content, such as a movie screening event.

Also, the target event is not limited to an event that takes place in a real space as described above, but may be an event that takes place in a virtual space. For example, the target event may be a virtual event such as a live music performance held within an online game. As described above, the event (target event) targeted for highlight generation by the information processing system 1 may be any event that can be targeted for highlight generation.

Also, hereinafter, the user who watched the event in real time may be referred to as the "first user", and the user to whom the highlights are provided may be referred to as the "second user". That is, when a user to whom highlights are provided has viewed an event in real time, the first user and the second user may be the same user. Here, real-time viewing of an event means, for example, viewing the event at the date and time (period) when the event is held. FIGS. 1 and 2 show two cases, ie, the case where the entire event is viewed in real time and the case where no event is viewed in real time. In some cases, only part of the event is viewed in real time, and processing in that case will be described later.

Further, hereinafter, image information such as a video (moving image) captured by the first user when viewing the event in real time is used as information indicating the state of the first user viewing the event in real time (also referred to as “status information”). A case where it is used will be described as an example. Note that the state information is not limited to image information obtained by imaging the first user, and may be any information as long as it indicates the state of the first user. For example, the state information may be biometric information detected about the first user, such as information such as the first user's heartbeat, body temperature, and breathing.

From here, an overview of the highlight-related services provided by the information processing system 1 will be described with reference to FIGS. 1 and 2. FIG. 1 and 2 are diagrams illustrating an example of information processing according to an embodiment of the present disclosure. Specifically, FIGS. 1 and 2 show an example of highlight generation processing executed by a highlight generation server 100, which is an example of an information processing apparatus. 1 and 2 show a case where event A, which is a sporting event, is the target event. That is, FIGS. 1 and 2 show a case where the video (event content) captured by event A is the content targeted for highlight generation (also referred to as “target content”). 1 and 2 show the case where the model M3 used for highlight scene prediction is used, and the details of the model M3 will be described later.

[1-1-1. First example (first user = second user)]
First, the processing example (first example) shown in FIG. 1 will be described. Also, FIG. 1 shows a case where a user to whom highlights are provided has viewed an event in real time, that is, a case where the first user and the second user are the same user. As described above, FIG. 1 shows an example of highlight generation processing in the case where the user to whom highlights are provided is the user U1 and the target event is viewed in real time. That is, FIG. 1 shows a case where user U1 is a user who watched event A in real time (real time spectator).

The highlight generation server 100 inputs the input data IND1, which is the data of the user U1, to the model M3 (step S11). For example, the highlight generation server 100 inputs the input data IND1 based on the information (state information) indicating the state of the event A viewed by the user U1 in real time to the model M3. For example, the state information of the user U1 includes image information such as a video (moving image) of the user U1 when viewing the event A in real time. In order to simplify the explanation, FIG. 1 shows a case where user U1's state information is used as input data IND1. That is, in FIG. 1, the image of the user U1 when viewing the event A in real time is used as the input data IND1. Information (input information) to be input to the model M3, such as the input data IND1, may be information relating to feature amounts generated based on the state information of the first user, which will be described later.

The model M3 to which the input data IND1 is input outputs the output data OD1 (step S12). The model M3 outputs the score used for highlight generation of the event A as the output data OD1. For example, model M3 outputs a score corresponding to each point in time during event A. For example, model M3 outputs a score corresponding to each time point in the hour if event A is one hour in duration. Note that the model M3 may output a score corresponding to each time point (for example, 1 minute, 2 minutes, 3 minutes, etc.) at predetermined intervals (for example, 1 minute intervals) during the period of the event A. A continuous score (waveform) may be output for the period.

The highlight generation server 100 uses the output data OD1 to determine a period to be highlighted (also referred to as a "highlight target period") (step S13). The highlight generation server 100 compares the score output by the model M3 with a predetermined threshold value to determine the highlight target period of the highlight to be provided to the user U1. For example, the highlight generation server 100 determines the highlight target period of the highlights to be provided to the user U1 based on the period during which the score is equal to or greater than a predetermined threshold. For example, the highlight generation server 100 determines a period during which the score is equal to or greater than a predetermined threshold as the highlight target period of the highlights to be provided to the user U1. In FIG. 1, the highlight generation server 100 determines a period such as 1-3 minutes, 15-20 minutes, etc. as the highlight target period of the highlight to be provided to the user U1, as shown in the target period information PTD1.

The highlight generation server 100 generates highlights to be provided to the user U1 based on the determined highlight target period (step S14). In FIG. 1, the highlight generation server 100 uses target period information PTD1 for user U1 and target content TCV1, which is video of event A, to generate highlight HLD1 for user U1. The highlight generation server 100 generates highlights HLD1 for the user U1 by using portions of the target content TCV1 that correspond to periods such as 1-3 minutes and 15-20 minutes indicated in the target period information PTD1. That is, the highlight generating server 100 generates, as the highlight HLD1 for the user U1, the content extracted from the target content TCV1 corresponding to a period of 1-3 minutes, 15-20 minutes, or the like. Highlight HLD1 for user U1 is a video of event A that includes only periods of 1-3 minutes, 15-20 minutes, etc. that are estimated to be appropriate highlights for user U1.

In this way, when the user U1 (second user) is the user (first user) who is watching in real time, the highlight generation server 100 provides the highlight generation server 100 with the user U1 based on the state information of the user U1. generate light. Thereby, the highlight generation server 100 can generate appropriate highlights for the user U1. The highlight generation server 100 transmits the generated highlight HLD1 for the user U1 to the edge viewing terminal 10 used by the user U1. Then, the edge viewing terminal 10 used by the user U1 outputs (reproduces) the highlight HLD1. Thereby, in the information processing system 1, the user U1 can browse the highlights customized for himself/herself.

[1-1-2. Second example (first user ≠ second user)]
Next, a processing example (second example) shown in FIG. 2 will be described. Also, FIG. 2 shows a case where a user to whom highlights are provided is not viewing the event in real time, that is, a case where the first user and the second user are different users. As described above, FIG. 2 shows an example of highlight generation processing when the user to whom highlights are provided is the user U2 and the target event is not being viewed in real time. That is, FIG. 2 shows a case where the user U2 is not the user (real-time spectator) who watched the event A in real-time. For example, the user U2 is a user (also referred to as a “remote viewer”) who is located remotely from the venue of the event and views (views) content. Note that description of the same points as in FIG. 1 will be omitted as appropriate.

In FIG. 2, user U2 is not a real-time spectator of event A, so the highlight generation server 100 performs processing with a user similar to user U2 as the first user. The highlight generating server 100 takes a user who is similar to the user U2 and who is a real-time spectator of the event A as the first user, and inputs the input data IND2, which is the first user's data, to the model M3 (step S21). For example, the highlight generation server 100 determines, among the real-time spectators of the event A, a user similar to the attributes of the user U2 as the first user. For example, the highlight generation server 100 determines a first user among the real-time spectators of event A whose demographic attribute or psychographic attribute is similar to that of user U2. For example, among the real-time spectators of event A, the highlight generation server 100 is similar to user U2 in terms of age, gender, preferences such as favorite objects (teams to support, etc.), family structure, income, lifestyle, etc. The user is determined as the first user.

In FIG. 2, in order to simplify the explanation, a case where the highlight generation server 100 determines a user (referred to as "user U50") whose age and gender match those of user U2 as the first user will be described as an example. For example, the highlight generation server 100 determines (identifies) the real-time spectators of the event A using the information indicating the users who are associated with each event and have viewed the event in real time. For example, the highlight generation server 100 may determine (identify) users similar to user U2 by comparing attribute information associated with each user. Note that the highlight generation server 100 may use a model to determine attributes, but this point will be described later.

For example, the highlight generation server 100 inputs input data IND2 based on information (state information) indicating the state of real-time viewing of event A by user U50, who is the first user corresponding to user U2, to model M3. For example, the state information of the user U50 includes image information such as a video (moving image) of the user U50 when viewing the event A in real time. In FIG. 2, a video image of the user U50 when viewing the event A in real time is used as the input data IND2.

The model M3 to which the input data IND2 is input outputs the output data OD2 (step S22). The model M3 outputs the score used for highlight generation of the event A as the output data OD2. For example, model M3 outputs a score corresponding to each point in time during event A.

The highlight generation server 100 determines a period to be highlighted (highlight target period) using the output data OD2 (step S23). The highlight generation server 100 compares the score output by the model M3 with a predetermined threshold to determine the highlight target period of the highlight to be provided to the user U2. In FIG. 2, the highlight generation server 100 determines periods such as 5-10 minutes, 25-30 minutes, etc., as the highlight target periods of the highlights to be provided to the user U2, as indicated by the target period information PTD2.

The highlight generation server 100 generates highlights to be provided to the user U2 based on the determined highlight target period (step S24). In FIG. 2, the highlight generation server 100 uses target period information PTD2 for user U2 and target content TCV1, which is video of event A, to generate highlight HLD2 for user U2. The highlight generation server 100 generates highlights HLD2 for the user U2 using portions of the target content TCV1 that correspond to periods such as 5-10 minutes and 25-30 minutes indicated in the target period information PTD2. The highlight generation server 100 generates a video of the event A including only the period of 5-10 minutes, 25-30 minutes, etc. as the highlight HLD2 for the user U2.

As described above, if the user U2 (second user) is not the user (first user) who is watching in real time, the highlight generation server 100 may determine the status of the user U2 based on the state information of the users similar to the user U2. Generate highlights to provide to . As a result, the highlight generation server 100 can generate appropriate highlights for the user U2 even when the second user is not viewing in real time. The highlight generation server 100 transmits the generated highlight HLD2 for the user U2 to the edge viewing terminal 10 used by the user U2. Then, the edge viewing terminal 10 used by the user U2 outputs (reproduces) the highlight HLD2. Thereby, in the information processing system 1, the user U2 can browse the highlights customized for himself/herself.

[1-1-3. Background and effects, etc.]
There is a growing need to automatically generate highlight videos of sports and live performances, reduce production costs, and provide videos faster to users. Conventionally, there is a method of analyzing the video of the game itself and the video of the players with AI (artificial intelligence) and extracting the highlight scene, but this method requires the development of different AI depending on the type of sports competition and live performance, which is costly. It is difficult to suppress the increase in For example, in extracting scenes by image analysis of game footage, the actions or states to be highlighted (excited) by different players, etc., differ depending on the sport, and are individually ruled or learned. Data collection, analysis, model learning, etc. are required.

Also, for example, there is a method of extracting highlight scenes from the cheering of the audience, but this method cannot respond to individual differences and attribute differences in scenes that highlight viewers want to see. In addition, for example, scene extraction based on cheers can only provide excitement for the entire venue or for each area depending on the microphone arrangement environment, and it is not possible to meet the needs of scenes that viewers want to see with different highlights depending on individual viewers and attributes.

Therefore, in the information processing system 1, the state information of the user (spectator) watching the event in real time is used to generate the highlight, thereby analyzing the video of the event (event content) itself. You can generate highlights without That is, in the information processing system 1, by generating highlights based on the information of users (spectators) who are watching the event in real time instead of the video of the event, the highlights can be appropriately highlighted regardless of the type of the event. can be generated.

For example, in the automatic generation of highlights (moving images) of events such as sports, the information processing system 1 selects scenes according to the attributes of viewers based on recognition and analysis of images (images) of spectators, generate light. For example, the information processing system 1 includes a personal attribute determiner, a behavior index determiner such as excitement or degree of concentration, and a highlight scene prediction based on a video of spectators (or remote viewers) at an event venue such as a sport. Learn (generate) a device. For example, the information processing system 1 uses the skeleton, face recognition information, motion detection, line of sight, etc. obtained by image recognition of the image of the audience (or remote viewer) at the event venue as feature amounts, and uses the personal attribute determination device, the behavior Learn (generate) an index determiner and a highlight scene predictor. For example, the information processing system 1 learns (generates) a personal attribute determiner, an action index determiner, and a highlight scene predictor by supervised learning or rule base.

For example, the information processing system 1, according to the real-time watching time of the remote viewer (or the spectator at the venue) watching the highlight and the personal attribute determined by the personal attribute determiner, the viewer himself or the real-time watching of the same attribute generate highlights corresponding to (a set of) spectators. For example, the information processing system 1 provides a viewer (user) viewing highlights with highlights generated using a highlight scene predictor (or behavior index determiner). In addition, the information processing system 1 recognizes features obtained by image recognition of a group of spectators at the venue who have the same attributes as the viewers viewing the highlights (scenes of the highlights) extracted on the time axis described above. The optimal gaze point position and angle may be estimated from the amount, and the camera work of the highlight scene may be determined. Note that these points will be described later.

[1-2. Configuration of information processing system according to embodiment]
The information processing system 1 shown in FIG. 3 will be described. As shown in FIG. 3 , the information processing system 1 includes a highlight generation server 100 , a plurality of edge viewing terminals 10 , a content distribution server 50 and an audience video collection server 60 . The highlight generation server 100, each of the plurality of edge viewing terminals 10, the content distribution server 50, and the spectator video collection server 60 are communicatively connected via a predetermined communication network (network N) by wire or wirelessly. be done. FIG. 3 is a diagram illustrating a configuration example of an information processing system according to the embodiment;

Although only three edge viewing terminals 10 are illustrated in FIG. 3 , the information processing system 1 may include four or more edge viewing terminals 10 . In FIG. 3, each edge viewing terminal 10 may be described as an edge viewing terminal 10a, an edge viewing terminal 10b, and an edge viewing terminal 10c in order to distinguish and describe each edge viewing terminal 10. FIG. Note that the edge viewing terminal 10a, the edge viewing terminal 10b, the edge viewing terminal 10c, and the like will be referred to as the "edge viewing terminal 10" when they are not distinguished from each other. Further, the information processing system 1 shown in FIG. 3 may include a plurality of highlight generation servers 100, a plurality of content distribution servers 50, and a plurality of spectator video collection servers 60. FIG.

The highlight generation server 100 is a computer used to provide highlight services to users. The highlight generation server 100 generates highlights of event content to be provided to the second user, using part of the event content determined by the state information indicating the state of the first user's event during real-time viewing.

The edge viewing terminal 10 is a computer used by the user. For example, edge viewing terminals 10 are utilized by remote viewers or spectators. For example, the edge viewing terminal 10 is used by a user who accesses content such as web pages displayed on a browser and content for applications. For example, the edge viewing terminal 10 is used by the user to browse content.

The edge viewing terminal 10 may be, for example, a notebook PC (Personal Computer), a tablet terminal, a desktop PC, a smartphone, a smart speaker, a television, a mobile phone, a PDA (Personal Digital Assistant), or other device. Note that the edge viewing terminal 10 may be referred to as a user hereinafter. That is, hereinafter, the user can also be read as the edge viewing terminal 10 .

The edge viewing terminal 10 outputs information about the event. The edge viewing terminal 10 outputs information related to various contents such as videos and highlights of the event. The edge viewing terminal 10 displays the image (video) of the highlight and outputs the audio of the highlight. For example, the edge viewing terminal 10 transmits user's utterances and images (video) to the highlight generation server 100 and receives highlight audio and images (video) from the highlight generation server 100 . The edge viewing terminal 10 transmits the captured video of the user to the highlight generation server 100 .

The edge viewing terminal 10 accepts input from the user. The edge viewing terminal 10 receives voice input by user's utterance and input by user's operation. The edge viewing terminal 10 may be any device as long as it can implement the processing in the embodiments. The edge viewing terminal 10 may be any device as long as it has a function of displaying content information and outputting audio.

In FIG. 3, reference numerals indicating the camera 14 and the display unit 15 are attached only to the edge viewing terminal 10a, and the reference numerals indicating the cameras 14 and the display unit 15 of the other

edge viewing terminals

10b and 10c are omitted. , each edge viewing terminal 10 has a camera 14 and a display unit 15 .

The content distribution server 50 is a server device (computer) that provides a service for distributing content of photographed events. Note that the content distribution server 50 is the same as a server having a function of distributing content, so detailed description thereof will be omitted.

The spectator video collection server 60 is a server device (computer) that collects videos of spectators watching the event in real time. The spectator video collection server 60 transmits the collected video to the highlight generation server 100 . Note that the spectator video collection server 60 is the same as the content distribution server 50 except that the object to be photographed is the spectator, so a detailed description thereof will be omitted.

In addition, in the information processing system 1 shown in FIG. 3, the imaging equipment group FIA, which is spectator imaging equipment such as a camera for photographing the game, performers, etc., and the sound of the game, performers, etc. A sound collecting device group SCD, which is a content sound collecting device such as a microphone, is arranged. Information collected by the imaging device group FIA and the sound collecting device group SCD is transmitted by the content distribution server 50 to the highlight generation server 100 . In the information processing system 1 shown in FIG. 3, an imaging equipment group SIA, which is spectator imaging equipment such as a camera for capturing an image of an audience at the venue, is arranged at a sports or live venue. The spectator video collection server 60 transmits the information collected by the imaging device group SIA to the highlight generation server 100 .

In the information processing system 1 shown in FIG. 3, the environment for remote viewing in which content is viewed in real time or in highlights from a location other than the site consists of the display unit 15, which is a display device for viewing content, and the remote viewer himself/herself. It is composed of an edge viewing terminal 10 including a camera 14 or the like, which is a viewer imaging device such as a camera for viewing. In the information processing system 1 shown in FIG. 3, the image of the remote viewer is sent to the highlight generation server 100 through the network N. In the example shown in FIG. Also, during highlight viewing, the edge viewing terminal 10 can receive and view highlights (moving images) individually distributed from the highlight generation server 100 .

In the information processing system 1 shown in FIG. 3, the highlight generation server 100 generates video and audio data of content, captured image data of local spectators, and captured images of remote viewers viewing content in real time or in highlight mode. Data are collected. The highlight generation server 100 generates a highlight video optimized for the individual highlight viewer (second user) and distributes it to the individual highlight viewer.

The device configuration (device configuration) of the information processing system 1 is not limited to the configuration described above, and any device configuration can be adopted. That is, the information processing system 1 may have a configuration other than that described above. For example, the highlight generation server 100 may be integrated with any one of the edge viewing terminal 10, the content distribution server 50, and the spectator video collection server 60. That is, any one of the edge viewing terminal 10, the content distribution server 50, or the spectator video collection server 60 may have the function of the highlight generation server 100. FIG.

[1-2-1. Arrangement example of spectator imaging equipment]
Here, an example of arrangement of spectator imaging devices (imaging device group SIA in FIG. 3) in the information processing system 1 will be described with reference to FIG. FIG. 4 is a diagram showing an example of arrangement of spectator imaging devices. The spectator imaging device may be a 4K video camera.

FIG. 4 shows a case where at least one spectator imaging device is arranged at each of four points PT1 to PT4 in a basketball game venue. For example, the spectator imaging device placed at point PT1 photographs spectators (users) positioned in area AR1. Also, the spectator imaging device placed at the point PT2 photographs the spectators (users) positioned in the area AR2. Also, the spectator imaging device placed at the point PT3 photographs the spectators (users) positioned in the area AR3. Also, the spectator imaging device placed at the point PT4 photographs the spectators (users) positioned in the area AR4. Areas AR1 to AR4 in FIG. 4 show outlines of shooting areas from points PT1 to PT4. For example, areas AR1 to AR4 may cover all the spectators at the venue. Also, each of the areas AR1 to AR4 may partially overlap with another area. Moreover, FIG. 4 is merely an example, and the spectator imaging device may be arranged in any manner as long as it is possible to photograph a desired spectator.

[1-3. Configuration of information processing device according to embodiment]
Next, the configuration of the highlight generation server 100, which is an example of an information processing apparatus that executes information processing according to the embodiment, will be described. FIG. 5 is a diagram illustrating a configuration example of a highlight generation server according to an embodiment of the present disclosure;

As shown in FIG. 5, the highlight generation server 100 has a communication section 110, a storage section 120, and a control section . Note that the highlight generation server 100 includes an input unit (for example, a keyboard, a mouse, etc.) that receives various operations from the administrator of the highlight generation server 100, and a display unit (for example, a liquid crystal display, etc.) for displaying various information. ).

The communication unit 110 is implemented by, for example, a NIC (Network Interface Card) or the like. The communication unit 110 is connected to the network N (see FIG. 3) by wire or wirelessly, and exchanges information with other information processing devices such as the edge viewing terminal 10, the content distribution server 50, and the spectator video collection server 60. Send and receive. Also, the communication unit 110 may transmit and receive information to and from a user terminal (not shown) used by the user.

The storage unit 120 is implemented by, for example, a semiconductor memory device such as RAM (Random Access Memory) or flash memory, or a storage device such as a hard disk or optical disk. The storage unit 120 according to the embodiment has a dataset storage unit 121, a model information storage unit 122, a threshold information storage unit 123, and a content information storage unit 124, as shown in FIG.

The data set storage unit 121 according to the embodiment stores various information related to data used for learning. The dataset storage unit 121 stores datasets used for learning. FIG. 6 is a diagram illustrating an example of a dataset storage unit according to an embodiment of the present disclosure; FIG. 6 shows an example of the dataset storage unit 121 according to the embodiment. In the example of FIG. 6, each table in the data set storage unit 121 includes items such as "target model ID", "data ID", "data", "label", and "date and time".

The data set storage unit 121 stores data used for learning each of a plurality of models in association with the model to be learned, such as tables TB1, TB2, TB3, etc. in FIG. Although only three tables TB1, TB2, and TB3 are illustrated in FIG. 6, the data set storage unit 121 may include tables corresponding to the number of learned models.

"Target model ID" indicates identification information for identifying a model to be learned (target model). “Data ID” indicates identification information for identifying data used in the learning process of the target model. "Data" indicates data identified by a data ID.

"Label" indicates the label (correct label) attached to the corresponding data. For example, the "label" may be information (correct answer information) indicating the classification (category) of the corresponding data. For example, "label" is correct information (correct label) corresponding to the output of the target model.

"Date and time" indicates the time (date and time) related to the corresponding data. In the example of FIG. 6, "DA1" or the like is shown, but the "date and time" may be a specific date and time such as "17:48:35 on December 15, 2021". Information indicating from which model learning the data started to be used may be stored, such as "use started from model learning of version XX".

The example of FIG. 6 indicates that the data in the table TB1 is the data used for learning the target model (model M1) identified by the target model ID "M1". The data used for learning the model M1 includes a plurality of data identified by data IDs "DID1", "DID2", "DID3", and the like. For example, each data (data DT1, DT2, DT3, etc.) identified by data IDs "DID1", "DID2", "DID3", etc. is information used for learning of the model M1 that performs personal attribute determination. For example, data DT1, DT2, DT3, etc. are input data for the model M1, and labels LB1, LB2, LB3, etc. corresponding to each data indicate the desired output of the model M1 when each data is input.

Also, the data in the table TB2 indicates that it is the data used for learning the target model (model M2) identified by the target model ID "M2". Model M2 used for action index determination is learned using the data in table TB2. The data in the table TB3 are data used for learning the target model (model M3) identified by the target model ID "M3". Model M3 used for highlight scene prediction is learned using the data in table TB3.

It should be noted that the data set storage unit 121 may store various information not limited to the above, depending on the purpose. For example, the data set storage unit 121 may store data such as whether each data is learning data or evaluation data so as to be identifiable. For example, the data set storage unit 121 may store various information related to various data such as learning data used for learning and evaluation data used for accuracy evaluation (calculation). For example, the data set storage unit 121 stores learning data and evaluation data in a distinguishable manner. The data set storage unit 121 may store information identifying whether each data is learning data or evaluation data. The highlight generation server 100 learns a model based on each data used as learning data and the correct answer information. The highlight generation server 100 calculates the accuracy of the model based on each data used as the evaluation data and the correct answer information. The highlight generation server 100 calculates the accuracy of the model by collecting the result of comparing the output result output by the model when the evaluation data is input with the correct answer information.

The model information storage unit 122 according to the embodiment stores information about models. For example, the model information storage unit 122 stores information (model data) indicating the structure of a model (network). FIG. 7 is a diagram illustrating an example of a model information storage unit according to an embodiment of the present disclosure; FIG. 7 shows an example of the model information storage unit 122 according to the embodiment. In the example shown in FIG. 7, the model information storage unit 122 includes items such as "model ID", "usage", and "model data".

"Model ID" indicates identification information for identifying a model. "Use" indicates the use of the corresponding model. "Model data" indicates model data. FIG. 7 shows an example in which conceptual information such as "MDT1" is stored in "model data", but in reality, various types of information that make up the model, such as network information and functions included in the model, are stored. included.

In the example shown in FIG. 7, the model (model M1) identified by the model ID "M1" indicates that the application is "personal attribute determination". Model M1 indicates that it is a model used for personal attribute determination. It also indicates that the model data of the model M1 is the model data MDT1. Also, the model (model M2) identified by the model ID "M2" indicates that the application is "behavior index determination". Model M2 indicates that it is a model used for action index determination. It also indicates that the model data of the model M2 is the model data MDT2.

Also, the model (model M3) identified by the model ID "M3" indicates that the application is "highlight scene prediction". Model M3 indicates that it is a model used for highlight scene prediction. It also indicates that the model data of the model M3 is the model data MDT3. Also, the model (model M11) identified by the model ID "M11" indicates that the application is "angle estimation". Model M11 indicates that it is a model used for angle estimation. It also indicates that the model data of the model M11 is the model data MDT11.

Note that the model information storage unit 122 may store various types of information, not limited to the above, depending on the purpose. For example, the model information storage unit 122 stores parameter information of the model learned (generated) by the learning process.

The threshold information storage unit 123 according to the embodiment stores various information regarding thresholds. For example, the threshold information storage unit 123 stores various information related to thresholds used for comparison with model outputs (scores, etc.). FIG. 8 is a diagram illustrating an example of a threshold information storage unit according to an embodiment of the present disclosure; The threshold information storage unit 123 shown in FIG. 8 includes items such as "threshold ID", "usage", and "threshold".

"Threshold ID" indicates identification information for identifying the threshold. "Usage" indicates the usage of the threshold. Also, "threshold" indicates a specific value of the threshold identified by the corresponding threshold ID.

In the example of FIG. 8, the threshold (threshold TH1) identified by the threshold ID "TH1" is stored in association with information indicating that it is used for viewing determination. For example, the threshold TH1 is used to determine whether the user is viewing in real time. The value of the threshold TH1 indicates "VL1". In the example of FIG. 8, the value of the threshold TH1 is a specific numerical value (for example, 0.4, 0.6, etc.), although it is indicated by an abstract code such as "VL1".

Information indicating that the threshold (threshold TH2) identified by the threshold ID "TH2" is used for highlight generation is associated and stored. For example, the threshold TH2 is used to determine which part of the event content is used for highlighting. For example, the threshold TH2 is used to determine whether or not to include a video of event content at a certain point in time. The value of the threshold TH2 indicates "VL2". In the example of FIG. 8, the value of the threshold TH1 is a specific numerical value (for example, 0.5, 0.8, etc.), although it is indicated by an abstract code such as "VL2".

It should be noted that the threshold information storage unit 123 may store various types of information, not limited to the above, depending on the purpose.

The content information storage unit 124 according to the embodiment stores various types of information regarding content displayed on the edge viewing terminal 10 . For example, the content information storage unit 124 stores information about content displayed by an application (also referred to as “app”) installed in the edge viewing terminal 10 .

The content information storage unit 124 stores event content, which is video of the event. The content information storage unit 124 stores the event content of the event in association with the event. Note that the above is merely an example, and the content information storage unit 124 may store various types of information according to the content for which response candidates are displayed. The content information storage unit 124 stores various kinds of information necessary for providing content to the edge viewing terminal 10, displaying response candidates on the edge viewing terminal 10, and the like.

In addition, the storage unit 120 may store various information other than the above. For example, the storage unit 120 stores various information regarding highlight generation. The storage unit 120 stores various data for providing data to the edge viewing terminal 10 . For example, the storage unit 120 stores various information used to generate information displayed on the edge viewing terminal 10 . For example, the storage unit 120 stores information about content displayed by an application (content display application or the like) installed in the edge viewing terminal 10 . For example, the storage unit 120 stores information about content displayed by a content display application. Note that the above is merely an example, and the storage unit 120 may store various types of information used to provide the highlight service to the user.

The storage unit 120 stores attribute information and the like of each user. The storage unit 120 stores user information corresponding to information identifying each user (user ID, etc.) in association with each other. For example, the storage unit 120 stores information indicating personal attributes determined by the model M1 in association with the user.

Return to Figure 5 and continue the explanation. The control unit 130, for example, stores a program (for example, an information processing program according to the present disclosure) stored inside the highlight generation server 100 by a CPU (Central Processing Unit) or MPU (Micro Processing Unit). Access Memory) etc. is executed as a work area. Also, the control unit 130 is implemented by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

As shown in FIG. 5, the control unit 130 includes an acquisition unit 131, a learning unit 132, an image processing unit 133, a generation unit 134, and a transmission unit 135. Realize or perform an action. Note that the internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 5, and may be another configuration as long as it performs information processing described later. Further, the connection relationship between the processing units of the control unit 130 is not limited to the connection relationship shown in FIG. 5, and may be another connection relationship.

The acquisition unit 131 acquires various types of information. Acquisition unit 131 acquires various types of information from an external information processing device. The acquisition unit 131 acquires various types of information from the edge viewing terminal 10 , the content distribution server 50 and the spectator video collection server 60 . The acquisition unit 131 acquires information detected by the edge viewing terminal 10 from the edge viewing terminal 10 .

The acquisition unit 131 receives information from the content distribution server 50 or the spectator video collection server 60 . The acquisition unit 131 acquires requested information from the content distribution server 50 or the spectator video collection server 60 . Acquisition unit 131 acquires video from content distribution server 50 . The acquisition unit 131 acquires the video imaged by the imaging device group FIA from the content distribution server 50 . The acquisition unit 131 acquires the sound detected by the sound collection device group SCD from the content distribution server 50 . The acquisition unit 131 acquires the video from the spectator video collection server 60 . Acquisition unit 131 acquires various types of information from storage unit 120 . The acquisition unit 131 acquires the video captured by the imaging device group SIA from the spectator video collection server 60 .

The acquisition unit 131 acquires state information indicating the state of the real-time viewing of the event by the first user who has viewed the event in real-time. Acquisition unit 131 acquires event content, which is video of an event. Acquisition unit 131 acquires event content, which is video of an event.

The acquisition unit 131 acquires the status information of the first user who watched in real time at the venue of the event. Acquisition unit 131 acquires the state information of the first user who watched the sports or art event in real time. Acquisition unit 131 acquires the state information of the first user who watched the event locally. Acquisition unit 131 acquires state information including image information of the first user. Acquisition unit 131 acquires state information including biometric information of the first user.

When the second user is viewing the event in real time, the acquiring unit 131 acquires the state information indicating the state of the second user viewing the event in real time as the state information of the first user. If the second user is not viewing the event in real time, the acquisition unit 131 acquires the state information of the first user who is different from the second user.

The acquisition unit 131 acquires the state information of the first user, who is a user similar to the attributes of the second user. The acquisition unit 131 acquires the state information of the first user who is a user similar to the demographic attributes of the second user. The acquisition unit 131 acquires the state information of the first user who is a user similar in at least one of age and sex to the second user.

The acquisition unit 131 acquires the state information of the first user, who is a user similar to the psychographic attributes of the second user. The acquisition unit 131 acquires the state information of the first user who is a user similar to the second user's preference. The acquisition unit 131 acquires the state information of the first user who is a user whose love interest matches that of the second user.

The learning unit 132 learns various types of information. The learning unit 132 learns various types of information based on information from an external information processing device and information stored in the storage unit 120 . The learning unit 132 learns various types of information based on the information stored in the data set storage unit 121 . The learning unit 132 stores the model generated by learning in the model information storage unit 122 . The learning unit 132 stores the model updated by learning in the model information storage unit 122 .

The learning unit 132 performs learning processing. The learning unit 132 performs various types of learning. The learning unit 132 learns various types of information based on the information acquired by the acquisition unit 131 . The learning unit 132 learns (generates) a model. The learning unit 132 learns various types of information such as models. The learning unit 132 generates a model through learning. The learning unit 132 learns the model using various machine learning techniques. For example, the learning unit 132 learns model (network) parameters. The learning unit 132 learns the model using various machine learning techniques.

The learning unit 132 learns the model using learning data including highlights of past events and status information of users who watched the past events in real time. The learning unit 132 generates various models such as models M1, M2, M3, and M11. For example, the learning unit 132 generates a model M3 for determining highlight scenes. The learning unit 132 learns network parameters. For example, the learning unit 132 learns network parameters of various models such as models M1, M2, M3, and M11. In addition, the learning unit 132 learns network parameters of the model M3 for determining highlight scenes.

The learning unit 132 performs learning processing based on the learning data (teacher data) stored in the data set storage unit 121. The learning unit 132 generates various models such as models M1, M2, M3, and M11 by performing learning processing using the learning data stored in the data set storage unit 121 . For example, the learning unit 132 may generate a model used for image recognition. For example, the learning unit 132 generates the model M1 by learning parameters of the network of the model M1. For example, the learning unit 132 generates the model M3 by learning parameters of the network of the model M3.

The method of learning by the learning unit 132 is not particularly limited. You can learn. Also, for example, a technique based on DNN (Deep Neural Network) such as CNN (Convolutional Neural Network) and 3D-CNN may be used. The learning unit 132 uses a recurrent neural network (RNN) or LSTM (Long Short-Term Memory units), which is an extension of RNN, when targeting time-series data such as moving images (moving images) such as videos. You may use the method based on.

The image processing unit 133 executes various processes related to image processing. The image processing unit 133 executes processing on an image (video) of the user. The image processing unit 133 performs processing on a video image of the spectators of the event. The image processing unit 133 recognizes a person (user) in the video by image recognition processing. For example, the image processing unit 133 detects the orientation of the user's face and the line of sight in the video through image recognition processing.

The image processing unit 133 generates information to be input to the model from the video. For example, the image processing unit 133 generates feature amounts to be input to the model from the video. For example, the image processing unit 133 extracts feature amounts from video by image processing. The image processing unit 133 may extract the feature amount from the video using a model (feature amount extraction model) that outputs the feature amount of the video as input. Note that the above-described processing is an example, and the image processing unit 133 may appropriately use various techniques related to image processing to extract feature amounts from video. Also, when each model receives video (image) itself instead of a feature amount, the highlight generation server 100 does not need to have the image processing unit 133 .

The generation unit 134 generates various types of information. The generation unit 134 generates various types of information based on information from an external information processing device and information stored in the storage unit 120 . The generation unit 134 generates various types of information based on information from other information processing devices such as the edge viewing terminal 10, the content distribution server 50, the spectator video collection server 60, and the like. The generation unit 134 generates various types of information based on the information stored in the data set storage unit 121, the model information storage unit 122, the threshold information storage unit 123, and the content information storage unit 124. FIG. The generation unit 134 generates various information to be displayed on the edge viewing terminal 10 based on the model learned by the learning unit 132 .

The generation unit 134 uses part of the event content determined by the first user's state information acquired by the acquisition unit 131 to generate highlights of the event content to be provided to the second user. The generation unit 134 generates highlights of event content using a model that outputs a score corresponding to the period of the event in response to input of input data based on state information. The generator 134 uses the model to determine part of the event content.

For example, the generation unit 134 generates highlights of event content using a model with state information as input. The generation unit 134 generates highlights of event content using a model that receives an image of a user as an input. For example, the generation unit 134 generates highlights of event content using a model whose input is the feature amount extracted from the state information. The generation unit 134 generates highlights of event content using a model whose input is a feature amount extracted from a video of a user.

The generation unit 134 generates highlights of the event content using part of the determined event content. The generation unit 134 determines a portion of the event content corresponding to the period corresponding to the score equal to or greater than the threshold as part of the event content, and uses the determined portion of the event content to highlight the event content. to generate The generating unit 134 uses the model learned by the learning unit 132 to generate event content highlights.

When the second user is viewing the event in real time, the generation unit 134 generates highlights of the event content to be provided to the second user using a part of the event content determined by the state information of the second user. Generate. If the second user is not viewing the event in real time, the generation unit 134 generates the second event content using part of the event content determined by the state information of the first user who is a user different from the second user. Generate event content highlights for your users.

The generation unit 134 executes processing for generating information to be provided to the edge viewing terminal 10 . Also, the generation unit 134 may generate a display screen (content) to be displayed on the edge viewing terminal 10 as data. For example, the generation unit 134 may generate a screen (content) to be provided to the edge viewing terminal 10 by appropriately using various techniques such as Java (registered trademark). Note that the generation unit 134 may generate a screen (content) to be provided to the edge viewing terminal 10 based on CSS, JavaScript (registered trademark), or HTML format. Also, for example, the generation unit 134 may generate screens (contents) in various formats such as JPEG (Joint Photographic Experts Group), GIF (Graphics Interchange Format), and PNG (Portable Network Graphics).

The transmission unit 135 transmits information to the edge viewing terminal 10. The transmission unit 135 transmits the information generated by the generation unit 134 to the edge viewing terminal 10 . The transmission unit 135 transmits the data generated by the generation unit 134 to the edge viewing terminal 10 . The transmission unit 135 transmits the highlight of the event content generated by the generation unit 134 to the edge viewing terminal 10 used by the second user.

The transmission unit 135 transmits information requesting information to the content distribution server 50 . The transmission unit 135 transmits information indicating information requested to be acquired to the content distribution server 50 . The transmission unit 135 transmits information requesting information to the spectator video collection server 60 . The transmission unit 135 transmits information indicating information requested to be acquired to the spectator video collection server 60 .

[1-4. Configuration of terminal device according to embodiment]
Next, the configuration of the edge viewing terminal 10, which is an example of a terminal device that executes information processing according to the embodiment, will be described. FIG. 9 is a diagram showing a configuration example of an edge viewing terminal according to an embodiment of the present disclosure.

As shown in FIG. 9, the edge viewing terminal 10 includes a communication unit 11, an audio input unit 12, an audio output unit 13, a camera 14, a display unit 15, an operation unit 16, a storage unit 17, and a control unit. a portion 18;

The communication unit 11 is implemented by, for example, a NIC, a communication circuit, or the like. The communication unit 11 is connected to a predetermined communication network (network) by wire or wirelessly, and transmits and receives information to and from an external information processing device. For example, the communication unit 11 is connected to a predetermined communication network by wire or wirelessly, and transmits and receives information to and from the highlight generation server 100 .

The voice input unit 12 functions as an input unit that receives operations by voice (utterance) of the user. The voice input unit 12 is, for example, a microphone or the like, and detects voice. For example, the voice input unit 12 detects user's speech. Note that the voice input unit 12 may have any configuration as long as it can detect the user's speech information necessary for processing.

The audio output unit 13 is realized by a speaker that outputs audio, and is an output device for outputting various types of information as audio. The audio output unit 13 audio-outputs the content provided from the highlight generation server 100 . For example, the audio output unit 13 outputs audio corresponding to information displayed on the display unit 15 . The edge viewing terminal 10 inputs and outputs audio through the audio input section 12 and the audio output section 13 .

The camera 14 has an image sensor (image sensor) that detects images. Camera 14 photographs the user. For example, if the edge viewing terminal 10 is a desktop personal computer (desktop PC), the camera 14 may be a device (main device) in which the control unit 18 is mounted, a display device, or the like (separate device).

When the edge viewing terminal 10 is a desktop personal computer, the camera 14 may be integrated with the display (display device) or may be arranged above the display section 15 . For example, if the edge viewing terminal 10 is a notebook computer (laptop PC), the camera 14 may be built in the edge viewing terminal 10 and arranged above the display section 15 . Also, for example, in the case of a smartphone, the camera 14 may be an in-camera built into the edge viewing terminal 10 .

The display unit 15 is a display screen of a tablet terminal realized by, for example, a liquid crystal display or an organic EL (Electro-Luminescence) display, and is a display device for displaying various information. For example, if the edge viewing terminal 10 is a desktop personal computer, the display unit 15 may be separate (separate device) from a device (main device) in which the control unit 18 is mounted. For example, if the edge viewing terminal 10 is a notebook computer or a smartphone, the display unit 15 may be integrated with a device (main device) in which the control unit 18 is mounted.

The display unit 15 displays various information related to the event. The display unit 15 displays content. The display unit 15 displays various information received from the highlight generation server 100 . The display unit 15 displays highlights of events received from the highlight generation server 100 . The display unit 15 displays content. The display unit 15 displays the video of the event. The display unit 15 displays the highlight of the event.

The operation unit 16 functions as an input unit that receives various user operations. In the example of FIG. 9, the operation unit 16 is a keyboard, mouse, or the like. Also, the operation unit 16 may have a touch panel capable of realizing functions equivalent to those of a keyboard and a mouse. In this case, the operation unit 16 receives various operations from the user through the display screen by the function of a touch panel realized by various sensors. For example, the operation unit 16 receives various operations from the user via the display unit 15 .

For example, the operation unit 16 may receive an operation such as a user's designation operation via the display unit 15 of the edge viewing terminal 10 . As for the detection method of the user's operation by the operation unit 16, the tablet terminal mainly adopts the capacitance method, but there are other detection methods such as the resistive film method, the surface acoustic wave method, the infrared method, and the electromagnetic induction method. Any method may be adopted as long as the user's operation can be detected and the function of the touch panel can be realized.

The keyboard, mouse, touch panel, etc. described above are merely examples, and the edge viewing terminal 10 may have a configuration that accepts (detects) various types of information as input, not limited to the above. For example, the edge viewing terminal 10 may have a line-of-sight sensor that detects the user's line of sight. The line-of-sight sensor detects the user's line-of-sight direction using eye-tracking technology based on detection results from, for example, the camera 14 mounted on the edge viewing terminal 10, an optical sensor, and a motion sensor (all of which are not shown). . The line-of-sight sensor determines a region of the screen that the user is gazing at based on the detected line-of-sight direction. The line-of-sight sensor may transmit line-of-sight information including the determined gaze area to the highlight generation server 100 . For example, the edge viewing terminal 10 may have a motion sensor that detects user gestures and the like. The edge viewing terminal 10 may receive an operation by a user's gesture using a motion sensor.

The storage unit 17 is implemented by, for example, a semiconductor memory device such as RAM (Random Access Memory) or flash memory, or a storage device such as a hard disk or optical disk. The storage unit 17 stores, for example, various information received from the highlight generation server 100 . The storage unit 17 stores, for example, information about an application (for example, a content display application or the like) installed in the edge viewing terminal 10, such as a program.

The storage unit 17 stores user information. The storage unit 17 stores the user's utterance history (speech recognition result history) and action history.

The control unit 18 is a controller, and various programs stored in a storage device such as the storage unit 17 inside the edge viewing terminal 10 are executed by the CPU, MPU, etc., using the RAM as a work area. It is realized by For example, these various programs include programs of applications (for example, content display applications) that perform information processing. Also, the control unit 18 is a controller, and is realized by an integrated circuit such as ASIC or FPGA, for example.

As shown in FIG. 9, the control unit 18 includes an acquisition unit 181, a transmission unit 182, a reception unit 183, and a processing unit 184, and implements or executes the information processing functions and actions described below. . Note that the internal configuration of the control unit 18 is not limited to the configuration shown in FIG. 9, and may be another configuration as long as it performs the information processing described later. Moreover, the connection relationship between the processing units of the control unit 18 is not limited to the connection relationship shown in FIG. 9, and may be another connection relationship.

The acquisition unit 181 acquires various types of information. For example, the acquisition unit 181 acquires various information from an external information processing device. For example, the acquisition unit 181 stores the acquired various information in the storage unit 17 . The acquisition unit 181 acquires user operation information accepted by the operation unit 16 .

The acquisition unit 181 acquires state information indicating the state of the user. The acquisition unit 181 acquires state information including user image information captured by the camera 14 . Acquisition unit 181 acquires utterance information of a user. The acquisition unit 181 acquires user utterance information detected by the voice input unit 12 .

The transmission unit 182 transmits information to the highlight generation server 100 via the communication unit 11 . The transmission unit 182 transmits information about the user to the highlight generation server 100 . The transmission unit 182 transmits information about the user's video captured by the camera 14 to the highlight generation server 100 . The transmitting unit 182 transmits state information indicating the state of the user. The transmission unit 182 transmits state information including user image information captured by the camera 14 . The transmission unit 182 transmits information input by user's speech or operation.

The receiving section 183 receives information from the highlight generation server 100 via the communication section 11 . The receiving unit 183 receives information provided by the highlight generation server 100 . The receiving unit 183 receives content from the highlight generation server 100 . The receiving unit 183 receives highlights from the highlight generation server 100 .

The processing unit 184 executes various types of processing. The processing unit 184 executes processing according to the user's operation accepted by the voice input unit 12 or the operation unit 16 .

The processing unit 184 displays various information via the display unit 15. For example, the processing unit 184 functions as a display control unit that controls display on the display unit 15 . The processing unit 184 outputs various kinds of information as voice through the voice output unit 13 . For example, the processing unit 184 functions as an audio output control unit that controls audio output of the audio output unit 13 .

The processing unit 184 outputs the information received by the acquisition unit 181. The processing unit 184 outputs the content provided by the highlight generation server 100 . Processing unit 184 outputs the content received by acquisition unit 181 via audio output unit 13 or display unit 15 . The processing unit 184 displays content via the display unit 15 . The processing unit 184 outputs the contents as audio through the audio output unit 13 .

The processing unit 184 transmits various information to an external information processing device via the communication unit 11 . The processing unit 184 transmits various information to the highlight generation server 100 . The processing unit 184 transmits various information stored in the storage unit 17 to an external information processing device. The processing unit 184 transmits various information acquired by the acquisition unit 181 to the highlight generation server 100 . The processing unit 184 transmits the sensor information acquired by the acquisition unit 181 to the highlight generation server 100 . The processing unit 184 transmits the user operation information received by the operation unit 16 to the highlight generation server 100 . The processing unit 184 transmits information such as an utterance and an image of the user using the edge viewing terminal 10 to the highlight generation server 100 .

Note that each process performed by the control unit 18 described above may be implemented by, for example, JavaScript (registered trademark). Further, when the processing such as information processing by the control unit 18 described above is performed by a predetermined application, each unit of the control unit 18 may be realized by the predetermined application, for example. For example, processing such as information processing by the control unit 18 may be realized by control information received from an external information processing device. For example, when the display process described above is performed by a predetermined application (for example, a content display application, etc.), the control unit 18 may have an application control unit that controls the predetermined application or a dedicated application, for example.

[1-5. Information processing procedure according to the embodiment]
Next, various information processing procedures according to the embodiment will be described with reference to FIG. 10 . FIG. 10 is a flow chart showing the processing procedure of the information processing device according to the embodiment of the present disclosure. Specifically, FIG. 10 is a flowchart showing the procedure of information processing by the highlight generation server 100, which is an example of an information processing apparatus.

As shown in FIG. 10 , the highlight generation server 100 generates state information indicating the state of the real-time viewing of the event by the first user who has viewed the event in real-time, and event content, which is video of the event. (step S101). The highlight generation server 100 generates highlights of event content to be provided to the second user, using part of the event content determined by the state information of the first user (step S102).

[1-6. Configuration and processing of information processing system]
From now on, the configuration and processing of the information processing system will be described with reference to FIGS. 11 to 15. FIG. Note that the points described below may be applied to either the information processing system 1 according to the first embodiment or the information processing system 1 according to the second embodiment.

[1-6-1. Functional Configuration Example Regarding Learning of Information Processing System]
FIG. 11 will be described. FIG. 11 is a diagram illustrating a functional configuration example regarding learning of the information processing system. In FIG. 11, the dashed line BS indicates a functional interface in the system, the left side of the dashed line BS corresponds to the equipment at the site venue (corresponding to the event site in FIG. 3) or the edge viewing terminal 10 side, and the right side of the dashed line BS is high level. It corresponds to the light generation server 100 side. A dashed line BS indicates an example of allocation of functions in the information processing system 1 . In FIG. 11 , each component shown on the left side of the dashed line BS is implemented by the equipment at the site or the edge viewing terminal 10 . Also, in FIG. 11 , each component shown on the right side of the dashed line BS is implemented by the highlight generation server 100 .

The boundary (interface) of the device configuration in the information processing system 1 is not limited to the dashed line BS, and the functions assigned to the equipment at the venue, the edge viewing terminal 10, the highlight generation server 100, etc. may be combined in any combination. may For example, if the highlight generation server 100 is integrated with any of the edge viewing terminal 10, the content distribution server 50, or the spectator video collection server 60, the interface indicated by the dashed line BS may be omitted.

For example, FIG. 11 shows a learning flow for machine learning a behavior analysis algorithm based on data analysis in advance. In the information processing system 1, local spectators (real-time spectators) watching an event in real time at a sports game venue or a live venue are photographed by an imaging device such as a camera and stored as spectator video data. The imaging device in FIG. 11 corresponds to the audience imaging equipment or the camera 14 of the edge viewing terminal 10 or the like. Note that the information processing system 1 may set viewers watching in real time in a remote environment as learning target real time spectators. The information processing system 1 accumulates feature amounts obtained by performing image recognition processing on spectator photographed data as spectator feature amount data. Image recognition processing in FIG. 11 is executed by the image processing unit 133 of the highlight generation server 100 .

For example, the spectator feature amount data is time-series data of the entire match duration of the feature amount obtained by image recognition processing. For example, the spectator feature amount data is accumulated as time-series data of individual feature amounts for all the spectators who are photographed. Examples of feature values include body part points obtained by skeletal recognition, information indicating facial orientation, etc., facial part points obtained by face recognition, information indicating facial attributes such as smiles and emotions, and information obtained by motion detection. (motion information), or information obtained by line-of-sight detection (line-of-sight information).

For example, the highlight generation server 100 applies audience feature data and teacher data to each learning device to generate a behavior analysis algorithm model. Processing (learning processing) corresponding to each learning device in FIG. 11 is executed by the learning unit 132 of the highlight generation server 100 . In FIG. 11, the highlight generation server 100 generates a model M1, which is a personal attribute determiner. The highlight generation server 100 generates a model M2, which is an action index determiner. The highlight generation server 100 generates model M3, which is a highlight scene predictor. Details of each of the models M1 to M3 will be described later.

For example, the teacher data is a label (correct answer information) that indicates the correct answer that the model is expected to output when the corresponding audience feature data is input to the model. Teacher data may be generated by a content analysis process that analyzes event content. For example, training data is automatically generated from the results of image recognition processing of content video data captured by an imaging device such as a camera, and the acoustic processing results of content audio data recorded with a sound pickup device such as a microphone. Alternatively, the manager of the information processing system 1 may use it as an analysis tool for the development of a match when training data is manually created. In this way, it is possible to create teacher data only manually, so teacher data does not have to be generated from the video and audio of the content, as indicated by the dotted line in FIG. 11 .

[1-6-2. Learning and Inference on Personal Attribute Determinator]
Learning and reasoning for each determiner (model) will now be described. First, an example of learning and inference regarding the personal attribute determiner will be described. FIG. 12 is a diagram showing an example of learning and inference regarding the personal attribute determiner of the information processing system. The part above the dotted line in FIG. 12 shows the process related to model generation in the learning phase, and the part below the dotted line in FIG. 12 shows the process in the inference phase using the model generated by learning. It should be noted that descriptions of the same points as those described above will be omitted as appropriate.

In FIG. 12, as an example of personal attribute determination, each individual (user) is a fan of three types: home fan (first fan type), away fan (second fan type), and beginner (third fan type). The case of determining whether it is any of the attributes is shown.

Also, in FIG. 12, data in which a fan attribute label is assigned (manually) to each individual appearing in the spectator video data is prepared as training data TD1. For example, the highlight generation server 100 generates the model M1 by performing supervised learning on the audience feature amount data of each individual using the teacher data TD1. Any algorithm such as DNN, XGBoost (eXtreme Gradient Boosting), or the like can be adopted as the learning algorithm of the highlight generation server 100 .

The teacher data TD1 shown in FIG. 12 is data in which each individual appearing in the audience video data is labeled with a fan attribute. The teacher data TD1 gives each individual (user) a label indicating one of three types: home fan (first fan type), away fan (second fan type), and beginner (third fan type). indicate the case. In FIG. 12, hatching indicating one of the three types attached to each individual's face is shown as an example of a label, and the label indicates one of the three types of information (user ID) that identifies each individual. A mode in which a label (correct answer information) is attached may also be used.

The highlight generation server 100 uses personal feature amount data as an input by learning processing using the above-described information, and identifies which of the three types the attribute of the user corresponding to the input data belongs to. A model M1, which is a determiner, is generated.

Then, in the inference phase, the highlight generation server 100 performs inference processing using the model M1 generated in the learning phase. For example, the highlight generation server 100 inputs feature amount data as input data to the model M1, thereby causing the model M1 to output information (output data OD11) indicating the personal attribute of the user of the input data. In FIG. 12, the highlight generation server 100 inputs the feature amount data of a user whose attributes are unknown (to be inferred) (also referred to as a “target user”) to the model M1, thereby determining the fan attributes of the target user. The information shown is output to the model M1.

The personal attribute determiner shown in FIG. 12 is merely an example, and the information processing system 1 may use various personal attribute determiners. For example, the information processing system 1 may use the attribute recognition function of an image recognizer as a personal attribute determiner. The information processing system 1 may determine age and gender by attribute recognition of a face recognizer.

For example, the information processing system 1 may use a personal attribute determiner that determines enjoyable attributes. The information processing system 1 may use the average value of smile levels over the entire period of a sports game or live event as an attribute that allows enjoyment. The information processing system 1 may store personal attribute determination values of the viewer's past viewing (such as another game of the same sport), and use the stored personal attribute determination values at the time of determination.

The information processing system 1 may use the saved value from past viewing to determine attributes that are enjoyable or attributes that cannot be instantly determined by image recognition based on behavior during viewing, which will be described later. Further, the information processing system 1 may determine attributes related to situations in which scenes desired to be viewed in highlights are different.

For example, the information processing system 1 may use a personal attribute determiner that determines attributes that can be entered. The information processing system 1 may make the determination from the average value of the amount of movement over the entire period of the sports game or live event.

For example, the information processing system 1 may use a personal attribute determiner that determines the ticket price attribute. The information processing system 1 may determine expenditure costs such as the ticket price from the photographed seat position of the individual. Note that the information processing system 1 may determine the expenditure cost by referring to the purchase history when the target user is not a local spectator.

For example, the information processing system 1 may use a personal attribute determiner that determines vocalization attributes. The information processing system 1 may make the determination based on the average value of the opening degree of the mouth over the entire period of the game/event.

For example, the information processing system 1 may use a personal attribute determiner that determines the attribute. The information processing system 1 may generate a personal attribute determiner that determines a while attribute by assigning a label indicating whether the user is watching while doing something else, such as watching a game while looking at a smartphone, and learning.

For example, the information processing system 1 may use a personal attribute determiner that determines inference attributes. The information processing system 1 may generate a personal attribute determiner that determines the endorsement attribute by giving a label indicating whether or not a specific performer is endorsed at a live event or the like and learning the label.

For example, the information processing system 1 may use a personal attribute determiner that determines empathy attributes. The information processing system 1, for example, lectures and speeches, because the contents to be heard are different, in the lectures and speeches, by assigning a label indicating whether or not the speaker sympathizes (believes) with the principles and assertions of the speaker, and learns the sympathy attribute. A personal attribute determiner may be generated to determine.

For example, the information processing system 1 may use a personal attribute determiner that determines a sense of togetherness attribute. The information processing system 1 provides a personal attribute determination device that determines a sense of togetherness attribute by giving a label indicating whether the audience is cheering together with other spectators (routine, etc.) in a sport or a live performance, and learning the label. may be generated.

For example, the information processing system 1 may use a personal attribute determiner that determines party relationship attributes. The information processing system 1 may determine the relationship (relatives, friends, co-workers, etc.) with the main characters such as the bride and groom at a party such as a wedding ceremony from the seat position of the photographed individual. In addition, the information processing system 1 may input non-attendance viewers as attributes.

For example, the information processing system 1 may use a personal attribute determiner that determines concentration attributes. The information processing system 1 creates a personal attribute determiner that determines a sense of togetherness attribute by giving a label indicating whether or not the audience can concentrate on watching the entire event, such as a classical music concert, where there is little reaction from the audience, and by learning the label. You may When the personal attribute determiner generated in this way is used, the audience (viewers) who cannot concentrate can consequently reduce the number of scenes to be extracted and shorten the duration of the highlight moving image.

For example, the information processing system 1 may use a personal attribute determiner that determines the emotional expression behavior attribute. The information processing system 1 determines personal attribute determination of emotional expression behavior attributes by assigning and learning labels of types of emotional expression behavior (crying, laughing, being scared, excitement, etc.) when watching a movie or a play. You can create a vessel. When using the personal attribute determinator generated in this way, it is possible to deal with cases where the scene desired to be seen is different in highlights such as excitement, comedy, horror, action, etc., due to the difference in the amount of emotional expression behavior.

[1-6-3. Learning and reasoning about the action index determiner]
Next, an example of learning and inference regarding the action index determiner will be described. FIG. 13 is a diagram illustrating an example of learning and inference regarding the action index determiner of the information processing system. It should be noted that descriptions of the same points as those described above will be omitted as appropriate.

For example, the highlight generation server 100 defines objective variables from the development of the game for various indicators related to user behavior, and performs regression analysis using spectator feature amount data as explanatory variables, thereby generating a model M2, which is an action index determiner. Generate. The model M2 may be a regression equation for inputting user feature amount data corresponding to each point in time during an event and outputting (calculating) a value indicating the degree of excitement at each point in time. In the following, as an example of action index determination, a case of determining the excitement degree of each individual (user) will be described.

For example, the highlight generation server 100 defines an objective variable whose value changes from 0 to 1 when home fans score a home fan attribute (user's) feature amount as an explanatory variable, and executes learning processing to generate a model Generate M2. The teacher data TD2 shown in FIG. 13 corresponds to the case where the home team scores at time t1, and shows an example of the teacher data for changing the value of the objective variable from 0 to 1 at time t1.

The highlight generation server 100 generates a model M2 that receives individual feature amount data as input and outputs information indicating the excitement during the event period of the user corresponding to the input data through learning processing using the above-described information. . For example, the highlight generation server 100 generates a model M2 that outputs information indicating changes in the user's degree of excitement (score) during the period of the event.

The highlight generation server 100 generates the regression equation of the index obtained as a result of the regression analysis as an action index determiner (for example, model M2). As a result, for example, when feature amount data, which is time-series data, is input, an action index determination device is obtained that outputs time-series data of scores (results of regression equations) representing the degree of each index. The action index determiner receives individual feature amount data as input and outputs a score for each individual index. Also, the action index determination device may input the feature amount data for each attribute or for the entire set and output the score of each index as an average value.

Then, in the inference phase, the highlight generation server 100 performs inference processing using the model M2 generated in the learning phase. For example, the highlight generation server 100 inputs the feature amount data as input data to the model M2, thereby causing the model M2 to output information indicating the user's enthusiasm for the input data. In FIG. 13 , the highlight generation server 100 inputs feature amount data of a user (target user) whose index (degree of excitement, etc.) is unknown (to be inferred) into the model M2, thereby obtaining an index for the target user. is output to the model M2.

The output data OD21 in FIG. 13 shows an output example of the model M2 corresponding to the users of the home team, and shows the case where the degree of excitement rises sharply at time t1 when the home team scores. Output data OD22 in FIG. 13 shows an output example of the model M2 corresponding to the users of the away team. indicates Output data OD21 in FIG. 13 shows an output example of model M2 corresponding to a beginner user (for example, a user who is neither a home fan nor an away fan). Indicates the case of rising.

As described above, for example, the highlight generation server 100 may combine with attribute determination to generate score time-series data obtained by determining the degree of excitement for each attribute. For example, the highlight generation server 100 may use an average score value of sets for each attribute. For example, the highlight generation server 100 may generate a model M2 in which the score of home fans increases when a home score is scored, the score of away fans does not increase, and the score of beginners is intermediate.

The action index determiner shown in FIG. 13 is merely an example, and the information processing system 1 may use action index determiners for various indicators. For example, the information processing system 1 may use an action index determiner that determines the degree of concentration of each individual (user). In this case, the information processing system 1 uses the feature amount of the user as an explanatory variable, defines an objective variable that changes from 1 to 0 when the game is interrupted, and executes the learning process, thereby determining the degree of concentration. may be generated.

In addition, the information processing system 1 may use an action index determiner that determines the degree of disappointment of each individual (user). In this case, the information processing system 1 uses the feature amount of the user as an explanatory variable, defines an objective variable that changes from 0 to 1 when the shot is missed, and executes the learning process, thereby determining the degree of disappointment. A determiner may be generated.

The information processing system 1 may also use an action index determiner that determines the degree of tension of each individual (user). In this case, the information processing system 1 uses the feature amount of the user as an explanatory variable, defines an objective variable that becomes 1 when the score difference is balanced (for example, 1 point in the case of soccer), and executes the learning process. By doing so, an action index determiner for determining the degree of tension may be generated.

The information processing system 1 may also use an action index determiner that determines the degree of anger of each individual (user). In this case, the information processing system 1 defines an objective variable that changes from 0 to 1 when a mistake is made, using the feature amount of the user as an explanatory variable, and executes learning processing to determine the degree of anger. may be generated.

The information processing system 1 may also use an action index determiner that determines the degree of boredom of each individual (user). In this case, the information processing system 1 uses the feature amount of the user as an explanatory variable, defines an objective variable that becomes 1 in the time period during which the game is delayed, and executes the learning process, thereby determining the degree of boredom. An index determiner may be generated.

[1-6-4. Learning and Inference on Highlight Scene Predictor]
An example of learning and reasoning for a highlight scene predictor is now described. FIG. 14 is a diagram illustrating an example of learning and reasoning for a highlight scene predictor of an information processing system. It should be noted that descriptions of the same points as those described above will be omitted as appropriate.

FIG. 14 shows a case where a highlight scene predictor is used to predict which period of the event period should be used as a highlight.

In FIG. 14, data indicating periods used as highlight scenes and periods not used as highlight scenes (non-highlight periods) are prepared as teacher data TD3. For example, the teacher data TD3 is data in which the period used as the highlight scene is 1 and the non-highlight period is 0 in the period of the event. For example, the training data TD3 may be manually extracted or automatically generated by content analysis. For example, the teacher data TD3 is information indicating the period during which the event for which the feature amount as the input data was collected was used as the highlight scene in the actually generated highlight.

For example, the highlight generation server 100 sets the manually extracted highlight scene as the training data TD3, the audience feature amount data during the highlight scene period as True, and the audience feature amount data other than the highlight scene period as False. Generate data. For example, the highlight generation server 100 generates the model M3 by performing supervised learning on the audience feature amount data of each individual using the teacher data TD3. As the learning algorithm of the highlight generation server 100, arbitrary algorithms such as DNN and XGBoost can be adopted. For example, time-series data of individual feature amounts for all spectators photographed at an event corresponding to the teacher data TD3 may be used as input data for learning.

The above is just an example, and the highlight scene automatically extracted by analyzing the video and audio of the content shown by the dotted line in the lower left of FIG. 11 may be used as the training data TD3. In addition, the highlight generation server 100 may generate a highlight scene predictor for each personal attribute by classifying the learning target input audience feature amount data by personal attribute and learning.

The highlight generation server 100 uses the feature amount data, which is, for example, time-series data, as an input by learning processing using the above-described information, and determines the degree of highlight-likeness such as likelihood, reliability, moving average value of judgment values, etc. Generate a model M3, which is a highlight scene predictor that outputs time-series data of scores to represent. For example, the model M3 receives individual feature amount data as an input and outputs an individual highlight-likeness score. In addition, the model M3 may receive feature amount data for each attribute or for the entire set as input, and output a score of highlight-likeness for each attribute or for the entire set.

Then, in the inference phase, the highlight generation server 100 performs inference processing using the model M3 generated in the learning phase. For example, the highlight generation server 100 inputs feature amount data as input data to the model M3, thereby generating time-series data (output data OD31) of the score of highlight-likeness corresponding to the user of the input data to the model M3. output.

Note that the highlight scene predictor shown in FIG. 14 is merely an example, and the information processing system 1 may use various highlight scene predictors.

[1-6-5. Functional configuration example for highlight generation of information processing system]
From here, the functions related to highlight generation in the information processing system 1 will be described with reference to FIG. 15 . FIG. 15 is a diagram illustrating a functional configuration example related to highlight generation of the information processing system. It should be noted that descriptions of the same points as those described above will be omitted as appropriate. For example, the dashed line BS in FIG. 15 indicates a functional interface in the system like the dashed line BS in FIG.

For example, in the edge viewing terminal 10 used by the highlight viewing user in FIG. 15, when the remote viewer is viewing the content in real time, a shooting device such as a camera (for example, the camera 15 or the like) is always used for shooting. and stored as viewer video data. Further, in the information processing system 1, image recognition results are accumulated as viewer feature amount data. For example, in the information processing system 1, the information is stored and accumulated as information associated with the content viewed in real time.

In addition, in the information processing system 1, feature amounts obtained by photographing and image recognition of a group of local spectators (or remote viewers) watching content in real time are accumulated as real-time spectator feature amount data. For example, in the information processing system 1, real-time spectator feature amount data is stored and accumulated as information associated with content. For example, the configuration for accumulating the real-time spectator feature amount data is the same as the configuration for accumulating the spectator feature amount data regarding the real-time spectator users in FIG.

In addition, in the information processing system 1, at the start of highlight viewing, the individual attribute of the highlight viewing user is photographed, and the personal attribute is determined by the personal attribute determination device for the viewer feature amount data obtained by image recognition. For example, the highlight generation server 100 determines personal attributes using a model M1, which is a personal attribute determiner. Further, in the information processing system 1, the personal attribute result determined by the personal attribute determiner and the real-time viewing determination result determined by the real-time viewing determiner (to be described later) are generated by the highlight generation control unit (for example, the highlight generation server 100). 134).

Note that the real-time viewing determination device may be replaced with a real-time viewing determination by a behavior index determination device. In this case, the highlight generation server 100 performs real-time viewing determination using, for example, the model M2, which is a behavior index determination device. For example, the highlight generation server 100 determines that the period during which the score output by the model M2 is equal to or greater than a predetermined threshold is the period during which the user is viewing in real time.

In the information processing system 1, the highlight generation control unit generates feature amount data to be input to the highlight scene predictor based on the personal attribute result of the highlight viewing user input at the start of highlight viewing and the real-time viewing determination result. instruct. Although the details of the control flow will be described later, for example, the highlight generation server 100 determines feature amount data to be input to the model M3, which is a highlight scene predictor. A highlight scene predictor predicts a highlight scene from the input feature amount data instructed by the highlight generation controller. For example, the highlight generation server 100 inputs input feature amount data to the model M3 and predicts highlight scenes based on the output of the model M3. For example, the highlight generation server 100 determines that a scene corresponding to a period in which the score output by the model M3 is equal to or greater than a predetermined threshold is to be a highlight scene.

Note that the highlight scene predictor may be replaced with scene prediction by the action index determiner. In this case, the highlight generation server 100 performs scene prediction using, for example, the model M2, which is the action index determiner. For example, the highlight generation server 100 may determine that a scene corresponding to a period in which the score output by the model M2 is equal to or greater than a predetermined threshold is set as a highlight scene.

Further, although the details will be described later, in the information processing system 1, the position angle estimation unit (for example, corresponding to the generation unit 134 of the highlight generation server 100), for the highlight scene predicted by the highlight scene predictor, Estimate the optimal camera position and angle from real-time audience feature data. Also, in the information processing system 1, the highlight video generation unit (for example, corresponding to the generation unit 134 of the highlight generation server 100) generates content video (audio) data and Highlight video data is generated from viewer video data. The highlight video data is presented to the highlight viewing user through an image output device such as a monitor (for example, the display unit 14) and an audio output device such as a speaker (for example, the audio output unit 12) of the edge viewing terminal 10.

Here, an example of real-time viewing determination will be explained. For example, processing using a real-time viewing decision device is shown as an example. The real-time viewing determiner determines whether the highlight viewer has viewed the target content in real time. For example, the highlight generation server 100 uses a real-time viewing determiner (model) stored in the storage unit 120 to determine real-time viewing. The real-time viewing determination is merely an example, and if real-time viewing can be determined, real-time viewing may be determined by any process such as the process using the behavior index determiner described above.

For example, in the information processing system 1, if there is data at the time of real-time viewing of the highlight viewing target content in the accumulated viewer feature amount data, it is determined that there is real-time viewing during the period in which the data exists, If it does not exist, it is determined that there is no real-time viewing during that period. In this case, if the storage unit 120 has data of the highlight viewing target content at the time of real-time viewing, the highlight generation server 100 determines that the user was viewing the content in real-time during the period when the data existed. In addition, if the storage unit 120 does not contain the data for the real-time viewing of the highlight viewing target content, the highlight generation server 100 determines that the user did not view the content in real-time during that period.

Also, as described above, in the information processing system 1, the behavior index determiner may be used as a real-time viewing determiner. In this case, the highlight generating server 100 determines that there is real-time viewing during the period when the value of the index such as the degree of concentration during real-time viewing of the highlight viewing target content is equal to or greater than a threshold value. Also, the highlight generation server 100 determines that there is no real-time viewing if the value of the index such as the degree of concentration during real-time viewing of the highlight viewing target content is less than a threshold value.

For example, the information processing system 1 may determine that real-time viewing is present even if there is no real-time viewing data for the highlight viewing target content in the accumulated viewer feature amount data. In this case, the highlight generation server 100, for example, if there is a person identified as the same as the highlight viewer (by face recognition) in the real-time watching feature amount data, the presence period of the person is used as the local spectator for real-time viewing. You can judge yes. As a result, the information processing system 1 can cope with a usage scene in which viewing is performed on the edge viewing terminal after watching the game on site.

For example, in the information processing system 1, when it is determined that there is real-time viewing as a local spectator, the video data and image recognition feature amount data of the person photographed locally are used as viewer video data and viewer feature amount data. may be used.

[1-7. Highlight generation process flow example]
Next, with reference to FIG. 16, a process related to highlight generation in the information processing system 1 will be described. FIG. 16 is a flowchart showing a processing procedure regarding highlight generation. In the following, a case where the information processing system 1 performs processing will be described as an example, but the processing shown in FIG. 10, any device such as the content distribution server 50 and the spectator video collection server 60 may perform the processing.

Below, after explaining the outline of the processing flow shown in FIG. 16, the details of each process will be explained. In FIG. 16, the information processing system 1 branches the process according to the viewing period in real time (step S201).

The information processing system 1, when the user is watching the game in real time partially, such as watching the game in real time from the middle of the game or watching the game in real time until the middle of the game (the center in FIG. It separates into a real-time viewing period and other periods (step S202). For example, when the game is partially watched in real time, the information processing system 1 uses user data to specify the period during which the event was viewed in real time. To separate. Then, the information processing system 1 performs the processing shown in step S203 during the period of real-time watching, and performs the processing of steps S204 and S205 during the non-real-time watching period.

When the user is watching the game full-time, the information processing system 1 uses the input feature amount data of the highlight scene predictor for that user as the viewer's own feature amount during real-time viewing (step S203). For example, when a user is watching a game full-time including local spectators at the venue, the information processing system 1 uses the input feature amount data of the model M3 for the user as the feature amount during real-time viewing of the viewer himself/herself. .

Further, when the user has not viewed the user at all in real time, the information processing system 1 refers to the viewer's individual attribute determination result for that user (step S204), and uses the input feature amount data of the highlight scene predictor in real time. A set of real-time spectators similar to the viewer in the spectator feature quantity data is used as the feature quantity (step S204). For example, when the user does not watch the user in real time at all, including when the user watches only the highlights later, the information processing system 1 assigns the input feature amount data of the model M3 to the user in the real-time audience feature amount data. A set of similar real-time spectators is used as a feature quantity.

Then, the information processing system 1 designates the input feature amount data to the highlight scene predictor and instructs scene extraction (step S206). For example, the information processing system 1 inputs the feature amount data determined in steps S204 and S205 to the model M3, and generates highlights based on the output of the model M3.

Through the above-described processing, the information processing system 1 generates highlights. For example, when the viewer's own feature amount at the time of real-time viewing is used as the input feature amount data for the highlight scene predictor (in the case of personalization), the information processing system 1 sets the highlight scene prediction score/duration time to high. Scenes may be selected by threshold or ranking to generate a highlight video. For example, the information processing system 1 may wipe-display an image of the viewer's own scene when watching the game in real time in the generated highlight video.

This point will be explained using FIG. FIG. 17 is a diagram illustrating an example of superimposed display of wipes on highlights. For example, FIG. 17 shows an example of a wipe display of the video of the viewer's own corresponding scene in the highlight video when watching the game in real time. FIG. 17 shows a case where a wipe WP1, in which the viewer's own video of the corresponding scene in real time watching the game is superimposed on the content CT21, which is a highlight video of a basketball game, is displayed. For example, the highlight generation server 100 provides the edge viewing terminal 10 with the content CT21 on which the wipe WP1 is superimposed. The edge viewing terminal 10 provided by the highlight generation server 100 displays the content CT21 on which the wipe WP1 is superimposed.

In addition, when the feature amount of a set of real-time spectators having attributes closest to the viewer is used as the input feature amount data for the highlight scene predictor (in the case of attribute optimization), the information processing system 1, when highlight viewing starts The personal attribute of the viewer is determined by analyzing the camera image of the camera. In this case, attributes that can be determined may differ depending on the camera imaging time. For example, when the information processing system 1 can determine only one personal attribute, the information processing system 1 predicts from the time series of the highlight scores of the set of real-time spectators having that personal attribute in the real-time spectator feature amount data. For example, the information processing system 1 may select scenes with high scores/durations based on a threshold or ranking to generate a highlight video.

In addition, when there are a plurality of determined personal attributes, the information processing system 1 may use various information as appropriate to generate a highlight video. An example of this point will be described below.

For example, the information processing system 1 may take AND (logical product) of personal attributes and predict highlight scenes from the time series of highlight scores of a set of real-time spectators who have all the determined personal attributes. For example, if the attributes that the information processing system 1 has been able to distinguish from the viewer are two attributes of the age “30s” and the gender “male”, the information processing system 1 selects the age “30s” and the gender “male” from the real-time audience. A highlight scene may be predicted from the time series of the highlight scores of the set.

For example, the information processing system 1 may use the following formula (1).

A _i in Equation (1) indicates the plausibility score of the viewer's own personal attribute result. Also, H _i in Equation (1) indicates the highlight score of the set of real-time spectators having the corresponding attribute. S _j below Equation (1) indicates a combined score, and the information processing system 1 may predict a highlight scene from the time series of combined scores S _j .

For example, when there are multiple viewers, such as when a family is watching television, the information processing system 1 predicts a highlight scene from a set of real-time spectators who have personal attributes common to the multiple viewers. may For example, when there is no common personal attribute, the information processing system 1 calculates the time series of the combined score S _j by using Equation (1) A _i as the score of the likelihood of the personal attribute results of multiple viewers, A highlight scene may be predicted. In this way, in the information processing system 1, by viewing with a plurality of people, the scene prediction range extends to individual attributes different from those of the viewer itself (extends to personal attributes of people watching together). expected to be effective.

[1-8. Processing example]
Other processing examples will now be described. It should be noted that descriptions of the same points as those described above will be omitted as appropriate.

[1-8-1. Highlight scene prediction by action index determiner]
From here, an example of highlight scene prediction by the action index determination device described above will be described. As exemplified below, the information processing system 1 may receive as input the feature amount of the viewer himself or a real-time audience group having attributes closest to the viewer, and perform highlight scene prediction using an action index determiner. .

For example, the information processing system 1 may use only the degree of excitement and determine a period in which the degree of excitement is equal to or greater than a threshold as a highlight scene. For example, in the example of FIG. 13, the information processing system 1 determines the scene at the time of home scoring as a highlight scene for (a user of) the home fan attribute.

In addition, the information processing system 1 uses a plurality of indicators, a positive indicator (first indicator) such as excitement level and concentration level, and a negative indicator (second indicator) such as disappointment level and boredom level. , may determine the highlight scene. For example, the information processing system 1 may determine the highlight scene using Equation (2).

k _i in Equation (2) indicates a weighting factor for each behavior index. The weighting factor k _i is defined as a positive value for a positive index and a negative value for a negative index. B _i in Equation (2) indicates the determination score of the action index determiner for the input feature amount of each action index. The following S _t in Equation (2) indicates the combined score, and the information processing system 1 may predict the highlight scene from the time series of the combined score S _t .

For example, the information processing system 1 calculates a joint score S _t that combines the above-described Positive index (first index) and Negative index (second index) with sign weighting, and the joint score S _t is a threshold value. The above period may be determined as a highlight scene.

It should be noted that the above-described processing is merely an example, and the information processing system 1 may appropriately use various information to determine a highlight scene. In addition, for example, the information processing system 1 selects the period with the highest score for each index as a scene, and extracts a highlight scene that includes a variety of excitement, concentration, tension, disappointment, and anger, and that is slow and undulating. good too.

[1-8-2. Angle estimation example]
An example of angle estimation will now be described with reference to FIG. 18 . FIG. 18 is a diagram illustrating an example of highlight angle estimation. FIG. 18 shows a case where the information processing system 1 divides the audience seats into eight areas having the same angle with eight dotted lines AG1 to AG8 radially extending from the focus position CN1 centering on the focus position CN1. For example, the information processing system 1 calculates the average value of the degree of excitement in each area, and estimates the direction of the area with the highest degree of excitement as the optimum angle. For example, the information processing system 1 may estimate the angle and the like of the highlight scene using the model M11, which is a position and angle estimator used for estimating the appropriate position, angle and the like.

For example, the information processing system 1 may estimate the focus position. For example, the information processing system 1 may estimate the focus position of the camera in the highlight scene from the statistical values of the face orientation and line-of-sight direction of the real-time audience feature amount data. Further, for example, the information processing system 1 may divide the audience (real-time audience feature amount data) in the venue into areas based on the angle with respect to the focus position, and estimate the optimum angle from the degree of excitement for each area. In this case, the information processing system 1 may predict the angle from the area with the highest swelling as the optimum angle.

In addition, the information processing system 1 may remove biases such as fan attributes in angle estimation. In this case, the information processing system 1 may estimate the angle or the like from the excitement level of the spectators of the beginner attribute, for example. Further, the information processing system 1 may estimate the angle and the like from the degree of excitement of the real-time spectator group having the attribute closest to the viewer, similar to extracting the highlight scene.

It should be noted that the above-described processing is merely an example, and the information processing system 1 may use various information as appropriate to estimate angles and the like. For example, the information processing system 1 may determine camerawork for generating a highlight video by using it to determine which camera to select the video from. For example, the information processing system 1 may propose the optimum position and angle for viewing free-viewpoint video such as sports. As a result, the information processing system 1 can propose the optimum angle and the like in the free-viewpoint video for which it is difficult to manually operate the position and angle.

[1-8-3. Presentation example]
The information processing system 1 may present information generated by the various types of processing described above. An example of information presentation will be described below with reference to FIG. 19 . FIG. 19 is a diagram showing an example of presentation of information about spectators. For example, FIG. 19 shows an example of a UI (user interface) presented as a heat map of the attributes of spectator video of free viewpoint video or excitement.

For example, the highlight generation server 100 of the information processing system 1 generates a content CT32 in which a heat map is superimposed on the audience seats according to the degree of excitement of the audience, targeting the content CT31 (step S31). In FIG. 19, the highlight generation server 100 generates a content CT32 superimposed with a heat map indicating that the audience seats on the right side of the audience seats are the most exciting, and the degree of excitement decreases toward the left. Then, the highlight generation server 100 provides the edge viewing terminal 10 with the generated content CT32. Also, the highlight generation server 100 may place arbitrary information (a flame icon in FIG. 19) in the highest portion of the heat map.

For example, the edge viewing terminal 10 of the information processing system 1 displays the attribute of the spectator video of the free viewpoint video or the heat map of the excitement. In FIG. 19, the edge viewing terminal 10 displays a heat map of the spectator seats according to the degree of excitement of the spectators.

It should be noted that the processing described above is merely an example, and the information processing system 1 may present various types of information. For example, the information processing system 1 may present a heat map that indicates fan attributes with colors and the degree of excitement with transparency. In addition, the information processing system 1 may perform presentation display of the content, for example, by arranging an icon or the like at a location exceeding the threshold value.

[1-9. Application examples, modifications, effects, etc.]
Here, application examples, modifications, effects, etc. of the above contents will be described. The information processing system 1 may sell personalized highlight videos to visitors (full-time real-time viewing) at the site of the venue. For example, the information processing system 1 may use attribute optimization highlighting during a period in which the user is absent due to late arrival or the like.

In addition, the information processing system 1 may provide an attribute-optimized highlight video when viewed on a camera-equipped TV, PC, or the like (without real-time viewing). The information processing system 1 may also perform personal attribute determination during highlight viewing and update (correction, addition, or the like) the stored personal attribute determination value.

In addition, when viewing from the middle (to the middle) in real time, the information processing system 1 may generate attribute optimization highlights for the periods not viewed and personalized highlights for the periods viewed, and combine them. The information processing system 1 may present the unwatched first half with attribute optimization highlights when playing back after watching from the middle.

In addition, the information processing system 1 may provide automatic highlight reproduction based on an estimated angle as an additional function of free-viewpoint content. As a support tool for manual highlight editing, the information processing system 1 may propose camera work based on extracted highlight scenes and angle estimation.

With the information processing system 1 described above, it can be used as a generalized scene extraction engine that does not depend on the type of sports or live performance, so it can be developed into a wide variety of contents. The above-described information processing system 1 does not use the data on the content side of the sports game itself or the live event, but the analysis algorithm that inputs only the shooting data of the spectator side. Algorithms learned by behavior can be applied to other types of content.

With the information processing system 1 described above, the algorithm has high versatility and can be applied to other types of content at low cost. With the information processing system 1 described above, it is possible to perform determination and prediction not only for the entire group but also for each individual and attribute by individual identification in images for action index determination and highlight scene types using cheers and the like. With the information processing system 1 described above, it is possible to extract highlight scenes that match individual attributes and tastes and generate moving images.

With the information processing system 1 described above, the viewer of the highlight can actually see the scene that was exciting during the real-time viewing together with the video of the viewer at that time, and can watch it as a retrospective video of the memories of participating in the event. can be done. The information processing system 1 described above generates a highlight video that matches the attributes of the viewer, so that the viewer can be made to view the highlight video that matches the viewer's own taste.

With the information processing system 1 described above, a highlight video with a storyline can be viewed by highlight scene prediction (selecting the scene with the highest score for each index) using action index determination. In addition, the above-described information processing system 1 also performs scene extraction based on the Negative index, so that the effect of serendipity can be expected.

With the information processing system 1 described above, scene extraction optimized for personal attributes can be applied not only at sports venues, but also in remote viewing environments (camera-equipped TVs, PCs with web cameras, etc.).

The above-described information processing system 1 can be applied to other types of content by estimating angles and positions in the same way as highlight scene extraction, making it highly versatile and capable of responding to individual differences and attribute differences in viewing angles. Become. With the information processing system 1 described above, it is possible to improve the added value of the content by proposing a position/angle to the free-viewpoint video content. With the information processing system 1 described above, performance can be continuously improved through cycles of data collection, learning, and analysis algorithm improvement.

[2. Other embodiments]
The processes according to the above-described embodiments may be implemented in various different forms (modifications) other than the above-described embodiments and modifications.

[2-1. Other configuration examples]
The device configuration of the information processing system 1 described above is merely an example, and any form of division of functions in the information processing system 1 can be adopted. For example, the edge viewing terminal 10, the content distribution server 50, or the spectator video collection server 60 may be an information processing device that generates highlights like the highlight generation server 100. FIG. That is, any one of the edge viewing terminal 10, the content distribution server 50, or the spectator video collection server 60 may have the function of the highlight generation server 100 described above. In this case, the information processing system 1 collects information from each device such as the edge viewing terminal 10, the content distribution server 50, or the spectator video collection server 60 instead of the highlight generation server 100, and provides the information to each device. An information providing server may be included.

For example, when the edge viewing terminal 10 has the functions of the highlight generation server 100 described above, the edge viewing terminal 10 has information stored in the storage unit 120, and has a learning unit 132, an image processing unit 133, and a generating unit. 134 functions. The edge viewing terminal 10 may acquire various types of information from the information providing server, the content distribution server 50, and the spectator video collection server 60, and generate highlights using the acquired information.

Note that the above-described configuration is merely an example, and the information processing system 1 may have any function division mode and any device configuration as long as it can provide the above-described highlight service. good too.

[2-2. others]
Further, among the processes described in each of the above embodiments, all or part of the processes described as being performed automatically can be performed manually, or the processes described as being performed manually can be performed manually. can also be performed automatically by known methods. In addition, information including processing procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each drawing is not limited to the illustrated information.

Also, each component of each device illustrated is functionally conceptual and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution and integration of each device is not limited to the one shown in the figure, and all or part of them can be functionally or physically distributed and integrated in arbitrary units according to various loads and usage conditions. Can be integrated and configured.

In addition, the above-described embodiments and modifications can be appropriately combined within a range that does not contradict the processing content.

In addition, the effects described in this specification are only examples and are not limited, and other effects may be provided.

[3. Effects of the Present Disclosure]
As described above, the information processing apparatus according to the present disclosure (for example, the highlight generation server 100 in the embodiment) includes an acquisition unit (the acquisition unit 131 in the embodiment) and a generation unit (the generation unit 134 in the embodiment). Prepare. The acquisition unit acquires state information indicating a state of a first user, who has viewed the event in real time, when viewing the event in real time, and event content, which is video of the event. The generator generates highlights of the event content to be provided to the second user, using a portion of the event content determined by the state information of the first user acquired by the acquirer.

In this way, the information processing apparatus according to the present disclosure generates highlights of the event content using part of the event content based on the state of the user who has viewed the event in real time, so that You can generate special highlights.

Also, the acquisition unit acquires event content, which is video of the event. In this way, the information processing apparatus generates highlights of the event content, which is video of the event, based on the state of the user who watched the event in real time, thereby generating highlights corresponding to the user. be able to.

In addition, the acquisition unit acquires the status information of the first user who viewed the event in real time at the venue of the event. In this way, the information processing apparatus can generate highlights according to the user by generating highlights of event content based on the user's state of real-time viewing at the venue of the event.

Also, the acquisition unit acquires the state information of the first user who watched the sports or art event in real time. In this way, the information processing apparatus can generate highlights according to the user by generating event content highlights based on the state of the user who has watched the sports or arts event in real time. .

Also, the acquisition unit acquires the state information of the first user who watched the event locally. In this way, the information processing apparatus can generate highlights according to the user by generating event content highlights based on the state of the user who watched the sports or arts event at the site.

In addition, the generating unit generates event content highlights using a model that outputs a score corresponding to the period of the event in response to the input of input data based on the state information. In this way, the information processing apparatus uses a model that outputs a score corresponding to the period of the event in response to the input of input data based on the user's state, and generates highlights of the event content so that the user can You can generate corresponding highlights.

In addition, the generation unit uses the model to determine part of the event content, and uses the determined part of the event content to generate highlights of the event content. In this way, the information processing apparatus can generate highlights according to the user by generating highlights of the event content using part of the event content determined using the model.

In addition, the generation unit determines a portion of the event content corresponding to a period corresponding to a score equal to or greater than the threshold as part of the event content, and uses the determined part of the event content to generate a high score of the event content. generate light. In this way, the information processing apparatus generates highlights of the event content as part of the event content corresponding to a period in which the score output by the model is equal to or greater than the threshold value, thereby providing highlights according to the user. can be generated.

Also, the information processing apparatus according to the present disclosure includes a learning unit (learning unit 132 in the embodiment). The learning unit learns the model using learning data including highlights of past events and status information of users who viewed the past events in real time. In this way, the information processing apparatus can generate highlights according to the user by generating highlights of event content using the learned model.

In addition, the generation unit generates highlights of event content using the model learned by the learning unit. In this way, the information processing apparatus can generate highlights according to the user by generating highlights of event content using the learned model.

Also, if the second user is viewing the event in real time, the acquisition unit acquires state information indicating the state of the second user viewing the event in real time as the state information of the first user. The generator generates highlights of event content to be provided to the second user, using a portion of the event content determined by the second user's state information. In this way, the information processing apparatus can generate highlights according to the user by generating highlights to be provided to the user based on the state of the user who has watched the event in real time.

Also, if the second user is not viewing the event in real time, the acquisition unit acquires the state information of the first user who is different from the second user. In this way, when the user to whom the highlights are provided is not viewing the event in real time, the information processing apparatus generates the highlights to be provided to the user based on the state of the user different from that of the user. Thus, it is possible to generate highlights suitable for the user.

Also, the acquisition unit acquires the state information of the first user who is a user similar to the attributes of the second user. In this way, the information processing apparatus generates highlights to be provided to the user based on the states of similar users who are similar to the attributes of the user to whom the highlights are provided, thereby generating highlights suitable for the user. can do.

Also, the acquisition unit acquires the state information of the first user, who is a user similar to the demographic attributes of the second user. In this way, the information processing apparatus generates highlights to be provided to the user based on the states of similar users who are similar to the demographic attributes of the user to whom the highlights are provided, thereby providing highlights suitable for the user. can be generated.

Also, the acquisition unit acquires the state information of the first user who is a user similar to the second user in at least one of age and sex. In this way, the information processing apparatus generates highlights to be provided to the user based on the states of similar users who are similar in at least one of age and gender to the user to whom the highlights are provided, thereby providing the user with You can generate corresponding highlights.

Also, the acquisition unit acquires the state information of the first user, who is a user similar to the psychographic attributes of the second user. In this way, the information processing apparatus generates highlights to be provided to the user based on the states of similar users who are similar to the psychographic attributes of the user to whom the highlights are provided, thereby providing highlights suitable for the user. can be generated.

Also, the acquisition unit acquires the state information of the first user, who is a user similar to the second user's preferences. In this way, the information processing apparatus generates highlights to be provided to the user based on the states of similar users who are similar to the preferences of the user to whom the highlights are provided, thereby providing the highlights according to the user. can be generated.

In addition, the acquisition unit acquires the state information of the first user who is a user whose love target matches that of the second user. In this way, the information processing apparatus generates highlights to be provided to the user based on the states of similar users whose favorite objects match those of the user to whom the highlights are provided, thereby providing the highlights corresponding to the user. can be generated.

Further, the information processing device according to the present disclosure includes a transmission unit (transmission unit 135 in the embodiment). The transmission unit transmits the highlight of the event content generated by the generation unit to the terminal device (the edge viewing terminal 10 in the embodiment) used by the second user. In this way, the information processing apparatus can appropriately provide highlights according to the user by transmitting the generated highlights to the terminal device used by the user.

[4. Hardware configuration]
Information processing devices (information devices) such as the highlight generation server 100, the edge viewing terminal 10, the content distribution server 50, and the spectator video collection server 60 according to the above-described embodiments are computers configured as shown in FIG. 1000. FIG. 20 is a hardware configuration diagram showing an example of a computer 1000 that implements the functions of the information processing apparatus. The highlight generation server 100 according to the embodiment will be described below as an example. The computer 1000 has a CPU 1100 , a RAM 1200 , a ROM (Read Only Memory) 1300 , a HDD (Hard Disk Drive) 1400 , a communication interface 1500 and an input/output interface 1600 . Each part of computer 1000 is connected by bus 1050 .

The CPU 1100 operates based on programs stored in the ROM 1300 or HDD 1400 and controls each section. For example, the CPU 1100 loads programs stored in the ROM 1300 or HDD 1400 into the RAM 1200 and executes processes corresponding to various programs.

The ROM 1300 stores a boot program such as BIOS (Basic Input Output System) executed by the CPU 1100 when the computer 1000 is started, and programs dependent on the hardware of the computer 1000.

The HDD 1400 is a computer-readable recording medium that non-temporarily records programs executed by the CPU 1100 and data used by such programs. Specifically, HDD 1400 is a recording medium that records an information processing program according to the present disclosure, which is an example of program data 1450 .

A communication interface 1500 is an interface for connecting the computer 1000 to an external network 1550 (for example, the Internet). For example, CPU 1100 receives data from another device via communication interface 1500, and transmits data generated by CPU 1100 to another device.

The input/output interface 1600 is an interface for connecting the input/output device 1650 and the computer 1000 . For example, the CPU 1100 receives data from input devices such as a keyboard and mouse via the input/output interface 1600 . The CPU 1100 also transmits data to an output device such as a display, speaker, or printer via the input/output interface 1600 . Also, the input/output interface 1600 may function as a media interface for reading a program or the like recorded on a predetermined recording medium. Media include, for example, optical recording media such as DVD (Digital Versatile Disc) and PD (Phase change rewritable disk), magneto-optical recording media such as MO (Magneto-Optical disk), tape media, magnetic recording media, semiconductor memories, etc. is.

For example, when the computer 1000 functions as the highlight generation server 100 according to the embodiment, the CPU 1100 of the computer 1000 implements the functions of the control unit 130 and the like by executing the information processing program loaded on the RAM 1200. The HDD 1400 also stores an information processing program according to the present disclosure and data in the storage unit 120 . Although CPU 1100 reads and executes program data 1450 from HDD 1400 , as another example, these programs may be obtained from another device via external network 1550 .

Note that the present technology can also take the following configuration.
(1)
an acquisition unit that acquires state information indicating a state of a first user who has viewed the event in real time, and event content that is video of the event;
a generation unit that generates highlights of the event content to be provided to a second user using a portion of the event content determined by the state information of the first user acquired by the acquisition unit;
Information processing device.
(2)
The acquisition unit
The information processing apparatus according to (1), wherein the event content, which is a video image of the event, is acquired.
(3)
The acquisition unit
The information processing apparatus according to (1) or (2), wherein the state information of the first user who has performed the real-time viewing at the venue of the event is acquired.
(4)
The acquisition unit
The information processing apparatus according to any one of (1) to (3), wherein the state information of the first user who viewed the event of sports or art in real time is acquired.
(5)
The acquisition unit
The information processing apparatus according to any one of (1) to (4), wherein the state information including image information of the first user is acquired.
(6)
The generating unit
Any one of (1) to (5), generating highlights of the event content using a model that outputs a score corresponding to the duration of the event in response to input of input data based on the state information. The information processing device according to .
(7)
The generating unit
The information processing apparatus according to (6), wherein a portion of the event content is determined using the model, and a highlight of the event content is generated using the determined portion of the event content.
(8)
The generating unit
A portion of the event content corresponding to a period corresponding to the score equal to or greater than a threshold is determined as a portion of the event content, and the determined portion of the event content is used to highlight the event content. The information processing apparatus according to (7).
(9)
a learning unit that learns the model using learning data including highlights of past events and the state information of users who viewed the past events in real time;
The information processing device according to any one of (6) to (8).
(10)
The generating unit
The information processing apparatus according to (9), wherein the model learned by the learning unit is used to generate highlights of the event content.
(11)
The acquisition unit
when the second user is viewing the event in real time, acquiring the state information indicating the state of the event when the second user is viewing the event in real time as the state information of the first user;
The generating unit
generating highlights of the event content to be provided to the second user using a portion of the event content determined by the state information of the second user; The information processing device according to .
(12)
The acquisition unit
Any one of (1) to (11) obtained as the state information of the first user who is a user different from the second user when the second user is not viewing the event in real time The information processing device according to 1.
(13)
The acquisition unit
The information processing apparatus according to (12), wherein the state information of the first user who is a user similar to the attributes of the second user is obtained.
(14)
The acquisition unit
The information processing apparatus according to (13), wherein the state information of the first user who is a user similar to the demographic attributes of the second user is obtained.
(15)
The acquisition unit
(14) The information processing apparatus according to (14), wherein the status information is acquired as the state information of the first user who is a user similar in at least one of age and sex to the second user.
(16)
The acquisition unit
The information processing apparatus according to any one of (13) to (15), wherein the status information of the first user who is a user similar to the psychographic attributes of the second user is obtained.
(17)
The acquisition unit
The information processing apparatus according to any one of (13) to (16), wherein the state information of the first user who is a user similar to the preference of the second user is obtained.
(18)
The acquisition unit
The information processing apparatus according to any one of (13) to (17), wherein the state information is acquired as the state information of the first user whose love target matches that of the second user.
(19)
a transmission unit configured to transmit a highlight of the event content generated by the generation unit to a terminal device used by the second user;
The information processing device according to any one of (1) to (18).
(20)
State information indicating a state of a first user who has watched the event in real time, and event content, which is a video of the event, are acquired, and the state of the acquired first user is obtained. using a portion of the event content determined by the information to generate a highlight of the event content to be provided to a second user;
Information processing method that performs processing.

1 information processing system 100 highlight generation server (information processing device)
110 communication unit 120 storage unit 121 data set storage unit 122 model information storage unit 123 threshold information storage unit 124 content information storage unit 130 control unit 131 acquisition unit 132 learning unit 133 image processing unit 134 generation unit 135 transmission unit 10 edge viewing terminal ( terminal equipment)
11 communication unit 12 audio input unit 13 audio output unit 14 camera 15 display unit 16 operation unit 17 storage unit 18 control unit 181 acquisition unit 182 transmission unit 183 reception unit 184 processing unit 50 content distribution server 60 spectator video collection server

Claims

an acquisition unit that acquires state information indicating a state of a first user who has viewed the event in real time, and event content that is video of the event;
a generation unit that generates highlights of the event content to be provided to a second user using a portion of the event content determined by the state information of the first user acquired by the acquisition unit;
Information processing device.
The acquisition unit
The information processing apparatus according to claim 1, wherein the event content, which is a video image of the event, is obtained.
The acquisition unit
The information processing apparatus according to claim 1, wherein the state information of the first user who has performed the real-time viewing at the venue of the event is acquired.
The acquisition unit
The information processing apparatus according to claim 1, wherein the state information of the first user who has viewed the event of sports or art in real time is acquired.
The acquisition unit
The information processing apparatus according to claim 1, wherein the state information including image information obtained by imaging the first user is acquired.
The generating unit
The information processing apparatus according to claim 1, wherein, in response to input of input data based on said state information, highlights of said event content are generated using a model that outputs a score corresponding to a period of said event.
The generating unit
The information processing apparatus according to claim 6, wherein a portion of the event content is determined using the model, and a highlight of the event content is generated using the determined portion of the event content.
The generating unit
A portion of the event content corresponding to a period corresponding to the score equal to or greater than a threshold is determined as a portion of the event content, and the determined portion of the event content is used to highlight the event content. The information processing apparatus according to claim 7, which generates a .
a learning unit that learns the model using learning data including highlights of past events and the state information of users who viewed the past events in real time;
The information processing device according to claim 6 .
The generating unit
The information processing apparatus according to claim 9, wherein the model learned by the learning unit is used to generate highlights of the event content.
The acquisition unit
when the second user is viewing the event in real time, acquiring the state information indicating the state of the event when the second user is viewing the event in real time as the state information of the first user;
The generating unit
The information processing apparatus according to claim 1, wherein a part of said event content determined by said state information of said second user is used to generate a highlight of said event content to be provided to said second user.
The acquisition unit
The information processing apparatus according to claim 1, wherein when the second user is not viewing the event in real time, the state information is acquired as the state information of the first user who is a user different from the second user.
The acquisition unit
The information processing apparatus according to claim 12, wherein the state information of the first user who is a user similar to the attributes of the second user is obtained.
The acquisition unit
The information processing apparatus according to claim 13, wherein the state information of the first user who is a user similar to the demographic attributes of the second user is obtained.
The acquisition unit
The information processing apparatus according to claim 14, wherein the state information is obtained for the first user who is a user similar in at least one of age and sex to the second user.
The acquisition unit
The information processing apparatus according to claim 13, wherein the state information of the first user who is a user similar to the psychographic attributes of the second user is obtained.
The acquisition unit
The information processing apparatus according to claim 13, wherein the state information is obtained for the first user who is a user similar to the preference of the second user.
The acquisition unit
The information processing apparatus according to claim 13 , wherein the status information is obtained for the first user whose favorite object is the same as the second user.
a transmission unit configured to transmit a highlight of the event content generated by the generation unit to a terminal device used by the second user;
The information processing device according to claim 1 .
Acquiring state information indicating a state of real-time viewing of the event by a first user who has viewed the event in real time, and event content, which is video of the event;
generating highlights of the event content to be provided to a second user using a portion of the event content determined by the obtained state information of the first user;
An information processing method that performs processing.