WO2023120263A1 - Information processing device and information processing method - Google Patents

Information processing device and information processing method Download PDF

Info

Publication number
WO2023120263A1
WO2023120263A1 PCT/JP2022/045588 JP2022045588W WO2023120263A1 WO 2023120263 A1 WO2023120263 A1 WO 2023120263A1 JP 2022045588 W JP2022045588 W JP 2022045588W WO 2023120263 A1 WO2023120263 A1 WO 2023120263A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
information processing
event
information
highlight
Prior art date
Application number
PCT/JP2022/045588
Other languages
French (fr)
Japanese (ja)
Inventor
広 岩瀬
悠 朽木
俊之 荒木
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2023120263A1 publication Critical patent/WO2023120263A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer

Definitions

  • the present disclosure relates to an information processing device and an information processing method.
  • Technology is provided that automatically generates highlights (digests) of content such as video (video). For example, a technology has been provided that identifies whether or not a visitor to an event such as a concert, a sports game, or a lecture is smiling and generates highlights (for example, Patent Document 1).
  • the present disclosure proposes an information processing device and an information processing method capable of generating user-specific highlights.
  • an information processing apparatus includes state information indicating a state of a first user who has viewed an event in real time, when the event is viewed in real time; providing a second user with an acquisition unit for acquiring event content, which is a video of an event, and a part of the event content determined by the state information of the first user acquired by the acquisition unit; a generation unit that generates highlights of the event content.
  • FIG. 3 is a diagram showing an example of information processing according to an embodiment of the present disclosure
  • FIG. FIG. 3 is a diagram showing an example of information processing according to an embodiment of the present disclosure
  • FIG. 1 is a diagram illustrating a configuration example of an information processing system according to an embodiment of the present disclosure
  • FIG. It is a figure which shows the example of arrangement
  • FIG. 4 is a diagram illustrating a configuration example of a highlight generation server according to an embodiment of the present disclosure
  • FIG. FIG. 3 is a diagram illustrating an example of a dataset storage unit according to an embodiment of the present disclosure
  • FIG. It is a figure showing an example of a model information storage part concerning an embodiment of this indication.
  • FIG. 4 is a diagram illustrating an example of a threshold information storage unit according to an embodiment of the present disclosure
  • FIG. FIG. 2 is a diagram illustrating a configuration example of an edge viewing terminal according to an embodiment of the present disclosure
  • FIG. 4 is a flow chart showing a processing procedure of the information processing device according to the embodiment of the present disclosure
  • It is a figure which shows the functional structural example regarding learning of an information processing system.
  • It is a figure which shows an example of learning and inference regarding the personal attribute determination device of an information processing system.
  • FIG. 10 is a diagram showing an example of learning and inference regarding the action index determiner of the information processing system
  • FIG. 2 illustrates an example of learning and reasoning for a highlight scene predictor of an information processing system;
  • FIG. 4 is a diagram illustrating a functional configuration example related to highlight generation of an information processing system
  • FIG. 10 is a flowchart showing a processing procedure regarding highlight generation
  • FIG. 10 is a diagram illustrating an example of superimposed display of wipes on highlights
  • FIG. 10 is a diagram illustrating an example of angle estimation of a highlight
  • It is a figure which shows an example of presentation of the information regarding an audience.
  • 1 is a hardware configuration diagram showing an example of a computer that implements functions of an information processing apparatus; FIG.
  • Embodiment 1-1 Outline of information processing according to embodiment of present disclosure 1-1-1.
  • First example (first user second user) 1-1-2.
  • FIGS. 1 and 2 are diagrams illustrating an example of information processing according to an embodiment of the present disclosure.
  • the information processing according to the embodiment of the present disclosure is realized by the information processing system 1 including the highlight generation server 100, the plurality of edge viewing terminals 10, and the like.
  • a sports event such as a basketball game will be described as an example of an event for which highlights are generated (hereinafter also referred to as a "target event").
  • the highlight is a video generated using video content (also referred to as “event content”) that captures the event, and is, for example, a digest video that is shorter than the event content.
  • the sports here also include e-sports (electronic sports) performed using electronic devices (computers).
  • the target event is not limited to sports, and may be various events in which there are users (also referred to as “spectators”) who watch the event.
  • a spectator is a user watching or watching an event.
  • the target events may be music events such as live performances and concerts, events involving the improvisation of creative works such as calligraphy and painting, and art-related events such as plays, musicals, vaudeville, and live comedy.
  • the target event may be a lecture, talk show, seminar, or the like.
  • the target event may be an event in which a large number of people view content, such as a movie screening event.
  • the target event is not limited to an event that takes place in a real space as described above, but may be an event that takes place in a virtual space.
  • the target event may be a virtual event such as a live music performance held within an online game.
  • the event (target event) targeted for highlight generation by the information processing system 1 may be any event that can be targeted for highlight generation.
  • the user who watched the event in real time may be referred to as the "first user”, and the user to whom the highlights are provided may be referred to as the "second user”. That is, when a user to whom highlights are provided has viewed an event in real time, the first user and the second user may be the same user.
  • real-time viewing of an event means, for example, viewing the event at the date and time (period) when the event is held.
  • FIGS. 1 and 2 show two cases, ie, the case where the entire event is viewed in real time and the case where no event is viewed in real time. In some cases, only part of the event is viewed in real time, and processing in that case will be described later.
  • image information such as a video (moving image) captured by the first user when viewing the event in real time is used as information indicating the state of the first user viewing the event in real time (also referred to as “status information”).
  • state information is not limited to image information obtained by imaging the first user, and may be any information as long as it indicates the state of the first user.
  • the state information may be biometric information detected about the first user, such as information such as the first user's heartbeat, body temperature, and breathing.
  • FIG. 1 and 2 are diagrams illustrating an example of information processing according to an embodiment of the present disclosure.
  • FIGS. 1 and 2 show an example of highlight generation processing executed by a highlight generation server 100, which is an example of an information processing apparatus.
  • 1 and 2 show a case where event A, which is a sporting event, is the target event. That is, FIGS. 1 and 2 show a case where the video (event content) captured by event A is the content targeted for highlight generation (also referred to as “target content”).
  • 1 and 2 show the case where the model M3 used for highlight scene prediction is used, and the details of the model M3 will be described later.
  • FIG. 1 shows a case where a user to whom highlights are provided has viewed an event in real time, that is, a case where the first user and the second user are the same user.
  • FIG. 1 shows an example of highlight generation processing in the case where the user to whom highlights are provided is the user U1 and the target event is viewed in real time. That is, FIG. 1 shows a case where user U1 is a user who watched event A in real time (real time spectator).
  • the highlight generation server 100 inputs the input data IND1, which is the data of the user U1, to the model M3 (step S11).
  • the highlight generation server 100 inputs the input data IND1 based on the information (state information) indicating the state of the event A viewed by the user U1 in real time to the model M3.
  • the state information of the user U1 includes image information such as a video (moving image) of the user U1 when viewing the event A in real time.
  • FIG. 1 shows a case where user U1's state information is used as input data IND1. That is, in FIG. 1, the image of the user U1 when viewing the event A in real time is used as the input data IND1.
  • Information (input information) to be input to the model M3, such as the input data IND1, may be information relating to feature amounts generated based on the state information of the first user, which will be described later.
  • the model M3 to which the input data IND1 is input outputs the output data OD1 (step S12).
  • the model M3 outputs the score used for highlight generation of the event A as the output data OD1.
  • model M3 outputs a score corresponding to each point in time during event A.
  • model M3 outputs a score corresponding to each time point in the hour if event A is one hour in duration.
  • the model M3 may output a score corresponding to each time point (for example, 1 minute, 2 minutes, 3 minutes, etc.) at predetermined intervals (for example, 1 minute intervals) during the period of the event A.
  • a continuous score (waveform) may be output for the period.
  • the highlight generation server 100 uses the output data OD1 to determine a period to be highlighted (also referred to as a "highlight target period") (step S13).
  • the highlight generation server 100 compares the score output by the model M3 with a predetermined threshold value to determine the highlight target period of the highlight to be provided to the user U1.
  • the highlight generation server 100 determines the highlight target period of the highlights to be provided to the user U1 based on the period during which the score is equal to or greater than a predetermined threshold.
  • the highlight generation server 100 determines a period during which the score is equal to or greater than a predetermined threshold as the highlight target period of the highlights to be provided to the user U1.
  • the highlight generation server 100 determines a period such as 1-3 minutes, 15-20 minutes, etc. as the highlight target period of the highlight to be provided to the user U1, as shown in the target period information PTD1.
  • the highlight generation server 100 generates highlights to be provided to the user U1 based on the determined highlight target period (step S14).
  • the highlight generation server 100 uses target period information PTD1 for user U1 and target content TCV1, which is video of event A, to generate highlight HLD1 for user U1.
  • the highlight generation server 100 generates highlights HLD1 for the user U1 by using portions of the target content TCV1 that correspond to periods such as 1-3 minutes and 15-20 minutes indicated in the target period information PTD1. That is, the highlight generating server 100 generates, as the highlight HLD1 for the user U1, the content extracted from the target content TCV1 corresponding to a period of 1-3 minutes, 15-20 minutes, or the like.
  • Highlight HLD1 for user U1 is a video of event A that includes only periods of 1-3 minutes, 15-20 minutes, etc. that are estimated to be appropriate highlights for user U1.
  • the highlight generation server 100 provides the highlight generation server 100 with the user U1 based on the state information of the user U1. generate light. Thereby, the highlight generation server 100 can generate appropriate highlights for the user U1.
  • the highlight generation server 100 transmits the generated highlight HLD1 for the user U1 to the edge viewing terminal 10 used by the user U1. Then, the edge viewing terminal 10 used by the user U1 outputs (reproduces) the highlight HLD1. Thereby, in the information processing system 1, the user U1 can browse the highlights customized for himself/herself.
  • FIG. 2 shows a case where a user to whom highlights are provided is not viewing the event in real time, that is, a case where the first user and the second user are different users.
  • FIG. 2 shows an example of highlight generation processing when the user to whom highlights are provided is the user U2 and the target event is not being viewed in real time. That is, FIG. 2 shows a case where the user U2 is not the user (real-time spectator) who watched the event A in real-time.
  • the user U2 is a user (also referred to as a “remote viewer”) who is located remotely from the venue of the event and views (views) content. Note that description of the same points as in FIG. 1 will be omitted as appropriate.
  • user U2 is not a real-time spectator of event A, so the highlight generation server 100 performs processing with a user similar to user U2 as the first user.
  • the highlight generating server 100 takes a user who is similar to the user U2 and who is a real-time spectator of the event A as the first user, and inputs the input data IND2, which is the first user's data, to the model M3 (step S21).
  • the highlight generation server 100 determines, among the real-time spectators of the event A, a user similar to the attributes of the user U2 as the first user.
  • the highlight generation server 100 determines a first user among the real-time spectators of event A whose demographic attribute or psychographic attribute is similar to that of user U2.
  • the highlight generation server 100 is similar to user U2 in terms of age, gender, preferences such as favorite objects (teams to support, etc.), family structure, income, lifestyle, etc.
  • the user is determined as the first user.
  • the highlight generation server 100 determines a user (referred to as "user U50") whose age and gender match those of user U2 as the first user will be described as an example.
  • the highlight generation server 100 determines (identifies) the real-time spectators of the event A using the information indicating the users who are associated with each event and have viewed the event in real time.
  • the highlight generation server 100 may determine (identify) users similar to user U2 by comparing attribute information associated with each user. Note that the highlight generation server 100 may use a model to determine attributes, but this point will be described later.
  • the highlight generation server 100 inputs input data IND2 based on information (state information) indicating the state of real-time viewing of event A by user U50, who is the first user corresponding to user U2, to model M3.
  • the state information of the user U50 includes image information such as a video (moving image) of the user U50 when viewing the event A in real time.
  • a video image of the user U50 when viewing the event A in real time is used as the input data IND2.
  • the model M3 to which the input data IND2 is input outputs the output data OD2 (step S22).
  • the model M3 outputs the score used for highlight generation of the event A as the output data OD2. For example, model M3 outputs a score corresponding to each point in time during event A.
  • the highlight generation server 100 determines a period to be highlighted (highlight target period) using the output data OD2 (step S23).
  • the highlight generation server 100 compares the score output by the model M3 with a predetermined threshold to determine the highlight target period of the highlight to be provided to the user U2.
  • the highlight generation server 100 determines periods such as 5-10 minutes, 25-30 minutes, etc., as the highlight target periods of the highlights to be provided to the user U2, as indicated by the target period information PTD2.
  • the highlight generation server 100 generates highlights to be provided to the user U2 based on the determined highlight target period (step S24).
  • the highlight generation server 100 uses target period information PTD2 for user U2 and target content TCV1, which is video of event A, to generate highlight HLD2 for user U2.
  • the highlight generation server 100 generates highlights HLD2 for the user U2 using portions of the target content TCV1 that correspond to periods such as 5-10 minutes and 25-30 minutes indicated in the target period information PTD2.
  • the highlight generation server 100 generates a video of the event A including only the period of 5-10 minutes, 25-30 minutes, etc. as the highlight HLD2 for the user U2.
  • the highlight generation server 100 may determine the status of the user U2 based on the state information of the users similar to the user U2. Generate highlights to provide to . As a result, the highlight generation server 100 can generate appropriate highlights for the user U2 even when the second user is not viewing in real time.
  • the highlight generation server 100 transmits the generated highlight HLD2 for the user U2 to the edge viewing terminal 10 used by the user U2. Then, the edge viewing terminal 10 used by the user U2 outputs (reproduces) the highlight HLD2. Thereby, in the information processing system 1, the user U2 can browse the highlights customized for himself/herself.
  • the state information of the user (spectator) watching the event in real time is used to generate the highlight, thereby analyzing the video of the event (event content) itself.
  • the information processing system 1 selects scenes according to the attributes of viewers based on recognition and analysis of images (images) of spectators, generate light.
  • the information processing system 1 includes a personal attribute determiner, a behavior index determiner such as excitement or degree of concentration, and a highlight scene prediction based on a video of spectators (or remote viewers) at an event venue such as a sport. Learn (generate) a device.
  • the information processing system 1 uses the skeleton, face recognition information, motion detection, line of sight, etc.
  • the information processing system 1 learns (generates) a personal attribute determiner, an action index determiner, and a highlight scene predictor by supervised learning or rule base.
  • the information processing system 1 according to the real-time watching time of the remote viewer (or the spectator at the venue) watching the highlight and the personal attribute determined by the personal attribute determiner, the viewer himself or the real-time watching of the same attribute generate highlights corresponding to (a set of) spectators.
  • the information processing system 1 provides a viewer (user) viewing highlights with highlights generated using a highlight scene predictor (or behavior index determiner).
  • the information processing system 1 recognizes features obtained by image recognition of a group of spectators at the venue who have the same attributes as the viewers viewing the highlights (scenes of the highlights) extracted on the time axis described above.
  • the optimal gaze point position and angle may be estimated from the amount, and the camera work of the highlight scene may be determined. Note that these points will be described later.
  • the information processing system 1 shown in FIG. 3 will be described.
  • the information processing system 1 includes a highlight generation server 100 , a plurality of edge viewing terminals 10 , a content distribution server 50 and an audience video collection server 60 .
  • the highlight generation server 100, each of the plurality of edge viewing terminals 10, the content distribution server 50, and the spectator video collection server 60 are communicatively connected via a predetermined communication network (network N) by wire or wirelessly. be done.
  • FIG. 3 is a diagram illustrating a configuration example of an information processing system according to the embodiment
  • the information processing system 1 may include four or more edge viewing terminals 10 .
  • each edge viewing terminal 10 may be described as an edge viewing terminal 10a, an edge viewing terminal 10b, and an edge viewing terminal 10c in order to distinguish and describe each edge viewing terminal 10.
  • the information processing system 1 shown in FIG. 3 may include a plurality of highlight generation servers 100, a plurality of content distribution servers 50, and a plurality of spectator video collection servers 60.
  • the highlight generation server 100 is a computer used to provide highlight services to users.
  • the highlight generation server 100 generates highlights of event content to be provided to the second user, using part of the event content determined by the state information indicating the state of the first user's event during real-time viewing.
  • the edge viewing terminal 10 is a computer used by the user.
  • edge viewing terminals 10 are utilized by remote viewers or spectators.
  • the edge viewing terminal 10 is used by a user who accesses content such as web pages displayed on a browser and content for applications.
  • the edge viewing terminal 10 is used by the user to browse content.
  • the edge viewing terminal 10 may be, for example, a notebook PC (Personal Computer), a tablet terminal, a desktop PC, a smartphone, a smart speaker, a television, a mobile phone, a PDA (Personal Digital Assistant), or other device.
  • the edge viewing terminal 10 may be referred to as a user hereinafter. That is, hereinafter, the user can also be read as the edge viewing terminal 10 .
  • the edge viewing terminal 10 outputs information about the event.
  • the edge viewing terminal 10 outputs information related to various contents such as videos and highlights of the event.
  • the edge viewing terminal 10 displays the image (video) of the highlight and outputs the audio of the highlight.
  • the edge viewing terminal 10 transmits user's utterances and images (video) to the highlight generation server 100 and receives highlight audio and images (video) from the highlight generation server 100 .
  • the edge viewing terminal 10 transmits the captured video of the user to the highlight generation server 100 .
  • the edge viewing terminal 10 accepts input from the user.
  • the edge viewing terminal 10 receives voice input by user's utterance and input by user's operation.
  • the edge viewing terminal 10 may be any device as long as it can implement the processing in the embodiments.
  • the edge viewing terminal 10 may be any device as long as it has a function of displaying content information and outputting audio.
  • each edge viewing terminal 10 has a camera 14 and a display unit 15 .
  • the content distribution server 50 is a server device (computer) that provides a service for distributing content of photographed events. Note that the content distribution server 50 is the same as a server having a function of distributing content, so detailed description thereof will be omitted.
  • the spectator video collection server 60 is a server device (computer) that collects videos of spectators watching the event in real time.
  • the spectator video collection server 60 transmits the collected video to the highlight generation server 100 .
  • the spectator video collection server 60 is the same as the content distribution server 50 except that the object to be photographed is the spectator, so a detailed description thereof will be omitted.
  • the imaging equipment group FIA which is spectator imaging equipment such as a camera for photographing the game, performers, etc., and the sound of the game, performers, etc.
  • a sound collecting device group SCD which is a content sound collecting device such as a microphone, is arranged. Information collected by the imaging device group FIA and the sound collecting device group SCD is transmitted by the content distribution server 50 to the highlight generation server 100 .
  • an imaging equipment group SIA which is spectator imaging equipment such as a camera for capturing an image of an audience at the venue, is arranged at a sports or live venue.
  • the spectator video collection server 60 transmits the information collected by the imaging device group SIA to the highlight generation server 100 .
  • the environment for remote viewing in which content is viewed in real time or in highlights from a location other than the site consists of the display unit 15, which is a display device for viewing content, and the remote viewer himself/herself. It is composed of an edge viewing terminal 10 including a camera 14 or the like, which is a viewer imaging device such as a camera for viewing.
  • the image of the remote viewer is sent to the highlight generation server 100 through the network N.
  • the edge viewing terminal 10 can receive and view highlights (moving images) individually distributed from the highlight generation server 100 .
  • the highlight generation server 100 generates video and audio data of content, captured image data of local spectators, and captured images of remote viewers viewing content in real time or in highlight mode. Data are collected.
  • the highlight generation server 100 generates a highlight video optimized for the individual highlight viewer (second user) and distributes it to the individual highlight viewer.
  • the device configuration (device configuration) of the information processing system 1 is not limited to the configuration described above, and any device configuration can be adopted. That is, the information processing system 1 may have a configuration other than that described above.
  • the highlight generation server 100 may be integrated with any one of the edge viewing terminal 10, the content distribution server 50, and the spectator video collection server 60. That is, any one of the edge viewing terminal 10, the content distribution server 50, or the spectator video collection server 60 may have the function of the highlight generation server 100.
  • FIG. 4 is a diagram showing an example of arrangement of spectator imaging devices.
  • the spectator imaging device may be a 4K video camera.
  • FIG. 4 shows a case where at least one spectator imaging device is arranged at each of four points PT1 to PT4 in a basketball game venue.
  • the spectator imaging device placed at point PT1 photographs spectators (users) positioned in area AR1.
  • the spectator imaging device placed at the point PT2 photographs the spectators (users) positioned in the area AR2.
  • the spectator imaging device placed at the point PT3 photographs the spectators (users) positioned in the area AR3.
  • the spectator imaging device placed at the point PT4 photographs the spectators (users) positioned in the area AR4.
  • Areas AR1 to AR4 in FIG. 4 show outlines of shooting areas from points PT1 to PT4.
  • areas AR1 to AR4 may cover all the spectators at the venue.
  • each of the areas AR1 to AR4 may partially overlap with another area.
  • FIG. 4 is merely an example, and the spectator imaging device may be arranged in any manner as long as it is possible to photograph a desired spectator.
  • FIG. 5 is a diagram illustrating a configuration example of a highlight generation server according to an embodiment of the present disclosure
  • the highlight generation server 100 has a communication section 110, a storage section 120, and a control section .
  • the highlight generation server 100 includes an input unit (for example, a keyboard, a mouse, etc.) that receives various operations from the administrator of the highlight generation server 100, and a display unit (for example, a liquid crystal display, etc.) for displaying various information. ).
  • the communication unit 110 is implemented by, for example, a NIC (Network Interface Card) or the like.
  • the communication unit 110 is connected to the network N (see FIG. 3) by wire or wirelessly, and exchanges information with other information processing devices such as the edge viewing terminal 10, the content distribution server 50, and the spectator video collection server 60. Send and receive.
  • the communication unit 110 may transmit and receive information to and from a user terminal (not shown) used by the user.
  • the storage unit 120 is implemented by, for example, a semiconductor memory device such as RAM (Random Access Memory) or flash memory, or a storage device such as a hard disk or optical disk.
  • the storage unit 120 according to the embodiment has a dataset storage unit 121, a model information storage unit 122, a threshold information storage unit 123, and a content information storage unit 124, as shown in FIG.
  • the data set storage unit 121 stores various information related to data used for learning.
  • the dataset storage unit 121 stores datasets used for learning.
  • FIG. 6 is a diagram illustrating an example of a dataset storage unit according to an embodiment of the present disclosure; FIG. 6 shows an example of the dataset storage unit 121 according to the embodiment.
  • each table in the data set storage unit 121 includes items such as "target model ID”, "data ID”, “data”, “label”, and "date and time”.
  • the data set storage unit 121 stores data used for learning each of a plurality of models in association with the model to be learned, such as tables TB1, TB2, TB3, etc. in FIG. Although only three tables TB1, TB2, and TB3 are illustrated in FIG. 6, the data set storage unit 121 may include tables corresponding to the number of learned models.
  • Target model ID indicates identification information for identifying a model to be learned (target model).
  • Data ID indicates identification information for identifying data used in the learning process of the target model.
  • Data indicates data identified by a data ID.
  • Label indicates the label (correct label) attached to the corresponding data.
  • the “label” may be information (correct answer information) indicating the classification (category) of the corresponding data.
  • label is correct information (correct label) corresponding to the output of the target model.
  • “Date and time” indicates the time (date and time) related to the corresponding data.
  • “DA1” or the like is shown, but the "date and time” may be a specific date and time such as "17:48:35 on December 15, 2021".
  • Information indicating from which model learning the data started to be used may be stored, such as "use started from model learning of version XX”.
  • the example of FIG. 6 indicates that the data in the table TB1 is the data used for learning the target model (model M1) identified by the target model ID "M1".
  • the data used for learning the model M1 includes a plurality of data identified by data IDs "DID1", “DID2", “DID3”, and the like.
  • each data (data DT1, DT2, DT3, etc.) identified by data IDs "DID1", “DID2", “DID3”, etc. is information used for learning of the model M1 that performs personal attribute determination.
  • data DT1, DT2, DT3, etc. are input data for the model M1, and labels LB1, LB2, LB3, etc. corresponding to each data indicate the desired output of the model M1 when each data is input.
  • the data in the table TB2 indicates that it is the data used for learning the target model (model M2) identified by the target model ID "M2". Model M2 used for action index determination is learned using the data in table TB2.
  • the data in the table TB3 are data used for learning the target model (model M3) identified by the target model ID "M3”. Model M3 used for highlight scene prediction is learned using the data in table TB3.
  • the data set storage unit 121 may store various information not limited to the above, depending on the purpose.
  • the data set storage unit 121 may store data such as whether each data is learning data or evaluation data so as to be identifiable.
  • the data set storage unit 121 may store various information related to various data such as learning data used for learning and evaluation data used for accuracy evaluation (calculation).
  • the data set storage unit 121 stores learning data and evaluation data in a distinguishable manner.
  • the data set storage unit 121 may store information identifying whether each data is learning data or evaluation data.
  • the highlight generation server 100 learns a model based on each data used as learning data and the correct answer information.
  • the highlight generation server 100 calculates the accuracy of the model based on each data used as the evaluation data and the correct answer information.
  • the highlight generation server 100 calculates the accuracy of the model by collecting the result of comparing the output result output by the model when the evaluation data is input with the correct answer information.
  • the model information storage unit 122 stores information about models.
  • the model information storage unit 122 stores information (model data) indicating the structure of a model (network).
  • FIG. 7 is a diagram illustrating an example of a model information storage unit according to an embodiment of the present disclosure; FIG. 7 shows an example of the model information storage unit 122 according to the embodiment.
  • the model information storage unit 122 includes items such as "model ID", "usage", and "model data”.
  • Model ID indicates identification information for identifying a model.
  • User indicates the use of the corresponding model.
  • Model data indicates model data.
  • FIG. 7 shows an example in which conceptual information such as “MDT1" is stored in “model data”, but in reality, various types of information that make up the model, such as network information and functions included in the model, are stored. included.
  • model (model M1) identified by the model ID "M1" indicates that the application is "personal attribute determination”.
  • Model M1 indicates that it is a model used for personal attribute determination. It also indicates that the model data of the model M1 is the model data MDT1.
  • model (model M2) identified by the model ID "M2” indicates that the application is "behavior index determination”. Model M2 indicates that it is a model used for action index determination. It also indicates that the model data of the model M2 is the model data MDT2.
  • model (model M3) identified by the model ID "M3" indicates that the application is "highlight scene prediction”. Model M3 indicates that it is a model used for highlight scene prediction. It also indicates that the model data of the model M3 is the model data MDT3. Also, the model (model M11) identified by the model ID "M11” indicates that the application is "angle estimation”. Model M11 indicates that it is a model used for angle estimation. It also indicates that the model data of the model M11 is the model data MDT11.
  • model information storage unit 122 may store various types of information, not limited to the above, depending on the purpose.
  • the model information storage unit 122 stores parameter information of the model learned (generated) by the learning process.
  • the threshold information storage unit 123 stores various information regarding thresholds.
  • the threshold information storage unit 123 stores various information related to thresholds used for comparison with model outputs (scores, etc.).
  • FIG. 8 is a diagram illustrating an example of a threshold information storage unit according to an embodiment of the present disclosure;
  • the threshold information storage unit 123 shown in FIG. 8 includes items such as "threshold ID", "usage”, and "threshold”.
  • Threshold ID indicates identification information for identifying the threshold.
  • User indicates the usage of the threshold.
  • Threshold indicates a specific value of the threshold identified by the corresponding threshold ID.
  • the threshold (threshold TH1) identified by the threshold ID "TH1" is stored in association with information indicating that it is used for viewing determination. For example, the threshold TH1 is used to determine whether the user is viewing in real time.
  • the value of the threshold TH1 indicates "VL1".
  • the value of the threshold TH1 is a specific numerical value (for example, 0.4, 0.6, etc.), although it is indicated by an abstract code such as "VL1".
  • the threshold TH2 is used to determine which part of the event content is used for highlighting.
  • the threshold TH2 is used to determine whether or not to include a video of event content at a certain point in time.
  • the value of the threshold TH2 indicates "VL2".
  • the value of the threshold TH1 is a specific numerical value (for example, 0.5, 0.8, etc.), although it is indicated by an abstract code such as "VL2".
  • the threshold information storage unit 123 may store various types of information, not limited to the above, depending on the purpose.
  • the content information storage unit 124 stores various types of information regarding content displayed on the edge viewing terminal 10 .
  • the content information storage unit 124 stores information about content displayed by an application (also referred to as “app”) installed in the edge viewing terminal 10 .
  • the content information storage unit 124 stores event content, which is video of the event.
  • the content information storage unit 124 stores the event content of the event in association with the event. Note that the above is merely an example, and the content information storage unit 124 may store various types of information according to the content for which response candidates are displayed.
  • the content information storage unit 124 stores various kinds of information necessary for providing content to the edge viewing terminal 10, displaying response candidates on the edge viewing terminal 10, and the like.
  • the storage unit 120 may store various information other than the above.
  • the storage unit 120 stores various information regarding highlight generation.
  • the storage unit 120 stores various data for providing data to the edge viewing terminal 10 .
  • the storage unit 120 stores various information used to generate information displayed on the edge viewing terminal 10 .
  • the storage unit 120 stores information about content displayed by an application (content display application or the like) installed in the edge viewing terminal 10 .
  • the storage unit 120 stores information about content displayed by a content display application. Note that the above is merely an example, and the storage unit 120 may store various types of information used to provide the highlight service to the user.
  • the storage unit 120 stores attribute information and the like of each user.
  • the storage unit 120 stores user information corresponding to information identifying each user (user ID, etc.) in association with each other.
  • the storage unit 120 stores information indicating personal attributes determined by the model M1 in association with the user.
  • the control unit 130 stores a program (for example, an information processing program according to the present disclosure) stored inside the highlight generation server 100 by a CPU (Central Processing Unit) or MPU (Micro Processing Unit). Access Memory) etc. is executed as a work area. Also, the control unit 130 is implemented by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
  • a program for example, an information processing program according to the present disclosure
  • a program for example, an information processing program according to the present disclosure
  • MPU Micro Processing Unit
  • Access Memory etc.
  • the control unit 130 is implemented by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
  • the control unit 130 includes an acquisition unit 131, a learning unit 132, an image processing unit 133, a generation unit 134, and a transmission unit 135. Realize or perform an action.
  • the internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 5, and may be another configuration as long as it performs information processing described later.
  • the connection relationship between the processing units of the control unit 130 is not limited to the connection relationship shown in FIG. 5, and may be another connection relationship.
  • the acquisition unit 131 acquires various types of information. Acquisition unit 131 acquires various types of information from an external information processing device. The acquisition unit 131 acquires various types of information from the edge viewing terminal 10 , the content distribution server 50 and the spectator video collection server 60 . The acquisition unit 131 acquires information detected by the edge viewing terminal 10 from the edge viewing terminal 10 .
  • the acquisition unit 131 receives information from the content distribution server 50 or the spectator video collection server 60 .
  • the acquisition unit 131 acquires requested information from the content distribution server 50 or the spectator video collection server 60 .
  • Acquisition unit 131 acquires video from content distribution server 50 .
  • the acquisition unit 131 acquires the video imaged by the imaging device group FIA from the content distribution server 50 .
  • the acquisition unit 131 acquires the sound detected by the sound collection device group SCD from the content distribution server 50 .
  • the acquisition unit 131 acquires the video from the spectator video collection server 60 .
  • Acquisition unit 131 acquires various types of information from storage unit 120 .
  • the acquisition unit 131 acquires the video captured by the imaging device group SIA from the spectator video collection server 60 .
  • the acquisition unit 131 acquires state information indicating the state of the real-time viewing of the event by the first user who has viewed the event in real-time.
  • Acquisition unit 131 acquires event content, which is video of an event.
  • Acquisition unit 131 acquires event content, which is video of an event.
  • the acquisition unit 131 acquires the status information of the first user who watched in real time at the venue of the event. Acquisition unit 131 acquires the state information of the first user who watched the sports or art event in real time. Acquisition unit 131 acquires the state information of the first user who watched the event locally. Acquisition unit 131 acquires state information including image information of the first user. Acquisition unit 131 acquires state information including biometric information of the first user.
  • the acquiring unit 131 acquires the state information indicating the state of the second user viewing the event in real time as the state information of the first user. If the second user is not viewing the event in real time, the acquisition unit 131 acquires the state information of the first user who is different from the second user.
  • the acquisition unit 131 acquires the state information of the first user, who is a user similar to the attributes of the second user.
  • the acquisition unit 131 acquires the state information of the first user who is a user similar to the demographic attributes of the second user.
  • the acquisition unit 131 acquires the state information of the first user who is a user similar in at least one of age and sex to the second user.
  • the acquisition unit 131 acquires the state information of the first user, who is a user similar to the psychographic attributes of the second user.
  • the acquisition unit 131 acquires the state information of the first user who is a user similar to the second user's preference.
  • the acquisition unit 131 acquires the state information of the first user who is a user whose love interest matches that of the second user.
  • the learning unit 132 learns various types of information.
  • the learning unit 132 learns various types of information based on information from an external information processing device and information stored in the storage unit 120 .
  • the learning unit 132 learns various types of information based on the information stored in the data set storage unit 121 .
  • the learning unit 132 stores the model generated by learning in the model information storage unit 122 .
  • the learning unit 132 stores the model updated by learning in the model information storage unit 122 .
  • the learning unit 132 performs learning processing.
  • the learning unit 132 performs various types of learning.
  • the learning unit 132 learns various types of information based on the information acquired by the acquisition unit 131 .
  • the learning unit 132 learns (generates) a model.
  • the learning unit 132 learns various types of information such as models.
  • the learning unit 132 generates a model through learning.
  • the learning unit 132 learns the model using various machine learning techniques. For example, the learning unit 132 learns model (network) parameters.
  • the learning unit 132 learns the model using various machine learning techniques.
  • the learning unit 132 learns the model using learning data including highlights of past events and status information of users who watched the past events in real time.
  • the learning unit 132 generates various models such as models M1, M2, M3, and M11.
  • the learning unit 132 generates a model M3 for determining highlight scenes.
  • the learning unit 132 learns network parameters.
  • the learning unit 132 learns network parameters of various models such as models M1, M2, M3, and M11.
  • the learning unit 132 learns network parameters of the model M3 for determining highlight scenes.
  • the learning unit 132 performs learning processing based on the learning data (teacher data) stored in the data set storage unit 121.
  • the learning unit 132 generates various models such as models M1, M2, M3, and M11 by performing learning processing using the learning data stored in the data set storage unit 121 .
  • the learning unit 132 may generate a model used for image recognition.
  • the learning unit 132 generates the model M1 by learning parameters of the network of the model M1.
  • the learning unit 132 generates the model M3 by learning parameters of the network of the model M3.
  • the method of learning by the learning unit 132 is not particularly limited. You can learn. Also, for example, a technique based on DNN (Deep Neural Network) such as CNN (Convolutional Neural Network) and 3D-CNN may be used.
  • the learning unit 132 uses a recurrent neural network (RNN) or LSTM (Long Short-Term Memory units), which is an extension of RNN, when targeting time-series data such as moving images (moving images) such as videos. You may use the method based on.
  • RNN recurrent neural network
  • LSTM Long Short-Term Memory units
  • the image processing unit 133 executes various processes related to image processing.
  • the image processing unit 133 executes processing on an image (video) of the user.
  • the image processing unit 133 performs processing on a video image of the spectators of the event.
  • the image processing unit 133 recognizes a person (user) in the video by image recognition processing. For example, the image processing unit 133 detects the orientation of the user's face and the line of sight in the video through image recognition processing.
  • the image processing unit 133 generates information to be input to the model from the video. For example, the image processing unit 133 generates feature amounts to be input to the model from the video. For example, the image processing unit 133 extracts feature amounts from video by image processing. The image processing unit 133 may extract the feature amount from the video using a model (feature amount extraction model) that outputs the feature amount of the video as input. Note that the above-described processing is an example, and the image processing unit 133 may appropriately use various techniques related to image processing to extract feature amounts from video. Also, when each model receives video (image) itself instead of a feature amount, the highlight generation server 100 does not need to have the image processing unit 133 .
  • the generation unit 134 generates various types of information.
  • the generation unit 134 generates various types of information based on information from an external information processing device and information stored in the storage unit 120 .
  • the generation unit 134 generates various types of information based on information from other information processing devices such as the edge viewing terminal 10, the content distribution server 50, the spectator video collection server 60, and the like.
  • the generation unit 134 generates various types of information based on the information stored in the data set storage unit 121, the model information storage unit 122, the threshold information storage unit 123, and the content information storage unit 124.
  • the generation unit 134 generates various information to be displayed on the edge viewing terminal 10 based on the model learned by the learning unit 132 .
  • the generation unit 134 uses part of the event content determined by the first user's state information acquired by the acquisition unit 131 to generate highlights of the event content to be provided to the second user.
  • the generation unit 134 generates highlights of event content using a model that outputs a score corresponding to the period of the event in response to input of input data based on state information.
  • the generator 134 uses the model to determine part of the event content.
  • the generation unit 134 generates highlights of event content using a model with state information as input.
  • the generation unit 134 generates highlights of event content using a model that receives an image of a user as an input.
  • the generation unit 134 generates highlights of event content using a model whose input is the feature amount extracted from the state information.
  • the generation unit 134 generates highlights of event content using a model whose input is a feature amount extracted from a video of a user.
  • the generation unit 134 generates highlights of the event content using part of the determined event content.
  • the generation unit 134 determines a portion of the event content corresponding to the period corresponding to the score equal to or greater than the threshold as part of the event content, and uses the determined portion of the event content to highlight the event content. to generate
  • the generating unit 134 uses the model learned by the learning unit 132 to generate event content highlights.
  • the generation unit 134 When the second user is viewing the event in real time, the generation unit 134 generates highlights of the event content to be provided to the second user using a part of the event content determined by the state information of the second user. Generate. If the second user is not viewing the event in real time, the generation unit 134 generates the second event content using part of the event content determined by the state information of the first user who is a user different from the second user. Generate event content highlights for your users.
  • the generation unit 134 executes processing for generating information to be provided to the edge viewing terminal 10 .
  • the generation unit 134 may generate a display screen (content) to be displayed on the edge viewing terminal 10 as data.
  • the generation unit 134 may generate a screen (content) to be provided to the edge viewing terminal 10 by appropriately using various techniques such as Java (registered trademark).
  • the generation unit 134 may generate a screen (content) to be provided to the edge viewing terminal 10 based on CSS, JavaScript (registered trademark), or HTML format.
  • the generation unit 134 may generate screens (contents) in various formats such as JPEG (Joint Photographic Experts Group), GIF (Graphics Interchange Format), and PNG (Portable Network Graphics).
  • the transmission unit 135 transmits information to the edge viewing terminal 10.
  • the transmission unit 135 transmits the information generated by the generation unit 134 to the edge viewing terminal 10 .
  • the transmission unit 135 transmits the data generated by the generation unit 134 to the edge viewing terminal 10 .
  • the transmission unit 135 transmits the highlight of the event content generated by the generation unit 134 to the edge viewing terminal 10 used by the second user.
  • the transmission unit 135 transmits information requesting information to the content distribution server 50 .
  • the transmission unit 135 transmits information indicating information requested to be acquired to the content distribution server 50 .
  • the transmission unit 135 transmits information requesting information to the spectator video collection server 60 .
  • the transmission unit 135 transmits information indicating information requested to be acquired to the spectator video collection server 60 .
  • FIG. 9 is a diagram showing a configuration example of an edge viewing terminal according to an embodiment of the present disclosure.
  • the edge viewing terminal 10 includes a communication unit 11, an audio input unit 12, an audio output unit 13, a camera 14, a display unit 15, an operation unit 16, a storage unit 17, and a control unit. a portion 18;
  • the communication unit 11 is implemented by, for example, a NIC, a communication circuit, or the like.
  • the communication unit 11 is connected to a predetermined communication network (network) by wire or wirelessly, and transmits and receives information to and from an external information processing device.
  • the communication unit 11 is connected to a predetermined communication network by wire or wirelessly, and transmits and receives information to and from the highlight generation server 100 .
  • the voice input unit 12 functions as an input unit that receives operations by voice (utterance) of the user.
  • the voice input unit 12 is, for example, a microphone or the like, and detects voice.
  • the voice input unit 12 detects user's speech.
  • the voice input unit 12 may have any configuration as long as it can detect the user's speech information necessary for processing.
  • the audio output unit 13 is realized by a speaker that outputs audio, and is an output device for outputting various types of information as audio.
  • the audio output unit 13 audio-outputs the content provided from the highlight generation server 100 .
  • the audio output unit 13 outputs audio corresponding to information displayed on the display unit 15 .
  • the edge viewing terminal 10 inputs and outputs audio through the audio input section 12 and the audio output section 13 .
  • the camera 14 has an image sensor (image sensor) that detects images. Camera 14 photographs the user.
  • the edge viewing terminal 10 is a desktop personal computer (desktop PC)
  • the camera 14 may be a device (main device) in which the control unit 18 is mounted, a display device, or the like (separate device).
  • the camera 14 may be integrated with the display (display device) or may be arranged above the display section 15 .
  • the camera 14 may be built in the edge viewing terminal 10 and arranged above the display section 15 .
  • the camera 14 may be an in-camera built into the edge viewing terminal 10 .
  • the display unit 15 is a display screen of a tablet terminal realized by, for example, a liquid crystal display or an organic EL (Electro-Luminescence) display, and is a display device for displaying various information.
  • the display unit 15 may be separate (separate device) from a device (main device) in which the control unit 18 is mounted.
  • the display unit 15 may be integrated with a device (main device) in which the control unit 18 is mounted.
  • the display unit 15 displays various information related to the event.
  • the display unit 15 displays content.
  • the display unit 15 displays various information received from the highlight generation server 100 .
  • the display unit 15 displays highlights of events received from the highlight generation server 100 .
  • the display unit 15 displays content.
  • the display unit 15 displays the video of the event.
  • the display unit 15 displays the highlight of the event.
  • the operation unit 16 functions as an input unit that receives various user operations.
  • the operation unit 16 is a keyboard, mouse, or the like.
  • the operation unit 16 may have a touch panel capable of realizing functions equivalent to those of a keyboard and a mouse.
  • the operation unit 16 receives various operations from the user through the display screen by the function of a touch panel realized by various sensors.
  • the operation unit 16 receives various operations from the user via the display unit 15 .
  • the operation unit 16 may receive an operation such as a user's designation operation via the display unit 15 of the edge viewing terminal 10 .
  • the tablet terminal mainly adopts the capacitance method, but there are other detection methods such as the resistive film method, the surface acoustic wave method, the infrared method, and the electromagnetic induction method. Any method may be adopted as long as the user's operation can be detected and the function of the touch panel can be realized.
  • the edge viewing terminal 10 may have a configuration that accepts (detects) various types of information as input, not limited to the above.
  • the edge viewing terminal 10 may have a line-of-sight sensor that detects the user's line of sight.
  • the line-of-sight sensor detects the user's line-of-sight direction using eye-tracking technology based on detection results from, for example, the camera 14 mounted on the edge viewing terminal 10, an optical sensor, and a motion sensor (all of which are not shown). .
  • the line-of-sight sensor determines a region of the screen that the user is gazing at based on the detected line-of-sight direction.
  • the line-of-sight sensor may transmit line-of-sight information including the determined gaze area to the highlight generation server 100 .
  • the edge viewing terminal 10 may have a motion sensor that detects user gestures and the like.
  • the edge viewing terminal 10 may receive an operation by a user's gesture using a motion sensor.
  • the storage unit 17 is implemented by, for example, a semiconductor memory device such as RAM (Random Access Memory) or flash memory, or a storage device such as a hard disk or optical disk.
  • the storage unit 17 stores, for example, various information received from the highlight generation server 100 .
  • the storage unit 17 stores, for example, information about an application (for example, a content display application or the like) installed in the edge viewing terminal 10, such as a program.
  • the storage unit 17 stores user information.
  • the storage unit 17 stores the user's utterance history (speech recognition result history) and action history.
  • the control unit 18 is a controller, and various programs stored in a storage device such as the storage unit 17 inside the edge viewing terminal 10 are executed by the CPU, MPU, etc., using the RAM as a work area. It is realized by For example, these various programs include programs of applications (for example, content display applications) that perform information processing. Also, the control unit 18 is a controller, and is realized by an integrated circuit such as ASIC or FPGA, for example.
  • control unit 18 includes an acquisition unit 181, a transmission unit 182, a reception unit 183, and a processing unit 184, and implements or executes the information processing functions and actions described below.
  • the internal configuration of the control unit 18 is not limited to the configuration shown in FIG. 9, and may be another configuration as long as it performs the information processing described later.
  • the connection relationship between the processing units of the control unit 18 is not limited to the connection relationship shown in FIG. 9, and may be another connection relationship.
  • the acquisition unit 181 acquires various types of information. For example, the acquisition unit 181 acquires various information from an external information processing device. For example, the acquisition unit 181 stores the acquired various information in the storage unit 17 . The acquisition unit 181 acquires user operation information accepted by the operation unit 16 .
  • the acquisition unit 181 acquires state information indicating the state of the user.
  • the acquisition unit 181 acquires state information including user image information captured by the camera 14 .
  • Acquisition unit 181 acquires utterance information of a user.
  • the acquisition unit 181 acquires user utterance information detected by the voice input unit 12 .
  • the transmission unit 182 transmits information to the highlight generation server 100 via the communication unit 11 .
  • the transmission unit 182 transmits information about the user to the highlight generation server 100 .
  • the transmission unit 182 transmits information about the user's video captured by the camera 14 to the highlight generation server 100 .
  • the transmitting unit 182 transmits state information indicating the state of the user.
  • the transmission unit 182 transmits state information including user image information captured by the camera 14 .
  • the transmission unit 182 transmits information input by user's speech or operation.
  • the receiving section 183 receives information from the highlight generation server 100 via the communication section 11 .
  • the receiving unit 183 receives information provided by the highlight generation server 100 .
  • the receiving unit 183 receives content from the highlight generation server 100 .
  • the receiving unit 183 receives highlights from the highlight generation server 100 .
  • the processing unit 184 executes various types of processing.
  • the processing unit 184 executes processing according to the user's operation accepted by the voice input unit 12 or the operation unit 16 .
  • the processing unit 184 displays various information via the display unit 15.
  • the processing unit 184 functions as a display control unit that controls display on the display unit 15 .
  • the processing unit 184 outputs various kinds of information as voice through the voice output unit 13 .
  • the processing unit 184 functions as an audio output control unit that controls audio output of the audio output unit 13 .
  • the processing unit 184 outputs the information received by the acquisition unit 181.
  • the processing unit 184 outputs the content provided by the highlight generation server 100 .
  • Processing unit 184 outputs the content received by acquisition unit 181 via audio output unit 13 or display unit 15 .
  • the processing unit 184 displays content via the display unit 15 .
  • the processing unit 184 outputs the contents as audio through the audio output unit 13 .
  • the processing unit 184 transmits various information to an external information processing device via the communication unit 11 .
  • the processing unit 184 transmits various information to the highlight generation server 100 .
  • the processing unit 184 transmits various information stored in the storage unit 17 to an external information processing device.
  • the processing unit 184 transmits various information acquired by the acquisition unit 181 to the highlight generation server 100 .
  • the processing unit 184 transmits the sensor information acquired by the acquisition unit 181 to the highlight generation server 100 .
  • the processing unit 184 transmits the user operation information received by the operation unit 16 to the highlight generation server 100 .
  • the processing unit 184 transmits information such as an utterance and an image of the user using the edge viewing terminal 10 to the highlight generation server 100 .
  • each process performed by the control unit 18 described above may be implemented by, for example, JavaScript (registered trademark).
  • each unit of the control unit 18 may be realized by the predetermined application, for example.
  • processing such as information processing by the control unit 18 may be realized by control information received from an external information processing device.
  • the control unit 18 may have an application control unit that controls the predetermined application or a dedicated application, for example.
  • FIG. 10 is a flow chart showing the processing procedure of the information processing device according to the embodiment of the present disclosure. Specifically, FIG. 10 is a flowchart showing the procedure of information processing by the highlight generation server 100, which is an example of an information processing apparatus.
  • the highlight generation server 100 generates state information indicating the state of the real-time viewing of the event by the first user who has viewed the event in real-time, and event content, which is video of the event. (step S101).
  • the highlight generation server 100 generates highlights of event content to be provided to the second user, using part of the event content determined by the state information of the first user (step S102).
  • FIG. 11 is a diagram illustrating a functional configuration example regarding learning of the information processing system.
  • the dashed line BS indicates a functional interface in the system
  • the left side of the dashed line BS corresponds to the equipment at the site venue (corresponding to the event site in FIG. 3) or the edge viewing terminal 10 side
  • the right side of the dashed line BS is high level. It corresponds to the light generation server 100 side.
  • a dashed line BS indicates an example of allocation of functions in the information processing system 1 .
  • each component shown on the left side of the dashed line BS is implemented by the equipment at the site or the edge viewing terminal 10 .
  • each component shown on the right side of the dashed line BS is implemented by the highlight generation server 100 .
  • the boundary (interface) of the device configuration in the information processing system 1 is not limited to the dashed line BS, and the functions assigned to the equipment at the venue, the edge viewing terminal 10, the highlight generation server 100, etc. may be combined in any combination. may For example, if the highlight generation server 100 is integrated with any of the edge viewing terminal 10, the content distribution server 50, or the spectator video collection server 60, the interface indicated by the dashed line BS may be omitted.
  • FIG. 11 shows a learning flow for machine learning a behavior analysis algorithm based on data analysis in advance.
  • the information processing system 1 local spectators (real-time spectators) watching an event in real time at a sports game venue or a live venue are photographed by an imaging device such as a camera and stored as spectator video data.
  • the imaging device in FIG. 11 corresponds to the audience imaging equipment or the camera 14 of the edge viewing terminal 10 or the like.
  • the information processing system 1 may set viewers watching in real time in a remote environment as learning target real time spectators.
  • the information processing system 1 accumulates feature amounts obtained by performing image recognition processing on spectator photographed data as spectator feature amount data.
  • Image recognition processing in FIG. 11 is executed by the image processing unit 133 of the highlight generation server 100 .
  • the spectator feature amount data is time-series data of the entire match duration of the feature amount obtained by image recognition processing.
  • the spectator feature amount data is accumulated as time-series data of individual feature amounts for all the spectators who are photographed.
  • feature values include body part points obtained by skeletal recognition, information indicating facial orientation, etc., facial part points obtained by face recognition, information indicating facial attributes such as smiles and emotions, and information obtained by motion detection. (motion information), or information obtained by line-of-sight detection (line-of-sight information).
  • the highlight generation server 100 applies audience feature data and teacher data to each learning device to generate a behavior analysis algorithm model. Processing (learning processing) corresponding to each learning device in FIG. 11 is executed by the learning unit 132 of the highlight generation server 100 .
  • the highlight generation server 100 generates a model M1, which is a personal attribute determiner.
  • the highlight generation server 100 generates a model M2, which is an action index determiner.
  • the highlight generation server 100 generates model M3, which is a highlight scene predictor. Details of each of the models M1 to M3 will be described later.
  • the teacher data is a label (correct answer information) that indicates the correct answer that the model is expected to output when the corresponding audience feature data is input to the model.
  • Teacher data may be generated by a content analysis process that analyzes event content. For example, training data is automatically generated from the results of image recognition processing of content video data captured by an imaging device such as a camera, and the acoustic processing results of content audio data recorded with a sound pickup device such as a microphone.
  • the manager of the information processing system 1 may use it as an analysis tool for the development of a match when training data is manually created. In this way, it is possible to create teacher data only manually, so teacher data does not have to be generated from the video and audio of the content, as indicated by the dotted line in FIG. 11 .
  • FIG. 12 is a diagram showing an example of learning and inference regarding the personal attribute determiner of the information processing system.
  • the part above the dotted line in FIG. 12 shows the process related to model generation in the learning phase, and the part below the dotted line in FIG. 12 shows the process in the inference phase using the model generated by learning. It should be noted that descriptions of the same points as those described above will be omitted as appropriate.
  • each individual (user) is a fan of three types: home fan (first fan type), away fan (second fan type), and beginner (third fan type). The case of determining whether it is any of the attributes is shown.
  • data in which a fan attribute label is assigned (manually) to each individual appearing in the spectator video data is prepared as training data TD1.
  • the highlight generation server 100 generates the model M1 by performing supervised learning on the audience feature amount data of each individual using the teacher data TD1. Any algorithm such as DNN, XGBoost (eXtreme Gradient Boosting), or the like can be adopted as the learning algorithm of the highlight generation server 100 .
  • the teacher data TD1 shown in FIG. 12 is data in which each individual appearing in the audience video data is labeled with a fan attribute.
  • the teacher data TD1 gives each individual (user) a label indicating one of three types: home fan (first fan type), away fan (second fan type), and beginner (third fan type). indicate the case.
  • FIG. 12 hatching indicating one of the three types attached to each individual's face is shown as an example of a label, and the label indicates one of the three types of information (user ID) that identifies each individual.
  • a mode in which a label (correct answer information) is attached may also be used.
  • the highlight generation server 100 uses personal feature amount data as an input by learning processing using the above-described information, and identifies which of the three types the attribute of the user corresponding to the input data belongs to.
  • a model M1 which is a determiner, is generated.
  • the highlight generation server 100 performs inference processing using the model M1 generated in the learning phase. For example, the highlight generation server 100 inputs feature amount data as input data to the model M1, thereby causing the model M1 to output information (output data OD11) indicating the personal attribute of the user of the input data.
  • the highlight generation server 100 inputs the feature amount data of a user whose attributes are unknown (to be inferred) (also referred to as a “target user”) to the model M1, thereby determining the fan attributes of the target user. The information shown is output to the model M1.
  • the personal attribute determiner shown in FIG. 12 is merely an example, and the information processing system 1 may use various personal attribute determiners.
  • the information processing system 1 may use the attribute recognition function of an image recognizer as a personal attribute determiner.
  • the information processing system 1 may determine age and gender by attribute recognition of a face recognizer.
  • the information processing system 1 may use a personal attribute determiner that determines enjoyable attributes.
  • the information processing system 1 may use the average value of smile levels over the entire period of a sports game or live event as an attribute that allows enjoyment.
  • the information processing system 1 may store personal attribute determination values of the viewer's past viewing (such as another game of the same sport), and use the stored personal attribute determination values at the time of determination.
  • the information processing system 1 may use the saved value from past viewing to determine attributes that are enjoyable or attributes that cannot be instantly determined by image recognition based on behavior during viewing, which will be described later. Further, the information processing system 1 may determine attributes related to situations in which scenes desired to be viewed in highlights are different.
  • the information processing system 1 may use a personal attribute determiner that determines attributes that can be entered.
  • the information processing system 1 may make the determination from the average value of the amount of movement over the entire period of the sports game or live event.
  • the information processing system 1 may use a personal attribute determiner that determines the ticket price attribute.
  • the information processing system 1 may determine expenditure costs such as the ticket price from the photographed seat position of the individual. Note that the information processing system 1 may determine the expenditure cost by referring to the purchase history when the target user is not a local spectator.
  • the information processing system 1 may use a personal attribute determiner that determines vocalization attributes.
  • the information processing system 1 may make the determination based on the average value of the opening degree of the mouth over the entire period of the game/event.
  • the information processing system 1 may use a personal attribute determiner that determines the attribute.
  • the information processing system 1 may generate a personal attribute determiner that determines a while attribute by assigning a label indicating whether the user is watching while doing something else, such as watching a game while looking at a smartphone, and learning.
  • the information processing system 1 may use a personal attribute determiner that determines inference attributes.
  • the information processing system 1 may generate a personal attribute determiner that determines the endorsement attribute by giving a label indicating whether or not a specific performer is endorsed at a live event or the like and learning the label.
  • the information processing system 1 may use a personal attribute determiner that determines empathy attributes.
  • the information processing system for example, lectures and speeches, because the contents to be heard are different, in the lectures and speeches, by assigning a label indicating whether or not the speaker sympathizes (believes) with the principles and assertions of the speaker, and learns the sympathy attribute.
  • a personal attribute determiner may be generated to determine.
  • the information processing system 1 may use a personal attribute determiner that determines a sense of togetherness attribute.
  • the information processing system 1 provides a personal attribute determination device that determines a sense of togetherness attribute by giving a label indicating whether the audience is cheering together with other spectators (routine, etc.) in a sport or a live performance, and learning the label. may be generated.
  • the information processing system 1 may use a personal attribute determiner that determines party relationship attributes.
  • the information processing system 1 may determine the relationship (relatives, friends, co-workers, etc.) with the main characters such as the bride and groom at a party such as a wedding ceremony from the seat position of the photographed individual.
  • the information processing system 1 may input non-attendance viewers as attributes.
  • the information processing system 1 may use a personal attribute determiner that determines concentration attributes.
  • the information processing system 1 creates a personal attribute determiner that determines a sense of togetherness attribute by giving a label indicating whether or not the audience can concentrate on watching the entire event, such as a classical music concert, where there is little reaction from the audience, and by learning the label. You may When the personal attribute determiner generated in this way is used, the audience (viewers) who cannot concentrate can consequently reduce the number of scenes to be extracted and shorten the duration of the highlight moving image.
  • the information processing system 1 may use a personal attribute determiner that determines the emotional expression behavior attribute.
  • the information processing system 1 determines personal attribute determination of emotional expression behavior attributes by assigning and learning labels of types of emotional expression behavior (crying, laughing, being scared, excitement, etc.) when watching a movie or a play. You can create a vessel.
  • the personal attribute determinator generated in this way it is possible to deal with cases where the scene desired to be seen is different in highlights such as excitement, comedy, horror, action, etc., due to the difference in the amount of emotional expression behavior.
  • FIG. 13 is a diagram illustrating an example of learning and inference regarding the action index determiner of the information processing system. It should be noted that descriptions of the same points as those described above will be omitted as appropriate.
  • the highlight generation server 100 defines objective variables from the development of the game for various indicators related to user behavior, and performs regression analysis using spectator feature amount data as explanatory variables, thereby generating a model M2, which is an action index determiner.
  • the model M2 may be a regression equation for inputting user feature amount data corresponding to each point in time during an event and outputting (calculating) a value indicating the degree of excitement at each point in time.
  • action index determination a case of determining the excitement degree of each individual (user) will be described.
  • the highlight generation server 100 defines an objective variable whose value changes from 0 to 1 when home fans score a home fan attribute (user's) feature amount as an explanatory variable, and executes learning processing to generate a model Generate M2.
  • the teacher data TD2 shown in FIG. 13 corresponds to the case where the home team scores at time t1, and shows an example of the teacher data for changing the value of the objective variable from 0 to 1 at time t1.
  • the highlight generation server 100 generates a model M2 that receives individual feature amount data as input and outputs information indicating the excitement during the event period of the user corresponding to the input data through learning processing using the above-described information. .
  • the highlight generation server 100 generates a model M2 that outputs information indicating changes in the user's degree of excitement (score) during the period of the event.
  • the highlight generation server 100 generates the regression equation of the index obtained as a result of the regression analysis as an action index determiner (for example, model M2).
  • an action index determiner for example, model M2
  • an action index determination device is obtained that outputs time-series data of scores (results of regression equations) representing the degree of each index.
  • the action index determiner receives individual feature amount data as input and outputs a score for each individual index. Also, the action index determination device may input the feature amount data for each attribute or for the entire set and output the score of each index as an average value.
  • the highlight generation server 100 performs inference processing using the model M2 generated in the learning phase. For example, the highlight generation server 100 inputs the feature amount data as input data to the model M2, thereby causing the model M2 to output information indicating the user's enthusiasm for the input data.
  • the highlight generation server 100 inputs feature amount data of a user (target user) whose index (degree of excitement, etc.) is unknown (to be inferred) into the model M2, thereby obtaining an index for the target user. is output to the model M2.
  • the output data OD21 in FIG. 13 shows an output example of the model M2 corresponding to the users of the home team, and shows the case where the degree of excitement rises sharply at time t1 when the home team scores.
  • Output data OD22 in FIG. 13 shows an output example of the model M2 corresponding to the users of the away team.
  • Output data OD21 in FIG. 13 shows an output example of model M2 corresponding to a beginner user (for example, a user who is neither a home fan nor an away fan). Indicates the case of rising.
  • the highlight generation server 100 may combine with attribute determination to generate score time-series data obtained by determining the degree of excitement for each attribute.
  • the highlight generation server 100 may use an average score value of sets for each attribute.
  • the highlight generation server 100 may generate a model M2 in which the score of home fans increases when a home score is scored, the score of away fans does not increase, and the score of beginners is intermediate.
  • the action index determiner shown in FIG. 13 is merely an example, and the information processing system 1 may use action index determiners for various indicators.
  • the information processing system 1 may use an action index determiner that determines the degree of concentration of each individual (user).
  • the information processing system 1 uses the feature amount of the user as an explanatory variable, defines an objective variable that changes from 1 to 0 when the game is interrupted, and executes the learning process, thereby determining the degree of concentration. may be generated.
  • the information processing system 1 may use an action index determiner that determines the degree of disappointment of each individual (user).
  • the information processing system 1 uses the feature amount of the user as an explanatory variable, defines an objective variable that changes from 0 to 1 when the shot is missed, and executes the learning process, thereby determining the degree of disappointment.
  • a determiner may be generated.
  • the information processing system 1 may also use an action index determiner that determines the degree of tension of each individual (user).
  • the information processing system 1 uses the feature amount of the user as an explanatory variable, defines an objective variable that becomes 1 when the score difference is balanced (for example, 1 point in the case of soccer), and executes the learning process. By doing so, an action index determiner for determining the degree of tension may be generated.
  • the information processing system 1 may also use an action index determiner that determines the degree of anger of each individual (user).
  • the information processing system 1 defines an objective variable that changes from 0 to 1 when a mistake is made, using the feature amount of the user as an explanatory variable, and executes learning processing to determine the degree of anger. may be generated.
  • the information processing system 1 may also use an action index determiner that determines the degree of boredom of each individual (user).
  • the information processing system 1 uses the feature amount of the user as an explanatory variable, defines an objective variable that becomes 1 in the time period during which the game is delayed, and executes the learning process, thereby determining the degree of boredom.
  • An index determiner may be generated.
  • FIG. 14 is a diagram illustrating an example of learning and reasoning for a highlight scene predictor of an information processing system. It should be noted that descriptions of the same points as those described above will be omitted as appropriate.
  • FIG. 14 shows a case where a highlight scene predictor is used to predict which period of the event period should be used as a highlight.
  • data indicating periods used as highlight scenes and periods not used as highlight scenes are prepared as teacher data TD3.
  • the teacher data TD3 is data in which the period used as the highlight scene is 1 and the non-highlight period is 0 in the period of the event.
  • the training data TD3 may be manually extracted or automatically generated by content analysis.
  • the teacher data TD3 is information indicating the period during which the event for which the feature amount as the input data was collected was used as the highlight scene in the actually generated highlight.
  • the highlight generation server 100 sets the manually extracted highlight scene as the training data TD3, the audience feature amount data during the highlight scene period as True, and the audience feature amount data other than the highlight scene period as False. Generate data.
  • the highlight generation server 100 generates the model M3 by performing supervised learning on the audience feature amount data of each individual using the teacher data TD3.
  • learning algorithm of the highlight generation server 100 arbitrary algorithms such as DNN and XGBoost can be adopted.
  • time-series data of individual feature amounts for all spectators photographed at an event corresponding to the teacher data TD3 may be used as input data for learning.
  • the highlight scene automatically extracted by analyzing the video and audio of the content shown by the dotted line in the lower left of FIG. 11 may be used as the training data TD3.
  • the highlight generation server 100 may generate a highlight scene predictor for each personal attribute by classifying the learning target input audience feature amount data by personal attribute and learning.
  • the highlight generation server 100 uses the feature amount data, which is, for example, time-series data, as an input by learning processing using the above-described information, and determines the degree of highlight-likeness such as likelihood, reliability, moving average value of judgment values, etc.
  • the model M3 receives individual feature amount data as an input and outputs an individual highlight-likeness score.
  • the model M3 may receive feature amount data for each attribute or for the entire set as input, and output a score of highlight-likeness for each attribute or for the entire set.
  • the highlight generation server 100 performs inference processing using the model M3 generated in the learning phase. For example, the highlight generation server 100 inputs feature amount data as input data to the model M3, thereby generating time-series data (output data OD31) of the score of highlight-likeness corresponding to the user of the input data to the model M3. output.
  • highlight scene predictor shown in FIG. 14 is merely an example, and the information processing system 1 may use various highlight scene predictors.
  • FIG. 15 is a diagram illustrating a functional configuration example related to highlight generation of the information processing system. It should be noted that descriptions of the same points as those described above will be omitted as appropriate.
  • the dashed line BS in FIG. 15 indicates a functional interface in the system like the dashed line BS in FIG.
  • a shooting device such as a camera (for example, the camera 15 or the like) is always used for shooting. and stored as viewer video data.
  • image recognition results are accumulated as viewer feature amount data.
  • the information is stored and accumulated as information associated with the content viewed in real time.
  • real-time spectator feature amount data feature amounts obtained by photographing and image recognition of a group of local spectators (or remote viewers) watching content in real time are accumulated as real-time spectator feature amount data.
  • real-time spectator feature amount data is stored and accumulated as information associated with content.
  • the configuration for accumulating the real-time spectator feature amount data is the same as the configuration for accumulating the spectator feature amount data regarding the real-time spectator users in FIG.
  • the highlight generation server 100 determines personal attributes using a model M1, which is a personal attribute determiner. Further, in the information processing system 1, the personal attribute result determined by the personal attribute determiner and the real-time viewing determination result determined by the real-time viewing determiner (to be described later) are generated by the highlight generation control unit (for example, the highlight generation server 100). 134).
  • the real-time viewing determination device may be replaced with a real-time viewing determination by a behavior index determination device.
  • the highlight generation server 100 performs real-time viewing determination using, for example, the model M2, which is a behavior index determination device. For example, the highlight generation server 100 determines that the period during which the score output by the model M2 is equal to or greater than a predetermined threshold is the period during which the user is viewing in real time.
  • the highlight generation control unit generates feature amount data to be input to the highlight scene predictor based on the personal attribute result of the highlight viewing user input at the start of highlight viewing and the real-time viewing determination result. instruct.
  • the highlight generation server 100 determines feature amount data to be input to the model M3, which is a highlight scene predictor.
  • a highlight scene predictor predicts a highlight scene from the input feature amount data instructed by the highlight generation controller.
  • the highlight generation server 100 inputs input feature amount data to the model M3 and predicts highlight scenes based on the output of the model M3.
  • the highlight generation server 100 determines that a scene corresponding to a period in which the score output by the model M3 is equal to or greater than a predetermined threshold is to be a highlight scene.
  • the highlight scene predictor may be replaced with scene prediction by the action index determiner.
  • the highlight generation server 100 performs scene prediction using, for example, the model M2, which is the action index determiner.
  • the highlight generation server 100 may determine that a scene corresponding to a period in which the score output by the model M2 is equal to or greater than a predetermined threshold is set as a highlight scene.
  • the position angle estimation unit for example, corresponding to the generation unit 134 of the highlight generation server 100
  • the highlight video generation unit for example, corresponding to the generation unit 134 of the highlight generation server 100
  • the highlight video generation unit generates content video (audio) data
  • Highlight video data is generated from viewer video data.
  • the highlight video data is presented to the highlight viewing user through an image output device such as a monitor (for example, the display unit 14) and an audio output device such as a speaker (for example, the audio output unit 12) of the edge viewing terminal 10.
  • the real-time viewing determiner determines whether the highlight viewer has viewed the target content in real time.
  • the highlight generation server 100 uses a real-time viewing determiner (model) stored in the storage unit 120 to determine real-time viewing.
  • the real-time viewing determination is merely an example, and if real-time viewing can be determined, real-time viewing may be determined by any process such as the process using the behavior index determiner described above.
  • the highlight generation server 100 determines that the user was viewing the content in real-time during the period when the data existed. In addition, if the storage unit 120 does not contain the data for the real-time viewing of the highlight viewing target content, the highlight generation server 100 determines that the user did not view the content in real-time during that period.
  • the behavior index determiner may be used as a real-time viewing determiner.
  • the highlight generating server 100 determines that there is real-time viewing during the period when the value of the index such as the degree of concentration during real-time viewing of the highlight viewing target content is equal to or greater than a threshold value. Also, the highlight generation server 100 determines that there is no real-time viewing if the value of the index such as the degree of concentration during real-time viewing of the highlight viewing target content is less than a threshold value.
  • the information processing system 1 may determine that real-time viewing is present even if there is no real-time viewing data for the highlight viewing target content in the accumulated viewer feature amount data.
  • the highlight generation server 100 for example, if there is a person identified as the same as the highlight viewer (by face recognition) in the real-time watching feature amount data, the presence period of the person is used as the local spectator for real-time viewing. You can judge yes.
  • the information processing system 1 can cope with a usage scene in which viewing is performed on the edge viewing terminal after watching the game on site.
  • the video data and image recognition feature amount data of the person photographed locally are used as viewer video data and viewer feature amount data. may be used.
  • FIG. 16 is a flowchart showing a processing procedure regarding highlight generation.
  • the information processing system 1 performs processing will be described as an example, but the processing shown in FIG. 10, any device such as the content distribution server 50 and the spectator video collection server 60 may perform the processing.
  • the information processing system 1 branches the process according to the viewing period in real time (step S201).
  • the information processing system 1 when the user is watching the game in real time partially, such as watching the game in real time from the middle of the game or watching the game in real time until the middle of the game (the center in FIG. It separates into a real-time viewing period and other periods (step S202). For example, when the game is partially watched in real time, the information processing system 1 uses user data to specify the period during which the event was viewed in real time. To separate. Then, the information processing system 1 performs the processing shown in step S203 during the period of real-time watching, and performs the processing of steps S204 and S205 during the non-real-time watching period.
  • the information processing system 1 uses the input feature amount data of the highlight scene predictor for that user as the viewer's own feature amount during real-time viewing (step S203). For example, when a user is watching a game full-time including local spectators at the venue, the information processing system 1 uses the input feature amount data of the model M3 for the user as the feature amount during real-time viewing of the viewer himself/herself. .
  • the information processing system 1 refers to the viewer's individual attribute determination result for that user (step S204), and uses the input feature amount data of the highlight scene predictor in real time.
  • a set of real-time spectators similar to the viewer in the spectator feature quantity data is used as the feature quantity (step S204). For example, when the user does not watch the user in real time at all, including when the user watches only the highlights later, the information processing system 1 assigns the input feature amount data of the model M3 to the user in the real-time audience feature amount data. A set of similar real-time spectators is used as a feature quantity.
  • the information processing system 1 designates the input feature amount data to the highlight scene predictor and instructs scene extraction (step S206). For example, the information processing system 1 inputs the feature amount data determined in steps S204 and S205 to the model M3, and generates highlights based on the output of the model M3.
  • the information processing system 1 generates highlights. For example, when the viewer's own feature amount at the time of real-time viewing is used as the input feature amount data for the highlight scene predictor (in the case of personalization), the information processing system 1 sets the highlight scene prediction score/duration time to high. Scenes may be selected by threshold or ranking to generate a highlight video. For example, the information processing system 1 may wipe-display an image of the viewer's own scene when watching the game in real time in the generated highlight video.
  • FIG. 17 is a diagram illustrating an example of superimposed display of wipes on highlights.
  • FIG. 17 shows an example of a wipe display of the video of the viewer's own corresponding scene in the highlight video when watching the game in real time.
  • FIG. 17 shows a case where a wipe WP1, in which the viewer's own video of the corresponding scene in real time watching the game is superimposed on the content CT21, which is a highlight video of a basketball game, is displayed.
  • the highlight generation server 100 provides the edge viewing terminal 10 with the content CT21 on which the wipe WP1 is superimposed.
  • the edge viewing terminal 10 provided by the highlight generation server 100 displays the content CT21 on which the wipe WP1 is superimposed.
  • the information processing system 1 when the feature amount of a set of real-time spectators having attributes closest to the viewer is used as the input feature amount data for the highlight scene predictor (in the case of attribute optimization), the information processing system 1, when highlight viewing starts
  • the personal attribute of the viewer is determined by analyzing the camera image of the camera.
  • attributes that can be determined may differ depending on the camera imaging time.
  • the information processing system 1 predicts from the time series of the highlight scores of the set of real-time spectators having that personal attribute in the real-time spectator feature amount data.
  • the information processing system 1 may select scenes with high scores/durations based on a threshold or ranking to generate a highlight video.
  • the information processing system 1 may use various information as appropriate to generate a highlight video. An example of this point will be described below.
  • the information processing system 1 may take AND (logical product) of personal attributes and predict highlight scenes from the time series of highlight scores of a set of real-time spectators who have all the determined personal attributes. For example, if the attributes that the information processing system 1 has been able to distinguish from the viewer are two attributes of the age “30s” and the gender “male”, the information processing system 1 selects the age “30s” and the gender “male” from the real-time audience.
  • a highlight scene may be predicted from the time series of the highlight scores of the set.
  • the information processing system 1 may use the following formula (1).
  • a i in Equation (1) indicates the plausibility score of the viewer's own personal attribute result. Also, H i in Equation (1) indicates the highlight score of the set of real-time spectators having the corresponding attribute. S j below Equation (1) indicates a combined score, and the information processing system 1 may predict a highlight scene from the time series of combined scores S j .
  • the information processing system 1 predicts a highlight scene from a set of real-time spectators who have personal attributes common to the multiple viewers.
  • the information processing system 1 calculates the time series of the combined score S j by using Equation (1)
  • a i as the score of the likelihood of the personal attribute results of multiple viewers,
  • a highlight scene may be predicted.
  • the scene prediction range extends to individual attributes different from those of the viewer itself (extends to personal attributes of people watching together). expected to be effective.
  • the information processing system 1 may receive as input the feature amount of the viewer himself or a real-time audience group having attributes closest to the viewer, and perform highlight scene prediction using an action index determiner. .
  • the information processing system 1 may use only the degree of excitement and determine a period in which the degree of excitement is equal to or greater than a threshold as a highlight scene. For example, in the example of FIG. 13, the information processing system 1 determines the scene at the time of home scoring as a highlight scene for (a user of) the home fan attribute.
  • the information processing system 1 uses a plurality of indicators, a positive indicator (first indicator) such as excitement level and concentration level, and a negative indicator (second indicator) such as disappointment level and boredom level. , may determine the highlight scene.
  • a positive indicator such as excitement level and concentration level
  • a negative indicator such as disappointment level and boredom level.
  • the information processing system 1 may determine the highlight scene using Equation (2).
  • Equation (2) indicates a weighting factor for each behavior index.
  • the weighting factor k i is defined as a positive value for a positive index and a negative value for a negative index.
  • B i in Equation (2) indicates the determination score of the action index determiner for the input feature amount of each action index.
  • S t in Equation (2) indicates the combined score, and the information processing system 1 may predict the highlight scene from the time series of the combined score S t .
  • the information processing system 1 calculates a joint score S t that combines the above-described Positive index (first index) and Negative index (second index) with sign weighting, and the joint score S t is a threshold value.
  • the above period may be determined as a highlight scene.
  • the information processing system 1 may appropriately use various information to determine a highlight scene.
  • the information processing system 1 selects the period with the highest score for each index as a scene, and extracts a highlight scene that includes a variety of excitement, concentration, tension, disappointment, and anger, and that is slow and undulating. good too.
  • FIG. 18 is a diagram illustrating an example of highlight angle estimation.
  • FIG. 18 shows a case where the information processing system 1 divides the audience seats into eight areas having the same angle with eight dotted lines AG1 to AG8 radially extending from the focus position CN1 centering on the focus position CN1.
  • the information processing system 1 calculates the average value of the degree of excitement in each area, and estimates the direction of the area with the highest degree of excitement as the optimum angle.
  • the information processing system 1 may estimate the angle and the like of the highlight scene using the model M11, which is a position and angle estimator used for estimating the appropriate position, angle and the like.
  • the information processing system 1 may estimate the focus position. For example, the information processing system 1 may estimate the focus position of the camera in the highlight scene from the statistical values of the face orientation and line-of-sight direction of the real-time audience feature amount data. Further, for example, the information processing system 1 may divide the audience (real-time audience feature amount data) in the venue into areas based on the angle with respect to the focus position, and estimate the optimum angle from the degree of excitement for each area. In this case, the information processing system 1 may predict the angle from the area with the highest swelling as the optimum angle.
  • the information processing system 1 may remove biases such as fan attributes in angle estimation.
  • the information processing system 1 may estimate the angle or the like from the excitement level of the spectators of the beginner attribute, for example. Further, the information processing system 1 may estimate the angle and the like from the degree of excitement of the real-time spectator group having the attribute closest to the viewer, similar to extracting the highlight scene.
  • the information processing system 1 may use various information as appropriate to estimate angles and the like.
  • the information processing system 1 may determine camerawork for generating a highlight video by using it to determine which camera to select the video from.
  • the information processing system 1 may propose the optimum position and angle for viewing free-viewpoint video such as sports.
  • the information processing system 1 can propose the optimum angle and the like in the free-viewpoint video for which it is difficult to manually operate the position and angle.
  • FIG. 19 is a diagram showing an example of presentation of information about spectators.
  • FIG. 19 shows an example of a UI (user interface) presented as a heat map of the attributes of spectator video of free viewpoint video or excitement.
  • the highlight generation server 100 of the information processing system 1 generates a content CT32 in which a heat map is superimposed on the audience seats according to the degree of excitement of the audience, targeting the content CT31 (step S31).
  • the highlight generation server 100 generates a content CT32 superimposed with a heat map indicating that the audience seats on the right side of the audience seats are the most exciting, and the degree of excitement decreases toward the left.
  • the highlight generation server 100 provides the edge viewing terminal 10 with the generated content CT32.
  • the highlight generation server 100 may place arbitrary information (a flame icon in FIG. 19) in the highest portion of the heat map.
  • the edge viewing terminal 10 of the information processing system 1 displays the attribute of the spectator video of the free viewpoint video or the heat map of the excitement.
  • the edge viewing terminal 10 displays a heat map of the spectator seats according to the degree of excitement of the spectators.
  • the processing described above is merely an example, and the information processing system 1 may present various types of information.
  • the information processing system 1 may present a heat map that indicates fan attributes with colors and the degree of excitement with transparency.
  • the information processing system 1 may perform presentation display of the content, for example, by arranging an icon or the like at a location exceeding the threshold value.
  • the information processing system 1 may sell personalized highlight videos to visitors (full-time real-time viewing) at the site of the venue. For example, the information processing system 1 may use attribute optimization highlighting during a period in which the user is absent due to late arrival or the like.
  • the information processing system 1 may provide an attribute-optimized highlight video when viewed on a camera-equipped TV, PC, or the like (without real-time viewing).
  • the information processing system 1 may also perform personal attribute determination during highlight viewing and update (correction, addition, or the like) the stored personal attribute determination value.
  • the information processing system 1 may generate attribute optimization highlights for the periods not viewed and personalized highlights for the periods viewed, and combine them.
  • the information processing system 1 may present the unwatched first half with attribute optimization highlights when playing back after watching from the middle.
  • the information processing system 1 may provide automatic highlight reproduction based on an estimated angle as an additional function of free-viewpoint content.
  • the information processing system 1 may propose camera work based on extracted highlight scenes and angle estimation.
  • the information processing system 1 described above can be used as a generalized scene extraction engine that does not depend on the type of sports or live performance, so it can be developed into a wide variety of contents.
  • the above-described information processing system 1 does not use the data on the content side of the sports game itself or the live event, but the analysis algorithm that inputs only the shooting data of the spectator side. Algorithms learned by behavior can be applied to other types of content.
  • the algorithm has high versatility and can be applied to other types of content at low cost.
  • it is possible to perform determination and prediction not only for the entire group but also for each individual and attribute by individual identification in images for action index determination and highlight scene types using cheers and the like.
  • it is possible to extract highlight scenes that match individual attributes and tastes and generate moving images.
  • the viewer of the highlight can actually see the scene that was exciting during the real-time viewing together with the video of the viewer at that time, and can watch it as a retrospective video of the memories of participating in the event. can be done.
  • the information processing system 1 described above generates a highlight video that matches the attributes of the viewer, so that the viewer can be made to view the highlight video that matches the viewer's own taste.
  • a highlight video with a storyline can be viewed by highlight scene prediction (selecting the scene with the highest score for each index) using action index determination.
  • the above-described information processing system 1 also performs scene extraction based on the Negative index, so that the effect of serendipity can be expected.
  • scene extraction optimized for personal attributes can be applied not only at sports venues, but also in remote viewing environments (camera-equipped TVs, PCs with web cameras, etc.).
  • the above-described information processing system 1 can be applied to other types of content by estimating angles and positions in the same way as highlight scene extraction, making it highly versatile and capable of responding to individual differences and attribute differences in viewing angles. Become. With the information processing system 1 described above, it is possible to improve the added value of the content by proposing a position/angle to the free-viewpoint video content. With the information processing system 1 described above, performance can be continuously improved through cycles of data collection, learning, and analysis algorithm improvement.
  • the edge viewing terminal 10, the content distribution server 50, or the spectator video collection server 60 may be an information processing device that generates highlights like the highlight generation server 100.
  • the information processing system 1 collects information from each device such as the edge viewing terminal 10, the content distribution server 50, or the spectator video collection server 60 instead of the highlight generation server 100, and provides the information to each device.
  • An information providing server may be included.
  • the edge viewing terminal 10 has the functions of the highlight generation server 100 described above, the edge viewing terminal 10 has information stored in the storage unit 120, and has a learning unit 132, an image processing unit 133, and a generating unit. 134 functions.
  • the edge viewing terminal 10 may acquire various types of information from the information providing server, the content distribution server 50, and the spectator video collection server 60, and generate highlights using the acquired information.
  • the information processing system 1 may have any function division mode and any device configuration as long as it can provide the above-described highlight service. good too.
  • each component of each device illustrated is functionally conceptual and does not necessarily need to be physically configured as illustrated.
  • the specific form of distribution and integration of each device is not limited to the one shown in the figure, and all or part of them can be functionally or physically distributed and integrated in arbitrary units according to various loads and usage conditions. Can be integrated and configured.
  • the information processing apparatus includes an acquisition unit (the acquisition unit 131 in the embodiment) and a generation unit (the generation unit 134 in the embodiment). Prepare.
  • the acquisition unit acquires state information indicating a state of a first user, who has viewed the event in real time, when viewing the event in real time, and event content, which is video of the event.
  • the generator generates highlights of the event content to be provided to the second user, using a portion of the event content determined by the state information of the first user acquired by the acquirer.
  • the information processing apparatus generates highlights of the event content using part of the event content based on the state of the user who has viewed the event in real time, so that You can generate special highlights.
  • the acquisition unit acquires event content, which is video of the event.
  • the information processing apparatus generates highlights of the event content, which is video of the event, based on the state of the user who watched the event in real time, thereby generating highlights corresponding to the user. be able to.
  • the acquisition unit acquires the status information of the first user who viewed the event in real time at the venue of the event.
  • the information processing apparatus can generate highlights according to the user by generating highlights of event content based on the user's state of real-time viewing at the venue of the event.
  • the acquisition unit acquires the state information of the first user who watched the sports or art event in real time.
  • the information processing apparatus can generate highlights according to the user by generating event content highlights based on the state of the user who has watched the sports or arts event in real time. .
  • the acquisition unit acquires the state information of the first user who watched the event locally.
  • the information processing apparatus can generate highlights according to the user by generating event content highlights based on the state of the user who watched the sports or arts event at the site.
  • the generating unit generates event content highlights using a model that outputs a score corresponding to the period of the event in response to the input of input data based on the state information.
  • the information processing apparatus uses a model that outputs a score corresponding to the period of the event in response to the input of input data based on the user's state, and generates highlights of the event content so that the user can You can generate corresponding highlights.
  • the generation unit uses the model to determine part of the event content, and uses the determined part of the event content to generate highlights of the event content. In this way, the information processing apparatus can generate highlights according to the user by generating highlights of the event content using part of the event content determined using the model.
  • the generation unit determines a portion of the event content corresponding to a period corresponding to a score equal to or greater than the threshold as part of the event content, and uses the determined part of the event content to generate a high score of the event content. generate light.
  • the information processing apparatus generates highlights of the event content as part of the event content corresponding to a period in which the score output by the model is equal to or greater than the threshold value, thereby providing highlights according to the user. can be generated.
  • the information processing apparatus includes a learning unit (learning unit 132 in the embodiment).
  • the learning unit learns the model using learning data including highlights of past events and status information of users who viewed the past events in real time. In this way, the information processing apparatus can generate highlights according to the user by generating highlights of event content using the learned model.
  • the generation unit generates highlights of event content using the model learned by the learning unit.
  • the information processing apparatus can generate highlights according to the user by generating highlights of event content using the learned model.
  • the acquisition unit acquires state information indicating the state of the second user viewing the event in real time as the state information of the first user.
  • the generator generates highlights of event content to be provided to the second user, using a portion of the event content determined by the second user's state information. In this way, the information processing apparatus can generate highlights according to the user by generating highlights to be provided to the user based on the state of the user who has watched the event in real time.
  • the acquisition unit acquires the state information of the first user who is different from the second user.
  • the information processing apparatus generates the highlights to be provided to the user based on the state of the user different from that of the user. Thus, it is possible to generate highlights suitable for the user.
  • the acquisition unit acquires the state information of the first user who is a user similar to the attributes of the second user.
  • the information processing apparatus generates highlights to be provided to the user based on the states of similar users who are similar to the attributes of the user to whom the highlights are provided, thereby generating highlights suitable for the user. can do.
  • the acquisition unit acquires the state information of the first user, who is a user similar to the demographic attributes of the second user.
  • the information processing apparatus generates highlights to be provided to the user based on the states of similar users who are similar to the demographic attributes of the user to whom the highlights are provided, thereby providing highlights suitable for the user. can be generated.
  • the acquisition unit acquires the state information of the first user who is a user similar to the second user in at least one of age and sex.
  • the information processing apparatus generates highlights to be provided to the user based on the states of similar users who are similar in at least one of age and gender to the user to whom the highlights are provided, thereby providing the user with You can generate corresponding highlights.
  • the acquisition unit acquires the state information of the first user, who is a user similar to the psychographic attributes of the second user.
  • the information processing apparatus generates highlights to be provided to the user based on the states of similar users who are similar to the psychographic attributes of the user to whom the highlights are provided, thereby providing highlights suitable for the user. can be generated.
  • the acquisition unit acquires the state information of the first user, who is a user similar to the second user's preferences.
  • the information processing apparatus generates highlights to be provided to the user based on the states of similar users who are similar to the preferences of the user to whom the highlights are provided, thereby providing the highlights according to the user. can be generated.
  • the acquisition unit acquires the state information of the first user who is a user whose love target matches that of the second user.
  • the information processing apparatus generates highlights to be provided to the user based on the states of similar users whose favorite objects match those of the user to whom the highlights are provided, thereby providing the highlights corresponding to the user. can be generated.
  • the information processing device includes a transmission unit (transmission unit 135 in the embodiment).
  • the transmission unit transmits the highlight of the event content generated by the generation unit to the terminal device (the edge viewing terminal 10 in the embodiment) used by the second user.
  • the information processing apparatus can appropriately provide highlights according to the user by transmitting the generated highlights to the terminal device used by the user.
  • FIG. 20 is a hardware configuration diagram showing an example of a computer 1000 that implements the functions of the information processing apparatus.
  • the highlight generation server 100 according to the embodiment will be described below as an example.
  • the computer 1000 has a CPU 1100 , a RAM 1200 , a ROM (Read Only Memory) 1300 , a HDD (Hard Disk Drive) 1400 , a communication interface 1500 and an input/output interface 1600 .
  • Each part of computer 1000 is connected by bus 1050 .
  • the CPU 1100 operates based on programs stored in the ROM 1300 or HDD 1400 and controls each section. For example, the CPU 1100 loads programs stored in the ROM 1300 or HDD 1400 into the RAM 1200 and executes processes corresponding to various programs.
  • the ROM 1300 stores a boot program such as BIOS (Basic Input Output System) executed by the CPU 1100 when the computer 1000 is started, and programs dependent on the hardware of the computer 1000.
  • BIOS Basic Input Output System
  • the HDD 1400 is a computer-readable recording medium that non-temporarily records programs executed by the CPU 1100 and data used by such programs.
  • HDD 1400 is a recording medium that records an information processing program according to the present disclosure, which is an example of program data 1450 .
  • a communication interface 1500 is an interface for connecting the computer 1000 to an external network 1550 (for example, the Internet).
  • CPU 1100 receives data from another device via communication interface 1500, and transmits data generated by CPU 1100 to another device.
  • the input/output interface 1600 is an interface for connecting the input/output device 1650 and the computer 1000 .
  • the CPU 1100 receives data from input devices such as a keyboard and mouse via the input/output interface 1600 .
  • the CPU 1100 also transmits data to an output device such as a display, speaker, or printer via the input/output interface 1600 .
  • the input/output interface 1600 may function as a media interface for reading a program or the like recorded on a predetermined recording medium.
  • Media include, for example, optical recording media such as DVD (Digital Versatile Disc) and PD (Phase change rewritable disk), magneto-optical recording media such as MO (Magneto-Optical disk), tape media, magnetic recording media, semiconductor memories, etc. is.
  • the CPU 1100 of the computer 1000 implements the functions of the control unit 130 and the like by executing the information processing program loaded on the RAM 1200.
  • the HDD 1400 also stores an information processing program according to the present disclosure and data in the storage unit 120 .
  • CPU 1100 reads and executes program data 1450 from HDD 1400 , as another example, these programs may be obtained from another device via external network 1550 .
  • the present technology can also take the following configuration.
  • an acquisition unit that acquires state information indicating a state of a first user who has viewed the event in real time, and event content that is video of the event; a generation unit that generates highlights of the event content to be provided to a second user using a portion of the event content determined by the state information of the first user acquired by the acquisition unit; Information processing device.
  • the acquisition unit The information processing apparatus according to (1), wherein the event content, which is a video image of the event, is acquired.
  • the acquisition unit The information processing apparatus according to (1) or (2), wherein the state information of the first user who has performed the real-time viewing at the venue of the event is acquired.
  • the acquisition unit The information processing apparatus according to any one of (1) to (3), wherein the state information of the first user who viewed the event of sports or art in real time is acquired.
  • the acquisition unit The information processing apparatus according to any one of (1) to (4), wherein the state information including image information of the first user is acquired.
  • the generating unit Any one of (1) to (5), generating highlights of the event content using a model that outputs a score corresponding to the duration of the event in response to input of input data based on the state information.
  • the information processing device according to .
  • the generating unit The information processing apparatus according to (6), wherein a portion of the event content is determined using the model, and a highlight of the event content is generated using the determined portion of the event content.
  • the generating unit A portion of the event content corresponding to a period corresponding to the score equal to or greater than a threshold is determined as a portion of the event content, and the determined portion of the event content is used to highlight the event content.
  • the information processing apparatus according to (7).
  • a learning unit that learns the model using learning data including highlights of past events and the state information of users who viewed the past events in real time;
  • the information processing device according to any one of (6) to (8).
  • the generating unit The information processing apparatus according to (9), wherein the model learned by the learning unit is used to generate highlights of the event content.
  • the acquisition unit when the second user is viewing the event in real time acquiring the state information indicating the state of the event when the second user is viewing the event in real time as the state information of the first user;
  • the generating unit generating highlights of the event content to be provided to the second user using a portion of the event content determined by the state information of the second user;
  • the information processing device according to .
  • the acquisition unit Any one of (1) to (11) obtained as the state information of the first user who is a user different from the second user when the second user is not viewing the event in real time
  • the information processing device according to 1.
  • the acquisition unit The information processing apparatus according to (12), wherein the state information of the first user who is a user similar to the attributes of the second user is obtained.
  • the acquisition unit The information processing apparatus according to (13), wherein the state information of the first user who is a user similar to the demographic attributes of the second user is obtained.
  • the acquisition unit (14) The information processing apparatus according to (14), wherein the status information is acquired as the state information of the first user who is a user similar in at least one of age and sex to the second user.
  • the acquisition unit The information processing apparatus according to any one of (13) to (15), wherein the status information of the first user who is a user similar to the psychographic attributes of the second user is obtained.
  • the acquisition unit The information processing apparatus according to any one of (13) to (16), wherein the state information of the first user who is a user similar to the preference of the second user is obtained.
  • the acquisition unit The information processing apparatus according to any one of (13) to (17), wherein the state information is acquired as the state information of the first user whose love target matches that of the second user.
  • a transmission unit configured to transmit a highlight of the event content generated by the generation unit to a terminal device used by the second user;
  • the information processing device according to any one of (1) to (18).
  • 1 information processing system 100 highlight generation server (information processing device) 110 communication unit 120 storage unit 121 data set storage unit 122 model information storage unit 123 threshold information storage unit 124 content information storage unit 130 control unit 131 acquisition unit 132 learning unit 133 image processing unit 134 generation unit 135 transmission unit 10 edge viewing terminal ( terminal equipment) 11 communication unit 12 audio input unit 13 audio output unit 14 camera 15 display unit 16 operation unit 17 storage unit 18 control unit 181 acquisition unit 182 transmission unit 183 reception unit 184 processing unit 50 content distribution server 60 spectator video collection server

Abstract

An information processing device according to the present disclosure includes: an acquisition unit for acquiring state information indicating the state of a first user in real-time viewing of an event and an event content, which is a video of the event, said first user being a user having viewed the event in real time; and a generation unit for generating highlights of the event content to be provided to a second user using a part of the event content determined by the state information of the first user acquired by the acquisition unit.

Description

情報処理装置及び情報処理方法Information processing device and information processing method
 本開示は、情報処理装置及び情報処理方法に関する。 The present disclosure relates to an information processing device and an information processing method.
 映像(動画)等のコンテンツのハイライト(ダイジェスト)を自動で生成する技術が提供されている。例えば、コンサート、スポーツの試合、講演会等のイベントの来場者が笑顔か否かを識別してハイライトを生成する技術が提供されている(例えば特許文献1)。 Technology is provided that automatically generates highlights (digests) of content such as video (video). For example, a technology has been provided that identifies whether or not a visitor to an event such as a concert, a sports game, or a lecture is smiling and generates highlights (for example, Patent Document 1).
特開2007-104091号公報JP 2007-104091 A
 しかしながら、従来技術では、ユーザに応じたハイライトを生成することができるとは限らない。従来技術では、来場者の笑顔の数に応じてハイライトを生成しており、笑顔にならないような内容のイベントである場合等、ハイライトを適切に生成することが難しい場合があり、改善の余地がある。そのため、ユーザに応じたハイライトを生成することが望まれている。 However, with the conventional technology, it is not always possible to generate highlights according to the user. In the conventional technology, highlights are generated according to the number of smiles of the visitors, and it may be difficult to generate highlights appropriately when the event does not make people smile. There is room. Therefore, it is desired to generate highlights according to the user.
 そこで、本開示では、ユーザに応じたハイライトを生成することができる情報処理装置及び情報処理方法を提案する。 Therefore, the present disclosure proposes an information processing device and an information processing method capable of generating user-specific highlights.
 上記の課題を解決するために、本開示に係る一形態の情報処理装置は、イベントのリアルタイム視聴を行ったユーザである第1ユーザの前記イベントのリアルタイム視聴時の状態を示す状態情報と、前記イベントの映像であるイベントコンテンツとを取得する取得部と、前記取得部により取得された前記第1ユーザの前記状態情報により決定される前記イベントコンテンツの一部を用いて、第2ユーザに提供する前記イベントコンテンツのハイライトを生成する生成部と、を備える。 In order to solve the above problems, an information processing apparatus according to one embodiment of the present disclosure includes state information indicating a state of a first user who has viewed an event in real time, when the event is viewed in real time; providing a second user with an acquisition unit for acquiring event content, which is a video of an event, and a part of the event content determined by the state information of the first user acquired by the acquisition unit; a generation unit that generates highlights of the event content.
本開示の実施形態に係る情報処理の一例を示す図である。FIG. 3 is a diagram showing an example of information processing according to an embodiment of the present disclosure; FIG. 本開示の実施形態に係る情報処理の一例を示す図である。FIG. 3 is a diagram showing an example of information processing according to an embodiment of the present disclosure; FIG. 本開示の実施形態に係る情報処理システムの構成例を示す図である。1 is a diagram illustrating a configuration example of an information processing system according to an embodiment of the present disclosure; FIG. 観客撮像機器の配置例を示す図である。It is a figure which shows the example of arrangement|positioning of an audience imaging device. 本開示の実施形態に係るハイライト生成サーバの構成例を示す図である。FIG. 4 is a diagram illustrating a configuration example of a highlight generation server according to an embodiment of the present disclosure; FIG. 本開示の実施形態に係るデータセット記憶部の一例を示す図である。FIG. 3 is a diagram illustrating an example of a dataset storage unit according to an embodiment of the present disclosure; FIG. 本開示の実施形態に係るモデル情報記憶部の一例を示す図である。It is a figure showing an example of a model information storage part concerning an embodiment of this indication. 本開示の実施形態に係る閾値情報記憶部の一例を示す図である。4 is a diagram illustrating an example of a threshold information storage unit according to an embodiment of the present disclosure; FIG. 本開示の実施形態に係るエッジ視聴端末の構成例を示す図である。FIG. 2 is a diagram illustrating a configuration example of an edge viewing terminal according to an embodiment of the present disclosure; FIG. 本開示の実施形態に係る情報処理装置の処理手順を示すフローチャートである。4 is a flow chart showing a processing procedure of the information processing device according to the embodiment of the present disclosure; 情報処理システムの学習に関する機能的な構成例を示す図である。It is a figure which shows the functional structural example regarding learning of an information processing system. 情報処理システムの個人属性判定器に関する学習及び推論の一例を示す図である。It is a figure which shows an example of learning and inference regarding the personal attribute determination device of an information processing system. 情報処理システムの行動指標判定器に関する学習及び推論の一例を示す図である。FIG. 10 is a diagram showing an example of learning and inference regarding the action index determiner of the information processing system; 情報処理システムのハイライトシーン予測器に関する学習及び推論の一例を示す図である。FIG. 2 illustrates an example of learning and reasoning for a highlight scene predictor of an information processing system; 情報処理システムのハイライト生成に関する機能的な構成例を示す図である。FIG. 4 is a diagram illustrating a functional configuration example related to highlight generation of an information processing system; ハイライト生成に関する処理手順を示すフローチャートである。FIG. 10 is a flowchart showing a processing procedure regarding highlight generation; FIG. ハイライトへのワイプの重畳表示の一例を示す図である。FIG. 10 is a diagram illustrating an example of superimposed display of wipes on highlights; ハイライトのアングル推定の一例を示す図である。FIG. 10 is a diagram illustrating an example of angle estimation of a highlight; 観客に関する情報の提示の一例を示す図である。It is a figure which shows an example of presentation of the information regarding an audience. 情報処理装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。1 is a hardware configuration diagram showing an example of a computer that implements functions of an information processing apparatus; FIG.
 以下に、本開示の実施形態について図面に基づいて詳細に説明する。なお、この実施形態により本願にかかる情報処理装置及び情報処理方法が限定されるものではない。また、以下の各実施形態において、同一の部位には同一の符号を付することにより重複する説明を省略する。 Below, embodiments of the present disclosure will be described in detail based on the drawings. The information processing apparatus and information processing method according to the present application are not limited to this embodiment. Further, in each of the following embodiments, the same parts are denoted by the same reference numerals, thereby omitting redundant explanations.
 以下に示す項目順序に従って本開示を説明する。
  1.実施形態
   1-1.本開示の実施形態に係る情報処理の概要
    1-1-1.第1の例(第1ユーザ=第2ユーザ)
    1-1-2.第2の例(第1ユーザ≠第2ユーザ)
    1-1-3.背景及び効果等
   1-2.実施形態に係る情報処理システムの構成
    1-2-1.観客撮像機器の配置例
   1-3.実施形態に係る情報処理装置の構成
   1-4.実施形態に係る端末装置の構成
   1-5.実施形態に係る情報処理の手順
   1-6.情報処理システムの構成及び処理
    1-6-1.情報処理システムの学習に関する機能的な構成例
    1-6-2.個人属性判定器に関する学習及び推論
    1-6-3.行動指標判定器に関する学習及び推論
    1-6-4.ハイライトシーン予測器に関する学習及び推論
    1-6-5.情報処理システムのハイライト生成に関する機能的な構成例
   1-7.ハイライト生成の処理フロー例
   1-8.処理例
    1-8-1.行動指標判定器によるハイライトシーン予測
    1-8-2.アングル推定例
    1-8-3.提示例
   1-9.応用例・変形例・効果等
  2.その他の実施形態
   2-1.その他の構成例
   2-2.その他
  3.本開示に係る効果
  4.ハードウェア構成
The present disclosure will be described according to the order of items shown below.
1. Embodiment 1-1. Outline of information processing according to embodiment of present disclosure 1-1-1. First example (first user = second user)
1-1-2. Second example (first user ≠ second user)
1-1-3. Background and Effects 1-2. Configuration of information processing system according to embodiment 1-2-1. Arrangement example of spectator imaging device 1-3. Configuration of Information Processing Apparatus According to Embodiment 1-4. Configuration of terminal device according to embodiment 1-5. Information processing procedure according to the embodiment 1-6. Configuration and processing of information processing system 1-6-1. Functional configuration example related to learning of information processing system 1-6-2. Learning and reasoning about personal attribute determiner 1-6-3. Learning and reasoning about action index determiner 1-6-4. Learning and reasoning about highlight scene predictor 1-6-5. Functional configuration example related to highlight generation of information processing system 1-7. Example of highlight generation process flow 1-8. Processing example 1-8-1. Highlight Scene Prediction by Action Index Determinator 1-8-2. Angle estimation example 1-8-3. Presentation example 1-9. Examples of applications, modifications, effects, etc. 2. Other Embodiments 2-1. Other configuration examples 2-2. Others 3. Effects of the present disclosure 4 . Hardware configuration
[1.実施形態]
[1-1.本開示の実施形態に係る情報処理の概要]
 図1及び図2は、本開示の実施形態に係る情報処理の一例を示す図である。システム構成等については後述するが、本開示の実施形態に係る情報処理は、ハイライト生成サーバ100、複数のエッジ視聴端末10等を含む情報処理システム1によって実現される。なお、以下では、バスケットの試合等のスポーツイベントをハイライト生成の対象とするイベント(以下「対象イベント」ともいう)の一例として説明する。ここでいう、ハイライトは、イベントを撮影した映像コンテンツ(「イベントコンテンツ」ともいう)を用いて生成される映像であり、例えばイベントコンテンツよりも短く要約されたダイジェスト映像である。また、ここでいうスポーツには、電子機器(コンピュータ)を用いて行うeスポーツ(エレクトロニック・スポーツ)も含まれる。なお、対象イベントは、スポーツに限らず、そのイベントを観るユーザ(「観客」ともいう)が存在するような様々なイベントであってもよい。観客は、イベントを観戦または観賞するユーザである。
[1. embodiment]
[1-1. Overview of information processing according to the embodiment of the present disclosure]
1 and 2 are diagrams illustrating an example of information processing according to an embodiment of the present disclosure. Although the system configuration and the like will be described later, the information processing according to the embodiment of the present disclosure is realized by the information processing system 1 including the highlight generation server 100, the plurality of edge viewing terminals 10, and the like. In the following description, a sports event such as a basketball game will be described as an example of an event for which highlights are generated (hereinafter also referred to as a "target event"). Here, the highlight is a video generated using video content (also referred to as “event content”) that captures the event, and is, for example, a digest video that is shorter than the event content. The sports here also include e-sports (electronic sports) performed using electronic devices (computers). Note that the target event is not limited to sports, and may be various events in which there are users (also referred to as “spectators”) who watch the event. A spectator is a user watching or watching an event.
 例えば、対象イベントは、ライブやコンサート等の音楽イベント、書道、絵画等の創作物を即興で作製するイベント、演劇、ミュージカル、寄席、お笑いライブ等の芸術に関するイベントであってもよい。また、対象イベントは、講演会、トークショー、セミナー等であってもよい。また、対象イベントは、映画上映イベント等の多人数でコンテンツを視聴するイベントであってもよい。 For example, the target events may be music events such as live performances and concerts, events involving the improvisation of creative works such as calligraphy and painting, and art-related events such as plays, musicals, vaudeville, and live comedy. Also, the target event may be a lecture, talk show, seminar, or the like. Also, the target event may be an event in which a large number of people view content, such as a movie screening event.
 また、対象イベントは、上述したように実空間で行われるようなイベントに限らず、仮想空間(バーチャル空間)で行われるイベントであってもよい。例えば、対象イベントは、オンラインゲーム内で開催される音楽ライブのようなバーチャルイベントであってもよい。上述のように、情報処理システム1がハイライト生成の対象とするイベント(対象イベント)は、ハイライト生成の対象とすることが可能なイベントであればどのようなイベントであってもよい。 Also, the target event is not limited to an event that takes place in a real space as described above, but may be an event that takes place in a virtual space. For example, the target event may be a virtual event such as a live music performance held within an online game. As described above, the event (target event) targeted for highlight generation by the information processing system 1 may be any event that can be targeted for highlight generation.
 また、以下では、イベントのリアルタイム視聴を行ったユーザを「第1ユーザ」とし、ハイライトの提供先となるユーザを「第2ユーザ」と記載する場合がある。すなわち、ハイライトの提供先となるユーザが、イベントのリアルタイム視聴を行った場合、第1ユーザと第2ユーザとが同一のユーザとなる場合もある。ここでいうイベントのリアルタイム視聴とは、例えば、そのイベントが開催されている日時(期間)に、そのイベントを視聴していることをいう。図1及び図2では、イベント全体をリアルタイム視聴した場合と、全くイベントをリアルタイム視聴していない場合との2つの場合を一例として示す。なお、イベントの一部だけをリアルタイム視聴する場合もあるが、その場合の処理については後述する。 Also, hereinafter, the user who watched the event in real time may be referred to as the "first user", and the user to whom the highlights are provided may be referred to as the "second user". That is, when a user to whom highlights are provided has viewed an event in real time, the first user and the second user may be the same user. Here, real-time viewing of an event means, for example, viewing the event at the date and time (period) when the event is held. FIGS. 1 and 2 show two cases, ie, the case where the entire event is viewed in real time and the case where no event is viewed in real time. In some cases, only part of the event is viewed in real time, and processing in that case will be described later.
 また、以下では、第1ユーザのイベントのリアルタイム視聴時の状態を示す情報(「状態情報」ともいう)として、イベントのリアルタイム視聴時の第1ユーザを撮像した映像(動画)等の画像情報が用いられる場合を一例として説明する。なお、状態情報は、第1ユーザを撮像した画像情報に限らず、第1ユーザの状態を示すものであれば、どのような情報であってもよい。例えば、状態情報は、第1ユーザについて検知された生体情報であってもよく、例えば第1ユーザの心拍、体温、呼吸等の情報であってもよい。 Further, hereinafter, image information such as a video (moving image) captured by the first user when viewing the event in real time is used as information indicating the state of the first user viewing the event in real time (also referred to as “status information”). A case where it is used will be described as an example. Note that the state information is not limited to image information obtained by imaging the first user, and may be any information as long as it indicates the state of the first user. For example, the state information may be biometric information detected about the first user, such as information such as the first user's heartbeat, body temperature, and breathing.
 ここから、図1及び図2を参照しつつ、情報処理システム1が提供するハイライトに関するサービスの概要について説明する。図1及び図2は、本開示の実施形態に係る情報処理の一例を示す図である。具体的には、図1及び図2は、情報処理装置の一例であるハイライト生成サーバ100が実行するハイライトの生成処理例を示す。図1及び図2では、スポーツイベントであるイベントAを対象イベントとする場合を示す。すなわち、図1及び図2では、イベントAを撮影した映像(イベントコンテンツ)が、ハイライト生成の対象となるコンテンツ(「対象コンテンツ」ともいう)である場合を示す。また、図1及び図2では、ハイライトシーン予測に用いられるモデルM3を用いる場合を示すが、モデルM3の詳細については後述する。 From here, an overview of the highlight-related services provided by the information processing system 1 will be described with reference to FIGS. 1 and 2. FIG. 1 and 2 are diagrams illustrating an example of information processing according to an embodiment of the present disclosure. Specifically, FIGS. 1 and 2 show an example of highlight generation processing executed by a highlight generation server 100, which is an example of an information processing apparatus. 1 and 2 show a case where event A, which is a sporting event, is the target event. That is, FIGS. 1 and 2 show a case where the video (event content) captured by event A is the content targeted for highlight generation (also referred to as “target content”). 1 and 2 show the case where the model M3 used for highlight scene prediction is used, and the details of the model M3 will be described later.
[1-1-1.第1の例(第1ユーザ=第2ユーザ)]
 まず、図1に示す処理例(第1の例)について説明する。また、図1は、ハイライトの提供先となるユーザが、イベントのリアルタイム視聴を行った場合、すなわち第1ユーザと第2ユーザとが同一のユーザである場合を示す。このように、図1は、ハイライトの提供先となるユーザがユーザU1であり、対象イベントのリアルタイム視聴を行った場合のハイライトの生成処理の一例を示す。すなわち、図1では、ユーザU1は、イベントAのリアルタイム視聴を行ったユーザ(リアルタイム観戦者)である場合を示す。
[1-1-1. First example (first user = second user)]
First, the processing example (first example) shown in FIG. 1 will be described. Also, FIG. 1 shows a case where a user to whom highlights are provided has viewed an event in real time, that is, a case where the first user and the second user are the same user. As described above, FIG. 1 shows an example of highlight generation processing in the case where the user to whom highlights are provided is the user U1 and the target event is viewed in real time. That is, FIG. 1 shows a case where user U1 is a user who watched event A in real time (real time spectator).
 ハイライト生成サーバ100は、ユーザU1のデータである入力データIND1をモデルM3に入力する(ステップS11)。例えば、ハイライト生成サーバ100は、ユーザU1のイベントAのリアルタイム視聴時の状態を示す情報(状態情報)に基づく入力データIND1をモデルM3に入力する。例えば、ユーザU1の状態情報は、イベントAのリアルタイム視聴時のユーザU1を撮像した映像(動画)等の画像情報が含まれる。図1では説明を簡単にするために、ユーザU1の状態情報を入力データIND1として用いる場合を示す。すなわち、図1では、イベントAのリアルタイム視聴時のユーザU1を撮像した映像を入力データIND1として用いる。なお、入力データIND1等、モデルM3に入力する情報(入力情報)は、第1ユーザの状態情報を基に生成される特徴量に関する情報であってもよいが、この点については後述する。 The highlight generation server 100 inputs the input data IND1, which is the data of the user U1, to the model M3 (step S11). For example, the highlight generation server 100 inputs the input data IND1 based on the information (state information) indicating the state of the event A viewed by the user U1 in real time to the model M3. For example, the state information of the user U1 includes image information such as a video (moving image) of the user U1 when viewing the event A in real time. In order to simplify the explanation, FIG. 1 shows a case where user U1's state information is used as input data IND1. That is, in FIG. 1, the image of the user U1 when viewing the event A in real time is used as the input data IND1. Information (input information) to be input to the model M3, such as the input data IND1, may be information relating to feature amounts generated based on the state information of the first user, which will be described later.
 入力データIND1が入力されたモデルM3は、出力データOD1を出力する(ステップS12)。モデルM3は、イベントAのハイライト生成に用いるスコアを出力データOD1として出力する。例えば、モデルM3は、イベントAの期間の各時点に対応するスコアを出力する。例えば、モデルM3は、イベントAの期間が1時間である場合、1時間のうちの各時点に対応するスコアを出力する。なお、モデルM3は、イベントAの期間について所定の間隔(例えば1分間隔)の各時点(例えば1分、2分、3分…)に対応するスコアを出力してもよいし、イベントAの期間について連続するスコア(波形)を出力してもよい。 The model M3 to which the input data IND1 is input outputs the output data OD1 (step S12). The model M3 outputs the score used for highlight generation of the event A as the output data OD1. For example, model M3 outputs a score corresponding to each point in time during event A. For example, model M3 outputs a score corresponding to each time point in the hour if event A is one hour in duration. Note that the model M3 may output a score corresponding to each time point (for example, 1 minute, 2 minutes, 3 minutes, etc.) at predetermined intervals (for example, 1 minute intervals) during the period of the event A. A continuous score (waveform) may be output for the period.
 ハイライト生成サーバ100は、出力データOD1を用いてハイライトの対象とする期間(「ハイライト対象期間」ともいう)を決定する(ステップS13)。ハイライト生成サーバ100は、モデルM3が出力したスコアと、所定の閾値とを比較することにより、ユーザU1に提供するハイライトのハイライト対象期間に決定する。例えば、ハイライト生成サーバ100は、スコアが所定の閾値以上となる期間に基づいて、ユーザU1に提供するハイライトのハイライト対象期間に決定する。例えば、ハイライト生成サーバ100は、スコアが所定の閾値以上となる期間を、ユーザU1に提供するハイライトのハイライト対象期間に決定する。図1では、ハイライト生成サーバ100は、対象期間情報PTD1に示すように、1-3分、15-20分等の期間をユーザU1に提供するハイライトのハイライト対象期間に決定する。 The highlight generation server 100 uses the output data OD1 to determine a period to be highlighted (also referred to as a "highlight target period") (step S13). The highlight generation server 100 compares the score output by the model M3 with a predetermined threshold value to determine the highlight target period of the highlight to be provided to the user U1. For example, the highlight generation server 100 determines the highlight target period of the highlights to be provided to the user U1 based on the period during which the score is equal to or greater than a predetermined threshold. For example, the highlight generation server 100 determines a period during which the score is equal to or greater than a predetermined threshold as the highlight target period of the highlights to be provided to the user U1. In FIG. 1, the highlight generation server 100 determines a period such as 1-3 minutes, 15-20 minutes, etc. as the highlight target period of the highlight to be provided to the user U1, as shown in the target period information PTD1.
 ハイライト生成サーバ100は、決定したハイライト対象期間に基づいて、ユーザU1に提供するハイライトを生成する(ステップS14)。図1では、ハイライト生成サーバ100は、ユーザU1用の対象期間情報PTD1と、イベントAの映像である対象コンテンツTCV1とを用いて、ユーザU1用のハイライトHLD1を生成する。ハイライト生成サーバ100は、対象コンテンツTCV1のうち、対象期間情報PTD1に示す1-3分、15-20分等の期間に対応する部分を用いて、ユーザU1用のハイライトHLD1を生成する。すなわち、ハイライト生成サーバ100は、対象コンテンツTCV1のうち、1-3分、15-20分等の期間に対応する部分を抽出したコンテンツを、ユーザU1用のハイライトHLD1として生成する。ユーザU1用のハイライトHLD1は、ユーザU1へのハイライトとして適切と推定される1-3分、15-20分等の期間のみを含むイベントAの映像である。 The highlight generation server 100 generates highlights to be provided to the user U1 based on the determined highlight target period (step S14). In FIG. 1, the highlight generation server 100 uses target period information PTD1 for user U1 and target content TCV1, which is video of event A, to generate highlight HLD1 for user U1. The highlight generation server 100 generates highlights HLD1 for the user U1 by using portions of the target content TCV1 that correspond to periods such as 1-3 minutes and 15-20 minutes indicated in the target period information PTD1. That is, the highlight generating server 100 generates, as the highlight HLD1 for the user U1, the content extracted from the target content TCV1 corresponding to a period of 1-3 minutes, 15-20 minutes, or the like. Highlight HLD1 for user U1 is a video of event A that includes only periods of 1-3 minutes, 15-20 minutes, etc. that are estimated to be appropriate highlights for user U1.
 このように、ハイライト生成サーバ100は、ユーザU1(第2ユーザ)がリアルタイム視聴を行っているユーザ(第1ユーザ)である場合、ユーザU1の状態情報を基に、ユーザU1に提供するハイライトを生成する。これにより、ハイライト生成サーバ100は、ユーザU1にとって適切なハイライトを生成することができる。ハイライト生成サーバ100は、生成したユーザU1用のハイライトHLD1をユーザU1が利用するエッジ視聴端末10へ送信する。そして、ユーザU1が利用するエッジ視聴端末10は、ハイライトHLD1を出力(再生)する。これにより、情報処理システム1では、ユーザU1は自分用にカスタマイズされたハイライトを閲覧することができる。 In this way, when the user U1 (second user) is the user (first user) who is watching in real time, the highlight generation server 100 provides the highlight generation server 100 with the user U1 based on the state information of the user U1. generate light. Thereby, the highlight generation server 100 can generate appropriate highlights for the user U1. The highlight generation server 100 transmits the generated highlight HLD1 for the user U1 to the edge viewing terminal 10 used by the user U1. Then, the edge viewing terminal 10 used by the user U1 outputs (reproduces) the highlight HLD1. Thereby, in the information processing system 1, the user U1 can browse the highlights customized for himself/herself.
[1-1-2.第2の例(第1ユーザ≠第2ユーザ)]
 次に、図2に示す処理例(第2の例)について説明する。また、図2は、ハイライトの提供先となるユーザが、イベントのリアルタイム視聴を行っていない場合、すなわち第1ユーザと第2ユーザとが異なるユーザである場合を示す。このように、図2は、ハイライトの提供先となるユーザがユーザU2であり、対象イベントのリアルタイム視聴を行っていない場合のハイライトの生成処理の一例を示す。すなわち、図2では、ユーザU2は、イベントAのリアルタイム視聴を行ったユーザ(リアルタイム観戦者)ではない場合を示す。例えば、ユーザU2は、イベントの開催場所の遠隔に位置し、コンテンツを視聴(閲覧)するユーザ(「リモート視聴者」ともいう)である。なお、図1と同様の点については適宜説明を省略する。
[1-1-2. Second example (first user ≠ second user)]
Next, a processing example (second example) shown in FIG. 2 will be described. Also, FIG. 2 shows a case where a user to whom highlights are provided is not viewing the event in real time, that is, a case where the first user and the second user are different users. As described above, FIG. 2 shows an example of highlight generation processing when the user to whom highlights are provided is the user U2 and the target event is not being viewed in real time. That is, FIG. 2 shows a case where the user U2 is not the user (real-time spectator) who watched the event A in real-time. For example, the user U2 is a user (also referred to as a “remote viewer”) who is located remotely from the venue of the event and views (views) content. Note that description of the same points as in FIG. 1 will be omitted as appropriate.
 図2では、ユーザU2がイベントAのリアルタイム観戦者ではないため、ハイライト生成サーバ100は、ユーザU2に類似するユーザを第1ユーザとして処理を行う。ハイライト生成サーバ100は、ユーザU2に類似し、イベントAのリアルタイム観戦者であるユーザを第1ユーザとして、その第1ユーザのデータである入力データIND2をモデルM3に入力する(ステップS21)。例えば、ハイライト生成サーバ100は、イベントAのリアルタイム観戦者のうち、ユーザU2の属性に類似するユーザを第1ユーザに決定する。例えば、ハイライト生成サーバ100は、イベントAのリアルタイム観戦者のうち、デモグラフィック属性またはサイコグラフィック属性がユーザU2と類似するユーザを第1ユーザに決定する。例えば、ハイライト生成サーバ100は、イベントAのリアルタイム観戦者のうち、年齢、性別、愛好する対象(応援するチーム等)等の嗜好性、家族構成、収入、ライフスタイル等がユーザU2と類似するユーザを第1ユーザに決定する。 In FIG. 2, user U2 is not a real-time spectator of event A, so the highlight generation server 100 performs processing with a user similar to user U2 as the first user. The highlight generating server 100 takes a user who is similar to the user U2 and who is a real-time spectator of the event A as the first user, and inputs the input data IND2, which is the first user's data, to the model M3 (step S21). For example, the highlight generation server 100 determines, among the real-time spectators of the event A, a user similar to the attributes of the user U2 as the first user. For example, the highlight generation server 100 determines a first user among the real-time spectators of event A whose demographic attribute or psychographic attribute is similar to that of user U2. For example, among the real-time spectators of event A, the highlight generation server 100 is similar to user U2 in terms of age, gender, preferences such as favorite objects (teams to support, etc.), family structure, income, lifestyle, etc. The user is determined as the first user.
 図2では説明を簡単にするために、ハイライト生成サーバ100は、ユーザU2と年齢及び性別が一致するユーザ(「ユーザU50」とする)を第1ユーザに決定する場合を一例として説明する。例えば、ハイライト生成サーバ100は、各イベントに対応付けられたリアルタイム視聴を行ったユーザを示す情報を用いて、イベントAのリアルタイム観戦者を決定(特定)する。例えば、ハイライト生成サーバ100は、各ユーザに対応付けられた属性情報を比較することにより、ユーザU2に類似するユーザを決定(特定)してもよい。なお、ハイライト生成サーバ100は、モデルを用いて、属性を判定してもよいが、この点については後述する。 In FIG. 2, in order to simplify the explanation, a case where the highlight generation server 100 determines a user (referred to as "user U50") whose age and gender match those of user U2 as the first user will be described as an example. For example, the highlight generation server 100 determines (identifies) the real-time spectators of the event A using the information indicating the users who are associated with each event and have viewed the event in real time. For example, the highlight generation server 100 may determine (identify) users similar to user U2 by comparing attribute information associated with each user. Note that the highlight generation server 100 may use a model to determine attributes, but this point will be described later.
 例えば、ハイライト生成サーバ100は、ユーザU2に対応する第1ユーザであるユーザU50のイベントAのリアルタイム視聴時の状態を示す情報(状態情報)に基づく入力データIND2をモデルM3に入力する。例えば、ユーザU50の状態情報は、イベントAのリアルタイム視聴時のユーザU50を撮像した映像(動画)等の画像情報が含まれる。図2では、イベントAのリアルタイム視聴時のユーザU50を撮像した映像を入力データIND2として用いる。 For example, the highlight generation server 100 inputs input data IND2 based on information (state information) indicating the state of real-time viewing of event A by user U50, who is the first user corresponding to user U2, to model M3. For example, the state information of the user U50 includes image information such as a video (moving image) of the user U50 when viewing the event A in real time. In FIG. 2, a video image of the user U50 when viewing the event A in real time is used as the input data IND2.
 入力データIND2が入力されたモデルM3は、出力データOD2を出力する(ステップS22)。モデルM3は、イベントAのハイライト生成に用いるスコアを出力データOD2として出力する。例えば、モデルM3は、イベントAの期間の各時点に対応するスコアを出力する。 The model M3 to which the input data IND2 is input outputs the output data OD2 (step S22). The model M3 outputs the score used for highlight generation of the event A as the output data OD2. For example, model M3 outputs a score corresponding to each point in time during event A.
 ハイライト生成サーバ100は、出力データOD2を用いてハイライトの対象とする期間(ハイライト対象期間)を決定する(ステップS23)。ハイライト生成サーバ100は、モデルM3が出力したスコアと、所定の閾値とを比較することにより、ユーザU2に提供するハイライトのハイライト対象期間に決定する。図2では、ハイライト生成サーバ100は、対象期間情報PTD2に示すように、5-10分、25-30分等の期間をユーザU2に提供するハイライトのハイライト対象期間に決定する。 The highlight generation server 100 determines a period to be highlighted (highlight target period) using the output data OD2 (step S23). The highlight generation server 100 compares the score output by the model M3 with a predetermined threshold to determine the highlight target period of the highlight to be provided to the user U2. In FIG. 2, the highlight generation server 100 determines periods such as 5-10 minutes, 25-30 minutes, etc., as the highlight target periods of the highlights to be provided to the user U2, as indicated by the target period information PTD2.
 ハイライト生成サーバ100は、決定したハイライト対象期間に基づいて、ユーザU2に提供するハイライトを生成する(ステップS24)。図2では、ハイライト生成サーバ100は、ユーザU2用の対象期間情報PTD2と、イベントAの映像である対象コンテンツTCV1とを用いて、ユーザU2用のハイライトHLD2を生成する。ハイライト生成サーバ100は、対象コンテンツTCV1のうち、対象期間情報PTD2に示す5-10分、25-30分等の期間に対応する部分を用いて、ユーザU2用のハイライトHLD2を生成する。ハイライト生成サーバ100は、5-10分、25-30分等の期間のみを含むイベントAの映像を、ユーザU2用のハイライトHLD2として生成する。 The highlight generation server 100 generates highlights to be provided to the user U2 based on the determined highlight target period (step S24). In FIG. 2, the highlight generation server 100 uses target period information PTD2 for user U2 and target content TCV1, which is video of event A, to generate highlight HLD2 for user U2. The highlight generation server 100 generates highlights HLD2 for the user U2 using portions of the target content TCV1 that correspond to periods such as 5-10 minutes and 25-30 minutes indicated in the target period information PTD2. The highlight generation server 100 generates a video of the event A including only the period of 5-10 minutes, 25-30 minutes, etc. as the highlight HLD2 for the user U2.
 このように、ハイライト生成サーバ100は、ユーザU2(第2ユーザ)がリアルタイム視聴を行っているユーザ(第1ユーザ)ではない場合、ユーザU2に類似するユーザの状態情報を基に、ユーザU2に提供するハイライトを生成する。これにより、ハイライト生成サーバ100は、第2ユーザがリアルタイム視聴を行っていない場合であっても、ユーザU2にとって適切なハイライトを生成することができる。ハイライト生成サーバ100は、生成したユーザU2用のハイライトHLD2をユーザU2が利用するエッジ視聴端末10へ送信する。そして、ユーザU2が利用するエッジ視聴端末10は、ハイライトHLD2を出力(再生)する。これにより、情報処理システム1では、ユーザU2は自分用にカスタマイズされたハイライトを閲覧することができる。 As described above, if the user U2 (second user) is not the user (first user) who is watching in real time, the highlight generation server 100 may determine the status of the user U2 based on the state information of the users similar to the user U2. Generate highlights to provide to . As a result, the highlight generation server 100 can generate appropriate highlights for the user U2 even when the second user is not viewing in real time. The highlight generation server 100 transmits the generated highlight HLD2 for the user U2 to the edge viewing terminal 10 used by the user U2. Then, the edge viewing terminal 10 used by the user U2 outputs (reproduces) the highlight HLD2. Thereby, in the information processing system 1, the user U2 can browse the highlights customized for himself/herself.
[1-1-3.背景及び効果等]
 スポーツやライブなどのハイライト動画を自動で生成し、制作コストを削減しユーザにより早く動画を提供するニーズが高まっている。従来、試合自体の映像やプレイヤーの映像をAI(人工知能)により解析し、ハイライトシーンを抽出する手法があるが、この手法ではスポーツの競技やライブの種別によって異なるAIの開発が必要となりコストの増大を抑制することが難しい。例えば、試合映像の画像解析によるシーン抽出では、競技によって異なるプレイヤー等のハイライト対象となる(盛り上がる)行動または状態を個別にルール化したり、学習したりするため、他競技への展開に、別途データ収集、分析及びモデル学習等が必要となる。
[1-1-3. Background and effects, etc.]
There is a growing need to automatically generate highlight videos of sports and live performances, reduce production costs, and provide videos faster to users. Conventionally, there is a method of analyzing the video of the game itself and the video of the players with AI (artificial intelligence) and extracting the highlight scene, but this method requires the development of different AI depending on the type of sports competition and live performance, which is costly. It is difficult to suppress the increase in For example, in extracting scenes by image analysis of game footage, the actions or states to be highlighted (excited) by different players, etc., differ depending on the sport, and are individually ruled or learned. Data collection, analysis, model learning, etc. are required.
 また、例えば、観客の声援によりハイライトシーンを抽出する手法もあるが、この手法ではハイライト視聴者の見たいシーンの個人差および属性差に応える事ができない。また、例えば、歓声によるシーン抽出では会場全体もしくはマイク配置環境に依存したエリア別程度の盛り上がりしか取れず、視聴者個人及び属性によって異なるハイライトで見たいシーンのニーズには応えることが出来ない。 Also, for example, there is a method of extracting highlight scenes from the cheering of the audience, but this method cannot respond to individual differences and attribute differences in scenes that highlight viewers want to see. In addition, for example, scene extraction based on cheers can only provide excitement for the entire venue or for each area depending on the microphone arrangement environment, and it is not possible to meet the needs of scenes that viewers want to see with different highlights depending on individual viewers and attributes.
 そこで、情報処理システム1では、イベントをリアルタイム視聴しているユーザ(観客)の状態情報を用いて、ハイライトを生成することにより、イベントを撮影した映像(イベントコンテンツ)自体を解析したりすることなく、ハイライトを生成することができる。すなわち、情報処理システム1では、イベントを撮影した映像ではなく、イベントをリアルタイム視聴しているユーザ(観客)の情報により、ハイライトを生成することで、イベントの種別に依らず、適切にハイライトを生成することができる。 Therefore, in the information processing system 1, the state information of the user (spectator) watching the event in real time is used to generate the highlight, thereby analyzing the video of the event (event content) itself. You can generate highlights without That is, in the information processing system 1, by generating highlights based on the information of users (spectators) who are watching the event in real time instead of the video of the event, the highlights can be appropriately highlighted regardless of the type of the event. can be generated.
 例えば、情報処理システム1は、スポーツなどイベントのハイライト(動画)の自動生成において、観客を撮影した映像(画像)の認識と解析に基づいて視聴者の属性に応じたシーンを選択し、ハイライトを生成する。例えば、情報処理システム1は、スポーツなどのイベント会場の観客(もしくはリモート視聴者)を撮影した映像に基づいて、個人属性判定器、盛り上がりまたは集中度などの行動指標判定器、及びハイライトシーン予測器を学習(生成)する。例えば、情報処理システム1は、イベント会場の観客(もしくはリモート視聴者)を撮影した映像の画像認識により得られる骨格、顔認識情報、動き検出及び視線等を特徴量として、個人属性判定器、行動指標判定器、及びハイライトシーン予測器を学習(生成)する。例えば、情報処理システム1は、教師有り学習またはルールベースにより、個人属性判定器、行動指標判定器、及びハイライトシーン予測器を学習(生成)する。 For example, in the automatic generation of highlights (moving images) of events such as sports, the information processing system 1 selects scenes according to the attributes of viewers based on recognition and analysis of images (images) of spectators, generate light. For example, the information processing system 1 includes a personal attribute determiner, a behavior index determiner such as excitement or degree of concentration, and a highlight scene prediction based on a video of spectators (or remote viewers) at an event venue such as a sport. Learn (generate) a device. For example, the information processing system 1 uses the skeleton, face recognition information, motion detection, line of sight, etc. obtained by image recognition of the image of the audience (or remote viewer) at the event venue as feature amounts, and uses the personal attribute determination device, the behavior Learn (generate) an index determiner and a highlight scene predictor. For example, the information processing system 1 learns (generates) a personal attribute determiner, an action index determiner, and a highlight scene predictor by supervised learning or rule base.
 例えば、情報処理システム1は、ハイライトを視聴するリモート視聴者(もしくは会場の観客)のリアルタイム観戦時間と個人属性判定器により判定された個人属性に応じて、視聴者自身もしくは同じ属性のリアルタイム観戦した観客(の集合)に対応するハイライトを生成する。例えば、情報処理システム1は、ハイライトシーン予測器(もしくは行動指標判定器)を用いて生成したハイライトを、ハイライトを視聴する視聴者(ユーザ)に提供する。また、情報処理システム1は、上述の時間軸上で抽出されたハイライト(のシーン)に対し、ハイライトを視聴する視聴者と同じ属性を持つ会場の観客の集合の画像認識により得られる特徴量から最適な注視点位置とアングルを推定し、ハイライトシーンのカメラワークを決定してもよい。なお、これらの点については後述する。 For example, the information processing system 1, according to the real-time watching time of the remote viewer (or the spectator at the venue) watching the highlight and the personal attribute determined by the personal attribute determiner, the viewer himself or the real-time watching of the same attribute generate highlights corresponding to (a set of) spectators. For example, the information processing system 1 provides a viewer (user) viewing highlights with highlights generated using a highlight scene predictor (or behavior index determiner). In addition, the information processing system 1 recognizes features obtained by image recognition of a group of spectators at the venue who have the same attributes as the viewers viewing the highlights (scenes of the highlights) extracted on the time axis described above. The optimal gaze point position and angle may be estimated from the amount, and the camera work of the highlight scene may be determined. Note that these points will be described later.
[1-2.実施形態に係る情報処理システムの構成]
 図3に示す情報処理システム1について説明する。図3に示すように、情報処理システム1は、ハイライト生成サーバ100と、複数のエッジ視聴端末10と、コンテンツ配信サーバ50と、観客映像収集サーバ60とが含まれる。ハイライト生成サーバ100と、複数のエッジ視聴端末10の各々と、コンテンツ配信サーバ50と、観客映像収集サーバ60とは所定の通信網(ネットワークN)を介して、有線または無線により通信可能に接続される。図3は、実施形態に係る情報処理システムの構成例を示す図である。
[1-2. Configuration of information processing system according to embodiment]
The information processing system 1 shown in FIG. 3 will be described. As shown in FIG. 3 , the information processing system 1 includes a highlight generation server 100 , a plurality of edge viewing terminals 10 , a content distribution server 50 and an audience video collection server 60 . The highlight generation server 100, each of the plurality of edge viewing terminals 10, the content distribution server 50, and the spectator video collection server 60 are communicatively connected via a predetermined communication network (network N) by wire or wirelessly. be done. FIG. 3 is a diagram illustrating a configuration example of an information processing system according to the embodiment;
 なお、図3では3個のエッジ視聴端末10のみを図示するが、情報処理システム1には、4個以上のエッジ視聴端末10が含まれてもよい。図3では各エッジ視聴端末10を区別して説明するために、各エッジ視聴端末10を、エッジ視聴端末10a、エッジ視聴端末10b及びエッジ視聴端末10cとして説明する場合がある。なお、エッジ視聴端末10a、エッジ視聴端末10b及びエッジ視聴端末10c等について、特に区別せずに説明する場合は、「エッジ視聴端末10」と記載する。また、図3に示した情報処理システム1には、複数のハイライト生成サーバ100や複数のコンテンツ配信サーバ50や複数の観客映像収集サーバ60が含まれてもよい。 Although only three edge viewing terminals 10 are illustrated in FIG. 3 , the information processing system 1 may include four or more edge viewing terminals 10 . In FIG. 3, each edge viewing terminal 10 may be described as an edge viewing terminal 10a, an edge viewing terminal 10b, and an edge viewing terminal 10c in order to distinguish and describe each edge viewing terminal 10. FIG. Note that the edge viewing terminal 10a, the edge viewing terminal 10b, the edge viewing terminal 10c, and the like will be referred to as the "edge viewing terminal 10" when they are not distinguished from each other. Further, the information processing system 1 shown in FIG. 3 may include a plurality of highlight generation servers 100, a plurality of content distribution servers 50, and a plurality of spectator video collection servers 60. FIG.
 ハイライト生成サーバ100は、ユーザにハイライトのサービスを提供するために用いられるコンピュータである。ハイライト生成サーバ100は、第1ユーザのイベントのリアルタイム視聴時の状態を示す状態情報により決定されるイベントコンテンツの一部を用いて、第2ユーザに提供するイベントコンテンツのハイライトを生成する。 The highlight generation server 100 is a computer used to provide highlight services to users. The highlight generation server 100 generates highlights of event content to be provided to the second user, using part of the event content determined by the state information indicating the state of the first user's event during real-time viewing.
 エッジ視聴端末10は、ユーザによって利用されるコンピュータである。例えば、エッジ視聴端末10は、リモート視聴者または観客によって利用される。例えば、エッジ視聴端末10は、ブラウザに表示されるウェブページやアプリケーション用のコンテンツ等のコンテンツにアクセスするユーザによって利用される。例えば、エッジ視聴端末10は、ユーザがコンテンツを閲覧するために利用される。 The edge viewing terminal 10 is a computer used by the user. For example, edge viewing terminals 10 are utilized by remote viewers or spectators. For example, the edge viewing terminal 10 is used by a user who accesses content such as web pages displayed on a browser and content for applications. For example, the edge viewing terminal 10 is used by the user to browse content.
 エッジ視聴端末10は、例えば、ノート型PC(Personal Computer)、タブレット型端末、デスクトップPC、スマートフォン、スマートスピーカ、テレビ、携帯電話機、PDA(Personal Digital Assistant)等の装置であってもよい。なお、以下では、エッジ視聴端末10をユーザと表記する場合がある。すなわち、以下では、ユーザをエッジ視聴端末10と読み替えることもできる。 The edge viewing terminal 10 may be, for example, a notebook PC (Personal Computer), a tablet terminal, a desktop PC, a smartphone, a smart speaker, a television, a mobile phone, a PDA (Personal Digital Assistant), or other device. Note that the edge viewing terminal 10 may be referred to as a user hereinafter. That is, hereinafter, the user can also be read as the edge viewing terminal 10 .
 エッジ視聴端末10は、イベントに関する情報を出力する。エッジ視聴端末10は、イベントを撮影した映像やハイライト等の各種コンテンツに関する情報を出力する。エッジ視聴端末10は、ハイライトの画像(映像)を表示し、ハイライトの音声を音声出力する。例えば、エッジ視聴端末10は、ユーザの発話や画像(映像)をハイライト生成サーバ100へ送信し、ハイライト生成サーバ100からハイライトの音声や画像(映像)を受信する。エッジ視聴端末10は、撮影したユーザの映像を、ハイライト生成サーバ100へ送信する。 The edge viewing terminal 10 outputs information about the event. The edge viewing terminal 10 outputs information related to various contents such as videos and highlights of the event. The edge viewing terminal 10 displays the image (video) of the highlight and outputs the audio of the highlight. For example, the edge viewing terminal 10 transmits user's utterances and images (video) to the highlight generation server 100 and receives highlight audio and images (video) from the highlight generation server 100 . The edge viewing terminal 10 transmits the captured video of the user to the highlight generation server 100 .
 エッジ視聴端末10は、ユーザによる入力を受け付ける。エッジ視聴端末10は、ユーザの発話による音声入力や、ユーザの操作による入力を受け付ける。エッジ視聴端末10は、実施形態における処理を実現可能であれば、どのような装置であってもよい。エッジ視聴端末10は、コンテンツの情報表示や音声出力等を行う機能を有する構成であれば、どのような装置であってもよい。 The edge viewing terminal 10 accepts input from the user. The edge viewing terminal 10 receives voice input by user's utterance and input by user's operation. The edge viewing terminal 10 may be any device as long as it can implement the processing in the embodiments. The edge viewing terminal 10 may be any device as long as it has a function of displaying content information and outputting audio.
 なお、図3では、エッジ視聴端末10aのみにカメラ14及び表示部15を示す符号を付し、他のエッジ視聴端末10b、10cについてのカメラ14及び表示部15を示す符号の図示を省略するが、各エッジ視聴端末10は、カメラ14及び表示部15を有する。 In FIG. 3, reference numerals indicating the camera 14 and the display unit 15 are attached only to the edge viewing terminal 10a, and the reference numerals indicating the cameras 14 and the display unit 15 of the other edge viewing terminals 10b and 10c are omitted. , each edge viewing terminal 10 has a camera 14 and a display unit 15 .
 コンテンツ配信サーバ50は、イベントを撮影したコンテンツを配信するサービスを提供するサーバ装置(コンピュータ)である。なお、コンテンツ配信サーバ50は、コンテンツを配信する機能を有するサーバと同様であるため詳細な説明は省略する。 The content distribution server 50 is a server device (computer) that provides a service for distributing content of photographed events. Note that the content distribution server 50 is the same as a server having a function of distributing content, so detailed description thereof will be omitted.
 観客映像収集サーバ60は、イベントのリアルタイム視聴する観客を撮影した映像を収集するサーバ装置(コンピュータ)である。観客映像収集サーバ60は、収集した映像をハイライト生成サーバ100に送信する。なお、観客映像収集サーバ60は、撮影する対象が観客である点以外は、コンテンツ配信サーバ50と同様であるため詳細な説明は省略する。 The spectator video collection server 60 is a server device (computer) that collects videos of spectators watching the event in real time. The spectator video collection server 60 transmits the collected video to the highlight generation server 100 . Note that the spectator video collection server 60 is the same as the content distribution server 50 except that the object to be photographed is the spectator, so a detailed description thereof will be omitted.
 また、図3に示す情報処理システム1では、スポーツやライブ会場の現地に、試合や演者等を撮影するカメラなどの観客撮像機器である撮像機器群FIA及び試合や演者等の音を収音するマイクなどのコンテンツ収音機器である収音機器群SCDが配置される。撮像機器群FIA及び収音機器群SCDが収集した情報はコンテンツ配信サーバ50がハイライト生成サーバ100に送信する。図3に示す情報処理システム1では、スポーツやライブ会場の現地に、現地の観客を撮影するカメラなどの観客撮像機器である撮像機器群SIAが配置される。撮像機器群SIAが収集した情報は観客映像収集サーバ60がハイライト生成サーバ100に送信する。 In addition, in the information processing system 1 shown in FIG. 3, the imaging equipment group FIA, which is spectator imaging equipment such as a camera for photographing the game, performers, etc., and the sound of the game, performers, etc. A sound collecting device group SCD, which is a content sound collecting device such as a microphone, is arranged. Information collected by the imaging device group FIA and the sound collecting device group SCD is transmitted by the content distribution server 50 to the highlight generation server 100 . In the information processing system 1 shown in FIG. 3, an imaging equipment group SIA, which is spectator imaging equipment such as a camera for capturing an image of an audience at the venue, is arranged at a sports or live venue. The spectator video collection server 60 transmits the information collected by the imaging device group SIA to the highlight generation server 100 .
 また、図3に示す情報処理システム1では、現地以外からリアルタイムもしくはハイライトでコンテンツを視聴するリモート視聴の環境は、コンテンツを視聴するため表示装置である表示部15と、リモート視聴者自身を撮影するカメラなどの視聴者撮像機器であるカメラ14等を含むエッジ視聴端末10で構成される。図3に示す情報処理システム1では、リモート視聴者を撮影した映像はネットワークNを通してハイライト生成サーバ100に送られる。また、ハイライト視聴時にはハイライト生成サーバ100から個別に配信されるハイライト(動画)をエッジ視聴端末10で受信し視聴可能となる。 In the information processing system 1 shown in FIG. 3, the environment for remote viewing in which content is viewed in real time or in highlights from a location other than the site consists of the display unit 15, which is a display device for viewing content, and the remote viewer himself/herself. It is composed of an edge viewing terminal 10 including a camera 14 or the like, which is a viewer imaging device such as a camera for viewing. In the information processing system 1 shown in FIG. 3, the image of the remote viewer is sent to the highlight generation server 100 through the network N. In the example shown in FIG. Also, during highlight viewing, the edge viewing terminal 10 can receive and view highlights (moving images) individually distributed from the highlight generation server 100 .
 また、図3に示す情報処理システム1では、ハイライト生成サーバ100は、コンテンツの映像及び音声データ、現地観客の撮影画像データ、リモート視聴者のリアルタイムもしくはハイライトでコンテンツを視聴する時の撮影画像データが収集される。ハイライト生成サーバ100は、ハイライト視聴者個人(第2ユーザ)に最適化されたハイライト動画を生成し、ハイライト視聴者個人に配信する。 In the information processing system 1 shown in FIG. 3, the highlight generation server 100 generates video and audio data of content, captured image data of local spectators, and captured images of remote viewers viewing content in real time or in highlight mode. Data are collected. The highlight generation server 100 generates a highlight video optimized for the individual highlight viewer (second user) and distributes it to the individual highlight viewer.
 なお、情報処理システム1の装置構成(デバイス構成)としては上述の構成に限られず、任意の装置構成が採用可能である。すなわち、情報処理システム1は、上述以外の構成をとってもよく、例えば、ハイライト生成サーバ100は、エッジ視聴端末10、コンテンツ配信サーバ50または観客映像収集サーバ60のいずれかと一体であってもよい。すなわち、ハイライト生成サーバ100の機能を、エッジ視聴端末10、コンテンツ配信サーバ50または観客映像収集サーバ60のいずれかが有してもよい。 The device configuration (device configuration) of the information processing system 1 is not limited to the configuration described above, and any device configuration can be adopted. That is, the information processing system 1 may have a configuration other than that described above. For example, the highlight generation server 100 may be integrated with any one of the edge viewing terminal 10, the content distribution server 50, and the spectator video collection server 60. That is, any one of the edge viewing terminal 10, the content distribution server 50, or the spectator video collection server 60 may have the function of the highlight generation server 100. FIG.
[1-2-1.観客撮像機器の配置例]
 ここで、図4を用いて、情報処理システム1における観客撮像機器(図3では撮像機器群SIA)の配置の一例を説明する。図4は、観客撮像機器の配置例を示す図である。観客撮像機器は、4Kビデオカメラであってもよい。
[1-2-1. Arrangement example of spectator imaging equipment]
Here, an example of arrangement of spectator imaging devices (imaging device group SIA in FIG. 3) in the information processing system 1 will be described with reference to FIG. FIG. 4 is a diagram showing an example of arrangement of spectator imaging devices. The spectator imaging device may be a 4K video camera.
 図4では、バスケットボールの試合会場の4つのポイントPT1~PT4の各々に少なくとも1台の観客撮像機器が配置される場合を示す。例えば、ポイントPT1に配置された観客撮像機器は、エリアAR1に位置する観客(ユーザ)を撮影する。また、ポイントPT2に配置された観客撮像機器は、エリアAR2に位置する観客(ユーザ)を撮影する。また、ポイントPT3に配置された観客撮像機器は、エリアAR3に位置する観客(ユーザ)を撮影する。また、ポイントPT4に配置された観客撮像機器は、エリアAR4に位置する観客(ユーザ)を撮影する。なお、図4中のエリアAR1~AR4は各ポイントPT1~PT4からの撮影エリアの概略を示すものであり、例えば、エリアAR1~AR4により会場の全観客が撮影対象となってもよい。また、エリアAR1~AR4の各々は、一部が他のエリアと重複してもよい。また、図4は一例に過ぎず、所望の観客を撮影可能であれば、観客撮像機器はどのような配置態様であってもよい。 FIG. 4 shows a case where at least one spectator imaging device is arranged at each of four points PT1 to PT4 in a basketball game venue. For example, the spectator imaging device placed at point PT1 photographs spectators (users) positioned in area AR1. Also, the spectator imaging device placed at the point PT2 photographs the spectators (users) positioned in the area AR2. Also, the spectator imaging device placed at the point PT3 photographs the spectators (users) positioned in the area AR3. Also, the spectator imaging device placed at the point PT4 photographs the spectators (users) positioned in the area AR4. Areas AR1 to AR4 in FIG. 4 show outlines of shooting areas from points PT1 to PT4. For example, areas AR1 to AR4 may cover all the spectators at the venue. Also, each of the areas AR1 to AR4 may partially overlap with another area. Moreover, FIG. 4 is merely an example, and the spectator imaging device may be arranged in any manner as long as it is possible to photograph a desired spectator.
[1-3.実施形態に係る情報処理装置の構成]
 次に、実施形態に係る情報処理を実行する情報処理装置の一例であるハイライト生成サーバ100の構成について説明する。図5は、本開示の実施形態に係るハイライト生成サーバの構成例を示す図である。
[1-3. Configuration of information processing device according to embodiment]
Next, the configuration of the highlight generation server 100, which is an example of an information processing apparatus that executes information processing according to the embodiment, will be described. FIG. 5 is a diagram illustrating a configuration example of a highlight generation server according to an embodiment of the present disclosure;
 図5に示すように、ハイライト生成サーバ100は、通信部110と、記憶部120と、制御部130とを有する。なお、ハイライト生成サーバ100は、ハイライト生成サーバ100の管理者等から各種操作を受け付ける入力部(例えば、キーボードやマウス等)や、各種情報を表示するための表示部(例えば、液晶ディスプレイ等)を有してもよい。 As shown in FIG. 5, the highlight generation server 100 has a communication section 110, a storage section 120, and a control section . Note that the highlight generation server 100 includes an input unit (for example, a keyboard, a mouse, etc.) that receives various operations from the administrator of the highlight generation server 100, and a display unit (for example, a liquid crystal display, etc.) for displaying various information. ).
 通信部110は、例えば、NIC(Network Interface Card)等によって実現される。そして、通信部110は、ネットワークN(図3参照)と有線または無線で接続され、エッジ視聴端末10、コンテンツ配信サーバ50及び観客映像収集サーバ60等の他の情報処理装置との間で情報の送受信を行う。また、通信部110は、ユーザが利用するユーザ端末(図示省略)との間で情報の送受信を行ってもよい。 The communication unit 110 is implemented by, for example, a NIC (Network Interface Card) or the like. The communication unit 110 is connected to the network N (see FIG. 3) by wire or wirelessly, and exchanges information with other information processing devices such as the edge viewing terminal 10, the content distribution server 50, and the spectator video collection server 60. Send and receive. Also, the communication unit 110 may transmit and receive information to and from a user terminal (not shown) used by the user.
 記憶部120は、例えば、RAM(Random Access Memory)、フラッシュメモリ(Flash Memory)等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。実施形態に係る記憶部120は、図5に示すように、データセット記憶部121と、モデル情報記憶部122と、閾値情報記憶部123と、コンテンツ情報記憶部124とを有する。 The storage unit 120 is implemented by, for example, a semiconductor memory device such as RAM (Random Access Memory) or flash memory, or a storage device such as a hard disk or optical disk. The storage unit 120 according to the embodiment has a dataset storage unit 121, a model information storage unit 122, a threshold information storage unit 123, and a content information storage unit 124, as shown in FIG.
 実施形態に係るデータセット記憶部121は、学習に用いるデータに関する各種情報を記憶する。データセット記憶部121は、学習に用いるデータセットを記憶する。図6は、本開示の実施形態に係るデータセット記憶部の一例を示す図である。図6に、実施形態に係るデータセット記憶部121の一例を示す。図6の例では、データセット記憶部121中の各テーブルには、「対象モデルID」、「データID」、「データ」、「ラベル」、「日時」といった項目が含まれる。 The data set storage unit 121 according to the embodiment stores various information related to data used for learning. The dataset storage unit 121 stores datasets used for learning. FIG. 6 is a diagram illustrating an example of a dataset storage unit according to an embodiment of the present disclosure; FIG. 6 shows an example of the dataset storage unit 121 according to the embodiment. In the example of FIG. 6, each table in the data set storage unit 121 includes items such as "target model ID", "data ID", "data", "label", and "date and time".
 データセット記憶部121は、図6中のテーブルTB1、TB2、TB3等のように、複数のモデルの各々の学習に用いるデータを、学習対象となるモデルに対応付けて記憶する。なお、図6では、3個のテーブルTB1、TB2、TB3のみを図示するが、データセット記憶部121は、学習されるモデルの数に対応するテーブルを含んでもよい。 The data set storage unit 121 stores data used for learning each of a plurality of models in association with the model to be learned, such as tables TB1, TB2, TB3, etc. in FIG. Although only three tables TB1, TB2, and TB3 are illustrated in FIG. 6, the data set storage unit 121 may include tables corresponding to the number of learned models.
 「対象モデルID」は、学習される対象となるモデル(対象モデル)を識別するための識別情報を示す。「データID」は、対象モデルの学習処理に用いられたデータを識別するための識別情報を示す。また、「データ」は、データIDにより識別されるデータを示す。 "Target model ID" indicates identification information for identifying a model to be learned (target model). “Data ID” indicates identification information for identifying data used in the learning process of the target model. "Data" indicates data identified by a data ID.
 「ラベル」は、対応するデータに付されるラベル(正解ラベル)を示す。例えば、「ラベル」は、対応するデータの分類(カテゴリ)を示す情報(正解情報)であってもよい。例えば、「ラベル」は、対象モデルの出力に対応する正解情報(正解ラベル)である。 "Label" indicates the label (correct label) attached to the corresponding data. For example, the "label" may be information (correct answer information) indicating the classification (category) of the corresponding data. For example, "label" is correct information (correct label) corresponding to the output of the target model.
 また、「日時」は、対応するデータに関する時間(日時)を示す。なお、図6の例では、「DA1」等で図示するが、「日時」には、「2021年12月15日17時48分35秒」等の具体的な日時であってもよいし、「バージョンXXのモデル学習から使用開始」等、そのデータがどのモデルの学習から使用が開始されたかを示す情報が記憶されてもよい。 "Date and time" indicates the time (date and time) related to the corresponding data. In the example of FIG. 6, "DA1" or the like is shown, but the "date and time" may be a specific date and time such as "17:48:35 on December 15, 2021". Information indicating from which model learning the data started to be used may be stored, such as "use started from model learning of version XX".
 図6の例では、テーブルTB1中のデータは、対象モデルID「M1」により識別される対象モデル(モデルM1)の学習に用いられたデータであることを示す。モデルM1の学習に用いられたデータは、データID「DID1」、「DID2」、「DID3」等により識別される複数のデータが含まれることを示す。例えば、データID「DID1」、「DID2」、「DID3」等により識別される各データ(データDT1、DT2、DT3等)は、個人属性判定を行うモデルM1の学習に用いられる情報である。例えば、データDT1、DT2、DT3等は、モデルM1の入力データであり、各データに対応するラベルLB1、LB2、LB3等は、各データが入力された場合に望まれるモデルM1の出力を示す。 The example of FIG. 6 indicates that the data in the table TB1 is the data used for learning the target model (model M1) identified by the target model ID "M1". The data used for learning the model M1 includes a plurality of data identified by data IDs "DID1", "DID2", "DID3", and the like. For example, each data (data DT1, DT2, DT3, etc.) identified by data IDs "DID1", "DID2", "DID3", etc. is information used for learning of the model M1 that performs personal attribute determination. For example, data DT1, DT2, DT3, etc. are input data for the model M1, and labels LB1, LB2, LB3, etc. corresponding to each data indicate the desired output of the model M1 when each data is input.
 また、テーブルTB2中のデータは、対象モデルID「M2」により識別される対象モデル(モデルM2)の学習に用いられたデータであることを示す。行動指標判定に用いられるモデルM2は、テーブルTB2中のデータを用いて学習されたことを示す。また、テーブルTB3中のデータは、対象モデルID「M3」により識別される対象モデル(モデルM3)の学習に用いられたデータであることを示す。ハイライトシーン予測に用いられるモデルM3は、テーブルTB3中のデータを用いて学習されたことを示す。 Also, the data in the table TB2 indicates that it is the data used for learning the target model (model M2) identified by the target model ID "M2". Model M2 used for action index determination is learned using the data in table TB2. The data in the table TB3 are data used for learning the target model (model M3) identified by the target model ID "M3". Model M3 used for highlight scene prediction is learned using the data in table TB3.
 なお、データセット記憶部121は、上記に限らず、目的に応じて種々の情報を記憶してもよい。例えば、データセット記憶部121は、各データが学習用データであるか、評価用データであるか等を特定可能に記憶してもよい。例えば、データセット記憶部121は、学習に用いる学習用データや精度評価(算出)に用いる評価用データ等の種々のデータに関する各種情報を記憶してもよい。例えば、データセット記憶部121は、学習用データと評価用データとを区別可能に記憶する。データセット記憶部121は、各データが学習用データや評価用データであるかを識別する情報を記憶してもよい。ハイライト生成サーバ100は、学習用データとして用いられる各データと正解情報とに基づいて、モデルを学習する。ハイライト生成サーバ100は、評価用データとして用いられる各データと正解情報とに基づいて、モデルの精度を算出する。ハイライト生成サーバ100は、評価用データを入力した場合にモデルが出力する出力結果と、正解情報とを比較した結果を収集することにより、モデルの精度を算出する。 It should be noted that the data set storage unit 121 may store various information not limited to the above, depending on the purpose. For example, the data set storage unit 121 may store data such as whether each data is learning data or evaluation data so as to be identifiable. For example, the data set storage unit 121 may store various information related to various data such as learning data used for learning and evaluation data used for accuracy evaluation (calculation). For example, the data set storage unit 121 stores learning data and evaluation data in a distinguishable manner. The data set storage unit 121 may store information identifying whether each data is learning data or evaluation data. The highlight generation server 100 learns a model based on each data used as learning data and the correct answer information. The highlight generation server 100 calculates the accuracy of the model based on each data used as the evaluation data and the correct answer information. The highlight generation server 100 calculates the accuracy of the model by collecting the result of comparing the output result output by the model when the evaluation data is input with the correct answer information.
 実施形態に係るモデル情報記憶部122は、モデルに関する情報を記憶する。例えば、モデル情報記憶部122は、モデル(ネットワーク)の構造を示す情報(モデルデータ)を記憶する。図7は、本開示の実施形態に係るモデル情報記憶部の一例を示す図である。図7に、実施形態に係るモデル情報記憶部122の一例を示す。図7に示した例では、モデル情報記憶部122は、「モデルID」、「用途」、「モデルデータ」といった項目が含まれる。 The model information storage unit 122 according to the embodiment stores information about models. For example, the model information storage unit 122 stores information (model data) indicating the structure of a model (network). FIG. 7 is a diagram illustrating an example of a model information storage unit according to an embodiment of the present disclosure; FIG. 7 shows an example of the model information storage unit 122 according to the embodiment. In the example shown in FIG. 7, the model information storage unit 122 includes items such as "model ID", "usage", and "model data".
 「モデルID」は、モデルを識別するための識別情報を示す。「用途」は、対応するモデルの用途を示す。「モデルデータ」は、モデルのデータを示す。図7では「モデルデータ」に「MDT1」といった概念的な情報が格納される例を示したが、実際には、モデルに含まれるネットワークに関する情報や関数等、そのモデルを構成する種々の情報が含まれる。 "Model ID" indicates identification information for identifying a model. "Use" indicates the use of the corresponding model. "Model data" indicates model data. FIG. 7 shows an example in which conceptual information such as "MDT1" is stored in "model data", but in reality, various types of information that make up the model, such as network information and functions included in the model, are stored. included.
 図7に示す例では、モデルID「M1」により識別されるモデル(モデルM1)は、用途が「個人属性判定」であることを示す。モデルM1は、個人属性判定に用いられるモデルであることを示す。また、モデルM1のモデルデータは、モデルデータMDT1であることを示す。また、モデルID「M2」により識別されるモデル(モデルM2)は、用途が「行動指標判定」であることを示す。モデルM2は、行動指標判定に用いられるモデルであることを示す。また、モデルM2のモデルデータは、モデルデータMDT2であることを示す。 In the example shown in FIG. 7, the model (model M1) identified by the model ID "M1" indicates that the application is "personal attribute determination". Model M1 indicates that it is a model used for personal attribute determination. It also indicates that the model data of the model M1 is the model data MDT1. Also, the model (model M2) identified by the model ID "M2" indicates that the application is "behavior index determination". Model M2 indicates that it is a model used for action index determination. It also indicates that the model data of the model M2 is the model data MDT2.
 また、モデルID「M3」により識別されるモデル(モデルM3)は、用途が「ハイライトシーン予測」であることを示す。モデルM3は、ハイライトシーン予測に用いられるモデルであることを示す。また、モデルM3のモデルデータは、モデルデータMDT3であることを示す。また、モデルID「M11」により識別されるモデル(モデルM11)は、用途が「アングル推定」であることを示す。モデルM11は、アングル推定に用いられるモデルであることを示す。また、モデルM11のモデルデータは、モデルデータMDT11であることを示す。 Also, the model (model M3) identified by the model ID "M3" indicates that the application is "highlight scene prediction". Model M3 indicates that it is a model used for highlight scene prediction. It also indicates that the model data of the model M3 is the model data MDT3. Also, the model (model M11) identified by the model ID "M11" indicates that the application is "angle estimation". Model M11 indicates that it is a model used for angle estimation. It also indicates that the model data of the model M11 is the model data MDT11.
 なお、モデル情報記憶部122は、上記に限らず、目的に応じて種々の情報を記憶してもよい。例えば、モデル情報記憶部122は、学習処理により学習(生成)されたモデルのパラメータ情報を記憶する。 Note that the model information storage unit 122 may store various types of information, not limited to the above, depending on the purpose. For example, the model information storage unit 122 stores parameter information of the model learned (generated) by the learning process.
 実施形態に係る閾値情報記憶部123は、閾値に関する各種情報を記憶する。例えば、閾値情報記憶部123は、モデルの出力(スコア等)との比較に用いる閾値に関する各種情報を記憶する。図8は、本開示の実施形態に係る閾値情報記憶部の一例を示す図である。図8に示す閾値情報記憶部123には、「閾値ID」、「用途」、「閾値」といった項目が含まれる。 The threshold information storage unit 123 according to the embodiment stores various information regarding thresholds. For example, the threshold information storage unit 123 stores various information related to thresholds used for comparison with model outputs (scores, etc.). FIG. 8 is a diagram illustrating an example of a threshold information storage unit according to an embodiment of the present disclosure; The threshold information storage unit 123 shown in FIG. 8 includes items such as "threshold ID", "usage", and "threshold".
 「閾値ID」は、閾値を識別するための識別情報を示す。「用途」は、閾値の用途を示す。また、「閾値」は、対応する閾値IDにより識別される閾値の具体的な値を示す。 "Threshold ID" indicates identification information for identifying the threshold. "Usage" indicates the usage of the threshold. Also, "threshold" indicates a specific value of the threshold identified by the corresponding threshold ID.
 図8の例では、閾値ID「TH1」により識別される閾値(閾値TH1)は、視聴の判定に用いられることを示す情報が対応付けて記憶される。例えば、閾値TH1は、ユーザがリアルタイム視聴を行っているか否かを判定するために用いられる。閾値TH1の値は、「VL1」であることを示す。なお、図8の例では、「VL1」といった抽象的な符号で示すが、閾値TH1の値は具体的な数値(例えば0.4、0.6等)である。 In the example of FIG. 8, the threshold (threshold TH1) identified by the threshold ID "TH1" is stored in association with information indicating that it is used for viewing determination. For example, the threshold TH1 is used to determine whether the user is viewing in real time. The value of the threshold TH1 indicates "VL1". In the example of FIG. 8, the value of the threshold TH1 is a specific numerical value (for example, 0.4, 0.6, etc.), although it is indicated by an abstract code such as "VL1".
 また、閾値ID「TH2」により識別される閾値(閾値TH2)は、ハイライト生成に用いられることを示す情報が対応付けて記憶される。例えば、閾値TH2は、イベントコンテンツのうち、どの部分をハイライトに用いるかを判定するために用いられる。例えば、閾値TH2は、イベントコンテンツのある時点の映像をハイライトに含めるか否かを判定するために用いられる。閾値TH2の値は、「VL2」であることを示す。なお、図8の例では、「VL2」といった抽象的な符号で示すが、閾値TH1の値は具体的な数値(例えば0.5、0.8等)である。 Information indicating that the threshold (threshold TH2) identified by the threshold ID "TH2" is used for highlight generation is associated and stored. For example, the threshold TH2 is used to determine which part of the event content is used for highlighting. For example, the threshold TH2 is used to determine whether or not to include a video of event content at a certain point in time. The value of the threshold TH2 indicates "VL2". In the example of FIG. 8, the value of the threshold TH1 is a specific numerical value (for example, 0.5, 0.8, etc.), although it is indicated by an abstract code such as "VL2".
 なお、閾値情報記憶部123は、上記に限らず、目的に応じて種々の情報を記憶してもよい。 It should be noted that the threshold information storage unit 123 may store various types of information, not limited to the above, depending on the purpose.
 実施形態に係るコンテンツ情報記憶部124は、エッジ視聴端末10に表示されるコンテンツに関する各種情報を記憶する。例えば、コンテンツ情報記憶部124は、エッジ視聴端末10にインストールされているアプリケーション(「アプリ」ともいう)で表示されるコンテンツに関する情報を記憶する。 The content information storage unit 124 according to the embodiment stores various types of information regarding content displayed on the edge viewing terminal 10 . For example, the content information storage unit 124 stores information about content displayed by an application (also referred to as “app”) installed in the edge viewing terminal 10 .
 コンテンツ情報記憶部124は、イベントの映像であるイベントコンテンツを記憶する。コンテンツ情報記憶部124は、イベントに対応付けて、そのイベントのイベントコンテンツを記憶する。なお、上記は一例に過ぎず、コンテンツ情報記憶部124は、応答候補を表示するコンテンツ等に応じて種々の情報を記憶してもよい。コンテンツ情報記憶部124は、エッジ視聴端末10へのコンテンツの提供やエッジ視聴端末10での応答候補の表示等に必要な各種情報を記憶する。 The content information storage unit 124 stores event content, which is video of the event. The content information storage unit 124 stores the event content of the event in association with the event. Note that the above is merely an example, and the content information storage unit 124 may store various types of information according to the content for which response candidates are displayed. The content information storage unit 124 stores various kinds of information necessary for providing content to the edge viewing terminal 10, displaying response candidates on the edge viewing terminal 10, and the like.
 また、記憶部120は、上記以外にも各種情報を記憶してもよい。例えば、記憶部120は、ハイライト生成に関する各種情報を記憶する。記憶部120は、エッジ視聴端末10にデータを提供するための各種データを記憶する。例えば、記憶部120は、エッジ視聴端末10に表示される情報を生成するために用いる各種情報を記憶する。例えば、記憶部120は、エッジ視聴端末10にインストールされているアプリケーション(コンテンツ表示アプリ等)で表示されるコンテンツに関する情報を記憶する。例えば、記憶部120は、コンテンツ表示アプリで表示されるコンテンツに関する情報を記憶する。なお、上記は一例に過ぎず、記憶部120は、ユーザにハイライトのサービスを提供するために用いる各種の情報を記憶してもよい。 In addition, the storage unit 120 may store various information other than the above. For example, the storage unit 120 stores various information regarding highlight generation. The storage unit 120 stores various data for providing data to the edge viewing terminal 10 . For example, the storage unit 120 stores various information used to generate information displayed on the edge viewing terminal 10 . For example, the storage unit 120 stores information about content displayed by an application (content display application or the like) installed in the edge viewing terminal 10 . For example, the storage unit 120 stores information about content displayed by a content display application. Note that the above is merely an example, and the storage unit 120 may store various types of information used to provide the highlight service to the user.
 記憶部120は、各ユーザの属性情報等を記憶する。記憶部120は、各ユーザを識別する情報(ユーザID等)に対応するユーザの情報を対応付けて記憶する。例えば、記憶部120は、モデルM1により判定された個人属性を示す情報をユーザに対応付けて記憶する。 The storage unit 120 stores attribute information and the like of each user. The storage unit 120 stores user information corresponding to information identifying each user (user ID, etc.) in association with each other. For example, the storage unit 120 stores information indicating personal attributes determined by the model M1 in association with the user.
 図5に戻り、説明を続ける。制御部130は、例えば、CPU(Central Processing Unit)やMPU(Micro Processing Unit)等によって、ハイライト生成サーバ100内部に記憶されたプログラム(例えば、本開示に係る情報処理プログラム等)がRAM(Random Access Memory)等を作業領域として実行されることにより実現される。また、制御部130は、例えば、ASIC(Application Specific Integrated Circuit)やFPGA(Field Programmable Gate Array)等の集積回路により実現される。 Return to Figure 5 and continue the explanation. The control unit 130, for example, stores a program (for example, an information processing program according to the present disclosure) stored inside the highlight generation server 100 by a CPU (Central Processing Unit) or MPU (Micro Processing Unit). Access Memory) etc. is executed as a work area. Also, the control unit 130 is implemented by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
 図5に示すように、制御部130は、取得部131と、学習部132と、画像処理部133と、生成部134と、送信部135とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部130の内部構成は、図5に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。また、制御部130が有する各処理部の接続関係は、図5に示した接続関係に限られず、他の接続関係であってもよい。 As shown in FIG. 5, the control unit 130 includes an acquisition unit 131, a learning unit 132, an image processing unit 133, a generation unit 134, and a transmission unit 135. Realize or perform an action. Note that the internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 5, and may be another configuration as long as it performs information processing described later. Further, the connection relationship between the processing units of the control unit 130 is not limited to the connection relationship shown in FIG. 5, and may be another connection relationship.
 取得部131は、各種情報を取得する。取得部131は、外部の情報処理装置から各種情報を取得する。取得部131は、エッジ視聴端末10、コンテンツ配信サーバ50及び観客映像収集サーバ60から各種情報を取得する。取得部131は、エッジ視聴端末10が検知した情報をエッジ視聴端末10から取得する。 The acquisition unit 131 acquires various types of information. Acquisition unit 131 acquires various types of information from an external information processing device. The acquisition unit 131 acquires various types of information from the edge viewing terminal 10 , the content distribution server 50 and the spectator video collection server 60 . The acquisition unit 131 acquires information detected by the edge viewing terminal 10 from the edge viewing terminal 10 .
 取得部131は、コンテンツ配信サーバ50または観客映像収集サーバ60から情報を受信する。取得部131は、コンテンツ配信サーバ50または観客映像収集サーバ60から要求に応じた情報を取得する。取得部131は、コンテンツ配信サーバ50から映像を取得する。取得部131は、コンテンツ配信サーバ50から撮像機器群FIAが撮影した映像を取得する。取得部131は、コンテンツ配信サーバ50から収音機器群SCDが検知した音声を取得する。取得部131は、観客映像収集サーバ60から映像を取得する。取得部131は、記憶部120から各種情報を取得する。取得部131は、観客映像収集サーバ60から撮像機器群SIAが撮影した映像を取得する。 The acquisition unit 131 receives information from the content distribution server 50 or the spectator video collection server 60 . The acquisition unit 131 acquires requested information from the content distribution server 50 or the spectator video collection server 60 . Acquisition unit 131 acquires video from content distribution server 50 . The acquisition unit 131 acquires the video imaged by the imaging device group FIA from the content distribution server 50 . The acquisition unit 131 acquires the sound detected by the sound collection device group SCD from the content distribution server 50 . The acquisition unit 131 acquires the video from the spectator video collection server 60 . Acquisition unit 131 acquires various types of information from storage unit 120 . The acquisition unit 131 acquires the video captured by the imaging device group SIA from the spectator video collection server 60 .
 取得部131は、イベントのリアルタイム視聴を行ったユーザである第1ユーザのイベントのリアルタイム視聴時の状態を示す状態情報を取得する。取得部131は、イベントの映像であるイベントコンテンツを取得する。取得部131は、イベントを撮影した映像であるイベントコンテンツを取得する。 The acquisition unit 131 acquires state information indicating the state of the real-time viewing of the event by the first user who has viewed the event in real-time. Acquisition unit 131 acquires event content, which is video of an event. Acquisition unit 131 acquires event content, which is video of an event.
 取得部131は、イベントの開催場所でリアルタイム視聴を行った第1ユーザの状態情報を取得する。取得部131は、スポーツまたは芸術のイベントのリアルタイム視聴を行った第1ユーザの状態情報を取得する。取得部131は、イベントを現地で観た第1ユーザの状態情報を取得する。取得部131は、第1ユーザを撮像した画像情報を含む状態情報を取得する。取得部131は、第1ユーザの生体情報を含む状態情報を取得する。 The acquisition unit 131 acquires the status information of the first user who watched in real time at the venue of the event. Acquisition unit 131 acquires the state information of the first user who watched the sports or art event in real time. Acquisition unit 131 acquires the state information of the first user who watched the event locally. Acquisition unit 131 acquires state information including image information of the first user. Acquisition unit 131 acquires state information including biometric information of the first user.
 取得部131は、第2ユーザがイベントのリアルタイム視聴を行っていた場合、第2ユーザのイベントのリアルタイム視聴時の状態を示す状態情報を、第1ユーザの状態情報として取得する。取得部131は、第2ユーザがイベントのリアルタイム視聴を行っていなかった場合、第2ユーザとは異なるユーザである第1ユーザの状態情報として取得する。 When the second user is viewing the event in real time, the acquiring unit 131 acquires the state information indicating the state of the second user viewing the event in real time as the state information of the first user. If the second user is not viewing the event in real time, the acquisition unit 131 acquires the state information of the first user who is different from the second user.
 取得部131は、第2ユーザの属性に類似するユーザである第1ユーザの状態情報として取得する。取得部131は、第2ユーザのデモグラフィック属性に類似するユーザである第1ユーザの状態情報として取得する。取得部131は、第2ユーザの年齢及び性別のうち少なくとも1つが類似するユーザである第1ユーザの状態情報として取得する。 The acquisition unit 131 acquires the state information of the first user, who is a user similar to the attributes of the second user. The acquisition unit 131 acquires the state information of the first user who is a user similar to the demographic attributes of the second user. The acquisition unit 131 acquires the state information of the first user who is a user similar in at least one of age and sex to the second user.
 取得部131は、第2ユーザのサイコグラフィック属性に類似するユーザである第1ユーザの状態情報として取得する。取得部131は、第2ユーザの嗜好性に類似するユーザである第1ユーザの状態情報として取得する。取得部131は、愛好する対象が第2ユーザと一致するユーザである第1ユーザの状態情報として取得する。 The acquisition unit 131 acquires the state information of the first user, who is a user similar to the psychographic attributes of the second user. The acquisition unit 131 acquires the state information of the first user who is a user similar to the second user's preference. The acquisition unit 131 acquires the state information of the first user who is a user whose love interest matches that of the second user.
 学習部132は、各種情報を学習する。学習部132は、外部の情報処理装置からの情報や記憶部120に記憶された情報に基づいて、各種情報を学習する。学習部132は、データセット記憶部121に記憶された情報に基づいて、各種情報を学習する。学習部132は、学習により生成したモデルをモデル情報記憶部122に格納する。学習部132は、学習により更新したモデルをモデル情報記憶部122に格納する。 The learning unit 132 learns various types of information. The learning unit 132 learns various types of information based on information from an external information processing device and information stored in the storage unit 120 . The learning unit 132 learns various types of information based on the information stored in the data set storage unit 121 . The learning unit 132 stores the model generated by learning in the model information storage unit 122 . The learning unit 132 stores the model updated by learning in the model information storage unit 122 .
 学習部132は、学習処理を行う。学習部132は、各種学習を行う。学習部132は、取得部131により取得された情報に基づいて、各種情報を学習する。学習部132は、モデルを学習(生成)する。学習部132は、モデル等の各種情報を学習する。学習部132は、学習によりモデルを生成する。学習部132は、種々の機械学習に関する技術を用いて、モデルを学習する。例えば、学習部132は、モデル(ネットワーク)のパラメータを学習する。学習部132は、種々の機械学習に関する技術を用いて、モデルを学習する。 The learning unit 132 performs learning processing. The learning unit 132 performs various types of learning. The learning unit 132 learns various types of information based on the information acquired by the acquisition unit 131 . The learning unit 132 learns (generates) a model. The learning unit 132 learns various types of information such as models. The learning unit 132 generates a model through learning. The learning unit 132 learns the model using various machine learning techniques. For example, the learning unit 132 learns model (network) parameters. The learning unit 132 learns the model using various machine learning techniques.
 学習部132は、過去のイベントのハイライトと、過去のイベントのリアルタイム視聴を行ったユーザの状態情報とを含む学習データを用いてモデルを学習する。学習部132は、モデルM1、M2、M3、M11等の各種モデルを生成する。例えば、学習部132は、ハイライトシーンを判定するモデルM3を生成する。学習部132は、ネットワークのパラメータを学習する。例えば、学習部132は、モデルM1、M2、M3、M11等の各種モデルのネットワークのパラメータを学習する。また、学習部132は、ハイライトシーンを判定するモデルM3のネットワークのパラメータを学習する。 The learning unit 132 learns the model using learning data including highlights of past events and status information of users who watched the past events in real time. The learning unit 132 generates various models such as models M1, M2, M3, and M11. For example, the learning unit 132 generates a model M3 for determining highlight scenes. The learning unit 132 learns network parameters. For example, the learning unit 132 learns network parameters of various models such as models M1, M2, M3, and M11. In addition, the learning unit 132 learns network parameters of the model M3 for determining highlight scenes.
 学習部132は、データセット記憶部121に記憶された学習用データ(教師データ)に基づいて、学習処理を行う。学習部132は、データセット記憶部121に記憶された学習用データを用いて、学習処理を行うことにより、モデルM1、M2、M3、M11等の各種モデルを生成する。例えば、学習部132は、画像認識に用いられるモデルを生成してもよい。例えば、学習部132は、モデルM1のネットワークのパラメータを学習することにより、モデルM1を生成する。例えば、学習部132は、モデルM3のネットワークのパラメータを学習することにより、モデルM3を生成する。 The learning unit 132 performs learning processing based on the learning data (teacher data) stored in the data set storage unit 121. The learning unit 132 generates various models such as models M1, M2, M3, and M11 by performing learning processing using the learning data stored in the data set storage unit 121 . For example, the learning unit 132 may generate a model used for image recognition. For example, the learning unit 132 generates the model M1 by learning parameters of the network of the model M1. For example, the learning unit 132 generates the model M3 by learning parameters of the network of the model M3.
 学習部132による学習の手法は特に限定されないが、例えば、ラベルとデータ(画像)とを紐づけた学習用データを用意し、その学習用データを多層ニューラルネットワークに基づいた計算モデルに入力して学習してもよい。また、例えばCNN(Convolutional Neural Network)、3D-CNN等のDNN(Deep Neural Network)に基づく手法が用いられてもよい。学習部132は、映像等の動画像(動画)のような時系列データを対象とする場合、再帰型ニューラルネットワーク(Recurrent Neural Network:RNN)やRNNを拡張したLSTM(Long Short-Term Memory units)に基づく手法を用いてもよい。 The method of learning by the learning unit 132 is not particularly limited. You can learn. Also, for example, a technique based on DNN (Deep Neural Network) such as CNN (Convolutional Neural Network) and 3D-CNN may be used. The learning unit 132 uses a recurrent neural network (RNN) or LSTM (Long Short-Term Memory units), which is an extension of RNN, when targeting time-series data such as moving images (moving images) such as videos. You may use the method based on.
 画像処理部133は、画像処理に関する各種処理を実行する。画像処理部133は、ユーザを撮影した画像(映像)を対象として処理を実行する。画像処理部133は、イベントの観客を撮影した映像を対象として処理を実行する。画像処理部133は、画像認識処理により映像中の人(ユーザ)を認識する。例えば、画像処理部133は、画像認識処理により映像中のユーザの顔の向きや視線を検出する。 The image processing unit 133 executes various processes related to image processing. The image processing unit 133 executes processing on an image (video) of the user. The image processing unit 133 performs processing on a video image of the spectators of the event. The image processing unit 133 recognizes a person (user) in the video by image recognition processing. For example, the image processing unit 133 detects the orientation of the user's face and the line of sight in the video through image recognition processing.
 画像処理部133は、映像からモデルに入力する情報を生成する。例えば、画像処理部133は、映像からモデルに入力する特徴量を生成する。例えば、画像処理部133は、画像処理により、映像から特徴量を抽出する。画像処理部133は、映像を入力として、その映像の特徴量を出力するモデル(特徴量抽出モデル)を用いて、映像から特徴量を抽出してもよい。なお、上述した処理は一例であり、画像処理部133は、画像処理に関する様々な技術を適宜用いて、映像から特徴量を抽出してもよい。また、各モデルが特徴量ではなく、映像(画像)自体を入力とする場合、ハイライト生成サーバ100は画像処理部133を有しなくてもよい。 The image processing unit 133 generates information to be input to the model from the video. For example, the image processing unit 133 generates feature amounts to be input to the model from the video. For example, the image processing unit 133 extracts feature amounts from video by image processing. The image processing unit 133 may extract the feature amount from the video using a model (feature amount extraction model) that outputs the feature amount of the video as input. Note that the above-described processing is an example, and the image processing unit 133 may appropriately use various techniques related to image processing to extract feature amounts from video. Also, when each model receives video (image) itself instead of a feature amount, the highlight generation server 100 does not need to have the image processing unit 133 .
 生成部134は、各種情報を生成する。生成部134は、外部の情報処理装置からの情報や記憶部120に記憶された情報に基づいて、各種情報を生成する。生成部134は、エッジ視聴端末10やコンテンツ配信サーバ50や観客映像収集サーバ60等の他の情報処理装置からの情報に基づいて、各種情報を生成する。生成部134は、データセット記憶部121やモデル情報記憶部122や閾値情報記憶部123やコンテンツ情報記憶部124に記憶された情報に基づいて、各種情報を生成する。生成部134は、学習部132により学習されたモデルを基に、エッジ視聴端末10に表示する各種情報を生成する。 The generation unit 134 generates various types of information. The generation unit 134 generates various types of information based on information from an external information processing device and information stored in the storage unit 120 . The generation unit 134 generates various types of information based on information from other information processing devices such as the edge viewing terminal 10, the content distribution server 50, the spectator video collection server 60, and the like. The generation unit 134 generates various types of information based on the information stored in the data set storage unit 121, the model information storage unit 122, the threshold information storage unit 123, and the content information storage unit 124. FIG. The generation unit 134 generates various information to be displayed on the edge viewing terminal 10 based on the model learned by the learning unit 132 .
 生成部134は、取得部131により取得された第1ユーザの状態情報により決定されるイベントコンテンツの一部を用いて、第2ユーザに提供するイベントコンテンツのハイライトを生成する。生成部134は、状態情報に基づく入力データの入力に応じて、イベントの期間に対応するスコアを出力するモデルを用いて、イベントコンテンツのハイライトを生成する。生成部134は、モデルを用いてイベントコンテンツの一部を決定する。 The generation unit 134 uses part of the event content determined by the first user's state information acquired by the acquisition unit 131 to generate highlights of the event content to be provided to the second user. The generation unit 134 generates highlights of event content using a model that outputs a score corresponding to the period of the event in response to input of input data based on state information. The generator 134 uses the model to determine part of the event content.
 例えば、生成部134は、状態情報を入力とするモデルを用いて、イベントコンテンツのハイライトを生成する。生成部134は、ユーザを撮影した映像を入力とするモデルを用いて、イベントコンテンツのハイライトを生成する。例えば、生成部134は、状態情報から抽出された特徴量を入力とするモデルを用いて、イベントコンテンツのハイライトを生成する。生成部134は、ユーザを撮影した映像から抽出された特徴量を入力とするモデルを用いて、イベントコンテンツのハイライトを生成する。 For example, the generation unit 134 generates highlights of event content using a model with state information as input. The generation unit 134 generates highlights of event content using a model that receives an image of a user as an input. For example, the generation unit 134 generates highlights of event content using a model whose input is the feature amount extracted from the state information. The generation unit 134 generates highlights of event content using a model whose input is a feature amount extracted from a video of a user.
 生成部134は、決定したイベントコンテンツの一部を用いて、イベントコンテンツのハイライトを生成する。生成部134は、イベントコンテンツのうち、閾値以上であるスコアに対応する期間に該当する部分を、イベントコンテンツの一部に決定し、決定したイベントコンテンツの一部を用いて、イベントコンテンツのハイライトを生成する。生成部134は、学習部132により学習されたモデルを用いてイベントコンテンツのハイライトを生成する。 The generation unit 134 generates highlights of the event content using part of the determined event content. The generation unit 134 determines a portion of the event content corresponding to the period corresponding to the score equal to or greater than the threshold as part of the event content, and uses the determined portion of the event content to highlight the event content. to generate The generating unit 134 uses the model learned by the learning unit 132 to generate event content highlights.
 生成部134は、第2ユーザがイベントのリアルタイム視聴を行っていた場合、第2ユーザの状態情報により決定されるイベントコンテンツの一部を用いて、第2ユーザに提供するイベントコンテンツのハイライトを生成する。生成部134は、第2ユーザがイベントのリアルタイム視聴を行っていなかった場合、第2ユーザとは異なるユーザである第1ユーザの状態情報により決定されるイベントコンテンツの一部を用いて、第2ユーザに提供するイベントコンテンツのハイライトを生成する。 When the second user is viewing the event in real time, the generation unit 134 generates highlights of the event content to be provided to the second user using a part of the event content determined by the state information of the second user. Generate. If the second user is not viewing the event in real time, the generation unit 134 generates the second event content using part of the event content determined by the state information of the first user who is a user different from the second user. Generate event content highlights for your users.
 生成部134は、エッジ視聴端末10に提供する情報を生成する処理を実行する。また、生成部134は、エッジ視聴端末10に表示する表示画面(コンテンツ)をデータとして生成してもよい。例えば、生成部134は、Java(登録商標)等の種々の技術を適宜用いて、エッジ視聴端末10へ提供する画面(コンテンツ)を生成してもよい。なお、生成部134は、CSSやJavaScript(登録商標)やHTMLの形式に基づいて、エッジ視聴端末10へ提供する画面(コンテンツ)を生成してもよい。また、例えば、生成部134は、JPEG(Joint Photographic Experts Group)やGIF(Graphics Interchange Format)やPNG(Portable Network Graphics)など様々な形式で画面(コンテンツ)を生成してもよい。 The generation unit 134 executes processing for generating information to be provided to the edge viewing terminal 10 . Also, the generation unit 134 may generate a display screen (content) to be displayed on the edge viewing terminal 10 as data. For example, the generation unit 134 may generate a screen (content) to be provided to the edge viewing terminal 10 by appropriately using various techniques such as Java (registered trademark). Note that the generation unit 134 may generate a screen (content) to be provided to the edge viewing terminal 10 based on CSS, JavaScript (registered trademark), or HTML format. Also, for example, the generation unit 134 may generate screens (contents) in various formats such as JPEG (Joint Photographic Experts Group), GIF (Graphics Interchange Format), and PNG (Portable Network Graphics).
 送信部135は、エッジ視聴端末10へ情報を送信する。送信部135は、生成部134により生成された情報をエッジ視聴端末10へ送信する。送信部135は、生成部134により生成されたデータをエッジ視聴端末10へ送信する。送信部135は、生成部134により生成されたイベントコンテンツのハイライトを第2ユーザが利用するエッジ視聴端末10へ送信する。 The transmission unit 135 transmits information to the edge viewing terminal 10. The transmission unit 135 transmits the information generated by the generation unit 134 to the edge viewing terminal 10 . The transmission unit 135 transmits the data generated by the generation unit 134 to the edge viewing terminal 10 . The transmission unit 135 transmits the highlight of the event content generated by the generation unit 134 to the edge viewing terminal 10 used by the second user.
 送信部135は、コンテンツ配信サーバ50へ情報を要求する情報を送信する。送信部135は、コンテンツ配信サーバ50へ取得を要求する情報を示す情報を送信する。送信部135は、観客映像収集サーバ60へ情報を要求する情報を送信する。送信部135は、観客映像収集サーバ60へ取得を要求する情報を示す情報を送信する。 The transmission unit 135 transmits information requesting information to the content distribution server 50 . The transmission unit 135 transmits information indicating information requested to be acquired to the content distribution server 50 . The transmission unit 135 transmits information requesting information to the spectator video collection server 60 . The transmission unit 135 transmits information indicating information requested to be acquired to the spectator video collection server 60 .
[1-4.実施形態に係る端末装置の構成]
 次に、実施形態に係る情報処理を実行する端末装置の一例であるエッジ視聴端末10の構成について説明する。図9は、本開示の実施形態に係るエッジ視聴端末の構成例を示す図である。
[1-4. Configuration of terminal device according to embodiment]
Next, the configuration of the edge viewing terminal 10, which is an example of a terminal device that executes information processing according to the embodiment, will be described. FIG. 9 is a diagram showing a configuration example of an edge viewing terminal according to an embodiment of the present disclosure.
 図9に示すように、エッジ視聴端末10は、通信部11と、音声入力部12と、音声出力部13と、カメラ14と、表示部15と、操作部16と、記憶部17と、制御部18とを有する。 As shown in FIG. 9, the edge viewing terminal 10 includes a communication unit 11, an audio input unit 12, an audio output unit 13, a camera 14, a display unit 15, an operation unit 16, a storage unit 17, and a control unit. a portion 18;
 通信部11は、例えば、NICや通信回路等によって実現される。そして、通信部11は、所定の通信網(ネットワーク)と有線または無線で接続され、外部の情報処理装置との間で情報の送受信を行う。例えば、通信部11は、所定の通信網と有線または無線で接続され、ハイライト生成サーバ100との間で情報の送受信を行う。 The communication unit 11 is implemented by, for example, a NIC, a communication circuit, or the like. The communication unit 11 is connected to a predetermined communication network (network) by wire or wirelessly, and transmits and receives information to and from an external information processing device. For example, the communication unit 11 is connected to a predetermined communication network by wire or wirelessly, and transmits and receives information to and from the highlight generation server 100 .
 音声入力部12は、ユーザの音声(発話)による操作を受け付ける入力部として機能する。音声入力部12は、例えばマイク等であり、音声を検知する。例えば、音声入力部12は、ユーザの発話を検知する。なお、音声入力部12は、処理に必要なユーザの発話情報を検知可能であれば、どのような構成であってもよい。 The voice input unit 12 functions as an input unit that receives operations by voice (utterance) of the user. The voice input unit 12 is, for example, a microphone or the like, and detects voice. For example, the voice input unit 12 detects user's speech. Note that the voice input unit 12 may have any configuration as long as it can detect the user's speech information necessary for processing.
 音声出力部13は、音声を出力するスピーカーによって実現され、各種情報を音声として出力するための出力装置である。音声出力部13は、ハイライト生成サーバ100から提供されるコンテンツを音声出力する。例えば、音声出力部13は、表示部15に表示される情報に対応する音声を出力する。エッジ視聴端末10は、音声入力部12及び音声出力部13により音声の入出力を行う。 The audio output unit 13 is realized by a speaker that outputs audio, and is an output device for outputting various types of information as audio. The audio output unit 13 audio-outputs the content provided from the highlight generation server 100 . For example, the audio output unit 13 outputs audio corresponding to information displayed on the display unit 15 . The edge viewing terminal 10 inputs and outputs audio through the audio input section 12 and the audio output section 13 .
 カメラ14は、画像を検知するイメージセンサ(画像センサ)を有する。カメラ14は、ユーザを撮影する。例えば、エッジ視聴端末10がデスクトップパソコン(デスクトップPC)の場合、カメラ14は、制御部18が搭載される装置(本体装置)や表示装置等と別体(別装置)であってもよい。 The camera 14 has an image sensor (image sensor) that detects images. Camera 14 photographs the user. For example, if the edge viewing terminal 10 is a desktop personal computer (desktop PC), the camera 14 may be a device (main device) in which the control unit 18 is mounted, a display device, or the like (separate device).
 エッジ視聴端末10がデスクトップパソコンの場合、カメラ14は、ディスプレイ(表示装置)と一体であってもよく、表示部15の上部に配置されてもよい。例えば、エッジ視聴端末10がノートパソコン(ノート型PC)の場合、カメラ14は、エッジ視聴端末10に内蔵され、表示部15の上部に配置されてもよい。また、例えば、スマートフォンの場合、カメラ14は、エッジ視聴端末10に内蔵されたインカメラであってもよい。 When the edge viewing terminal 10 is a desktop personal computer, the camera 14 may be integrated with the display (display device) or may be arranged above the display section 15 . For example, if the edge viewing terminal 10 is a notebook computer (laptop PC), the camera 14 may be built in the edge viewing terminal 10 and arranged above the display section 15 . Also, for example, in the case of a smartphone, the camera 14 may be an in-camera built into the edge viewing terminal 10 .
 表示部15は、例えば液晶ディスプレイや有機EL(Electro-Luminescence)ディスプレイ等によって実現されるタブレット端末等の表示画面であり、各種情報を表示するための表示装置である。例えば、エッジ視聴端末10がデスクトップパソコンの場合、表示部15は、制御部18が搭載される装置(本体装置)と別体(別装置)であってもよい。例えば、エッジ視聴端末10がノートパソコンまたはスマートフォンの場合、表示部15は、制御部18が搭載される装置(本体装置)と一体であってもよい。 The display unit 15 is a display screen of a tablet terminal realized by, for example, a liquid crystal display or an organic EL (Electro-Luminescence) display, and is a display device for displaying various information. For example, if the edge viewing terminal 10 is a desktop personal computer, the display unit 15 may be separate (separate device) from a device (main device) in which the control unit 18 is mounted. For example, if the edge viewing terminal 10 is a notebook computer or a smartphone, the display unit 15 may be integrated with a device (main device) in which the control unit 18 is mounted.
 表示部15は、イベントに関する各種情報を表示する。表示部15は、コンテンツを表示する。表示部15は、ハイライト生成サーバ100から受信した各種情報を表示する。表示部15は、ハイライト生成サーバ100から受信したイベントのハイライトを表示する。表示部15は、コンテンツを表示する。表示部15は、イベントを撮影した映像を表示する。表示部15は、イベントのハイライトを表示する。 The display unit 15 displays various information related to the event. The display unit 15 displays content. The display unit 15 displays various information received from the highlight generation server 100 . The display unit 15 displays highlights of events received from the highlight generation server 100 . The display unit 15 displays content. The display unit 15 displays the video of the event. The display unit 15 displays the highlight of the event.
 操作部16は、様々なユーザの操作を受け付ける入力部として機能する。図9の例では、操作部16は、キーボード、マウス等である。また、操作部16は、キーボードやマウスと同等の機能を実現できるタッチパネルを有してもよい。この場合、操作部16は、各種センサにより実現されるタッチパネルの機能により、表示画面を介してユーザから各種操作を受け付ける。例えば、操作部16は、表示部15を介してユーザから各種操作を受け付ける。 The operation unit 16 functions as an input unit that receives various user operations. In the example of FIG. 9, the operation unit 16 is a keyboard, mouse, or the like. Also, the operation unit 16 may have a touch panel capable of realizing functions equivalent to those of a keyboard and a mouse. In this case, the operation unit 16 receives various operations from the user through the display screen by the function of a touch panel realized by various sensors. For example, the operation unit 16 receives various operations from the user via the display unit 15 .
 例えば、操作部16は、エッジ視聴端末10の表示部15を介してユーザの指定操作等の操作を受け付けてもよい。なお、操作部16によるユーザの操作の検知方式には、タブレット端末では主に静電容量方式が採用されるが、他の検知方式である抵抗膜方式、表面弾性波方式、赤外線方式、電磁誘導方式など、ユーザの操作を検知できタッチパネルの機能が実現できればどのような方式を採用してもよい。 For example, the operation unit 16 may receive an operation such as a user's designation operation via the display unit 15 of the edge viewing terminal 10 . As for the detection method of the user's operation by the operation unit 16, the tablet terminal mainly adopts the capacitance method, but there are other detection methods such as the resistive film method, the surface acoustic wave method, the infrared method, and the electromagnetic induction method. Any method may be adopted as long as the user's operation can be detected and the function of the touch panel can be realized.
 上記のキーボード、マウス、タッチパネル等は一例に過ぎず、エッジ視聴端末10は、上記に限らず様々な情報を入力として受け付ける(検知する)構成を有してもよい。例えば、エッジ視聴端末10は、ユーザの視線を検知する視線センサを有してもよい。視線センサは、例えば、エッジ視聴端末10に搭載されたカメラ14や光センサ、動きセンサ(いずれも図示省略)等の検出結果に基づき、アイトラッキング技術を利用して、ユーザの視線方向を検出する。視線センサは、検出した視線方向に基づき、画面のうち、ユーザが注視している注視領域を決定する。視線センサは、決定した注視領域を含む視線情報をハイライト生成サーバ100に送信してもよい。例えば、エッジ視聴端末10は、ユーザのジェスチャ等を検知するモーションセンサを有してもよい。エッジ視聴端末10は、モーションセンサにより、ユーザのジェスチャによる操作を受け付けてもよい。 The keyboard, mouse, touch panel, etc. described above are merely examples, and the edge viewing terminal 10 may have a configuration that accepts (detects) various types of information as input, not limited to the above. For example, the edge viewing terminal 10 may have a line-of-sight sensor that detects the user's line of sight. The line-of-sight sensor detects the user's line-of-sight direction using eye-tracking technology based on detection results from, for example, the camera 14 mounted on the edge viewing terminal 10, an optical sensor, and a motion sensor (all of which are not shown). . The line-of-sight sensor determines a region of the screen that the user is gazing at based on the detected line-of-sight direction. The line-of-sight sensor may transmit line-of-sight information including the determined gaze area to the highlight generation server 100 . For example, the edge viewing terminal 10 may have a motion sensor that detects user gestures and the like. The edge viewing terminal 10 may receive an operation by a user's gesture using a motion sensor.
 記憶部17は、例えば、RAM(Random Access Memory)、フラッシュメモリ(Flash Memory)等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。記憶部17は、例えば、ハイライト生成サーバ100から受信した各種情報を記憶する。記憶部17は、例えば、エッジ視聴端末10にインストールされているアプリケーション(例えばコンテンツ表示アプリ等)に関する情報、例えばプログラム等を記憶する。 The storage unit 17 is implemented by, for example, a semiconductor memory device such as RAM (Random Access Memory) or flash memory, or a storage device such as a hard disk or optical disk. The storage unit 17 stores, for example, various information received from the highlight generation server 100 . The storage unit 17 stores, for example, information about an application (for example, a content display application or the like) installed in the edge viewing terminal 10, such as a program.
 記憶部17は、ユーザの情報を記憶する。記憶部17は、ユーザの発話履歴(音声認識結果の履歴)や行動履歴を記憶する。 The storage unit 17 stores user information. The storage unit 17 stores the user's utterance history (speech recognition result history) and action history.
 制御部18は、コントローラ(controller)であり、例えば、CPUやMPU等によって、エッジ視聴端末10内部の記憶部17などの記憶装置に記憶されている各種プログラムがRAMを作業領域として実行されることにより実現される。例えば、この各種プログラムは、情報処理を行うアプリケーション(例えばコンテンツ表示アプリ)のプログラムが含まれる。また、制御部18は、コントローラ(controller)であり、例えば、ASICやFPGA等の集積回路により実現される。 The control unit 18 is a controller, and various programs stored in a storage device such as the storage unit 17 inside the edge viewing terminal 10 are executed by the CPU, MPU, etc., using the RAM as a work area. It is realized by For example, these various programs include programs of applications (for example, content display applications) that perform information processing. Also, the control unit 18 is a controller, and is realized by an integrated circuit such as ASIC or FPGA, for example.
 図9に示すように、制御部18は、取得部181と、送信部182と、受信部183と、処理部184とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部18の内部構成は、図9に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。また、制御部18が有する各処理部の接続関係は、図9に示した接続関係に限られず、他の接続関係であってもよい。 As shown in FIG. 9, the control unit 18 includes an acquisition unit 181, a transmission unit 182, a reception unit 183, and a processing unit 184, and implements or executes the information processing functions and actions described below. . Note that the internal configuration of the control unit 18 is not limited to the configuration shown in FIG. 9, and may be another configuration as long as it performs the information processing described later. Moreover, the connection relationship between the processing units of the control unit 18 is not limited to the connection relationship shown in FIG. 9, and may be another connection relationship.
 取得部181は、各種情報を取得する。例えば、取得部181は、外部の情報処理装置から各種情報を取得する。例えば、取得部181は、取得した各種情報を記憶部17に格納する。取得部181は、操作部16により受け付けられたユーザの操作情報を取得する。 The acquisition unit 181 acquires various types of information. For example, the acquisition unit 181 acquires various information from an external information processing device. For example, the acquisition unit 181 stores the acquired various information in the storage unit 17 . The acquisition unit 181 acquires user operation information accepted by the operation unit 16 .
 取得部181は、ユーザの状態を示す状態情報を取得する。取得部181は、カメラ14により撮像されたユーザの画像情報を含む状態情報を取得する。取得部181は、ユーザの発話情報を取得する。取得部181は、音声入力部12により検知されたユーザの発話情報を取得する。 The acquisition unit 181 acquires state information indicating the state of the user. The acquisition unit 181 acquires state information including user image information captured by the camera 14 . Acquisition unit 181 acquires utterance information of a user. The acquisition unit 181 acquires user utterance information detected by the voice input unit 12 .
 送信部182は、通信部11を介して、ハイライト生成サーバ100へ情報を送信する。送信部182は、ユーザに関する情報をハイライト生成サーバ100へ送信する。送信部182は、カメラ14により撮影されたユーザの映像に関する情報をハイライト生成サーバ100へ送信する。送信部182は、ユーザの状態を示す状態情報を送信する。送信部182は、カメラ14により撮像されたユーザの画像情報を含む状態情報を送信する。送信部182は、ユーザの発話または操作などにより入力された情報を送信する。 The transmission unit 182 transmits information to the highlight generation server 100 via the communication unit 11 . The transmission unit 182 transmits information about the user to the highlight generation server 100 . The transmission unit 182 transmits information about the user's video captured by the camera 14 to the highlight generation server 100 . The transmitting unit 182 transmits state information indicating the state of the user. The transmission unit 182 transmits state information including user image information captured by the camera 14 . The transmission unit 182 transmits information input by user's speech or operation.
 受信部183は、通信部11を介して、ハイライト生成サーバ100から情報を受信する。受信部183は、ハイライト生成サーバ100が提供する情報を受信する。受信部183は、ハイライト生成サーバ100からコンテンツを受信する。受信部183は、ハイライト生成サーバ100からハイライトを受信する。 The receiving section 183 receives information from the highlight generation server 100 via the communication section 11 . The receiving unit 183 receives information provided by the highlight generation server 100 . The receiving unit 183 receives content from the highlight generation server 100 . The receiving unit 183 receives highlights from the highlight generation server 100 .
 処理部184は、各種の処理を実行する。処理部184は、音声入力部12または操作部16により受け付けられたユーザの操作に応じた処理を実行する。 The processing unit 184 executes various types of processing. The processing unit 184 executes processing according to the user's operation accepted by the voice input unit 12 or the operation unit 16 .
 処理部184は、表示部15を介して各種情報を表示する。例えば、処理部184は、表示部15の表示を制御する表示制御部として機能する。処理部184は、音声出力部13を介して各種情報を音声出力する。例えば、処理部184は、音声出力部13の音声出力を制御する音声出力制御部として機能する。 The processing unit 184 displays various information via the display unit 15. For example, the processing unit 184 functions as a display control unit that controls display on the display unit 15 . The processing unit 184 outputs various kinds of information as voice through the voice output unit 13 . For example, the processing unit 184 functions as an audio output control unit that controls audio output of the audio output unit 13 .
 処理部184は、取得部181が受信した情報を出力する。処理部184は、ハイライト生成サーバ100から提供されたコンテンツを出力する。処理部184は、取得部181が受信したコンテンツを、音声出力部13または表示部15を介して出力する。処理部184は、表示部15を介してコンテンツを表示する。処理部184は、音声出力部13を介してコンテンツを音声出力する。 The processing unit 184 outputs the information received by the acquisition unit 181. The processing unit 184 outputs the content provided by the highlight generation server 100 . Processing unit 184 outputs the content received by acquisition unit 181 via audio output unit 13 or display unit 15 . The processing unit 184 displays content via the display unit 15 . The processing unit 184 outputs the contents as audio through the audio output unit 13 .
 処理部184は、通信部11を介して、外部の情報処理装置へ種々の情報を送信する。処理部184は、ハイライト生成サーバ100へ各種情報を送信する。処理部184は、記憶部17に記憶された各種情報を外部の情報処理装置へ送信する。処理部184は、取得部181により取得された各種情報をハイライト生成サーバ100へ送信する。処理部184は、取得部181により取得されたセンサ情報をハイライト生成サーバ100へ送信する。処理部184は、操作部16により受け付けられたユーザの操作情報をハイライト生成サーバ100へ送信する。処理部184は、エッジ視聴端末10を利用するユーザの発話や画像等の情報をハイライト生成サーバ100へ送信する。 The processing unit 184 transmits various information to an external information processing device via the communication unit 11 . The processing unit 184 transmits various information to the highlight generation server 100 . The processing unit 184 transmits various information stored in the storage unit 17 to an external information processing device. The processing unit 184 transmits various information acquired by the acquisition unit 181 to the highlight generation server 100 . The processing unit 184 transmits the sensor information acquired by the acquisition unit 181 to the highlight generation server 100 . The processing unit 184 transmits the user operation information received by the operation unit 16 to the highlight generation server 100 . The processing unit 184 transmits information such as an utterance and an image of the user using the edge viewing terminal 10 to the highlight generation server 100 .
 なお、上述した制御部18による各処理は、例えば、JavaScript(登録商標)などにより実現されてもよい。また、上述した制御部18による情報処理等の処理は、所定のアプリケーションにより行われる場合、制御部18の各部は、例えば、所定のアプリケーションにより実現されてもよい。例えば、制御部18による情報処理等の処理は、外部の情報処理装置から受信した制御情報により実現されてもよい。例えば、上述した表示処理が所定のアプリケーション(例えばコンテンツ表示アプリ等)により行われる場合、制御部18は、例えば、所定のアプリや専用アプリを制御するアプリ制御部を有してもよい。 Note that each process performed by the control unit 18 described above may be implemented by, for example, JavaScript (registered trademark). Further, when the processing such as information processing by the control unit 18 described above is performed by a predetermined application, each unit of the control unit 18 may be realized by the predetermined application, for example. For example, processing such as information processing by the control unit 18 may be realized by control information received from an external information processing device. For example, when the display process described above is performed by a predetermined application (for example, a content display application, etc.), the control unit 18 may have an application control unit that controls the predetermined application or a dedicated application, for example.
[1-5.実施形態に係る情報処理の手順]
 次に、図10を用いて、実施形態に係る各種情報処理の手順について説明する。図10は、本開示の実施形態に係る情報処理装置の処理手順を示すフローチャートである。具体的には、図10は、情報処理装置の一例であるハイライト生成サーバ100による情報処理の手順を示すフローチャートである。
[1-5. Information processing procedure according to the embodiment]
Next, various information processing procedures according to the embodiment will be described with reference to FIG. 10 . FIG. 10 is a flow chart showing the processing procedure of the information processing device according to the embodiment of the present disclosure. Specifically, FIG. 10 is a flowchart showing the procedure of information processing by the highlight generation server 100, which is an example of an information processing apparatus.
 図10に示すように、ハイライト生成サーバ100は、イベントのリアルタイム視聴を行ったユーザである第1ユーザの前記イベントのリアルタイム視聴時の状態を示す状態情報と、イベントの映像であるイベントコンテンツとを取得する(ステップS101)。ハイライト生成サーバ100は、第1ユーザの状態情報により決定されるイベントコンテンツの一部を用いて、第2ユーザに提供するイベントコンテンツのハイライトを生成する(ステップS102)。 As shown in FIG. 10 , the highlight generation server 100 generates state information indicating the state of the real-time viewing of the event by the first user who has viewed the event in real-time, and event content, which is video of the event. (step S101). The highlight generation server 100 generates highlights of event content to be provided to the second user, using part of the event content determined by the state information of the first user (step S102).
[1-6.情報処理システムの構成及び処理]
 ここから、図11~図15を用いて、情報処理システムの構成や処理について説明する。なお、以下説明する点については、第1の実施形態に係る情報処理システム1及び第2の実施形態に係る情報処理システム1のいずれに適用されてもよい。
[1-6. Configuration and processing of information processing system]
From now on, the configuration and processing of the information processing system will be described with reference to FIGS. 11 to 15. FIG. Note that the points described below may be applied to either the information processing system 1 according to the first embodiment or the information processing system 1 according to the second embodiment.
[1-6-1.情報処理システムの学習に関する機能的な構成例]
 図11について説明する。図11は、情報処理システムの学習に関する機能的な構成例を示す図である。図11では、破線BSがシステムにおける機能的な界面を示し、破線BSの左側が現地会場(図3のイベント現地に対応)の機器またはエッジ視聴端末10側に対応し、破線BSの右側がハイライト生成サーバ100側に対応する。破線BSは、情報処理システム1における機能の振り分けの一例を示す。図11では、破線BSの左側に示す各構成要素は、現地会場の機器またはエッジ視聴端末10により実現される。また、図11では、破線BSの右側に示す各構成要素は、ハイライト生成サーバ100により実現される。
[1-6-1. Functional Configuration Example Regarding Learning of Information Processing System]
FIG. 11 will be described. FIG. 11 is a diagram illustrating a functional configuration example regarding learning of the information processing system. In FIG. 11, the dashed line BS indicates a functional interface in the system, the left side of the dashed line BS corresponds to the equipment at the site venue (corresponding to the event site in FIG. 3) or the edge viewing terminal 10 side, and the right side of the dashed line BS is high level. It corresponds to the light generation server 100 side. A dashed line BS indicates an example of allocation of functions in the information processing system 1 . In FIG. 11 , each component shown on the left side of the dashed line BS is implemented by the equipment at the site or the edge viewing terminal 10 . Also, in FIG. 11 , each component shown on the right side of the dashed line BS is implemented by the highlight generation server 100 .
 なお、情報処理システム1における装置構成の境界(界面)は、破線BSに限定されず、現地会場の機器、エッジ視聴端末10、ハイライト生成サーバ100等に割り当てられる機能はどのような組合せであってもよい。例えば、ハイライト生成サーバ100がエッジ視聴端末10、コンテンツ配信サーバ50または観客映像収集サーバ60のいずれかと一体である場合、破線BSが示す界面は無くてもよい。 The boundary (interface) of the device configuration in the information processing system 1 is not limited to the dashed line BS, and the functions assigned to the equipment at the venue, the edge viewing terminal 10, the highlight generation server 100, etc. may be combined in any combination. may For example, if the highlight generation server 100 is integrated with any of the edge viewing terminal 10, the content distribution server 50, or the spectator video collection server 60, the interface indicated by the dashed line BS may be omitted.
 例えば、図11は、事前にデータ分析により行動解析アルゴリズムを機械学習する学習フローを示す。情報処理システム1では、スポーツの試合会場やライブ会場などでイベントをリアルタイムで観戦している現地の観客(リアルタイム観戦者)をカメラなどの撮像デバイスで撮影し観客映像データとして蓄積する。図11中の撮像デバイスは、観客撮像機器またはエッジ視聴端末10のカメラ14等に対応する。なお、情報処理システム1は、リモート環境でリアルタイム視聴している視聴者を学習対象のリアルタイム観戦者としてもよい。情報処理システム1は、観客撮影データに対して画像認識処理を行う事で得られる特徴量を観客特徴量データとして蓄積する。図11中の画像認識処理は、ハイライト生成サーバ100の画像処理部133により実行される。 For example, FIG. 11 shows a learning flow for machine learning a behavior analysis algorithm based on data analysis in advance. In the information processing system 1, local spectators (real-time spectators) watching an event in real time at a sports game venue or a live venue are photographed by an imaging device such as a camera and stored as spectator video data. The imaging device in FIG. 11 corresponds to the audience imaging equipment or the camera 14 of the edge viewing terminal 10 or the like. Note that the information processing system 1 may set viewers watching in real time in a remote environment as learning target real time spectators. The information processing system 1 accumulates feature amounts obtained by performing image recognition processing on spectator photographed data as spectator feature amount data. Image recognition processing in FIG. 11 is executed by the image processing unit 133 of the highlight generation server 100 .
 例えば、観客特徴量データは、画像認識処理で得られる特徴量の全試合時間の時系列データである。例えば、観客特徴量データは、撮影されている観客全員分の各個人の特徴量が時系列データとして蓄積される。特徴量の例としては、骨格認識により得られる体のパーツ点、顔向き等を示す情報、顔認識により得られる顔パーツ点、笑顔、感情などの顔属性を示す情報、動き検出により得られる情報(動き情報)、または視線検知により得られる情報(視線情報)等である。 For example, the spectator feature amount data is time-series data of the entire match duration of the feature amount obtained by image recognition processing. For example, the spectator feature amount data is accumulated as time-series data of individual feature amounts for all the spectators who are photographed. Examples of feature values include body part points obtained by skeletal recognition, information indicating facial orientation, etc., facial part points obtained by face recognition, information indicating facial attributes such as smiles and emotions, and information obtained by motion detection. (motion information), or information obtained by line-of-sight detection (line-of-sight information).
 例えば、ハイライト生成サーバ100は、観客特徴量データと教師データを各学習器にかけて行動解析アルゴリズムのモデルを生成する。図11中の各学習器に対応する処理(学習処理)は、ハイライト生成サーバ100の学習部132により実行される。図11では、ハイライト生成サーバ100は、個人属性判定器であるモデルM1を生成する。ハイライト生成サーバ100は、行動指標判定器であるモデルM2を生成する。ハイライト生成サーバ100は、ハイライトシーン予測器であるモデルM3を生成する。なお、各モデルM1~M3についての詳細は後述する。 For example, the highlight generation server 100 applies audience feature data and teacher data to each learning device to generate a behavior analysis algorithm model. Processing (learning processing) corresponding to each learning device in FIG. 11 is executed by the learning unit 132 of the highlight generation server 100 . In FIG. 11, the highlight generation server 100 generates a model M1, which is a personal attribute determiner. The highlight generation server 100 generates a model M2, which is an action index determiner. The highlight generation server 100 generates model M3, which is a highlight scene predictor. Details of each of the models M1 to M3 will be described later.
 例えば、教師データは、対応する観客特徴量データがモデルに入力された場合に、モデルが出力することが望まれる正解を示すラベル(正解情報)である。教師データは、イベントコンテンツを解析するコンテンツ解析処理により生成されてもよい。例えば、教師データは、試合や演者などのコンテンツをカメラなどの撮像デバイスで撮影したコンテンツ映像データの画像認識処理結果およびマイクなどの収音デバイスで録音したコンテンツ音声データの音響処理結果から自動生成してもよいし、情報処理システム1の管理者等により手作業で教師データを作成する際の試合展開などの分析ツールとして活用してもよい。このように、手作業のみによる教師データの作成も可能であるため、図11中に点線で示すように、コンテンツの映像及び音声からの教師データ生成は行われなくてもよい。 For example, the teacher data is a label (correct answer information) that indicates the correct answer that the model is expected to output when the corresponding audience feature data is input to the model. Teacher data may be generated by a content analysis process that analyzes event content. For example, training data is automatically generated from the results of image recognition processing of content video data captured by an imaging device such as a camera, and the acoustic processing results of content audio data recorded with a sound pickup device such as a microphone. Alternatively, the manager of the information processing system 1 may use it as an analysis tool for the development of a match when training data is manually created. In this way, it is possible to create teacher data only manually, so teacher data does not have to be generated from the video and audio of the content, as indicated by the dotted line in FIG. 11 .
[1-6-2.個人属性判定器に関する学習及び推論]
 ここから、各判定器(モデル)に関する学習及び推論について説明する。まず、個人属性判定器に関する学習及び推論の一例について説明する。図12は、情報処理システムの個人属性判定器に関する学習及び推論の一例を示す図である。図12中の点線より上側が学習フェーズにおけるモデルの生成に関する処理を示し、図12中の点線より下側が学習により生成されたモデルを用いた推論フェーズにおける処理を示す。なお、上述した内容と同様の点については適宜説明を省略する。
[1-6-2. Learning and Inference on Personal Attribute Determinator]
Learning and reasoning for each determiner (model) will now be described. First, an example of learning and inference regarding the personal attribute determiner will be described. FIG. 12 is a diagram showing an example of learning and inference regarding the personal attribute determiner of the information processing system. The part above the dotted line in FIG. 12 shows the process related to model generation in the learning phase, and the part below the dotted line in FIG. 12 shows the process in the inference phase using the model generated by learning. It should be noted that descriptions of the same points as those described above will be omitted as appropriate.
 図12では、個人属性判定の一例として、各個人(ユーザ)が、ホームファン(第1ファン種別)、アウェイファン(第2ファン種別)、及びビギナー(第3ファン種別)の3つの種別のファン属性のいずれであるかを判定する場合を示す。 In FIG. 12, as an example of personal attribute determination, each individual (user) is a fan of three types: home fan (first fan type), away fan (second fan type), and beginner (third fan type). The case of determining whether it is any of the attributes is shown.
 また、図12では、教師データTD1として、観客映像データに映っている各個人に対してファン属性のラベルを(人手で)付与したデータを準備する。例えば、ハイライト生成サーバ100は、教師データTD1を用いて、各個人の観客特徴量データに対して教師有り学習を行うことによりモデルM1を生成する。ハイライト生成サーバ100の学習アルゴリズムとしては、DNN、XGBoost(eXtreme Gradient Boosting)などの任意のアルゴリズムが採用可能である。 Also, in FIG. 12, data in which a fan attribute label is assigned (manually) to each individual appearing in the spectator video data is prepared as training data TD1. For example, the highlight generation server 100 generates the model M1 by performing supervised learning on the audience feature amount data of each individual using the teacher data TD1. Any algorithm such as DNN, XGBoost (eXtreme Gradient Boosting), or the like can be adopted as the learning algorithm of the highlight generation server 100 .
 図12に示す教師データTD1は、観客映像データに映っている各個人に対してファン属性のラベルを付与したデータである。教師データTD1は、各個人(ユーザ)をホームファン(第1ファン種別)、アウェイファン(第2ファン種別)、及びビギナー(第3ファン種別)の3つの種別のいずれかを示すラベルを付与する場合を示す。図12では、各個人の顔に付される3つの種別のいずれかを示すハッチングをラベルの一例として示すが、ラベルは各個人を識別する情報(ユーザID)に3つの種別のいずれかを示すラベル(正解情報)が付される態様であってもよい。 The teacher data TD1 shown in FIG. 12 is data in which each individual appearing in the audience video data is labeled with a fan attribute. The teacher data TD1 gives each individual (user) a label indicating one of three types: home fan (first fan type), away fan (second fan type), and beginner (third fan type). indicate the case. In FIG. 12, hatching indicating one of the three types attached to each individual's face is shown as an example of a label, and the label indicates one of the three types of information (user ID) that identifies each individual. A mode in which a label (correct answer information) is attached may also be used.
 ハイライト生成サーバ100は、上述した情報を用いた学習処理により、個人の特徴量データを入力として、入力データに対応するユーザの属性が3つの種別のうちのいずれであるかを識別する個人属性判定器であるモデルM1を生成する。 The highlight generation server 100 uses personal feature amount data as an input by learning processing using the above-described information, and identifies which of the three types the attribute of the user corresponding to the input data belongs to. A model M1, which is a determiner, is generated.
 そして、ハイライト生成サーバ100は、推論フェーズでは、学習フェーズで生成したモデルM1を用いて推論処理を行う。例えば、ハイライト生成サーバ100は、特徴量データを入力データとして、モデルM1に入力することにより、入力データのユーザの個人属性を示す情報(出力データOD11)をモデルM1に出力させる。図12では、ハイライト生成サーバ100は、属性が未知である(推論対象となる)ユーザ(「対象ユーザ」ともいう)の特徴量データをモデルM1に入力することにより、対象ユーザのファン属性を示す情報をモデルM1に出力させる。 Then, in the inference phase, the highlight generation server 100 performs inference processing using the model M1 generated in the learning phase. For example, the highlight generation server 100 inputs feature amount data as input data to the model M1, thereby causing the model M1 to output information (output data OD11) indicating the personal attribute of the user of the input data. In FIG. 12, the highlight generation server 100 inputs the feature amount data of a user whose attributes are unknown (to be inferred) (also referred to as a “target user”) to the model M1, thereby determining the fan attributes of the target user. The information shown is output to the model M1.
 なお、図12に示す個人属性判定器は一例に過ぎず、情報処理システム1は、様々な個人属性判定器を用いてもよい。例えば、情報処理システム1は、画像認識器の属性認識機能を個人属性判定器として利用してもよい情報処理システム1は、顔認識器の属性認識により年代や性別を判定してもよい。 The personal attribute determiner shown in FIG. 12 is merely an example, and the information processing system 1 may use various personal attribute determiners. For example, the information processing system 1 may use the attribute recognition function of an image recognizer as a personal attribute determiner. The information processing system 1 may determine age and gender by attribute recognition of a face recognizer.
 例えば、情報処理システム1は、楽しめる属性を判定する個人属性判定器を用いてもよい。情報処理システム1は、スポーツの試合やライブイベントの期間全体を通した笑顔度の平均値を楽しめる属性としてもよい。情報処理システム1は、視聴者自身の過去の視聴時(同一競技の別の試合など)の個人属性判定値を保存しておき、判定時に保存されている個人属性判定値を使ってもよい。 For example, the information processing system 1 may use a personal attribute determiner that determines enjoyable attributes. The information processing system 1 may use the average value of smile levels over the entire period of a sports game or live event as an attribute that allows enjoyment. The information processing system 1 may store personal attribute determination values of the viewer's past viewing (such as another game of the same sport), and use the stored personal attribute determination values at the time of determination.
 情報処理システム1は、楽しめる属性または後述の視聴時の行動に基づいた画像認識で瞬時に判別できない属性の判定に、過去視聴時の保存値を使用してもよい。また、情報処理システム1は、ハイライトで視聴したいシーンが異なるシチュエーションに関する属性を判定してもよい。 The information processing system 1 may use the saved value from past viewing to determine attributes that are enjoyable or attributes that cannot be instantly determined by image recognition based on behavior during viewing, which will be described later. Further, the information processing system 1 may determine attributes related to situations in which scenes desired to be viewed in highlights are different.
 例えば、情報処理システム1は、入り込める属性を判定する個人属性判定器を用いてもよい。情報処理システム1は、スポーツの試合やライブイベントの期間全体を通した動き量の平均値から判定してもよい。 For example, the information processing system 1 may use a personal attribute determiner that determines attributes that can be entered. The information processing system 1 may make the determination from the average value of the amount of movement over the entire period of the sports game or live event.
 例えば、情報処理システム1は、チケット金額属性を判定する個人属性判定器を用いてもよい。情報処理システム1は、撮影された個人の座席位置からチケット金額等の支出コストを判定してもよい。なお、情報処理システム1は、対象ユーザが現地の観客でない場合は購入履歴を参照して支出コストを判定してもよい。 For example, the information processing system 1 may use a personal attribute determiner that determines the ticket price attribute. The information processing system 1 may determine expenditure costs such as the ticket price from the photographed seat position of the individual. Note that the information processing system 1 may determine the expenditure cost by referring to the purchase history when the target user is not a local spectator.
 例えば、情報処理システム1は、声出し属性を判定する個人属性判定器を用いてもよい。情報処理システム1は、試合・イベントの期間全体を通した口の開き具合の平均値から判定してもよい。 For example, the information processing system 1 may use a personal attribute determiner that determines vocalization attributes. The information processing system 1 may make the determination based on the average value of the opening degree of the mouth over the entire period of the game/event.
 例えば、情報処理システム1は、ながら属性を判定する個人属性判定器を用いてもよい。情報処理システム1は、スマホを見ながら観戦など、他のことをしながら見をしているかのラベルを付与し学習することにより、ながら属性を判定する個人属性判定器を生成してもよい。 For example, the information processing system 1 may use a personal attribute determiner that determines the attribute. The information processing system 1 may generate a personal attribute determiner that determines a while attribute by assigning a label indicating whether the user is watching while doing something else, such as watching a game while looking at a smartphone, and learning.
 例えば、情報処理システム1は、推し属性を判定する個人属性判定器を用いてもよい。情報処理システム1は、ライブイベント等で、特定の演者を推しているかのラベルを付与し学習することにより、推し属性を判定する個人属性判定器を生成してもよい。 For example, the information processing system 1 may use a personal attribute determiner that determines inference attributes. The information processing system 1 may generate a personal attribute determiner that determines the endorsement attribute by giving a label indicating whether or not a specific performer is endorsed at a live event or the like and learning the label.
 例えば、情報処理システム1は、共感属性を判定する個人属性判定器を用いてもよい。情報処理システム1は、例えば講演会や演説では聞きたい内容が異なるため、講演会や演説で、演者の主義主張に共感(信奉)しているかのラベルを付与し学習することにより、共感属性を判定する個人属性判定器を生成してもよい。 For example, the information processing system 1 may use a personal attribute determiner that determines empathy attributes. The information processing system 1, for example, lectures and speeches, because the contents to be heard are different, in the lectures and speeches, by assigning a label indicating whether or not the speaker sympathizes (believes) with the principles and assertions of the speaker, and learns the sympathy attribute. A personal attribute determiner may be generated to determine.
 例えば、情報処理システム1は、一体感属性を判定する個人属性判定器を用いてもよい。情報処理システム1は、スポーツやライブで、他の観客と一体となった(Routine等の)応援をしているかのラベルを付与し学習することにより、一体感属性を判定する個人属性判定器を生成してもよい。 For example, the information processing system 1 may use a personal attribute determiner that determines a sense of togetherness attribute. The information processing system 1 provides a personal attribute determination device that determines a sense of togetherness attribute by giving a label indicating whether the audience is cheering together with other spectators (routine, etc.) in a sport or a live performance, and learning the label. may be generated.
 例えば、情報処理システム1は、パーティ関係性属性を判定する個人属性判定器を用いてもよい。情報処理システム1は、結婚式などのパーティで新郎新婦など主役との関係性(親族、友人、会社同僚等.)を、撮影された個人の座席位置から判定してもよい。また、情報処理システム1は、非出席視聴者は属性入力としてもよい。 For example, the information processing system 1 may use a personal attribute determiner that determines party relationship attributes. The information processing system 1 may determine the relationship (relatives, friends, co-workers, etc.) with the main characters such as the bride and groom at a party such as a wedding ceremony from the seat position of the photographed individual. In addition, the information processing system 1 may input non-attendance viewers as attributes.
 例えば、情報処理システム1は、集中属性を判定する個人属性判定器を用いてもよい。情報処理システム1は、クラシックコンサートなどの観客のリアクションが少ないイベントにおいて、イベント全体を通して集中して鑑賞できているかのラベルを付与し学習することにより、一体感属性を判定する個人属性判定器を生成してもよい。このように生成された個人属性判定器を用いる場合、集中できない観客(視聴者)は結果的に抽出するシーン数が減りハイライト動画の時間が短くすることができる。 For example, the information processing system 1 may use a personal attribute determiner that determines concentration attributes. The information processing system 1 creates a personal attribute determiner that determines a sense of togetherness attribute by giving a label indicating whether or not the audience can concentrate on watching the entire event, such as a classical music concert, where there is little reaction from the audience, and by learning the label. You may When the personal attribute determiner generated in this way is used, the audience (viewers) who cannot concentrate can consequently reduce the number of scenes to be extracted and shorten the duration of the highlight moving image.
 例えば、情報処理システム1は、感情表出行動属性を判定する個人属性判定器を用いてもよい。情報処理システム1は、映画や演劇の鑑賞において、感情表出行動の種別(泣く、笑う、怖がる、興奮等)のラベルを付与し学習することにより、感情表出行動属性を判定する個人属性判定器を生成してもよい。このように生成された個人属性判定器を用いる場合、感情表出行動の多さの違いにより、感動・コメディ・ホラー・アクション等のハイライトで見たいシーンが異なる場合にも対応可能となる。 For example, the information processing system 1 may use a personal attribute determiner that determines the emotional expression behavior attribute. The information processing system 1 determines personal attribute determination of emotional expression behavior attributes by assigning and learning labels of types of emotional expression behavior (crying, laughing, being scared, excitement, etc.) when watching a movie or a play. You can create a vessel. When using the personal attribute determinator generated in this way, it is possible to deal with cases where the scene desired to be seen is different in highlights such as excitement, comedy, horror, action, etc., due to the difference in the amount of emotional expression behavior.
[1-6-3.行動指標判定器に関する学習及び推論]
 次に、行動指標判定器に関する学習及び推論の一例について説明する。図13は、情報処理システムの行動指標判定器に関する学習及び推論の一例を示す図である。なお、上述した内容と同様の点については適宜説明を省略する。
[1-6-3. Learning and reasoning about the action index determiner]
Next, an example of learning and inference regarding the action index determiner will be described. FIG. 13 is a diagram illustrating an example of learning and inference regarding the action index determiner of the information processing system. It should be noted that descriptions of the same points as those described above will be omitted as appropriate.
 例えば、ハイライト生成サーバ100は、ユーザの行動に関する各種の指標について試合展開から目的変数を定義し、観客特徴量データを説明変数として回帰分析を行うことにより、行動指標判定器であるモデルM2を生成する。モデルM2は、イベントが行われる期間の各時点に対応するユーザの特徴量データを入力として、各時点での盛り上がり度を示す値を出力(算出)するための回帰式であってもよい。以下では、行動指標判定の一例として、各個人(ユーザ)の盛り上がり度を判定する場合を示す。 For example, the highlight generation server 100 defines objective variables from the development of the game for various indicators related to user behavior, and performs regression analysis using spectator feature amount data as explanatory variables, thereby generating a model M2, which is an action index determiner. Generate. The model M2 may be a regression equation for inputting user feature amount data corresponding to each point in time during an event and outputting (calculating) a value indicating the degree of excitement at each point in time. In the following, as an example of action index determination, a case of determining the excitement degree of each individual (user) will be described.
 例えば、ハイライト生成サーバ100は、ホームファン属性の(ユーザの)特徴量を説明変数とし、ホーム得点時に値が0から1に変化する目的変数を定義して学習処理を実行することにより、モデルM2を生成する。図13に示す教師データTD2は、時点t1においてホームチームが得点した場合に対応し、時点t1に目的変数の値が0から1にする教師データの一例を示す。 For example, the highlight generation server 100 defines an objective variable whose value changes from 0 to 1 when home fans score a home fan attribute (user's) feature amount as an explanatory variable, and executes learning processing to generate a model Generate M2. The teacher data TD2 shown in FIG. 13 corresponds to the case where the home team scores at time t1, and shows an example of the teacher data for changing the value of the objective variable from 0 to 1 at time t1.
 ハイライト生成サーバ100は、上述した情報を用いた学習処理により、個人の特徴量データを入力として、入力データに対応するユーザのイベントの期間での盛り上がりを示す情報を出力するモデルM2を生成する。例えば、ハイライト生成サーバ100は、イベントの期間におけるユーザの盛り上がり度(スコア)の推移を示す情報を出力するモデルM2を生成する。 The highlight generation server 100 generates a model M2 that receives individual feature amount data as input and outputs information indicating the excitement during the event period of the user corresponding to the input data through learning processing using the above-described information. . For example, the highlight generation server 100 generates a model M2 that outputs information indicating changes in the user's degree of excitement (score) during the period of the event.
 ハイライト生成サーバ100は、回帰分析の結果として得られた指標の回帰式を行動指標判定器(例えばモデルM2)として生成する。これにより、例えば時系列データである特徴量データを入力すると、各指標の度合いを表すスコア(回帰式の結果)の時系列データを出力する行動指標判定器が得られる。行動指標判定器は、個人の特徴量データを入力として個人の各指標のスコアを出力する。また、行動指標判定器は、属性別また全体の集合の特徴量データを入力として各指標のスコアを平均値として出力してもよい。 The highlight generation server 100 generates the regression equation of the index obtained as a result of the regression analysis as an action index determiner (for example, model M2). As a result, for example, when feature amount data, which is time-series data, is input, an action index determination device is obtained that outputs time-series data of scores (results of regression equations) representing the degree of each index. The action index determiner receives individual feature amount data as input and outputs a score for each individual index. Also, the action index determination device may input the feature amount data for each attribute or for the entire set and output the score of each index as an average value.
 そして、ハイライト生成サーバ100は、推論フェーズでは、学習フェーズで生成したモデルM2を用いて推論処理を行う。例えば、ハイライト生成サーバ100は、特徴量データを入力データとして、モデルM2に入力することにより、入力データのユーザの盛り上がり度を示す情報をモデルM2に出力させる。図13では、ハイライト生成サーバ100は、指標(盛り上がり度等)が未知である(推論対象となる)ユーザ(対象ユーザ)の特徴量データをモデルM2に入力することにより、対象ユーザについて、指標のスコアの時系列データをモデルM2に出力させる。 Then, in the inference phase, the highlight generation server 100 performs inference processing using the model M2 generated in the learning phase. For example, the highlight generation server 100 inputs the feature amount data as input data to the model M2, thereby causing the model M2 to output information indicating the user's enthusiasm for the input data. In FIG. 13 , the highlight generation server 100 inputs feature amount data of a user (target user) whose index (degree of excitement, etc.) is unknown (to be inferred) into the model M2, thereby obtaining an index for the target user. is output to the model M2.
 図13中の出力データOD21は、ホームチームのユーザに対応するモデルM2の出力例を示し、ホームチームが得点した時点t1に盛り上がり度が急上昇する場合を示す。また、図13中の出力データOD22は、アウェイチームのユーザに対応するモデルM2の出力例を示し、ホームチームが得点した時点t1、すなわちアウェイチームが失点した時点t1に盛り上がり度が若干低下する場合を示す。また、図13中の出力データOD21は、ビギナーのユーザ(例えばホームファン及びアウェイファンのいずれでもないユーザ)に対応するモデルM2の出力例を示し、ホームチームが得点した時点t1に盛り上がり度が若干上昇する場合を示す。 The output data OD21 in FIG. 13 shows an output example of the model M2 corresponding to the users of the home team, and shows the case where the degree of excitement rises sharply at time t1 when the home team scores. Output data OD22 in FIG. 13 shows an output example of the model M2 corresponding to the users of the away team. indicates Output data OD21 in FIG. 13 shows an output example of model M2 corresponding to a beginner user (for example, a user who is neither a home fan nor an away fan). Indicates the case of rising.
 上述のように、例えば、ハイライト生成サーバ100は、属性判定と組み合わせて、盛り上がり度を属性別に判定したスコアの時系列データを生成してもよい。例えば、ハイライト生成サーバ100は、属性ごとの集合のスコア平均値を用いてもよい。例えば、ハイライト生成サーバ100は、ホーム得点時にはホームファンのスコアが上昇し、アウェイファンのスコアが上昇せず、ビギナーは中間程度のスコアとなるようなモデルM2を生成精してもよい。 As described above, for example, the highlight generation server 100 may combine with attribute determination to generate score time-series data obtained by determining the degree of excitement for each attribute. For example, the highlight generation server 100 may use an average score value of sets for each attribute. For example, the highlight generation server 100 may generate a model M2 in which the score of home fans increases when a home score is scored, the score of away fans does not increase, and the score of beginners is intermediate.
 なお、図13に示す行動指標判定器は一例に過ぎず、情報処理システム1は、様々な指標についての行動指標判定器を用いてもよい。例えば、情報処理システム1は、各個人(ユーザ)の集中度を判定する行動指標判定器を用いてもよい。この場合、情報処理システム1は、ユーザの特徴量を説明変数とし、試合中断時に1から0に変化する目的変数を定義して学習処理を実行することにより、集中度を判定する行動指標判定器を生成してもよい。 The action index determiner shown in FIG. 13 is merely an example, and the information processing system 1 may use action index determiners for various indicators. For example, the information processing system 1 may use an action index determiner that determines the degree of concentration of each individual (user). In this case, the information processing system 1 uses the feature amount of the user as an explanatory variable, defines an objective variable that changes from 1 to 0 when the game is interrupted, and executes the learning process, thereby determining the degree of concentration. may be generated.
 また、情報処理システム1は、各個人(ユーザ)のがっかり度を判定する行動指標判定器を用いてもよい。この場合、情報処理システム1は、ユーザの特徴量を説明変数とし、シュートを外した時に0から1に変化する目的変数を定義して学習処理を実行することにより、がっかり度を判定する行動指標判定器を生成してもよい。 In addition, the information processing system 1 may use an action index determiner that determines the degree of disappointment of each individual (user). In this case, the information processing system 1 uses the feature amount of the user as an explanatory variable, defines an objective variable that changes from 0 to 1 when the shot is missed, and executes the learning process, thereby determining the degree of disappointment. A determiner may be generated.
 また、情報処理システム1は、各個人(ユーザ)の緊張度を判定する行動指標判定器を用いてもよい。この場合、情報処理システム1は、ユーザの特徴量を説明変数とし、均衡した得点差(例えばサッカーの場合は1点等)である場合に1となる目的変数を定義して学習処理を実行することにより、緊張度を判定する行動指標判定器を生成してもよい。 The information processing system 1 may also use an action index determiner that determines the degree of tension of each individual (user). In this case, the information processing system 1 uses the feature amount of the user as an explanatory variable, defines an objective variable that becomes 1 when the score difference is balanced (for example, 1 point in the case of soccer), and executes the learning process. By doing so, an action index determiner for determining the degree of tension may be generated.
 また、情報処理システム1は、各個人(ユーザ)の怒り度を判定する行動指標判定器を用いてもよい。この場合、情報処理システム1は、ユーザの特徴量を説明変数とし、ミスした時に0から1に変化する目的変数を定義して学習処理を実行することにより、怒り度を判定する行動指標判定器を生成してもよい。 The information processing system 1 may also use an action index determiner that determines the degree of anger of each individual (user). In this case, the information processing system 1 defines an objective variable that changes from 0 to 1 when a mistake is made, using the feature amount of the user as an explanatory variable, and executes learning processing to determine the degree of anger. may be generated.
 また、情報処理システム1は、各個人(ユーザ)の飽きている度を判定する行動指標判定器を用いてもよい。この場合、情報処理システム1は、ユーザの特徴量を説明変数とし、試合が間延びした時間帯で1となる目的変数を定義して学習処理を実行することにより、飽きている度を判定する行動指標判定器を生成してもよい。 The information processing system 1 may also use an action index determiner that determines the degree of boredom of each individual (user). In this case, the information processing system 1 uses the feature amount of the user as an explanatory variable, defines an objective variable that becomes 1 in the time period during which the game is delayed, and executes the learning process, thereby determining the degree of boredom. An index determiner may be generated.
[1-6-4.ハイライトシーン予測器に関する学習及び推論]
 次に、ハイライトシーン予測器に関する学習及び推論の一例について説明する。図14は、情報処理システムのハイライトシーン予測器に関する学習及び推論の一例を示す図である。なお、上述した内容と同様の点については適宜説明を省略する。
[1-6-4. Learning and Inference on Highlight Scene Predictor]
An example of learning and reasoning for a highlight scene predictor is now described. FIG. 14 is a diagram illustrating an example of learning and reasoning for a highlight scene predictor of an information processing system. It should be noted that descriptions of the same points as those described above will be omitted as appropriate.
 図14では、イベントの期間のうち、どの期間をハイライトとして使用すべきかを予測するために、ハイライトシーン予測器を用いる場合を示す。 FIG. 14 shows a case where a highlight scene predictor is used to predict which period of the event period should be used as a highlight.
 図14では、教師データTD3として、ハイライトシーンとして使用された期間とハイライトシーンとして使用されなかった期間(非ハイライト期間)とを示すデータを準備する。例えば、教師データTD3は、イベントの期間のうち、ハイライトシーンとして使用された期間が1である、非ハイライト期間が0となるデータである。例えば、教師データTD3は、人手で抽出されてもよいし、コンテンツ解析により自動生成されてもよい。例えば、教師データTD3は、入力データとなる特徴量が収集されたイベントについて、実際に生成されたハイライトにおいてハイライトシーンとして使用された期間を示す情報である。 In FIG. 14, data indicating periods used as highlight scenes and periods not used as highlight scenes (non-highlight periods) are prepared as teacher data TD3. For example, the teacher data TD3 is data in which the period used as the highlight scene is 1 and the non-highlight period is 0 in the period of the event. For example, the training data TD3 may be manually extracted or automatically generated by content analysis. For example, the teacher data TD3 is information indicating the period during which the event for which the feature amount as the input data was collected was used as the highlight scene in the actually generated highlight.
 例えば、ハイライト生成サーバ100は、人手で抽出したハイライトシーンを教師データTD3として、ハイライトシーン期間の観客特徴量データをTrue、ハイライトシーン期間ではない観客特徴量データをFalseとするラベル付きデータを生成する。例えば、ハイライト生成サーバ100は、教師データTD3を用いて、各個人の観客特徴量データに対して教師有り学習を行うことによりモデルM3を生成する。ハイライト生成サーバ100の学習アルゴリズムとしては、DNN、XGBoostなどの任意のアルゴリズムが採用可能である。例えば、教師データTD3に対応するイベントで撮影された観客全員分の各個人の特徴量の時系列データが学習用の入力データとして用いられてもよい。 For example, the highlight generation server 100 sets the manually extracted highlight scene as the training data TD3, the audience feature amount data during the highlight scene period as True, and the audience feature amount data other than the highlight scene period as False. Generate data. For example, the highlight generation server 100 generates the model M3 by performing supervised learning on the audience feature amount data of each individual using the teacher data TD3. As the learning algorithm of the highlight generation server 100, arbitrary algorithms such as DNN and XGBoost can be adopted. For example, time-series data of individual feature amounts for all spectators photographed at an event corresponding to the teacher data TD3 may be used as input data for learning.
 なお、上記は一例に過ぎず、教師データTD3としては、図11左下に点線で示す構成のコンテンツの映像及び音声を解析することにより自動抽出されたハイライトシーンが用いられてもよい。また、ハイライト生成サーバ100は、学習対象の入力観客特徴量データを個人属性別に分けて学習することにより、個人属性ごとのハイライトシーン予測器を生成してもよい。 The above is just an example, and the highlight scene automatically extracted by analyzing the video and audio of the content shown by the dotted line in the lower left of FIG. 11 may be used as the training data TD3. In addition, the highlight generation server 100 may generate a highlight scene predictor for each personal attribute by classifying the learning target input audience feature amount data by personal attribute and learning.
 ハイライト生成サーバ100は、上述した情報を用いた学習処理により、例えば時系列データである特徴量データを入力として、尤度、信頼度、判定値の移動平均値等、ハイライトらしさの度合いを表すスコアの時系列データを出力するハイライトシーン予測器であるモデルM3を生成する。例えば、モデルM3は、個人の特徴量データを入力として個人のハイライトらしさのスコアを出力する。また、モデルM3は、属性別または全体の集合の特徴量データを入力として、属性別または全体の集合のハイライトらしさのスコアを出力してもよい。 The highlight generation server 100 uses the feature amount data, which is, for example, time-series data, as an input by learning processing using the above-described information, and determines the degree of highlight-likeness such as likelihood, reliability, moving average value of judgment values, etc. Generate a model M3, which is a highlight scene predictor that outputs time-series data of scores to represent. For example, the model M3 receives individual feature amount data as an input and outputs an individual highlight-likeness score. In addition, the model M3 may receive feature amount data for each attribute or for the entire set as input, and output a score of highlight-likeness for each attribute or for the entire set.
 そして、ハイライト生成サーバ100は、推論フェーズでは、学習フェーズで生成したモデルM3を用いて推論処理を行う。例えば、ハイライト生成サーバ100は、特徴量データを入力データとして、モデルM3に入力することにより、入力データのユーザに対応するハイライトらしさのスコアの時系列データ(出力データOD31)をモデルM3に出力させる。 Then, in the inference phase, the highlight generation server 100 performs inference processing using the model M3 generated in the learning phase. For example, the highlight generation server 100 inputs feature amount data as input data to the model M3, thereby generating time-series data (output data OD31) of the score of highlight-likeness corresponding to the user of the input data to the model M3. output.
 なお、図14に示すハイライトシーン予測器は一例に過ぎず、情報処理システム1は、様々なハイライトシーン予測器を用いてもよい。 Note that the highlight scene predictor shown in FIG. 14 is merely an example, and the information processing system 1 may use various highlight scene predictors.
[1-6-5.情報処理システムのハイライト生成に関する機能的な構成例]
 ここから、図15を用いて、情報処理システム1におけるハイライト生成に関する機能について説明する。図15は、情報処理システムのハイライト生成に関する機能的な構成例を示す図である。なお、上述した内容と同様の点については適宜説明を省略する。例えば、図15の破線BSは、図11の破線BSと同様にシステムにおける機能的な界面を示すため適宜説明を省略する。
[1-6-5. Functional configuration example for highlight generation of information processing system]
From here, the functions related to highlight generation in the information processing system 1 will be described with reference to FIG. 15 . FIG. 15 is a diagram illustrating a functional configuration example related to highlight generation of the information processing system. It should be noted that descriptions of the same points as those described above will be omitted as appropriate. For example, the dashed line BS in FIG. 15 indicates a functional interface in the system like the dashed line BS in FIG.
 例えば、図15中のハイライト視聴ユーザが利用するエッジ視聴端末10では、リモート視聴者がリアルタイムでコンテンツを視聴している場合には常時カメラ(例えばカメラ15等)などの撮影デバイスで撮影が行われ、視聴者映像データとして蓄積される。また、情報処理システム1では、画像認識結果が視聴者特徴量データとして蓄積される。例えば、情報処理システム1では、リアルタイムで視聴したコンテンツと紐づいた情報として記憶蓄積される。 For example, in the edge viewing terminal 10 used by the highlight viewing user in FIG. 15, when the remote viewer is viewing the content in real time, a shooting device such as a camera (for example, the camera 15 or the like) is always used for shooting. and stored as viewer video data. Further, in the information processing system 1, image recognition results are accumulated as viewer feature amount data. For example, in the information processing system 1, the information is stored and accumulated as information associated with the content viewed in real time.
 また、情報処理システム1では、コンテンツをリアルタイムで観戦している現地の観客(もしくはリモート視聴者)の集合を撮影・画像認識して得られる特徴量は、リアルタイム観客特徴量データとして蓄積される。例えば、情報処理システム1では、リアルタイム観客特徴量データは、コンテンツと紐づいた情報として記憶蓄積される。例えば、リアルタイム観客特徴量データを蓄積する構成は、図11中のリアルタイム観戦ユーザに関する観客特徴量データを蓄積する構成と同様である。 In addition, in the information processing system 1, feature amounts obtained by photographing and image recognition of a group of local spectators (or remote viewers) watching content in real time are accumulated as real-time spectator feature amount data. For example, in the information processing system 1, real-time spectator feature amount data is stored and accumulated as information associated with content. For example, the configuration for accumulating the real-time spectator feature amount data is the same as the configuration for accumulating the spectator feature amount data regarding the real-time spectator users in FIG.
 また、情報処理システム1では、ハイライト視聴開始時に、ハイライト視聴ユーザ個人を撮影し、画像認識した視聴者特徴量データに対して、個人属性判定器により個人属性が判定される。例えば、ハイライト生成サーバ100は、個人属性判定器であるモデルM1を用いて、個人属性を判定する。また、情報処理システム1では、個人属性判定器により判定された個人属性結果と、後述するリアルタイム視聴判定器により判定されたリアルタイム視聴判定結果がハイライト生成制御部(例えばハイライト生成サーバ100の生成部134に対応)に送られる。 In addition, in the information processing system 1, at the start of highlight viewing, the individual attribute of the highlight viewing user is photographed, and the personal attribute is determined by the personal attribute determination device for the viewer feature amount data obtained by image recognition. For example, the highlight generation server 100 determines personal attributes using a model M1, which is a personal attribute determiner. Further, in the information processing system 1, the personal attribute result determined by the personal attribute determiner and the real-time viewing determination result determined by the real-time viewing determiner (to be described later) are generated by the highlight generation control unit (for example, the highlight generation server 100). 134).
 なお、リアルタイム視聴判定器は、行動指標判定器によるリアルタイム視聴判定で置き換えてもよい。この場合、ハイライト生成サーバ100は、例えば行動指標判定器であるモデルM2を用いて、リアルタイム視聴判定を行う。例えば、ハイライト生成サーバ100は、モデルM2が出力するスコアが所定の閾値以上である期間を、ユーザがリアルタイム視聴している期間であると判定する。 Note that the real-time viewing determination device may be replaced with a real-time viewing determination by a behavior index determination device. In this case, the highlight generation server 100 performs real-time viewing determination using, for example, the model M2, which is a behavior index determination device. For example, the highlight generation server 100 determines that the period during which the score output by the model M2 is equal to or greater than a predetermined threshold is the period during which the user is viewing in real time.
 また、情報処理システム1において、ハイライト生成制御部ではハイライト視聴開始時に入力されるハイライト視聴ユーザの個人属性結果とリアルタイム視聴判定結果からハイライトシーン予測器に対して入力する特徴量データを指示する。制御フローの詳細は後述するが、例えば、ハイライト生成サーバ100は、ハイライトシーン予測器であるモデルM3に対して入力する特徴量データを決定する。ハイライトシーン予測器はハイライト生成制御部から指示された入力特徴量データからハイライトシーンを予測する。例えば、ハイライト生成サーバ100は、モデルM3に入力特徴量データを入力し、モデルM3の出力に基づいてハイライトシーンを予測する。例えば、ハイライト生成サーバ100は、モデルM3が出力するスコアが所定の閾値以上である期間に対応するシーンをハイライトシーンにすると決定する。 In the information processing system 1, the highlight generation control unit generates feature amount data to be input to the highlight scene predictor based on the personal attribute result of the highlight viewing user input at the start of highlight viewing and the real-time viewing determination result. instruct. Although the details of the control flow will be described later, for example, the highlight generation server 100 determines feature amount data to be input to the model M3, which is a highlight scene predictor. A highlight scene predictor predicts a highlight scene from the input feature amount data instructed by the highlight generation controller. For example, the highlight generation server 100 inputs input feature amount data to the model M3 and predicts highlight scenes based on the output of the model M3. For example, the highlight generation server 100 determines that a scene corresponding to a period in which the score output by the model M3 is equal to or greater than a predetermined threshold is to be a highlight scene.
 なお、ハイライトシーン予測器は、行動指標判定器によるシーン予測で置き換えてもよいこの場合、ハイライト生成サーバ100は、例えば行動指標判定器であるモデルM2を用いて、シーン予測を行う。例えば、ハイライト生成サーバ100は、モデルM2が出力するスコアが所定の閾値以上である期間に対応するシーンをハイライトシーンにすると決定してもよい。 Note that the highlight scene predictor may be replaced with scene prediction by the action index determiner. In this case, the highlight generation server 100 performs scene prediction using, for example, the model M2, which is the action index determiner. For example, the highlight generation server 100 may determine that a scene corresponding to a period in which the score output by the model M2 is equal to or greater than a predetermined threshold is set as a highlight scene.
 また、詳細は後述するが、情報処理システム1において、位置アングル推定部(例えばハイライト生成サーバ100の生成部134に対応)は、ハイライトシーン予測器で予測されたハイライトシーンに対して、リアルタイム観客特徴量データから最適なカメラ位置とアングルを推定する。また、情報処理システム1において、ハイライト動画生成部(例えばハイライト生成サーバ100の生成部134に対応)は、シーン予測結果とカメラ位置及びアングル推定結果に基づいて、コンテンツ映像(音声)データと視聴者映像データからハイライト動画データを生成する。ハイライト動画データは、エッジ視聴端末10のモニター(例えば表示部14等)などの画像出力デバイス、スピーカ(例えば音声出力部12等)などの音声出力デバイスを通してハイライト視聴ユーザに提示される。 Further, although the details will be described later, in the information processing system 1, the position angle estimation unit (for example, corresponding to the generation unit 134 of the highlight generation server 100), for the highlight scene predicted by the highlight scene predictor, Estimate the optimal camera position and angle from real-time audience feature data. Also, in the information processing system 1, the highlight video generation unit (for example, corresponding to the generation unit 134 of the highlight generation server 100) generates content video (audio) data and Highlight video data is generated from viewer video data. The highlight video data is presented to the highlight viewing user through an image output device such as a monitor (for example, the display unit 14) and an audio output device such as a speaker (for example, the audio output unit 12) of the edge viewing terminal 10.
 ここで、リアルタイム視聴判定の一例について説明する。例えば、リアルタイム視聴判定器を用いた処理を一例として示す。リアルタイム視聴判定器では、ハイライト視聴者が対象となるコンテンツをリアルタイム視聴したかを判定する。例えば、ハイライト生成サーバ100は、記憶部120に記憶されたリアルタイム視聴判定器(モデル)を用いて、リアルタイム視聴の判定を行う。なお、リアルタイム視聴判定は一例に過ぎず、リアルタイム視聴が判定可能であれば、上記した行動指標判定器を用いた処理等、任意の処理によりリアルタイム視聴が判定されてもよい。 Here, an example of real-time viewing determination will be explained. For example, processing using a real-time viewing decision device is shown as an example. The real-time viewing determiner determines whether the highlight viewer has viewed the target content in real time. For example, the highlight generation server 100 uses a real-time viewing determiner (model) stored in the storage unit 120 to determine real-time viewing. The real-time viewing determination is merely an example, and if real-time viewing can be determined, real-time viewing may be determined by any process such as the process using the behavior index determiner described above.
 例えば、情報処理システム1では、蓄積されている視聴者特徴量データ内に、ハイライト視聴対象コンテンツのリアルタイム視聴時のデータが存在すれば、そのデータが存在する期間はリアルタイム視聴ありと判定し、存在しなければ、その期間をリアルタイム視聴無しと判定する。この場合、ハイライト生成サーバ100は、記憶部120にハイライト視聴対象コンテンツのリアルタイム視聴時のデータが存在すれば、そのデータが存在する期間はユーザがリアルタイム視聴していたと判定する。また、ハイライト生成サーバ100は、記憶部120にハイライト視聴対象コンテンツのリアルタイム視聴時のデータが存在しなければ、その期間はユーザがリアルタイム視聴していなかったと判定する。 For example, in the information processing system 1, if there is data at the time of real-time viewing of the highlight viewing target content in the accumulated viewer feature amount data, it is determined that there is real-time viewing during the period in which the data exists, If it does not exist, it is determined that there is no real-time viewing during that period. In this case, if the storage unit 120 has data of the highlight viewing target content at the time of real-time viewing, the highlight generation server 100 determines that the user was viewing the content in real-time during the period when the data existed. In addition, if the storage unit 120 does not contain the data for the real-time viewing of the highlight viewing target content, the highlight generation server 100 determines that the user did not view the content in real-time during that period.
 また、上記したように、情報処理システム1では、行動指標判定器をリアルタイム視聴判定器として使用してもよい。この場合、ハイライト生成サーバ100は、例えばハイライト視聴対象コンテンツのリアルタイム視聴時の集中度等の指標の値が閾値以上であればその期間はリアルタイム視聴ありと判定する。また、ハイライト生成サーバ100は、例えばハイライト視聴対象コンテンツのリアルタイム視聴時の集中度等の指標の値が閾値未満であればリアルタイム視聴無しと判定する。 Also, as described above, in the information processing system 1, the behavior index determiner may be used as a real-time viewing determiner. In this case, the highlight generating server 100 determines that there is real-time viewing during the period when the value of the index such as the degree of concentration during real-time viewing of the highlight viewing target content is equal to or greater than a threshold value. Also, the highlight generation server 100 determines that there is no real-time viewing if the value of the index such as the degree of concentration during real-time viewing of the highlight viewing target content is less than a threshold value.
 例えば、情報処理システム1では、蓄積されている視聴者特徴量データ内にハイライト視聴対象コンテンツのリアルタイム視聴時のデータが存在しない場合でも、リアルタイム視聴ありと判定してもよい。この場合、ハイライト生成サーバ100は、例えばリアルタイム観戦特徴量データ内にハイライト視聴者と同一と(顔認識で)個人識別される人物が存在する場合は、その存在期間を現地観客としてリアルタイム視聴ありと判定してもよい。これにより、情報処理システム1は、現地観戦後にエッジ視聴端末での視聴を行うような利用シーンに対応することができる。 For example, the information processing system 1 may determine that real-time viewing is present even if there is no real-time viewing data for the highlight viewing target content in the accumulated viewer feature amount data. In this case, the highlight generation server 100, for example, if there is a person identified as the same as the highlight viewer (by face recognition) in the real-time watching feature amount data, the presence period of the person is used as the local spectator for real-time viewing. You can judge yes. As a result, the information processing system 1 can cope with a usage scene in which viewing is performed on the edge viewing terminal after watching the game on site.
 例えば、情報処理システム1では、現地観客としてリアルタイム視聴ありと判定された場合は、現地で撮影された該当人物の映像データと画像認識特徴量データを、視聴者映像データと視聴者特徴量データとして用いてもよい。 For example, in the information processing system 1, when it is determined that there is real-time viewing as a local spectator, the video data and image recognition feature amount data of the person photographed locally are used as viewer video data and viewer feature amount data. may be used.
[1-7.ハイライト生成の処理フロー例]
 次に、図16を用いて、情報処理システム1におけるハイライト生成に関する処理について説明する。図16は、ハイライト生成に関する処理手順を示すフローチャートである。なお、以下では、情報処理システム1が処理を行う場合を一例として説明するが、図16に示す処理は、情報処理システム1に含まれる装置構成に応じて、ハイライト生成サーバ100、エッジ視聴端末10、コンテンツ配信サーバ50及び観客映像収集サーバ60等のいずれの装置が行ってもよい。
[1-7. Highlight generation process flow example]
Next, with reference to FIG. 16, a process related to highlight generation in the information processing system 1 will be described. FIG. 16 is a flowchart showing a processing procedure regarding highlight generation. In the following, a case where the information processing system 1 performs processing will be described as an example, but the processing shown in FIG. 10, any device such as the content distribution server 50 and the spectator video collection server 60 may perform the processing.
 以下では、図16に示す処理フローの流れの概要を説明した後、各処理についての詳細を説明する。図16では、情報処理システム1は、リアルタイムに視聴した期間に応じて処理を分岐させる(ステップS201)。 Below, after explaining the outline of the processing flow shown in FIG. 16, the details of each process will be explained. In FIG. 16, the information processing system 1 branches the process according to the viewing period in real time (step S201).
 情報処理システム1は、ユーザが途中からリアルタイムに視聴したり、途中までリアルタイムに視聴したりする等、部分的にリアルタイム観戦を行っている場合(図16中の中央)、そのユーザについてはコンテンツをリアルタイム視聴した期間とそれ以外の期間とに分離する(ステップS202)。例えば、情報処理システム1は、部分的にリアルタイム観戦を行っている場合、ユーザのデータを用いて、イベントをリアルタイム視聴した期間を特定することにより、リアルタイム視聴した期間と、それ以外の期間とに分離する。そして、情報処理システム1は、リアルタイム観戦した期間についてはステップS203に示す処理を行い、リアルタイム観戦してない期間についてはステップS204~ステップS205に示す処理を行う。 The information processing system 1, when the user is watching the game in real time partially, such as watching the game in real time from the middle of the game or watching the game in real time until the middle of the game (the center in FIG. It separates into a real-time viewing period and other periods (step S202). For example, when the game is partially watched in real time, the information processing system 1 uses user data to specify the period during which the event was viewed in real time. To separate. Then, the information processing system 1 performs the processing shown in step S203 during the period of real-time watching, and performs the processing of steps S204 and S205 during the non-real-time watching period.
 情報処理システム1は、ユーザがフルタイム観戦を行っている場合、そのユーザについてはハイライトシーン予測器の入力特徴量データを視聴者自身のリアルタイム視聴時の特徴量とする(ステップS203)。例えば、情報処理システム1は、ユーザが会場での現地観客を含むフルタイム観戦を行っている場合、そのユーザについてはモデルM3の入力特徴量データを視聴者自身のリアルタイム視聴時の特徴量とする。 When the user is watching the game full-time, the information processing system 1 uses the input feature amount data of the highlight scene predictor for that user as the viewer's own feature amount during real-time viewing (step S203). For example, when a user is watching a game full-time including local spectators at the venue, the information processing system 1 uses the input feature amount data of the model M3 for the user as the feature amount during real-time viewing of the viewer himself/herself. .
 また、情報処理システム1は、ユーザがリアルタイムで全く見ていない場合、そのユーザについては、視聴者個人の属性判定結果を参照し(ステップS204)、ハイライトシーン予測器の入力特徴量データをリアルタイム観客特徴量データ内の視聴者に類似するリアルタイム観客の集合の特徴量とする(ステップS204)。例えば、情報処理システム1は、ユーザが後からハイライトのみ視聴する場合を含めリアルタイムで全く見ていない場合、そのユーザについてはモデルM3の入力特徴量データをリアルタイム観客特徴量データ内のそのユーザに類似するリアルタイム観客の集合の特徴量とする。 Further, when the user has not viewed the user at all in real time, the information processing system 1 refers to the viewer's individual attribute determination result for that user (step S204), and uses the input feature amount data of the highlight scene predictor in real time. A set of real-time spectators similar to the viewer in the spectator feature quantity data is used as the feature quantity (step S204). For example, when the user does not watch the user in real time at all, including when the user watches only the highlights later, the information processing system 1 assigns the input feature amount data of the model M3 to the user in the real-time audience feature amount data. A set of similar real-time spectators is used as a feature quantity.
 そして、情報処理システム1は、ハイライトシーン予測器へ入力特徴量データを指定してシーン抽出を指示する(ステップS206)。例えば、情報処理システム1は、ステップS204、ステップS205により決定した特徴量データをモデルM3へ入力し、モデルM3の出力に基づいてハイライトを生成する。 Then, the information processing system 1 designates the input feature amount data to the highlight scene predictor and instructs scene extraction (step S206). For example, the information processing system 1 inputs the feature amount data determined in steps S204 and S205 to the model M3, and generates highlights based on the output of the model M3.
 上述した処理により、情報処理システム1は、ハイライト生成を行う。例えば、情報処理システム1は、視聴者自身のリアルタイム視聴時の特徴量をハイライトシーン予測器の入力特徴量データとする場合(個人化の場合)、ハイライトシーン予測のスコア・継続時間の高いシーンを閾値もしくはランキングで選択し、ハイライト動画を生成してもよい。例えば、情報処理システム1は、生成するハイライト動画には視聴者自身の該当シーンのリアルタイム観戦時の映像をワイプ表示してもよい。 Through the above-described processing, the information processing system 1 generates highlights. For example, when the viewer's own feature amount at the time of real-time viewing is used as the input feature amount data for the highlight scene predictor (in the case of personalization), the information processing system 1 sets the highlight scene prediction score/duration time to high. Scenes may be selected by threshold or ranking to generate a highlight video. For example, the information processing system 1 may wipe-display an image of the viewer's own scene when watching the game in real time in the generated highlight video.
 この点について、図17を用いて説明する。図17は、ハイライトへのワイプの重畳表示の一例を示す図である。例えば、図17は、ハイライト動画への視聴者自身の該当シーンのリアルタイム観戦時の映像をワイプ表示の一例を示す。図17では、バスケットボールの試合のハイライト動画であるコンテンツCT21に、視聴者自身の該当シーンのリアルタイム観戦時の映像が配置されるワイプWP1を重畳表示した場合を示す。例えば、ハイライト生成サーバ100は、ワイプWP1が重畳表示されるコンテンツCT21をエッジ視聴端末10に提供する。ハイライト生成サーバ100から提供を受けたエッジ視聴端末10は、ワイプWP1が重畳表示されるコンテンツCT21を表示する。 This point will be explained using FIG. FIG. 17 is a diagram illustrating an example of superimposed display of wipes on highlights. For example, FIG. 17 shows an example of a wipe display of the video of the viewer's own corresponding scene in the highlight video when watching the game in real time. FIG. 17 shows a case where a wipe WP1, in which the viewer's own video of the corresponding scene in real time watching the game is superimposed on the content CT21, which is a highlight video of a basketball game, is displayed. For example, the highlight generation server 100 provides the edge viewing terminal 10 with the content CT21 on which the wipe WP1 is superimposed. The edge viewing terminal 10 provided by the highlight generation server 100 displays the content CT21 on which the wipe WP1 is superimposed.
 また、情報処理システム1は、視聴者に最も近い属性を持つリアルタイム観客の集合の特徴量をハイライトシーン予測器の入力特徴量データとする場合(属性最適化の場合)、ハイライト視聴開始時のカメラ画像解析により視聴者の個人属性を判定する。この場合、カメラ撮像時間により判定できる属性が異なってもよい。例えば、情報処理システム1は、判定できた個人属性が一つの場合は、リアルタイム観客特徴量データ内のその個人属性を持つリアルタイム観客の集合のハイライトスコアの時系列から予測する。例えば、情報処理システム1は、スコア・継続時間の高いシーンを閾値もしくはランキングで選択し、ハイライト動画を生成してもよい。 In addition, when the feature amount of a set of real-time spectators having attributes closest to the viewer is used as the input feature amount data for the highlight scene predictor (in the case of attribute optimization), the information processing system 1, when highlight viewing starts The personal attribute of the viewer is determined by analyzing the camera image of the camera. In this case, attributes that can be determined may differ depending on the camera imaging time. For example, when the information processing system 1 can determine only one personal attribute, the information processing system 1 predicts from the time series of the highlight scores of the set of real-time spectators having that personal attribute in the real-time spectator feature amount data. For example, the information processing system 1 may select scenes with high scores/durations based on a threshold or ranking to generate a highlight video.
 また、判定できた個人属性が複数の場合、情報処理システム1は、様々な情報を適宜用いて、ハイライト動画を生成してもよい。この点について以下例示を記載する。 In addition, when there are a plurality of determined personal attributes, the information processing system 1 may use various information as appropriate to generate a highlight video. An example of this point will be described below.
 例えば、情報処理システム1は、個人属性のAND(論理積)を取り、判定した全ての個人属性を持つリアルタイム観客の集合のハイライトスコアの時系列から、ハイライトシーンを予測してもよい。例えば、情報処理システム1は、視聴者の判別できた属性が年代「30代」及び性別「男性」の2つの属性である場合、リアルタイム観客の中から年代「30代」かつ性別「男性」の集合のハイライトスコアの時系列から、ハイライトシーンを予測してもよい。 For example, the information processing system 1 may take AND (logical product) of personal attributes and predict highlight scenes from the time series of highlight scores of a set of real-time spectators who have all the determined personal attributes. For example, if the attributes that the information processing system 1 has been able to distinguish from the viewer are two attributes of the age “30s” and the gender “male”, the information processing system 1 selects the age “30s” and the gender “male” from the real-time audience. A highlight scene may be predicted from the time series of the highlight scores of the set.
 例えば、情報処理システム1は、以下の式(1)を用いてもよい。 For example, the information processing system 1 may use the following formula (1).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 式(1)のAは、視聴者自身の個人属性結果のもっともらしさのスコアを示す。また、式(1)のHは、該当する属性を持つリアルタイム観客の集合のハイライトスコアを示す。式(1)の以下のSは結合スコアを示し、情報処理システム1は、結合スコアSの時系列から、ハイライトシーンを予測してもよい。 A i in Equation (1) indicates the plausibility score of the viewer's own personal attribute result. Also, H i in Equation (1) indicates the highlight score of the set of real-time spectators having the corresponding attribute. S j below Equation (1) indicates a combined score, and the information processing system 1 may predict a highlight scene from the time series of combined scores S j .
 例えば、情報処理システム1は、家族でテレビ視聴している場合等、視聴者が複数人いる場合、複数人の視聴者に共通する個人属性を持つリアルタイム観客の集合から、ハイライトシーンを予測してもよい。例えば、情報処理システム1は、共通する個人属性が無い場合は、式(1)Aを複数人の視聴者の個人属性結果のもっともらしさのスコアとして結合スコアSの時系列を算出し、ハイライトシーンを予測してもよい。このように、情報処理システム1においては、複数人で視聴することにより、シーン予測範囲が視聴者自身と異なる個人属性に広がる(一緒に視聴している人の個人属性まで広がる)ため、serendipityの効果が見込める。 For example, when there are multiple viewers, such as when a family is watching television, the information processing system 1 predicts a highlight scene from a set of real-time spectators who have personal attributes common to the multiple viewers. may For example, when there is no common personal attribute, the information processing system 1 calculates the time series of the combined score S j by using Equation (1) A i as the score of the likelihood of the personal attribute results of multiple viewers, A highlight scene may be predicted. In this way, in the information processing system 1, by viewing with a plurality of people, the scene prediction range extends to individual attributes different from those of the viewer itself (extends to personal attributes of people watching together). expected to be effective.
[1-8.処理例]
 ここから、その他の処理例について説明する。なお、上述した内容と同様の点については適宜説明を省略する。
[1-8. Processing example]
Other processing examples will now be described. It should be noted that descriptions of the same points as those described above will be omitted as appropriate.
[1-8-1.行動指標判定器によるハイライトシーン予測]
 ここから、上述した行動指標判定器によるハイライトシーン予測の一例について説明する。以下に例示を示すように、情報処理システム1は、視聴者自身もしくは視聴者に最も近い属性を持つリアルタイム観客集合の特徴量を入力とし、行動指標判定器によりハイライトシーン予測を行ってもよい。
[1-8-1. Highlight scene prediction by action index determiner]
From here, an example of highlight scene prediction by the action index determination device described above will be described. As exemplified below, the information processing system 1 may receive as input the feature amount of the viewer himself or a real-time audience group having attributes closest to the viewer, and perform highlight scene prediction using an action index determiner. .
 例えば、情報処理システム1は、盛り上がり度のみを使用し、盛り上がり度が閾値以上の期間をハイライトシーンに決定してもよい。例えば、情報処理システム1は、図13の例では、ホームファン属性(のユーザ)については、ホーム得点時のシーンをハイライトシーンに決定する。 For example, the information processing system 1 may use only the degree of excitement and determine a period in which the degree of excitement is equal to or greater than a threshold as a highlight scene. For example, in the example of FIG. 13, the information processing system 1 determines the scene at the time of home scoring as a highlight scene for (a user of) the home fan attribute.
 また、情報処理システム1は、盛り上がり度や集中度などのPositive指標(第1の指標)と、がっかり度や飽きている度等のNegative指標(第2の指標)との複数の指標を用いて、ハイライトシーンを決定してもよい。例えば、情報処理システム1は、式(2)を用いて、ハイライトシーンを決定してもよい。 In addition, the information processing system 1 uses a plurality of indicators, a positive indicator (first indicator) such as excitement level and concentration level, and a negative indicator (second indicator) such as disappointment level and boredom level. , may determine the highlight scene. For example, the information processing system 1 may determine the highlight scene using Equation (2).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 式(2)のkは各行動指標の重み係数を示す。重み係数kは、Positive指標は正の値として定義され、Negative指標は負の値として定義される。式(2)のBは、各行動指標の入力特徴量に対する行動指標判定器の判定スコアを示す。式(2)の以下のSは結合スコアを示し、情報処理システム1は、結合スコアSの時系列から、ハイライトシーンを予測してもよい。 k i in Equation (2) indicates a weighting factor for each behavior index. The weighting factor k i is defined as a positive value for a positive index and a negative value for a negative index. B i in Equation (2) indicates the determination score of the action index determiner for the input feature amount of each action index. The following S t in Equation (2) indicates the combined score, and the information processing system 1 may predict the highlight scene from the time series of the combined score S t .
 例えば、情報処理システム1は、上記のようなPositive指標(第1の指標)及びNegative指標(第2の指標)を符号重み付きで結合した結合スコアSを算出し、結合スコアSが閾値以上の期間をハイライトシーンに決定してもよい。 For example, the information processing system 1 calculates a joint score S t that combines the above-described Positive index (first index) and Negative index (second index) with sign weighting, and the joint score S t is a threshold value. The above period may be determined as a highlight scene.
 なお、上述した処理は一例に過ぎず、情報処理システム1は、様々な情報を適宜用いてハイライトシーンを決定してもよい。また、例えば、情報処理システム1は、指標ごとにスコアが最も高い期間をシーンとして選択し、盛り上がり・集中・緊張・がっかり・怒りが多様に入って緩急・起伏のあるハイライトシーンを抽出してもよい。 It should be noted that the above-described processing is merely an example, and the information processing system 1 may appropriately use various information to determine a highlight scene. In addition, for example, the information processing system 1 selects the period with the highest score for each index as a scene, and extracts a highlight scene that includes a variety of excitement, concentration, tension, disappointment, and anger, and that is slow and undulating. good too.
[1-8-2.アングル推定例]
 ここから、アングル推定例について、図18を用いて説明する。図18は、ハイライトのアングル推定の一例を示す図である。図18では、情報処理システム1は、フォーカス位置CN1を中心として、フォーカス位置CN1から放射状に延びる8本の点線AG1~AG8で観客席を同じ角度となるエリアに8分割した場合を示す。例えば、情報処理システム1は、各エリアの盛り上がり度の平均値を算出し、最も盛り上がり度の高いエリアの方向を最適アングルとして推定する。例えば、情報処理システム1は、適切な位置やアングル等を推定するために用いる位置アングル推定器であるモデルM11を用いて、ハイライトシーンのアングル等を推定してもよい。
[1-8-2. Angle estimation example]
An example of angle estimation will now be described with reference to FIG. 18 . FIG. 18 is a diagram illustrating an example of highlight angle estimation. FIG. 18 shows a case where the information processing system 1 divides the audience seats into eight areas having the same angle with eight dotted lines AG1 to AG8 radially extending from the focus position CN1 centering on the focus position CN1. For example, the information processing system 1 calculates the average value of the degree of excitement in each area, and estimates the direction of the area with the highest degree of excitement as the optimum angle. For example, the information processing system 1 may estimate the angle and the like of the highlight scene using the model M11, which is a position and angle estimator used for estimating the appropriate position, angle and the like.
 例えば、情報処理システム1は、フォーカス位置を推定してもよい。例えば、情報処理システム1は、リアルタイム観客特徴量データの顔向きと視線方向の統計値より、ハイライトシーンのカメラのフォーカス位置を推定してもよい。また、例えば、情報処理システム1は、会場の観客(リアルタイム観客特徴量データ)をフォーカス位置に対する角度でエリア分割し、エリアごとの盛り上がり度から最適角度を推定してもよい。この場合、情報処理システム1は、最も盛り上がっているエリアからの角度を最適アングルと予測してもよい。 For example, the information processing system 1 may estimate the focus position. For example, the information processing system 1 may estimate the focus position of the camera in the highlight scene from the statistical values of the face orientation and line-of-sight direction of the real-time audience feature amount data. Further, for example, the information processing system 1 may divide the audience (real-time audience feature amount data) in the venue into areas based on the angle with respect to the focus position, and estimate the optimum angle from the degree of excitement for each area. In this case, the information processing system 1 may predict the angle from the area with the highest swelling as the optimum angle.
 また、情報処理システム1は、アングル推定においては、ファン属性などのバイアスを除去してもよい。この場合、情報処理システム1は、例えばビギナー属性の観客の盛り上がり度からアングル等を推定してもよい。また、情報処理システム1は、ハイライトシーン抽出と同様に視聴者に最も近い属性を持つリアルタイム観客集合の盛り上がり度からアングル等を推定してもよい。 In addition, the information processing system 1 may remove biases such as fan attributes in angle estimation. In this case, the information processing system 1 may estimate the angle or the like from the excitement level of the spectators of the beginner attribute, for example. Further, the information processing system 1 may estimate the angle and the like from the degree of excitement of the real-time spectator group having the attribute closest to the viewer, similar to extracting the highlight scene.
 なお、上述した処理は一例に過ぎず、情報処理システム1は、様々な情報を適宜用いてアングル等を推定してもよい。例えば、情報処理システム1は、どのカメラからの映像を選択するかの判定に利用して、ハイライト動画生成のカメラワークを決定してもよい。例えば、情報処理システム1は、スポーツなどの自由視点映像を視聴する際の最適位置やアングルを提案してもよい。これにより、情報処理システム1では、位置とアングルを手動操作するのが難しい自由視点映像における最適なアングル等を提案することができる。 It should be noted that the above-described processing is merely an example, and the information processing system 1 may use various information as appropriate to estimate angles and the like. For example, the information processing system 1 may determine camerawork for generating a highlight video by using it to determine which camera to select the video from. For example, the information processing system 1 may propose the optimum position and angle for viewing free-viewpoint video such as sports. As a result, the information processing system 1 can propose the optimum angle and the like in the free-viewpoint video for which it is difficult to manually operate the position and angle.
[1-8-3.提示例]
 情報処理システム1は、上述した各種の処理により生成した情報を提示してもよい。以下では、情報の提示の一例について、図19を用いて説明する。図19は、観客に関する情報の提示の一例を示す図である。例えば、図19は、自由視点映像の観客映像の属性または盛り上がりのヒートマップとして提示するUI(ユーザインタフェイス)の一例を示す。
[1-8-3. Presentation example]
The information processing system 1 may present information generated by the various types of processing described above. An example of information presentation will be described below with reference to FIG. 19 . FIG. 19 is a diagram showing an example of presentation of information about spectators. For example, FIG. 19 shows an example of a UI (user interface) presented as a heat map of the attributes of spectator video of free viewpoint video or excitement.
 例えば、情報処理システム1のハイライト生成サーバ100は、コンテンツCT31を対象として、観客の盛り上がり度に応じて観客席にヒートマップを重畳表示したコンテンツCT32を生成する(ステップS31)。図19では、ハイライト生成サーバ100は、観客席のうち右側の観客席の部分が最も盛り上がっており、左に向かうにつれて盛り上がり度が低下することを示すヒートマップを重畳表示したコンテンツCT32を生成するそして、ハイライト生成サーバ100は、生成したコンテンツCT32をエッジ視聴端末10に提供する。また、ハイライト生成サーバ100は、ヒートマップのうち最も高い部分に任意の情報(図19では炎のアイコン)を配置してもよい。 For example, the highlight generation server 100 of the information processing system 1 generates a content CT32 in which a heat map is superimposed on the audience seats according to the degree of excitement of the audience, targeting the content CT31 (step S31). In FIG. 19, the highlight generation server 100 generates a content CT32 superimposed with a heat map indicating that the audience seats on the right side of the audience seats are the most exciting, and the degree of excitement decreases toward the left. Then, the highlight generation server 100 provides the edge viewing terminal 10 with the generated content CT32. Also, the highlight generation server 100 may place arbitrary information (a flame icon in FIG. 19) in the highest portion of the heat map.
 例えば、情報処理システム1のエッジ視聴端末10は、自由視点映像の観客映像の属性または盛り上がりのヒートマップを表示する。図19では、エッジ視聴端末10は、観客席を観客の盛り上がり度に応じてヒートマップを表示する。 For example, the edge viewing terminal 10 of the information processing system 1 displays the attribute of the spectator video of the free viewpoint video or the heat map of the excitement. In FIG. 19, the edge viewing terminal 10 displays a heat map of the spectator seats according to the degree of excitement of the spectators.
 なお、上述した処理は一例に過ぎず、情報処理システム1は、様々な情報を提示してもよい。例えば、情報処理システム1は、ファン属性を色で示し、盛り上がり度をその透過度で示すヒートマップを提示してもよい。また、情報処理システム1は、例えば閾値を超えた箇所にアイコン等を配置することにより、コンテンツを演出表示してもよい。 It should be noted that the processing described above is merely an example, and the information processing system 1 may present various types of information. For example, the information processing system 1 may present a heat map that indicates fan attributes with colors and the degree of excitement with transparency. In addition, the information processing system 1 may perform presentation display of the content, for example, by arranging an icon or the like at a location exceeding the threshold value.
[1-9.応用例・変形例・効果等]
 ここで、上述した内容の応用例、変形例、効果等について記載する。情報処理システム1は、会場現地への来場者(フルタイムでのリアルタイム視聴)向けに個人化ハイライト動画を販売してもよい。例えば、情報処理システム1は、遅れて到着などによりユーザが退席している期間があれば、その期間は属性最適化ハイライトを用いてもよい。
[1-9. Application examples, modifications, effects, etc.]
Here, application examples, modifications, effects, etc. of the above contents will be described. The information processing system 1 may sell personalized highlight videos to visitors (full-time real-time viewing) at the site of the venue. For example, the information processing system 1 may use attribute optimization highlighting during a period in which the user is absent due to late arrival or the like.
 また、情報処理システム1は、カメラ搭載テレビ、PC等での視聴時に(リアルタイム視聴無し)属性最適化ハイライト動画を提供してもよい。情報処理システム1は、ハイライト視聴時にも個人属性判定を行って保存されている個人属性判定値の更新(修正または追加等)を行ってもよい。 In addition, the information processing system 1 may provide an attribute-optimized highlight video when viewed on a camera-equipped TV, PC, or the like (without real-time viewing). The information processing system 1 may also perform personal attribute determination during highlight viewing and update (correction, addition, or the like) the stored personal attribute determination value.
 また、情報処理システム1は、途中から(途中まで)リアルタイムで見た場合は、見てない期間を属性最適化ハイライト、見た期間を個人化ハイライトで生成し結合してもよい。情報処理システム1は、途中から見て追いかけ再生する時に、見てない前半部分を属性最適化ハイライトで提示してもよい。 In addition, when viewing from the middle (to the middle) in real time, the information processing system 1 may generate attribute optimization highlights for the periods not viewed and personalized highlights for the periods viewed, and combine them. The information processing system 1 may present the unwatched first half with attribute optimization highlights when playing back after watching from the middle.
 また、情報処理システム1は、自由視点コンテンツの付加機能として、推定アングルによる自動ハイライト再生を提供してもよい。情報処理システム1は、人手によるハイライト編集のサポートツールとして、抽出したハイライトシーンとアングル推定によるカメラワークの提案を行ってもよい。 In addition, the information processing system 1 may provide automatic highlight reproduction based on an estimated angle as an additional function of free-viewpoint content. As a support tool for manual highlight editing, the information processing system 1 may propose camera work based on extracted highlight scenes and angle estimation.
 上述した情報処理システム1により、スポーツやライブの種別によらない汎用化されたシーン抽出エンジンとして流用することが出来るため、多種多様なコンテンツに展開する事が可能となる。上述した情報処理システム1により、スポーツの試合自体やライブイベントのコンテンツ側のデータを使用せず、観客側の撮影データのみを入力とした解析アルゴリズムとなるため、特定のスポーツの試合やイベントの観客行動で学習したアルゴリズムを、他の種別のコンテンツにも適用可能となる。 With the information processing system 1 described above, it can be used as a generalized scene extraction engine that does not depend on the type of sports or live performance, so it can be developed into a wide variety of contents. The above-described information processing system 1 does not use the data on the content side of the sports game itself or the live event, but the analysis algorithm that inputs only the shooting data of the spectator side. Algorithms learned by behavior can be applied to other types of content.
 上述した情報処理システム1により、アルゴリズムの汎用性が高く、低コストで他の種別のコンテンツにも適用可能となる。上述した情報処理システム1により、歓声などを使った行動指標判定やハイライトシーン種別に対し、画像での個人識別により全体集合だけでなく個人や属性ごとの判定や予測が可能となる。上述した情報処理システム1により、個人の属性および好みに合ったハイライトシーンを抽出し動画を生成する事が可能となる。 With the information processing system 1 described above, the algorithm has high versatility and can be applied to other types of content at low cost. With the information processing system 1 described above, it is possible to perform determination and prediction not only for the entire group but also for each individual and attribute by individual identification in images for action index determination and highlight scene types using cheers and the like. With the information processing system 1 described above, it is possible to extract highlight scenes that match individual attributes and tastes and generate moving images.
 上述した情報処理システム1により、ハイライトの視聴者自身が実際にリアルタイム視聴時に盛り上がったシーンを自身のその時の映像と共に見ることができるようになり、イベントに参加した思い出の振り返り動画として視聴することができる。上述した情報処理システム1により、視聴者の属性にあったハイライト動画が生成されることで、視聴者自身の好みに合ったハイライト動画を視聴者に視聴させることができる。 With the information processing system 1 described above, the viewer of the highlight can actually see the scene that was exciting during the real-time viewing together with the video of the viewer at that time, and can watch it as a retrospective video of the memories of participating in the event. can be done. The information processing system 1 described above generates a highlight video that matches the attributes of the viewer, so that the viewer can be made to view the highlight video that matches the viewer's own taste.
 上述した情報処理システム1により、行動指標判定を用いたハイライトシーン予測(各指標のスコアが最も高いシーンを選出)により、ストーリー性のあるハイライト動画を視聴できる。また、上述した情報処理システム1により、Negative指標によるシーン抽出も行われることで、serendipityの効果を見込むことができる。 With the information processing system 1 described above, a highlight video with a storyline can be viewed by highlight scene prediction (selecting the scene with the highest score for each index) using action index determination. In addition, the above-described information processing system 1 also performs scene extraction based on the Negative index, so that the effect of serendipity can be expected.
 上述した情報処理システム1により、スポーツの会場などの現地のみならず、リモート視聴環境(カメラ搭載テレビやWebカメラ付きPCなど)においても、個人属性に最適化されたシーン抽出が適用できる。 With the information processing system 1 described above, scene extraction optimized for personal attributes can be applied not only at sports venues, but also in remote viewing environments (camera-equipped TVs, PCs with web cameras, etc.).
 上述した情報処理システム1により、アングル、位置推定により、ハイライトシーン抽出と同様に他の種別のコンテンツにも適用可能となり汎用性が高く、見たいアングルの個人差・属性差にも対応可能となる。上述した情報処理システム1により、自由視点映像コンテンツへの位置/アングル提案によりコンテンツとしての付加価値を向上させることができる。上述した情報処理システム1により、データ収集、学習、解析アルゴリズムの改善のサイクルで継続的に性能を向上させることができる。 The above-described information processing system 1 can be applied to other types of content by estimating angles and positions in the same way as highlight scene extraction, making it highly versatile and capable of responding to individual differences and attribute differences in viewing angles. Become. With the information processing system 1 described above, it is possible to improve the added value of the content by proposing a position/angle to the free-viewpoint video content. With the information processing system 1 described above, performance can be continuously improved through cycles of data collection, learning, and analysis algorithm improvement.
[2.その他の実施形態]
 上述した各実施形態に係る処理は、上記各実施形態や変形例以外にも種々の異なる形態(変形例)にて実施されてよい。
[2. Other embodiments]
The processes according to the above-described embodiments may be implemented in various different forms (modifications) other than the above-described embodiments and modifications.
[2-1.その他の構成例]
 上記の情報処理システム1の装置構成は一例に過ぎず、情報処理システム1における機能の分割は任意の態様が採用可能である。例えば、エッジ視聴端末10、コンテンツ配信サーバ50または観客映像収集サーバ60が、ハイライト生成サーバ100のようにハイライトを生成する情報処理装置であってもよい。すなわち、上述したハイライト生成サーバ100の機能を、エッジ視聴端末10、コンテンツ配信サーバ50または観客映像収集サーバ60のいずれかが有してもよい。この場合、情報処理システム1は、ハイライト生成サーバ100に代えて、エッジ視聴端末10、コンテンツ配信サーバ50または観客映像収集サーバ60等の各装置から情報を収集し、各装置へ情報を提供する情報提供サーバを含んでもよい。
[2-1. Other configuration examples]
The device configuration of the information processing system 1 described above is merely an example, and any form of division of functions in the information processing system 1 can be adopted. For example, the edge viewing terminal 10, the content distribution server 50, or the spectator video collection server 60 may be an information processing device that generates highlights like the highlight generation server 100. FIG. That is, any one of the edge viewing terminal 10, the content distribution server 50, or the spectator video collection server 60 may have the function of the highlight generation server 100 described above. In this case, the information processing system 1 collects information from each device such as the edge viewing terminal 10, the content distribution server 50, or the spectator video collection server 60 instead of the highlight generation server 100, and provides the information to each device. An information providing server may be included.
 例えば、上述したハイライト生成サーバ100の機能を、エッジ視聴端末10が有する場合、エッジ視聴端末10は、記憶部120に記憶される情報を保有し、学習部132、画像処理部133、生成部134の機能を有してもよい。エッジ視聴端末10は、情報提供サーバ、コンテンツ配信サーバ50、及び観客映像収集サーバ60から各種情報を取得し、取得した情報を用いてハイライトを生成してもよい。 For example, when the edge viewing terminal 10 has the functions of the highlight generation server 100 described above, the edge viewing terminal 10 has information stored in the storage unit 120, and has a learning unit 132, an image processing unit 133, and a generating unit. 134 functions. The edge viewing terminal 10 may acquire various types of information from the information providing server, the content distribution server 50, and the spectator video collection server 60, and generate highlights using the acquired information.
 なお、上述した構成は一例であり、上述したハイライトに関するサービスを提供可能であれば、情報処理システム1は、どのような機能の分割態様であってもよく、どのような装置構成であってもよい。 Note that the above-described configuration is merely an example, and the information processing system 1 may have any function division mode and any device configuration as long as it can provide the above-described highlight service. good too.
[2-2.その他]
 また、上記各実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。
[2-2. others]
Further, among the processes described in each of the above embodiments, all or part of the processes described as being performed automatically can be performed manually, or the processes described as being performed manually can be performed manually. can also be performed automatically by known methods. In addition, information including processing procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each drawing is not limited to the illustrated information.
 また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Also, each component of each device illustrated is functionally conceptual and does not necessarily need to be physically configured as illustrated. In other words, the specific form of distribution and integration of each device is not limited to the one shown in the figure, and all or part of them can be functionally or physically distributed and integrated in arbitrary units according to various loads and usage conditions. Can be integrated and configured.
 また、上述してきた各実施形態及び変形例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 In addition, the above-described embodiments and modifications can be appropriately combined within a range that does not contradict the processing content.
 また、本明細書に記載された効果はあくまで例示であって限定されるものでは無く、他の効果があってもよい。 In addition, the effects described in this specification are only examples and are not limited, and other effects may be provided.
[3.本開示に係る効果]
 上述のように、本開示に係る情報処理装置(例えば、実施形態ではハイライト生成サーバ100)は、取得部(実施形態では取得部131)と、生成部(実施形態では生成部134)とを備える。取得部は、イベントのリアルタイム視聴を行ったユーザである第1ユーザのイベントのリアルタイム視聴時の状態を示す状態情報と、イベントの映像であるイベントコンテンツとを取得する。生成部は、取得部により取得された第1ユーザの状態情報により決定されるイベントコンテンツの一部を用いて、第2ユーザに提供するイベントコンテンツのハイライトを生成する。
[3. Effects of the Present Disclosure]
As described above, the information processing apparatus according to the present disclosure (for example, the highlight generation server 100 in the embodiment) includes an acquisition unit (the acquisition unit 131 in the embodiment) and a generation unit (the generation unit 134 in the embodiment). Prepare. The acquisition unit acquires state information indicating a state of a first user, who has viewed the event in real time, when viewing the event in real time, and event content, which is video of the event. The generator generates highlights of the event content to be provided to the second user, using a portion of the event content determined by the state information of the first user acquired by the acquirer.
 このように、本開示に係る情報処理装置は、イベントのリアルタイム視聴を行ったユーザの状態を基に、イベントコンテンツの一部を用いて、イベントコンテンツのハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 In this way, the information processing apparatus according to the present disclosure generates highlights of the event content using part of the event content based on the state of the user who has viewed the event in real time, so that You can generate special highlights.
 また、取得部は、イベントを撮影した映像であるイベントコンテンツを取得する。このように、情報処理装置は、イベントのリアルタイム視聴を行ったユーザの状態を基に、イベントを撮影した映像であるイベントコンテンツのハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 Also, the acquisition unit acquires event content, which is video of the event. In this way, the information processing apparatus generates highlights of the event content, which is video of the event, based on the state of the user who watched the event in real time, thereby generating highlights corresponding to the user. be able to.
 また、取得部は、イベントの開催場所でリアルタイム視聴を行った第1ユーザの状態情報を取得する。このように、情報処理装置は、イベントの開催場所でリアルタイム視聴を行ったユーザの状態を基に、イベントコンテンツのハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 In addition, the acquisition unit acquires the status information of the first user who viewed the event in real time at the venue of the event. In this way, the information processing apparatus can generate highlights according to the user by generating highlights of event content based on the user's state of real-time viewing at the venue of the event.
 また、取得部は、スポーツまたは芸術のイベントのリアルタイム視聴を行った第1ユーザの状態情報を取得する。このように、情報処理装置は、スポーツまたは芸術のイベントのリアルタイム視聴を行ったユーザの状態を基に、イベントコンテンツのハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 Also, the acquisition unit acquires the state information of the first user who watched the sports or art event in real time. In this way, the information processing apparatus can generate highlights according to the user by generating event content highlights based on the state of the user who has watched the sports or arts event in real time. .
 また、取得部は、イベントを現地で観た第1ユーザの状態情報を取得する。このように、情報処理装置は、スポーツまたは芸術のイベントを現地で観たユーザの状態を基に、イベントコンテンツのハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 Also, the acquisition unit acquires the state information of the first user who watched the event locally. In this way, the information processing apparatus can generate highlights according to the user by generating event content highlights based on the state of the user who watched the sports or arts event at the site.
 また、生成部は、状態情報に基づく入力データの入力に応じて、イベントの期間に対応するスコアを出力するモデルを用いて、イベントコンテンツのハイライトを生成する。このように、情報処理装置は、ユーザの状態に基づく入力データの入力に応じて、イベントの期間に対応するスコアを出力するモデルを用いて、イベントコンテンツのハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 In addition, the generating unit generates event content highlights using a model that outputs a score corresponding to the period of the event in response to the input of input data based on the state information. In this way, the information processing apparatus uses a model that outputs a score corresponding to the period of the event in response to the input of input data based on the user's state, and generates highlights of the event content so that the user can You can generate corresponding highlights.
 また、生成部は、モデルを用いてイベントコンテンツの一部を決定し、決定したイベントコンテンツの一部を用いて、イベントコンテンツのハイライトを生成する。このように、情報処理装置は、モデルを用いて決定したイベントコンテンツの一部を用いて、イベントコンテンツのハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 In addition, the generation unit uses the model to determine part of the event content, and uses the determined part of the event content to generate highlights of the event content. In this way, the information processing apparatus can generate highlights according to the user by generating highlights of the event content using part of the event content determined using the model.
 また、生成部は、イベントコンテンツのうち、閾値以上であるスコアに対応する期間に該当する部分を、イベントコンテンツの一部に決定し、決定したイベントコンテンツの一部を用いて、イベントコンテンツのハイライトを生成する。このように、情報処理装置は、モデルが出力したスコアが閾値以上の期間に該当する部分を、イベントコンテンツの一部として、イベントコンテンツのハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 In addition, the generation unit determines a portion of the event content corresponding to a period corresponding to a score equal to or greater than the threshold as part of the event content, and uses the determined part of the event content to generate a high score of the event content. generate light. In this way, the information processing apparatus generates highlights of the event content as part of the event content corresponding to a period in which the score output by the model is equal to or greater than the threshold value, thereby providing highlights according to the user. can be generated.
 また、本開示に係る情報処理装置は、学習部(実施形態では学習部132)を備える。学習部は、過去のイベントのハイライトと、過去のイベントのリアルタイム視聴を行ったユーザの状態情報とを含む学習データを用いてモデルを学習する。このように、情報処理装置は、学習したモデルを用いて、イベントコンテンツのハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 Also, the information processing apparatus according to the present disclosure includes a learning unit (learning unit 132 in the embodiment). The learning unit learns the model using learning data including highlights of past events and status information of users who viewed the past events in real time. In this way, the information processing apparatus can generate highlights according to the user by generating highlights of event content using the learned model.
 また、生成部は、学習部により学習されたモデルを用いてイベントコンテンツのハイライトを生成する。このように、情報処理装置は、学習したモデルを用いて、イベントコンテンツのハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 In addition, the generation unit generates highlights of event content using the model learned by the learning unit. In this way, the information processing apparatus can generate highlights according to the user by generating highlights of event content using the learned model.
 また、取得部は、第2ユーザがイベントのリアルタイム視聴を行っていた場合、第2ユーザのイベントのリアルタイム視聴時の状態を示す状態情報を、第1ユーザの状態情報として取得する。生成部は、第2ユーザの状態情報により決定されるイベントコンテンツの一部を用いて、第2ユーザに提供するイベントコンテンツのハイライトを生成する。このように、情報処理装置は、イベントのリアルタイム視聴を行ったユーザ自身の状態を基に、そのユーザに提供するハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 Also, if the second user is viewing the event in real time, the acquisition unit acquires state information indicating the state of the second user viewing the event in real time as the state information of the first user. The generator generates highlights of event content to be provided to the second user, using a portion of the event content determined by the second user's state information. In this way, the information processing apparatus can generate highlights according to the user by generating highlights to be provided to the user based on the state of the user who has watched the event in real time.
 また、取得部は、第2ユーザがイベントのリアルタイム視聴を行っていなかった場合、第2ユーザとは異なるユーザである第1ユーザの状態情報として取得する。このように、情報処理装置は、ハイライトの提供先のユーザがイベントのリアルタイム視聴を行っていなかった場合、そのユーザとは異なるユーザの状態を基に、そのユーザに提供するハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 Also, if the second user is not viewing the event in real time, the acquisition unit acquires the state information of the first user who is different from the second user. In this way, when the user to whom the highlights are provided is not viewing the event in real time, the information processing apparatus generates the highlights to be provided to the user based on the state of the user different from that of the user. Thus, it is possible to generate highlights suitable for the user.
 また、取得部は、第2ユーザの属性に類似するユーザである第1ユーザの状態情報として取得する。このように、情報処理装置は、ハイライトの提供先のユーザの属性に類似する類似ユーザの状態を基に、そのユーザに提供するハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 Also, the acquisition unit acquires the state information of the first user who is a user similar to the attributes of the second user. In this way, the information processing apparatus generates highlights to be provided to the user based on the states of similar users who are similar to the attributes of the user to whom the highlights are provided, thereby generating highlights suitable for the user. can do.
 また、取得部は、第2ユーザのデモグラフィック属性に類似するユーザである第1ユーザの状態情報として取得する。このように、情報処理装置は、ハイライトの提供先のユーザのデモグラフィック属性に類似する類似ユーザの状態を基に、そのユーザに提供するハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 Also, the acquisition unit acquires the state information of the first user, who is a user similar to the demographic attributes of the second user. In this way, the information processing apparatus generates highlights to be provided to the user based on the states of similar users who are similar to the demographic attributes of the user to whom the highlights are provided, thereby providing highlights suitable for the user. can be generated.
 また、取得部は、第2ユーザの年齢及び性別のうち少なくとも1つが類似するユーザである第1ユーザの状態情報として取得する。このように、情報処理装置は、ハイライトの提供先のユーザの年齢及び性別のうち少なくとも1つが類似する類似ユーザの状態を基に、そのユーザに提供するハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 Also, the acquisition unit acquires the state information of the first user who is a user similar to the second user in at least one of age and sex. In this way, the information processing apparatus generates highlights to be provided to the user based on the states of similar users who are similar in at least one of age and gender to the user to whom the highlights are provided, thereby providing the user with You can generate corresponding highlights.
 また、取得部は、第2ユーザのサイコグラフィック属性に類似するユーザである第1ユーザの状態情報として取得する。このように、情報処理装置は、ハイライトの提供先のユーザのサイコグラフィック属性に類似する類似ユーザの状態を基に、そのユーザに提供するハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 Also, the acquisition unit acquires the state information of the first user, who is a user similar to the psychographic attributes of the second user. In this way, the information processing apparatus generates highlights to be provided to the user based on the states of similar users who are similar to the psychographic attributes of the user to whom the highlights are provided, thereby providing highlights suitable for the user. can be generated.
 また、取得部は、第2ユーザの嗜好性に類似するユーザである第1ユーザの状態情報として取得する。このように、情報処理装置は、ハイライトの提供先のユーザの嗜好性に類似する類似ユーザの状態を基に、そのユーザに提供するハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 Also, the acquisition unit acquires the state information of the first user, who is a user similar to the second user's preferences. In this way, the information processing apparatus generates highlights to be provided to the user based on the states of similar users who are similar to the preferences of the user to whom the highlights are provided, thereby providing the highlights according to the user. can be generated.
 また、取得部は、愛好する対象が第2ユーザと一致するユーザである第1ユーザの状態情報として取得する。このように、情報処理装置は、ハイライトの提供先のユーザと好きな対象が一致する類似ユーザの状態を基に、そのユーザに提供するハイライトを生成することにより、ユーザに応じたハイライトを生成することができる。 In addition, the acquisition unit acquires the state information of the first user who is a user whose love target matches that of the second user. In this way, the information processing apparatus generates highlights to be provided to the user based on the states of similar users whose favorite objects match those of the user to whom the highlights are provided, thereby providing the highlights corresponding to the user. can be generated.
 また、本開示に係る情報処理装置は、送信部(実施形態では送信部135)を備える。送信部は、生成部により生成されたイベントコンテンツのハイライトを第2ユーザが利用する端末装置(実施形態ではエッジ視聴端末10)へ送信する。このように、情報処理装置は、生成したハイライトをユーザが利用する端末装置へ送信することで、ユーザに応じたハイライトを適切に提供することができる。 Further, the information processing device according to the present disclosure includes a transmission unit (transmission unit 135 in the embodiment). The transmission unit transmits the highlight of the event content generated by the generation unit to the terminal device (the edge viewing terminal 10 in the embodiment) used by the second user. In this way, the information processing apparatus can appropriately provide highlights according to the user by transmitting the generated highlights to the terminal device used by the user.
[4.ハードウェア構成]
 上述してきた各実施形態に係るハイライト生成サーバ100、エッジ視聴端末10、コンテンツ配信サーバ50及び観客映像収集サーバ60等の情報処理装置(情報機器)は、例えば図20に示すような構成のコンピュータ1000によって実現される。図20は、情報処理装置の機能を実現するコンピュータ1000の一例を示すハードウェア構成図である。以下、実施形態に係るハイライト生成サーバ100を例に挙げて説明する。コンピュータ1000は、CPU1100、RAM1200、ROM(Read Only Memory)1300、HDD(Hard Disk Drive)1400、通信インターフェイス1500、及び入出力インターフェイス1600を有する。コンピュータ1000の各部は、バス1050によって接続される。
[4. Hardware configuration]
Information processing devices (information devices) such as the highlight generation server 100, the edge viewing terminal 10, the content distribution server 50, and the spectator video collection server 60 according to the above-described embodiments are computers configured as shown in FIG. 1000. FIG. 20 is a hardware configuration diagram showing an example of a computer 1000 that implements the functions of the information processing apparatus. The highlight generation server 100 according to the embodiment will be described below as an example. The computer 1000 has a CPU 1100 , a RAM 1200 , a ROM (Read Only Memory) 1300 , a HDD (Hard Disk Drive) 1400 , a communication interface 1500 and an input/output interface 1600 . Each part of computer 1000 is connected by bus 1050 .
 CPU1100は、ROM1300又はHDD1400に格納されたプログラムに基づいて動作し、各部の制御を行う。例えば、CPU1100は、ROM1300又はHDD1400に格納されたプログラムをRAM1200に展開し、各種プログラムに対応した処理を実行する。 The CPU 1100 operates based on programs stored in the ROM 1300 or HDD 1400 and controls each section. For example, the CPU 1100 loads programs stored in the ROM 1300 or HDD 1400 into the RAM 1200 and executes processes corresponding to various programs.
 ROM1300は、コンピュータ1000の起動時にCPU1100によって実行されるBIOS(Basic Input Output System)等のブートプログラムや、コンピュータ1000のハードウェアに依存するプログラム等を格納する。 The ROM 1300 stores a boot program such as BIOS (Basic Input Output System) executed by the CPU 1100 when the computer 1000 is started, and programs dependent on the hardware of the computer 1000.
 HDD1400は、CPU1100によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を非一時的に記録する、コンピュータが読み取り可能な記録媒体である。具体的には、HDD1400は、プログラムデータ1450の一例である本開示に係る情報処理プログラムを記録する記録媒体である。 The HDD 1400 is a computer-readable recording medium that non-temporarily records programs executed by the CPU 1100 and data used by such programs. Specifically, HDD 1400 is a recording medium that records an information processing program according to the present disclosure, which is an example of program data 1450 .
 通信インターフェイス1500は、コンピュータ1000が外部ネットワーク1550(例えばインターネット)と接続するためのインターフェイスである。例えば、CPU1100は、通信インターフェイス1500を介して、他の機器からデータを受信したり、CPU1100が生成したデータを他の機器へ送信したりする。 A communication interface 1500 is an interface for connecting the computer 1000 to an external network 1550 (for example, the Internet). For example, CPU 1100 receives data from another device via communication interface 1500, and transmits data generated by CPU 1100 to another device.
 入出力インターフェイス1600は、入出力デバイス1650とコンピュータ1000とを接続するためのインターフェイスである。例えば、CPU1100は、入出力インターフェイス1600を介して、キーボードやマウス等の入力デバイスからデータを受信する。また、CPU1100は、入出力インターフェイス1600を介して、ディスプレイやスピーカーやプリンタ等の出力デバイスにデータを送信する。また、入出力インターフェイス1600は、所定の記録媒体(メディア)に記録されたプログラム等を読み取るメディアインターフェイスとして機能してもよい。メディアとは、例えばDVD(Digital Versatile Disc)、PD(Phase change rewritable Disk)等の光学記録媒体、MO(Magneto-Optical disk)等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 The input/output interface 1600 is an interface for connecting the input/output device 1650 and the computer 1000 . For example, the CPU 1100 receives data from input devices such as a keyboard and mouse via the input/output interface 1600 . The CPU 1100 also transmits data to an output device such as a display, speaker, or printer via the input/output interface 1600 . Also, the input/output interface 1600 may function as a media interface for reading a program or the like recorded on a predetermined recording medium. Media include, for example, optical recording media such as DVD (Digital Versatile Disc) and PD (Phase change rewritable disk), magneto-optical recording media such as MO (Magneto-Optical disk), tape media, magnetic recording media, semiconductor memories, etc. is.
 例えば、コンピュータ1000が実施形態に係るハイライト生成サーバ100として機能する場合、コンピュータ1000のCPU1100は、RAM1200上にロードされた情報処理プログラムを実行することにより、制御部130等の機能を実現する。また、HDD1400には、本開示に係る情報処理プログラムや、記憶部120内のデータが格納される。なお、CPU1100は、プログラムデータ1450をHDD1400から読み取って実行するが、他の例として、外部ネットワーク1550を介して、他の装置からこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the highlight generation server 100 according to the embodiment, the CPU 1100 of the computer 1000 implements the functions of the control unit 130 and the like by executing the information processing program loaded on the RAM 1200. The HDD 1400 also stores an information processing program according to the present disclosure and data in the storage unit 120 . Although CPU 1100 reads and executes program data 1450 from HDD 1400 , as another example, these programs may be obtained from another device via external network 1550 .
 なお、本技術は以下のような構成も取ることができる。
(1)
 イベントのリアルタイム視聴を行ったユーザである第1ユーザの前記イベントのリアルタイム視聴時の状態を示す状態情報と、前記イベントの映像であるイベントコンテンツとを取得する取得部と、
 前記取得部により取得された前記第1ユーザの前記状態情報により決定される前記イベントコンテンツの一部を用いて、第2ユーザに提供する前記イベントコンテンツのハイライトを生成する生成部と、
 を備える情報処理装置。
(2)
 前記取得部は、
 前記イベントを撮影した映像である前記イベントコンテンツを取得する
 (1)に記載の情報処理装置。
(3)
 前記取得部は、
 前記イベントの開催場所で前記リアルタイム視聴を行った前記第1ユーザの前記状態情報を取得する
 (1)または(2)に記載の情報処理装置。
(4)
 前記取得部は、
 スポーツまたは芸術の前記イベントの前記リアルタイム視聴を行った前記第1ユーザの前記状態情報を取得する
 (1)~(3)のいずれか1つに記載の情報処理装置。
(5)
 前記取得部は、
 前記第1ユーザを撮像した画像情報を含む前記状態情報を取得する
 (1)~(4)のいずれか1つに記載の情報処理装置。
(6)
 前記生成部は、
 前記状態情報に基づく入力データの入力に応じて、前記イベントの期間に対応するスコアを出力するモデルを用いて、前記イベントコンテンツのハイライトを生成する
 (1)~(5)のいずれか1つに記載の情報処理装置。
(7)
 前記生成部は、
 前記モデルを用いて前記イベントコンテンツの一部を決定し、決定した前記イベントコンテンツの一部を用いて、前記イベントコンテンツのハイライトを生成する
 (6)に記載の情報処理装置。
(8)
 前記生成部は、
 前記イベントコンテンツのうち、閾値以上である前記スコアに対応する期間に該当する部分を、前記イベントコンテンツの一部に決定し、決定した前記イベントコンテンツの一部を用いて、前記イベントコンテンツのハイライトを生成する
 (7)に記載の情報処理装置。
(9)
 過去のイベントのハイライトと、前記過去のイベントのリアルタイム視聴を行ったユーザの前記状態情報とを含む学習データを用いて前記モデルを学習する学習部、
 (6)~(8)のいずれか1つに記載の情報処理装置。
(10)
 前記生成部は、
 前記学習部により学習された前記モデルを用いて前記イベントコンテンツのハイライトを生成する
 (9)に記載の情報処理装置。
(11)
 前記取得部は、
 前記第2ユーザが前記イベントの前記リアルタイム視聴を行っていた場合、前記第2ユーザの前記イベントの前記リアルタイム視聴時の状態を示す前記状態情報を、前記第1ユーザの前記状態情報として取得し、
 前記生成部は、
 前記第2ユーザの前記状態情報により決定される前記イベントコンテンツの一部を用いて、前記第2ユーザに提供する前記イベントコンテンツのハイライトを生成する
 (1)~(10)のいずれか1つに記載の情報処理装置。
(12)
 前記取得部は、
 前記第2ユーザが前記イベントの前記リアルタイム視聴を行っていなかった場合、前記第2ユーザとは異なるユーザである前記第1ユーザの前記状態情報として取得する
 (1)~(11)のいずれか1つに記載の情報処理装置。
(13)
 前記取得部は、
 前記第2ユーザの属性に類似するユーザである前記第1ユーザの前記状態情報として取得する
 (12)に記載の情報処理装置。
(14)
 前記取得部は、
 前記第2ユーザのデモグラフィック属性に類似するユーザである前記第1ユーザの前記状態情報として取得する
 (13)に記載の情報処理装置。
(15)
 前記取得部は、
 前記第2ユーザの年齢及び性別のうち少なくとも1つが類似するユーザである前記第1ユーザの前記状態情報として取得する
 (14)に記載の情報処理装置。
(16)
 前記取得部は、
 前記第2ユーザのサイコグラフィック属性に類似するユーザである前記第1ユーザの前記状態情報として取得する
 (13)~(15)のいずれか1つに記載の情報処理装置。
(17)
 前記取得部は、
 前記第2ユーザの嗜好性に類似するユーザである前記第1ユーザの前記状態情報として取得する
 (13)~(16)のいずれか1つに記載の情報処理装置。
(18)
 前記取得部は、
 愛好する対象が前記第2ユーザと一致するユーザである前記第1ユーザの前記状態情報として取得する
 (13)~(17)のいずれか1つに記載の情報処理装置。
(19)
 前記生成部により生成された前記イベントコンテンツのハイライトを前記第2ユーザが利用する端末装置に送信する送信部、
 (1)~(18)のいずれか1つに記載の情報処理装置。
(20)
 イベントのリアルタイム視聴を行ったユーザである第1ユーザの前記イベントのリアルタイム視聴時の状態を示す状態情報と、前記イベントの映像であるイベントコンテンツとを取得し、取得した前記第1ユーザの前記状態情報により決定される前記イベントコンテンツの一部を用いて、第2ユーザに提供する前記イベントコンテンツのハイライトを生成する、
 処理を実行する情報処理方法。
Note that the present technology can also take the following configuration.
(1)
an acquisition unit that acquires state information indicating a state of a first user who has viewed the event in real time, and event content that is video of the event;
a generation unit that generates highlights of the event content to be provided to a second user using a portion of the event content determined by the state information of the first user acquired by the acquisition unit;
Information processing device.
(2)
The acquisition unit
The information processing apparatus according to (1), wherein the event content, which is a video image of the event, is acquired.
(3)
The acquisition unit
The information processing apparatus according to (1) or (2), wherein the state information of the first user who has performed the real-time viewing at the venue of the event is acquired.
(4)
The acquisition unit
The information processing apparatus according to any one of (1) to (3), wherein the state information of the first user who viewed the event of sports or art in real time is acquired.
(5)
The acquisition unit
The information processing apparatus according to any one of (1) to (4), wherein the state information including image information of the first user is acquired.
(6)
The generating unit
Any one of (1) to (5), generating highlights of the event content using a model that outputs a score corresponding to the duration of the event in response to input of input data based on the state information. The information processing device according to .
(7)
The generating unit
The information processing apparatus according to (6), wherein a portion of the event content is determined using the model, and a highlight of the event content is generated using the determined portion of the event content.
(8)
The generating unit
A portion of the event content corresponding to a period corresponding to the score equal to or greater than a threshold is determined as a portion of the event content, and the determined portion of the event content is used to highlight the event content. The information processing apparatus according to (7).
(9)
a learning unit that learns the model using learning data including highlights of past events and the state information of users who viewed the past events in real time;
The information processing device according to any one of (6) to (8).
(10)
The generating unit
The information processing apparatus according to (9), wherein the model learned by the learning unit is used to generate highlights of the event content.
(11)
The acquisition unit
when the second user is viewing the event in real time, acquiring the state information indicating the state of the event when the second user is viewing the event in real time as the state information of the first user;
The generating unit
generating highlights of the event content to be provided to the second user using a portion of the event content determined by the state information of the second user; The information processing device according to .
(12)
The acquisition unit
Any one of (1) to (11) obtained as the state information of the first user who is a user different from the second user when the second user is not viewing the event in real time The information processing device according to 1.
(13)
The acquisition unit
The information processing apparatus according to (12), wherein the state information of the first user who is a user similar to the attributes of the second user is obtained.
(14)
The acquisition unit
The information processing apparatus according to (13), wherein the state information of the first user who is a user similar to the demographic attributes of the second user is obtained.
(15)
The acquisition unit
(14) The information processing apparatus according to (14), wherein the status information is acquired as the state information of the first user who is a user similar in at least one of age and sex to the second user.
(16)
The acquisition unit
The information processing apparatus according to any one of (13) to (15), wherein the status information of the first user who is a user similar to the psychographic attributes of the second user is obtained.
(17)
The acquisition unit
The information processing apparatus according to any one of (13) to (16), wherein the state information of the first user who is a user similar to the preference of the second user is obtained.
(18)
The acquisition unit
The information processing apparatus according to any one of (13) to (17), wherein the state information is acquired as the state information of the first user whose love target matches that of the second user.
(19)
a transmission unit configured to transmit a highlight of the event content generated by the generation unit to a terminal device used by the second user;
The information processing device according to any one of (1) to (18).
(20)
State information indicating a state of a first user who has watched the event in real time, and event content, which is a video of the event, are acquired, and the state of the acquired first user is obtained. using a portion of the event content determined by the information to generate a highlight of the event content to be provided to a second user;
Information processing method that performs processing.
 1 情報処理システム
 100 ハイライト生成サーバ(情報処理装置)
 110 通信部
 120 記憶部
 121 データセット記憶部
 122 モデル情報記憶部
 123 閾値情報記憶部
 124 コンテンツ情報記憶部
 130 制御部
 131 取得部
 132 学習部
 133 画像処理部
 134 生成部
 135 送信部
 10 エッジ視聴端末(端末装置)
 11 通信部
 12 音声入力部
 13 音声出力部
 14 カメラ
 15 表示部
 16 操作部
 17 記憶部
 18 制御部
 181 取得部
 182 送信部
 183 受信部
 184 処理部
 50 コンテンツ配信サーバ
 60 観客映像収集サーバ
1 information processing system 100 highlight generation server (information processing device)
110 communication unit 120 storage unit 121 data set storage unit 122 model information storage unit 123 threshold information storage unit 124 content information storage unit 130 control unit 131 acquisition unit 132 learning unit 133 image processing unit 134 generation unit 135 transmission unit 10 edge viewing terminal ( terminal equipment)
11 communication unit 12 audio input unit 13 audio output unit 14 camera 15 display unit 16 operation unit 17 storage unit 18 control unit 181 acquisition unit 182 transmission unit 183 reception unit 184 processing unit 50 content distribution server 60 spectator video collection server

Claims (20)

  1.  イベントのリアルタイム視聴を行ったユーザである第1ユーザの前記イベントのリアルタイム視聴時の状態を示す状態情報と、前記イベントの映像であるイベントコンテンツとを取得する取得部と、
     前記取得部により取得された前記第1ユーザの前記状態情報により決定される前記イベントコンテンツの一部を用いて、第2ユーザに提供する前記イベントコンテンツのハイライトを生成する生成部と、
     を備える情報処理装置。
    an acquisition unit that acquires state information indicating a state of a first user who has viewed the event in real time, and event content that is video of the event;
    a generation unit that generates highlights of the event content to be provided to a second user using a portion of the event content determined by the state information of the first user acquired by the acquisition unit;
    Information processing device.
  2.  前記取得部は、
     前記イベントを撮影した映像である前記イベントコンテンツを取得する
     請求項1に記載の情報処理装置。
    The acquisition unit
    The information processing apparatus according to claim 1, wherein the event content, which is a video image of the event, is obtained.
  3.  前記取得部は、
     前記イベントの開催場所で前記リアルタイム視聴を行った前記第1ユーザの前記状態情報を取得する
     請求項1に記載の情報処理装置。
    The acquisition unit
    The information processing apparatus according to claim 1, wherein the state information of the first user who has performed the real-time viewing at the venue of the event is acquired.
  4.  前記取得部は、
     スポーツまたは芸術の前記イベントの前記リアルタイム視聴を行った前記第1ユーザの前記状態情報を取得する
     請求項1に記載の情報処理装置。
    The acquisition unit
    The information processing apparatus according to claim 1, wherein the state information of the first user who has viewed the event of sports or art in real time is acquired.
  5.  前記取得部は、
     前記第1ユーザを撮像した画像情報を含む前記状態情報を取得する
     請求項1に記載の情報処理装置。
    The acquisition unit
    The information processing apparatus according to claim 1, wherein the state information including image information obtained by imaging the first user is acquired.
  6.  前記生成部は、
     前記状態情報に基づく入力データの入力に応じて、前記イベントの期間に対応するスコアを出力するモデルを用いて、前記イベントコンテンツのハイライトを生成する
     請求項1に記載の情報処理装置。
    The generating unit
    The information processing apparatus according to claim 1, wherein, in response to input of input data based on said state information, highlights of said event content are generated using a model that outputs a score corresponding to a period of said event.
  7.  前記生成部は、
     前記モデルを用いて前記イベントコンテンツの一部を決定し、決定した前記イベントコンテンツの一部を用いて、前記イベントコンテンツのハイライトを生成する
     請求項6に記載の情報処理装置。
    The generating unit
    The information processing apparatus according to claim 6, wherein a portion of the event content is determined using the model, and a highlight of the event content is generated using the determined portion of the event content.
  8.  前記生成部は、
     前記イベントコンテンツのうち、閾値以上である前記スコアに対応する期間に該当する部分を、前記イベントコンテンツの一部に決定し、決定した前記イベントコンテンツの一部を用いて、前記イベントコンテンツのハイライトを生成する
     請求項7に記載の情報処理装置。
    The generating unit
    A portion of the event content corresponding to a period corresponding to the score equal to or greater than a threshold is determined as a portion of the event content, and the determined portion of the event content is used to highlight the event content. The information processing apparatus according to claim 7, which generates a .
  9.  過去のイベントのハイライトと、前記過去のイベントのリアルタイム視聴を行ったユーザの前記状態情報とを含む学習データを用いて前記モデルを学習する学習部、
     請求項6に記載の情報処理装置。
    a learning unit that learns the model using learning data including highlights of past events and the state information of users who viewed the past events in real time;
    The information processing device according to claim 6 .
  10.  前記生成部は、
     前記学習部により学習された前記モデルを用いて前記イベントコンテンツのハイライトを生成する
     請求項9に記載の情報処理装置。
    The generating unit
    The information processing apparatus according to claim 9, wherein the model learned by the learning unit is used to generate highlights of the event content.
  11.  前記取得部は、
     前記第2ユーザが前記イベントの前記リアルタイム視聴を行っていた場合、前記第2ユーザの前記イベントの前記リアルタイム視聴時の状態を示す前記状態情報を、前記第1ユーザの前記状態情報として取得し、
     前記生成部は、
     前記第2ユーザの前記状態情報により決定される前記イベントコンテンツの一部を用いて、前記第2ユーザに提供する前記イベントコンテンツのハイライトを生成する
     請求項1に記載の情報処理装置。
    The acquisition unit
    when the second user is viewing the event in real time, acquiring the state information indicating the state of the event when the second user is viewing the event in real time as the state information of the first user;
    The generating unit
    The information processing apparatus according to claim 1, wherein a part of said event content determined by said state information of said second user is used to generate a highlight of said event content to be provided to said second user.
  12.  前記取得部は、
     前記第2ユーザが前記イベントの前記リアルタイム視聴を行っていなかった場合、前記第2ユーザとは異なるユーザである前記第1ユーザの前記状態情報として取得する
     請求項1に記載の情報処理装置。
    The acquisition unit
    The information processing apparatus according to claim 1, wherein when the second user is not viewing the event in real time, the state information is acquired as the state information of the first user who is a user different from the second user.
  13.  前記取得部は、
     前記第2ユーザの属性に類似するユーザである前記第1ユーザの前記状態情報として取得する
     請求項12に記載の情報処理装置。
    The acquisition unit
    The information processing apparatus according to claim 12, wherein the state information of the first user who is a user similar to the attributes of the second user is obtained.
  14.  前記取得部は、
     前記第2ユーザのデモグラフィック属性に類似するユーザである前記第1ユーザの前記状態情報として取得する
     請求項13に記載の情報処理装置。
    The acquisition unit
    The information processing apparatus according to claim 13, wherein the state information of the first user who is a user similar to the demographic attributes of the second user is obtained.
  15.  前記取得部は、
     前記第2ユーザの年齢及び性別のうち少なくとも1つが類似するユーザである前記第1ユーザの前記状態情報として取得する
     請求項14に記載の情報処理装置。
    The acquisition unit
    The information processing apparatus according to claim 14, wherein the state information is obtained for the first user who is a user similar in at least one of age and sex to the second user.
  16.  前記取得部は、
     前記第2ユーザのサイコグラフィック属性に類似するユーザである前記第1ユーザの前記状態情報として取得する
     請求項13に記載の情報処理装置。
    The acquisition unit
    The information processing apparatus according to claim 13, wherein the state information of the first user who is a user similar to the psychographic attributes of the second user is obtained.
  17.  前記取得部は、
     前記第2ユーザの嗜好性に類似するユーザである前記第1ユーザの前記状態情報として取得する
     請求項13に記載の情報処理装置。
    The acquisition unit
    The information processing apparatus according to claim 13, wherein the state information is obtained for the first user who is a user similar to the preference of the second user.
  18.  前記取得部は、
     愛好する対象が前記第2ユーザと一致するユーザである前記第1ユーザの前記状態情報として取得する
     請求項13に記載の情報処理装置。
    The acquisition unit
    The information processing apparatus according to claim 13 , wherein the status information is obtained for the first user whose favorite object is the same as the second user.
  19.  前記生成部により生成された前記イベントコンテンツのハイライトを前記第2ユーザが利用する端末装置に送信する送信部、
     請求項1に記載の情報処理装置。
    a transmission unit configured to transmit a highlight of the event content generated by the generation unit to a terminal device used by the second user;
    The information processing device according to claim 1 .
  20.  イベントのリアルタイム視聴を行ったユーザである第1ユーザの前記イベントのリアルタイム視聴時の状態を示す状態情報と、前記イベントの映像であるイベントコンテンツとを取得し、
     取得した前記第1ユーザの前記状態情報により決定される前記イベントコンテンツの一部を用いて、第2ユーザに提供する前記イベントコンテンツのハイライトを生成する、
     処理を実行する情報処理方法。
    Acquiring state information indicating a state of real-time viewing of the event by a first user who has viewed the event in real time, and event content, which is video of the event;
    generating highlights of the event content to be provided to a second user using a portion of the event content determined by the obtained state information of the first user;
    An information processing method that performs processing.
PCT/JP2022/045588 2021-12-20 2022-12-12 Information processing device and information processing method WO2023120263A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021-205919 2021-12-20
JP2021205919 2021-12-20

Publications (1)

Publication Number Publication Date
WO2023120263A1 true WO2023120263A1 (en) 2023-06-29

Family

ID=86902349

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/045588 WO2023120263A1 (en) 2021-12-20 2022-12-12 Information processing device and information processing method

Country Status (1)

Country Link
WO (1) WO2023120263A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012074773A (en) * 2010-09-27 2012-04-12 Nec Personal Computers Ltd Editing device, control method, and program
JP2015088203A (en) * 2013-10-30 2015-05-07 日本電信電話株式会社 Content creation method, content creation device, and content creation program
JP2017529778A (en) * 2014-08-29 2017-10-05 スリング メディア,インク. System and process for distributing digital video content based on excitement data
JP2018037784A (en) * 2016-08-30 2018-03-08 ソニー株式会社 Image transmission device, image transmitting method, and program
JP2020174971A (en) * 2019-04-19 2020-10-29 富士通株式会社 Highlight moving image generation program, highlight moving image generation method and information processing device
JP2021192484A (en) * 2020-06-05 2021-12-16 エヌ・ティ・ティ・コミュニケーションズ株式会社 Information provision system and information provision method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012074773A (en) * 2010-09-27 2012-04-12 Nec Personal Computers Ltd Editing device, control method, and program
JP2015088203A (en) * 2013-10-30 2015-05-07 日本電信電話株式会社 Content creation method, content creation device, and content creation program
JP2017529778A (en) * 2014-08-29 2017-10-05 スリング メディア,インク. System and process for distributing digital video content based on excitement data
JP2018037784A (en) * 2016-08-30 2018-03-08 ソニー株式会社 Image transmission device, image transmitting method, and program
JP2020174971A (en) * 2019-04-19 2020-10-29 富士通株式会社 Highlight moving image generation program, highlight moving image generation method and information processing device
JP2021192484A (en) * 2020-06-05 2021-12-16 エヌ・ティ・ティ・コミュニケーションズ株式会社 Information provision system and information provision method

Similar Documents

Publication Publication Date Title
US11663827B2 (en) Generating a video segment of an action from a video
JP7470137B2 (en) Video tagging by correlating visual features with sound tags
CN104756514B (en) TV and video frequency program are shared by social networks
US9436875B2 (en) Method and apparatus for semantic extraction and video remix creation
TWI558186B (en) Video selection based on environmental sensing
US10939165B2 (en) Facilitating television based interaction with social networking tools
US20180302687A1 (en) Personalizing closed captions for video content
US10541000B1 (en) User input-based video summarization
WO2018084875A1 (en) Targeted content during media downtimes
US11343595B2 (en) User interface elements for content selection in media narrative presentation
KR20140045412A (en) Video highlight identification based on environmental sensing
JP6807389B2 (en) Methods and equipment for immediate prediction of media content performance
KR20150007936A (en) Systems and Method for Obtaining User Feedback to Media Content, and Computer-readable Recording Medium
CN103686344A (en) Enhanced video system and method
US10567844B2 (en) Camera with reaction integration
CN111723237A (en) Media content access control method
WO2020223009A1 (en) Mapping visual tags to sound tags using text similarity
JP2011164681A (en) Device, method and program for inputting character and computer-readable recording medium recording the same
TWI570639B (en) Systems and methods for building virtual communities
WO2023120263A1 (en) Information processing device and information processing method
US11736780B2 (en) Graphically animated audience
KR101481996B1 (en) Behavior-based Realistic Picture Environment Control System
EP3316204A1 (en) Targeted content during media downtimes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22910971

Country of ref document: EP

Kind code of ref document: A1