WO2022264203A1 - Signal generation device, signal processing system, and signal generation method - Google Patents

Signal generation device, signal processing system, and signal generation method Download PDF

Info

Publication number
WO2022264203A1
WO2022264203A1 PCT/JP2021/022489 JP2021022489W WO2022264203A1 WO 2022264203 A1 WO2022264203 A1 WO 2022264203A1 JP 2021022489 W JP2021022489 W JP 2021022489W WO 2022264203 A1 WO2022264203 A1 WO 2022264203A1
Authority
WO
WIPO (PCT)
Prior art keywords
emotion
unit
processing system
corresponding signal
signal
Prior art date
Application number
PCT/JP2021/022489
Other languages
French (fr)
Japanese (ja)
Inventor
智明 龍
孝幸 永井
貴文 甲斐
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to PCT/JP2021/022489 priority Critical patent/WO2022264203A1/en
Publication of WO2022264203A1 publication Critical patent/WO2022264203A1/en

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/16Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state

Definitions

  • the present disclosure relates to a signal generation device, a signal processing system, and a signal generation method.
  • Patent Literature 1 in a commercial facility, etc., a management server analyzes the user's physical condition or emotion based on vital values acquired by a wearable device, and transmits the analysis result to the terminal of the staff of the commercial facility. A technique for doing so is disclosed. This allows the staff to change their response to the user according to changes in the user's physical condition or emotions.
  • Patent Document 1 does not describe the provision of video of an event, nor does it describe what kind of service should be provided in order to increase the added value in video distribution.
  • the present disclosure has been made in view of the above, and an object thereof is to obtain a signal generation device that can enhance the sense of unity of a user who is watching an event video with a player or a performer in the event. do.
  • the signal generation device includes a sensor that acquires information indicating the emotion of an estimated target person who is a player or a performer in an event, a sensor acquired by the sensor an emotion information acquisition unit that acquires information;
  • the signal generating device further includes an emotion estimating unit that estimates the emotion of the person to be presumed using the sensor information, and an output unit in the information processing system of the user that outputs according to the estimation result of the emotion estimating unit.
  • An emotion-responsive signal generator for generating an emotional-responsive signal and an emotional-responsive signal transmitter for transmitting the emotional-responsive signal to an information processing system either directly or via another device.
  • the signal generation device has the effect of enhancing the user's sense of unity with the athletes or performers in the event who are watching the video of the event.
  • FIG. 1 is a diagram showing a configuration example of a signal processing system according to a first embodiment
  • a diagram showing an example of an emotion correspondence table according to the first embodiment The figure which shows the structural example of the emotion estimation part of Embodiment 1 in the case of estimating an emotion by machine learning.
  • Schematic diagram showing an example of a neural network A diagram showing an example of an output correspondence table according to the first embodiment 1 is a diagram showing a device configuration example of an information processing system according to Embodiment 1;
  • FIG. FIG. 1 shows a configuration example of a computer system that realizes the image providing device of Embodiment 1;
  • FIG. 10 is a diagram showing a configuration example of a signal processing system according to a second embodiment; A diagram showing an example of an output correspondence table according to the second embodiment FIG. 10 is a diagram showing a configuration example of a signal processing system according to a modification of the second embodiment;
  • FIG. 1 is a diagram illustrating a configuration example of a signal processing system according to a first embodiment
  • a signal processing system 100 includes a video providing device 1 and an information processing system 5 .
  • the image providing device 1 is a signal generation device that generates an emotion-related signal, which will be described later, corresponding to the emotion of a player or performer in an event, and transmits the generated emotion-related signal to the information processing system 5 directly or via another device.
  • the video providing device 1 has a function as a signal processing device and also has a function of providing video data obtained by photographing an event to the user's information processing system 5 .
  • the image providing device 1 provides the user's information processing system 5 with the image data captured by the image capturing device 2 that captures the event via the distributor device 4 operated by the distributor.
  • the video data provided by the video providing device 1 may be video data obtained by processing the video data captured by the imaging device 2 into, for example, free-viewpoint video data.
  • the image providing device 1 further uses at least one of the photographing device 2 and the wearable device 3 worn by the athlete or performer in the event to estimate the emotion of the athlete or performer, and displays the emotion corresponding to the estimation result.
  • a corresponding signal is generated, and the generated emotional corresponding signal is provided to the information processing system 5 via the distributor device 4 .
  • the emotion corresponding signal is a signal that instructs the information processing system 5 to output vibration, sound effects, music, etc. according to the emotion of the player or performer. Details of the emotional response signal will be described later.
  • the video providing device 1 provides the video of the event and the emotion corresponding signal to the information processing system 5 via the distributor device 4
  • the signal processing system 100 may include not only the image providing device 1 and the information processing system 5 but also at least one of the photographing device 2 and the wearable device 3 .
  • the photographing device 2 may photograph all or part of the event, or may be a photographing device that tracks and photographs a specific player or performer when a plurality of photographing devices 2 are provided.
  • a plurality of imaging devices 2 for generating free-viewpoint video may be included, or an imaging device 2 for performing aerial photography using a drone or the like may be included.
  • Events to be photographed by the photographing device 2 of the present embodiment are, for example, sports, concerts, plays, etc. Specifically, baseball, soccer, volleyball, basketball, martial arts, boat racing, horse racing, bicycle ring, concert , theatre, etc., but not limited to these.
  • Venues where events are held include stadiums, multipurpose halls, concert halls, boat racecourses, racetracks, and gymnasiums.
  • the wearable device 3 and the imaging device 2 are examples of sensors that acquire information indicating the emotions of athletes or performers in an event.
  • the wearable device 3 is a sensor capable of detecting biometric information, movement, etc., of a player or performer, and is, for example, a watch-type, wristband-type, clothes-type, eyeglass-type, ring-type sensor, etc., but is limited to these. not.
  • the biological information is, for example, at least one of blood pressure, heart rate, body temperature, electroencephalogram, eye movement, etc., but is not limited to these.
  • the information indicating movement is at least one of acceleration, biopotential information indicating muscle movement, and the like, but is not limited to these.
  • one player or performer may wear multiple different types of wearable devices 3 . Further, when there are a plurality of players or performers whose emotions are to be estimated, each of these players or performers wears the wearable device 3 .
  • the wearable device 3 may be a photographing device worn by a player or a performer.
  • the wearable device 3 may be a wearable camera capable of shooting from the player's or performer's line of sight.
  • the imaging device 2 that captures the video data to be provided may include an imaging device worn by the athlete or performer. In this case, the imaging device 2 worn by the athlete or performer 2 and the wearable device 3 .
  • the wearable device 3 capable of detecting biological information and movement is used as a sensor
  • the sensor information acquired by the sensor is information indicating biological information and movement.
  • the acquired sensor information is video data.
  • Communication between the wearable device 3 and the imaging device 2 and the image providing device 1 includes wireless communication, but wireless communication and wired communication may be mixed.
  • wireless communication dedicated lines such as local 5G (5th generation mobile communication system) and private LTE (Long Term Evolution) at the venue of the event may be used.
  • local 5G may be used for communication between the imaging device 2 and the image providing device 1
  • private LTE may be used for communication between the wearable device 3 and the image providing device 1 . Note that the communication method is not limited to these.
  • the image providing device 1 includes an image providing unit 11, an emotion information acquiring unit 12, an emotion estimating unit 13, an emotion corresponding signal generating unit 14, and an emotion corresponding signal transmitting unit 15.
  • the video providing device 1 acquires video data from the imaging device 2 and transmits the acquired video data to the distributor device 4 .
  • the image providing apparatus 1 may process the image data before transmitting it to the distributor apparatus 4 as described above. is processed, and the image providing unit 11 transmits the processed image data.
  • Processing by the video processing unit includes, for example, processing for generating a free viewpoint video, processing for converting resolution, codec, and the like.
  • the video data may include the sound of the event venue collected by a sound collector at the event venue and included in the video data as sound data.
  • the signal generation device of the present embodiment may include emotion information acquiring section 12 , emotion estimating section 13 , emotion corresponding signal generating section 14 and emotion corresponding signal transmitting section 15 . Therefore, the image providing device 1 may not include the image providing unit 11. In this case, a separate device for providing image data to the user may be provided, or the image data may be provided to the user from the photographing device 2. may be Further, in this case, the device that provides video data to the user or the imaging device 2 may transmit the video data to the information processing system 5 via the distributor device 4, or may transmit the video data directly to the information processing system 5. may be sent.
  • the emotion information acquisition unit 12 acquires the sensor information acquired by the sensor from the sensor that acquires the information indicating the emotion of the presumed target person who is the player or performer in the event.
  • the sensor includes at least one of a wearable device 3 worn by the presumed target person and a photographing device 2 capable of capturing an image including the presumed target person.
  • the emotion information acquisition unit 12 outputs the acquired sensor information to the emotion estimation unit 13 .
  • the emotion estimation unit 13 estimates the emotion of the person to be estimated using the sensor information, and outputs the estimation result to the emotion corresponding signal generation unit 14 .
  • the emotion estimating unit 13 holds, as a table, correspondence information indicating the correspondence between the numerical range of each item in the sensor information and the emotion. Emotions may be estimated using a table that contains information, or may be estimated using machine learning. A method of estimating emotion will be described later.
  • the emotion corresponding signal generating unit 14 uses the estimation result received from the emotion estimating unit 13, the emotion corresponding signal generating unit 14 generates an emotion corresponding signal for causing the output unit 52 in the information processing system 5 to output according to the estimation result.
  • the emotional response signal is output to the emotional response signal transmission unit 15 .
  • Emotional response signal transmission unit 15 transmits an emotional response signal to information processing system 5 via distributor device 4 .
  • the emotion-responsive signal transmission unit 15 may transmit the emotional-responsive signal to the information processing system 5 without going through the distributor device 4 . 1, the image providing unit 11 and the emotion corresponding signal transmitting unit 15 are shown separately, but the image providing unit 11 functions as the emotion corresponding signal transmitting unit 15, , the video data and the emotion corresponding signal may be transmitted to the distributor device 4 . Although one distributor device 4 is illustrated in FIG. 1, a plurality of distributor devices 4 may be provided. Also, the plurality of distributor devices 4 may be operated by different distributors.
  • the distributor device 4 Upon receiving the video data and the emotion corresponding signal from the video providing device 1, the distributor device 4 transmits the video data and the emotion corresponding signal to the information processing system 5 to which the video data is to be distributed.
  • the information processing system 5 can receive video data and display the received video data.
  • the information processing system 5 includes a video receiving section 51, an output section 52, and an emotion corresponding signal processing section 53, as shown in FIG.
  • the information processing system 5 may include a television (hereinafter abbreviated as TV) on which television broadcasting can be viewed, may include a TV and a game machine, or may include a terminal such as a smartphone.
  • TV television
  • Game machine may include a terminal such as a smartphone.
  • Video receiving portion 51 receives video data and an emotion corresponding signal from distributor device 4 , outputs the received video data to output portion 52 , and outputs the received emotion corresponding signal to emotion corresponding signal processing portion 53 .
  • the output unit 52 executes output based on instructions from the emotion-corresponding signal processing unit 53.
  • the output unit 52 includes a vibration generation unit 521 , a display unit 522 and a speaker 523 .
  • FIG. 1 shows an example in which the output unit 52 includes the vibration generation unit 521 , but the information processing system 5 may not include the vibration generation unit 521 .
  • the information processing system 5 including the vibration generating section 521 and the information processing system 5 not including the vibration generating section 521 may be mixed.
  • the vibration generator 521 can transmit vibration to the user.
  • the display unit 522 can display video data.
  • the speaker 523 can output sound data included in the video data, and can output sound effects and music based on the emotion corresponding signal.
  • the emotional-responsive signal processing unit 53 selects, from among the vibration generating unit 521 and the speaker 523 of the output unit 52, a device that performs an operation according to the emotional-responsive signal, and applies the emotional response to the selected device. Instruct to perform output according to the signal. As a result, the output unit 52 executes output based on the emotion corresponding signal. If the emotion corresponding signal includes an instruction regarding vibration, the emotion corresponding signal processing unit 53 selects the vibration generating unit 521. The signal processing unit 53 selects the speaker 523 . When the emotion corresponding signal includes both an instruction regarding vibration and an instruction regarding sound, emotion corresponding signal processing section 53 selects vibration generating section 521 and speaker 523 .
  • FIG. 2 is a flow chart showing an example of operations related to generation of an emotion corresponding signal in the image providing device 1 of the present embodiment.
  • the image providing device 1 acquires sensor information (step S1). Acquisition of sensor information is started, for example, when provision of video of the event is started.
  • the emotion information acquisition unit 12 acquires sensor information from at least one of the imaging device 2 and the wearable device 3 worn by the athlete or performer in the event, and acquires the acquired sensor information. is output to the emotion estimation unit 13 .
  • the image providing device 1 uses the sensor information to estimate the emotion of the player or performer (step S2).
  • the emotion estimation unit 13 estimates the emotion of the player or performer using the sensor information, and outputs the estimation result to the emotion corresponding signal generation unit 14 .
  • the emotion estimating unit 13 may estimate the emotion of a player or a performer from sensor information, for example, using an emotion correspondence table, which is a table indicating the correspondence between the numerical range of each item in the sensor information and the emotion.
  • an emotion correspondence table which is a table indicating the correspondence between the numerical range of each item in the sensor information and the emotion.
  • machine learning may be used to estimate the player's or performer's emotion from sensor information.
  • FIG. 3 is a diagram showing an example of an emotion correspondence table according to this embodiment.
  • each information of blood flow, heart rate, brain wave (brain wave amplitude, frequency, etc.), body movement (acceleration, etc.), muscle movement (biopotential value, etc.) is expressed as excitement, tension, and so on.
  • anger, and relaxation are stored in the emotion correspondence table. If each information in the sensor information does not correspond to these values, the emotion may be determined as other.
  • the emotion estimating unit 13 determines that the values indicated by the sensor information for all of the blood flow, heart rate, electroencephalogram, body motion, and muscle motion information shown in FIG. It may be presumed that the emotion of the player or performer is the corresponding emotion such as excitement, tension, anger, relaxation, etc., if any one item is applicable. can be estimated. Also, a method may be used in which a priority is set for each item of information, and when the emotion corresponding to each item is different, priority is given to the determination of the item with the higher priority.
  • the emotion estimator 13 determines that the numerical values of the muscle movements indicated by the sensor information are within a range corresponding to anger, and that the brain waves indicated by the sensor information are in a range corresponding to anger. If the value of is in the range corresponding to relaxation, it may be estimated to be relaxed. Alternatively, when the emotion corresponding to each item is different in this way, the emotion estimating unit 13 may determine otherwise.
  • FIG. 3 is an example, and the items stored as the emotion correspondence table are not limited to blood flow, heart rate, electroencephalogram, body movement, and muscle movement, and may be some of them. , may include items other than these. Also, the types of emotions stored in the emotion correspondence table are not limited to the example shown in FIG. It may contain other types.
  • FIG. 4 is a diagram showing a configuration example of the emotion estimating section 13 of the present embodiment when estimating an emotion by machine learning.
  • the emotion estimation unit 13 includes a learned model generation unit 131 , a learned model storage unit 132 and an estimation unit 133 .
  • the estimating unit 133 reads the learned model stored in the learned model storage unit 132, and inputs the sensor information input from the emotional information acquisition unit 12 to the read-out learned model, thereby Estimate emotions. That is, the output obtained by inputting the sensor information input from the emotion information acquisition unit 12 into the trained model is used as the estimation result of the emotion of the person to be estimated.
  • a trained model is a trained model for estimating the emotion of an estimation target person who is a player or a performer from sensor information. is generated as
  • the trained model generation unit 131 generates a trained model using a plurality of learning data sets including sensor information input from the emotion information acquisition unit 12 and correct data corresponding to the sensor information, and generates a trained model.
  • the trained model is stored in the trained model storage unit 132 .
  • a trained model is generated before the video of the event is provided.
  • the sensor information input to the learned model generation unit 131 is not limited to that input from the emotion information acquisition unit 12, and may be learning sensor information acquired for learning.
  • the sensor information for learning includes information of similar items in the same format as the sensor information.
  • the sensor information for learning may be input to the image providing apparatus 1 by input means (not shown) and input to the trained model generation unit 131 from the input means, or may be transmitted from another device and received by receiving means (not shown). It may be input to the learned model generation unit 131 from the means.
  • the correct answer data is data indicating which of the above-described excitement, tension, anger, relaxation, etc. is the correct answer for the emotion corresponding to the sensor information.
  • the correct data may be determined, for example, by listening to the emotion corresponding to the sensor information from the subject who acquired the sensor information, or an expert or the like may confirm the sensor information and determine the correct data. good.
  • the correct data may be input to the image providing apparatus 1 by an input means (not shown) and input to the trained model generation unit 131 from the input means, or may be transmitted from another apparatus and received by a receiving means (not shown). may be input to the learned model generation unit 131 from the
  • the generation of a trained model in the trained model generation unit 131 is performed, for example, by supervised learning. Any supervised learning algorithm may be used, and for example, a neural network model may also be used.
  • a neural network consists of an input layer made up of multiple neurons, an intermediate layer (hidden layer) made up of multiple neurons, and an output layer made up of multiple neurons.
  • the intermediate layer may be one layer, or two or more layers.
  • FIG. 5 is a schematic diagram showing an example of a neural network.
  • a neural network for example, in a three-layer neural network as shown in FIG. Y1-Y2), and the result is multiplied by weight W2 (w21-w26) and output from the output layer (Z1-Z3). This output result changes depending on the value of weight W1 and the value of weight W2.
  • the relationship between the sensor information and the correct data is learned by adjusting the weight W1 and the weight W2 so that the output from the output layer when the sensor information is input approaches the correct data.
  • machine learning algorithms are not limited to neural networks. Reinforcement learning or the like may also be used as machine learning.
  • the numerical values are input to the input layer of the learned model generation unit 131 .
  • the sensor information includes information on a plurality of items
  • the information on each item is input to the input layer as X1 to X3, respectively.
  • FIG. 5 shows an example with three inputs and three outputs, the number of inputs and outputs is not limited to this example.
  • video data at fixed time intervals may be used as image data of still images as input data for machine learning, or all video data within a fixed time period may be used as input data for machine learning.
  • the image data can be used as sensor information. Also, when both numerical values such as biological information and image data are used as sensor information, all of these are used as input data for machine learning.
  • a trained model may be generated for each estimated target, or a common trained model may be generated without distinguishing the estimation targets.
  • a trained model may be generated for each type of sport, each venue for an event, or the like.
  • At least one of the information acquired by the wearable device 3 and the video data acquired by the imaging device 2 may be used as the sensor information.
  • the emotion estimating unit 13 includes a trained model generating unit 131.
  • a learning device that generates a trained model is provided separately from the image providing device 1, and the learning device performs learning.
  • a finished model generation unit 131 may be provided.
  • the emotion estimation unit 13 does not need to include the trained model generation unit 131, and the trained model generation unit 131 of the learning device generates a trained model in the same manner as described above. Then, the learned model generated by the learning device is stored in the learned model storage section 132 of the emotion estimation section 13 .
  • the position information indicating the position of the estimation target person acquired by the wearable device 3 may be used for estimating the emotion.
  • the event is a soccer match
  • the biometric information and the like are the same for the case where the person to be estimated exists near the opponent's goal, the case where the person is near the teammate's goal, and the other cases. Even so, there may be differences in emotion. Therefore, when emotion is estimated by machine learning, for example, a position within a soccer field may be used as one piece of sensor information.
  • the positions in the soccer field are divided into a plurality of regions in advance, and the regions defined in the emotion correspondence table are defined according to the region in which the person to be estimated exists. The range of each information may be corrected.
  • the image providing device 1 uses the estimation result to generate an emotion corresponding signal (step S3).
  • the emotion corresponding signal generator 14 uses the estimation result received from the emotion estimating unit 13 to generate an emotion corresponding signal corresponding to the emotion indicated by the estimation result, and transmits the generated emotion corresponding signal.
  • Output to unit 15 the emotion corresponding signal generation unit 14 holds, as an output correspondence table, output correspondence information indicating correspondence between emotions and output contents in the information processing system 5, and uses the held output correspondence table to An output content is determined, and an emotional response signal corresponding to the determined output content is generated.
  • FIG. 6 is a diagram showing an example of the output correspondence table of this embodiment.
  • output contents are shown for each type of emotion with respect to each of the vibration function, sound effects, and music.
  • the output content of the vibration function is high-frequency vibration with a large amplitude
  • the output content of the sound effect is fanfare, sound effect indicating excitement of comics
  • movie The output contents of music are movie music indicating excitement and game music indicating excitement. Further, for example, in the example shown in FIG.
  • the output content of the vibration function is peaky and intermittent vibration
  • the output content of the sound effect is a sound effect indicating the tension of a cartoon It is a sound effect indicating the tension of the movie
  • the output contents of the music are the movie music indicating the tension and the game music indicating the tension.
  • the output content of the vibration function is low-frequency vibration with fluctuation
  • the output content of the sound effect is the sound of waves, the chirping of birds, It is the babbling of a stream
  • the music output contents are classical music with a relaxing effect and music with a slow tempo.
  • the emotion corresponding signal generating section 14 may generate an emotion corresponding signal that causes the vibration generating section 521 of the information processing system 5 to vibrate according to the estimation result of the emotion estimating section 13.
  • the emotion corresponding signal may be generated to cause the speaker 523 to output a sound effect or music corresponding to the result of estimation by the emotion estimation unit 13 .
  • the output content may be changed according to the position information indicating the position of the estimation target acquired by the wearable device 3 .
  • a position within a soccer field may be divided into a plurality of areas in advance, and output contents may be determined according to which area the target person to be estimated exists. For example, if the estimated emotion is excitement, the output contents will be different depending on whether the character is near the enemy's goal, when the character is near the team's goal, or when the player is outside of these situations. , the output contents may be determined for each area.
  • the stage is divided into a plurality of areas, and even if the estimation result of the same emotion is obtained, the output contents are changed depending on which area the person to be estimated exists.
  • the output contents may be defined for each.
  • FIG. 6 is an example, and specific output contents are not limited to the example shown in FIG. Further, when determining the output contents, the emotion corresponding signal generation unit 14 does not need to select all of the vibration function, the sound effect, and the music as outputs to be executed by the information processing system 5. You should choose one. When the output contents are determined using the output correspondence table, the emotion corresponding signal generation unit 14 generates an emotion corresponding signal indicating an instruction to cause the information processing system 5 to execute the determined output contents.
  • the image providing device 1 transmits an emotion corresponding signal (step S4).
  • the emotion-responsive signal transmitting unit 15 transmits the emotional-responsive signal received from the emotion-responsive signal generating unit 14 to the distributor device 4 .
  • the emotion corresponding signal arrives at the information processing system 5 via the distributor device 4 .
  • the emotion corresponding signal may be transmitted to the distributor device 4 together with the video data.
  • the emotion estimating unit 13 may select a specific person to be presumed, or may be designated by an input means (not shown) from the operator, or A user may select a target for transmission of the emotional response signal. For example, when the imaging device 2 tracks and shoots a specific player or performer, the player or performer to be tracked becomes the target person for the video data shot by the imaging device 2 . Also, when a plurality of performers are included in the photographed data, such as an idol group concert, the image providing device 1 generates an emotion-corresponding signal using an average value of estimation results of the emotions of each of the performers to be photographed.
  • the video providing device 1 transmits emotional response signals for each of the plurality of performers to the distributor device 4, the distributor device 4 acquires the user's selection result, and responds to the emotional response according to the selection result. The signal may be sent to information processing system 5 .
  • the video providing device 1 estimates the emotion of the entire team.
  • the average value of the results may be used to generate an emotional response signal and send the emotional response signal corresponding to the selected team to information processing system 5 .
  • the video providing device 1 uses sensor information acquired from the wearable device 3 worn by the player or performer corresponding to this wearable camera to estimate the emotion. , the estimation result is used to generate an emotion-corresponding signal.
  • FIG. 7 is a diagram showing a device configuration example of the information processing system 5 of this embodiment.
  • FIG. 7 shows a device configuration example of information processing systems 5-1 to 5-4, each of which is the information processing system 5.
  • FIG. 7 shows a device configuration example of information processing systems 5-1 to 5-4, each of which is the information processing system 5.
  • the information processing system 5-1 includes a TV 501 and a speaker 502
  • the information processing system 5-2 includes the TV 501, a game machine body 503 and a controller 504, and the information processing system 5-3 , and a TV 501
  • the information processing system 5-4 includes a terminal 505 such as a smart phone.
  • the TV 501 generally incorporates a display unit and a speaker, and can display video data of an event. Also, the TV 501 can output sound when the video data includes sound data. Furthermore, the TV 501 can also perform an output corresponding to the emotion corresponding signal when the emotion corresponding signal is an instruction regarding sound. Therefore, like the information processing system 5-3 in FIG. 7, the information processing system 5 may consist of the TV 501 alone.
  • the output unit 52 does not include the vibration generation unit 521 .
  • an external speaker 502 is connected to the TV 501 as in the information processing system 5-1, sound data included in video data is input to the speaker 502 via the TV 501, and the speaker 502 outputs corresponding to the sound data.
  • the TV 501 instructs the speaker 502 to output indicated by the emotion corresponding signal, whereby the speaker 502 outputs sound effects, music, etc. based on the emotion corresponding signal. to output
  • the information processing system 5 is the TV 501 and the speaker 502
  • the TV 501 is equipped with the corresponding signal processing unit 53 , and the speaker 523 of the output unit 52 corresponds to the speaker 502 .
  • the game machine body 503 may receive the video data and the emotion-corresponding signal, and cause the TV 501 to display the video data.
  • the game machine body 503 is a game machine capable of operating games called video games, computer games, and the like.
  • the controller 504 is a game controller corresponding to the game machine main body 503, and can receive input regarding application software executed on the game machine main body 503, and can vibrate itself.
  • the game machine body 503 causes the TV 501 to output sound by outputting sound data to the TV 501, and when the emotion corresponding signal indicates an instruction regarding sound, the game machine body 503 outputs the output indicated by the emotion corresponding signal.
  • the TV 501 performs output based on the emotion corresponding signal. Further, when the emotion corresponding signal indicates an instruction regarding vibration, the game machine main body 503 instructs the controller 504 to perform an output indicated by the emotion corresponding signal, whereby the controller 504 vibrates based on the emotion corresponding signal. do.
  • the information processing system 5 includes the TV 501, the game machine body 503, and the controller 504, as in the information processing system 5-2, the display of the image receiving unit 51 and the output unit 52 of the information processing system 5 shown in FIG.
  • the game machine main body 503 is provided with the unit 522 and the emotion corresponding signal processing unit 53, the TV 501 is provided with the speaker 523 and the display unit 522 of the output unit 52, and the controller 504 is provided with the vibration generation unit 521 of the output unit 52. become.
  • the game machine main body 503 may include the speaker 523 and the display section 522 of the output unit 52, and the game machine main body 503 may display video data and output sound effects and music based on the emotion corresponding signal.
  • the terminal 505 when the information processing system 5 is a terminal 505 such as a smart phone, the terminal 505 includes the video receiving unit 51, the output unit 52 and the emotion processing unit 505 of the information processing system 5 shown in FIG. A corresponding signal processing unit 53 is provided. Since the terminal 505 generally has the functions of outputting vibration, display, and sound, the terminal 505 can display video data as well as vibration, sound effects, music, and the like indicated by the emotional response signal. can be done.
  • a device that receives video data and an emotion-corresponding signal such as the TV 501, the game machine body 503, and the terminal 505 described above, can receive the video data and the emotion-corresponding signal by installing application software, for example. It is possible to perform operations corresponding to these.
  • the information processing system 5 may be realized by a single device, or may be realized by a combination of multiple devices.
  • the configuration of the information processing system 5 described above is an example, and the display of video data, the sound effects based on the emotion-corresponding signal, and the output of music may be performed by a personal computer or the like. It is not limited to the examples described above.
  • FIG. 8 is a diagram showing a configuration example of a computer system that implements the image providing device 1 of this embodiment. As shown in FIG. 8, this computer system comprises a control section 101, an input section 102, a storage section 103, a display section 104, a communication section 105 and an output section 106, which are connected via a system bus 107. there is
  • control unit 101 is, for example, a processor such as a CPU (Central Processing Unit), and executes a program describing the processing in the image providing device 1 of this embodiment.
  • part of the control unit 101 may be realized by dedicated hardware such as a GPU (Graphics Processing Unit) or an FPGA (Field-Programmable Gate Array).
  • the input unit 102 is composed of, for example, a keyboard and a mouse, and is used by the user of the computer system to input various information.
  • the storage unit 103 includes various memories such as RAM (Random Access Memory) and ROM (Read Only Memory) and storage devices such as hard disks, and stores programs to be executed by the control unit 101 and necessary information obtained in the process of processing. store data, etc.
  • the storage unit 103 is also used as a temporary storage area for programs.
  • the display unit 104 includes a display, LCD (liquid crystal display panel), etc., and displays various screens to the user of the computer system.
  • a communication unit 105 is a receiver and a transmitter that perform communication processing.
  • the output unit 106 is a printer, speaker, or the like. Note that FIG. 8 is an example, and the configuration of the computer system is not limited to the example in FIG.
  • a computer program is stored in a storage unit from a CD-ROM or DVD-ROM set in a CD (Compact Disc)-ROM drive or a DVD (Digital Versatile Disc)-ROM drive (not shown).
  • 103 installed. Then, when the program is executed, the program read from storage unit 103 is stored in the main storage area of storage unit 103 . In this state, the control unit 101 executes processing as the image providing device 1 of this embodiment according to the program stored in the storage unit 103 .
  • a program describing the processing in the image providing apparatus 1 is provided using a CD-ROM or DVD-ROM as a recording medium, but the configuration of the computer system and the provided program are not limited to this.
  • a program provided by a transmission medium such as the Internet via the communication unit 105 may be used depending on the capacity of the computer.
  • the emotion estimation unit 13 and the emotion corresponding signal generation unit 14 shown in FIG. 1 are realized by executing a computer program stored in the storage unit 103 shown in FIG. 8 by the control unit 101 shown in FIG. .
  • a storage unit 103 is also used to implement the emotion estimation unit 13 and the emotion corresponding signal generation unit 14 . 1 are implemented by the communication unit 105 shown in FIG.
  • a control unit 101 is also used to realize the image providing unit 11, the emotion information acquiring unit 12, and the emotion corresponding signal transmitting unit 15 shown in FIG.
  • the image providing device 1 may be realized by a plurality of computer systems.
  • the image providing device 1 may be realized by a cloud computer system.
  • the distributor device 4 is realized by, for example, a computer system with the configuration shown in FIG.
  • the information processing system 5 is similarly implemented by, for example, a computer system having the configuration shown in FIG.
  • Emotion-responsive signal processing unit 53 shown in FIG. 1 is realized by executing a computer program stored in storage unit 103 shown in FIG. 8 by control unit 101 shown in FIG.
  • Video receiving unit 51 is realized by communication unit 105 shown in FIG.
  • a control unit 101 is also used to realize the video receiving unit 51 .
  • Output unit 52 is implemented by display unit 104 and output unit 106 shown in FIG. Note that, as described above, the functions of the video receiving unit 51, the output unit 52, and the emotion corresponding signal processing unit 53 may be divided and realized by a plurality of devices.
  • the image providing apparatus 1 of the present embodiment uses the sensor information acquired by the sensor that acquires the information indicating the emotion of the estimation target person who is a player or performer in the event to determine the emotion of the estimation target person. is estimated, and an emotion corresponding signal for outputting at least one of vibration and sound according to the estimation result is generated. Then, the image providing device 1 transmits an emotion corresponding signal to the information processing system 5 that receives the image data of the event. Therefore, it is possible to enhance the sense of unity of the user viewing the video of the event with the players or performers in the event.
  • FIG. 9 is a diagram illustrating a configuration example of a signal processing system according to a second embodiment
  • a signal processing system 100a of the present embodiment includes a video providing device 1a and an information processing system 5a.
  • the signal processing system 100a may include at least one of the wearable device 3 and the imaging device 2 in addition to the image providing device 1a and the information processing system 5a.
  • the image providing apparatus 1a is the same as the image providing apparatus of Embodiment 1 except that it does not include the emotion-responsive signal transmitting section 15 of Embodiment 1 and includes an emotion-responsive signal generating section 14a instead of the emotion-responsive signal generating section 14. Same as 1.
  • the information processing system 5a is the same as the information processing system 5 of the first embodiment except that the emotion corresponding signal processing unit 53 of the first embodiment is not provided and an output unit 52a is provided instead of the output unit 52.
  • FIG. Components having functions similar to those of the first embodiment are denoted by the same reference numerals as those of the first embodiment, and overlapping descriptions are omitted. Differences from the first embodiment will be mainly described below.
  • the emotion estimation unit 13 uses sensor information to estimate the emotion of the target player or performer, as in the first embodiment.
  • the emotion estimator 13 outputs the estimation result to the emotion corresponding signal generator 14a.
  • the estimation result output from the emotion estimation unit 13 is input to the emotion corresponding signal generation unit 14a, and video data is input from the imaging device 2 as well.
  • the image data processed by the image processing unit (not shown) is input to the emotion corresponding signal generation unit 14a. Processing in the video processing unit is the same as in the first embodiment.
  • the video data may also include sound data.
  • the emotion corresponding signal generation unit 14a determines the content of output in the information processing system 5a.
  • the emotion-corresponding signal generator 14a superimposes an emotion-corresponding signal indicating the content of output in the information processing system 5a on the video data.
  • the emotion corresponding signal is superimposed on at least one of the video portion of the video data and the sound data included in the video data.
  • the emotion-corresponding signal generator 14a holds, as an output-correspondence table, output-correspondence information indicating correspondence between emotions and output contents in the information processing system 5a, and uses the output-correspondence table to generate output contents. to decide.
  • FIG. 10 is a diagram showing an example of the output correspondence table of this embodiment.
  • the image quality of video data (described as image quality in FIG. 10)
  • the volume of sound data (described as volume in FIG. 10)
  • the sound quality of sound data (described as sound quality in FIG. 10)
  • animation video or Superimposition of icon images on video data (denoted as animation/icon in FIG. 10)
  • superimposition of text that is, character information on video data (denoted as text in FIG. 10) are performed according to emotion estimation results.
  • section 14a does. It should be noted that it is not necessary to perform all of these, and one or more may be performed.
  • the emotion corresponding signal is superimposed on the video data.
  • the emotion-corresponding signal generation unit 14a may set the image quality of the video data to high luminance and change the image quality to an edge-enhanced image quality.
  • a text such as "Assault!!!” may be superimposed on the video data, or two or more of these may be combined.
  • the emotion corresponding signal generation unit 14a may set the image quality of the video data to an image quality set to a high color temperature, may lower the sound volume, or set the image quality to a high frequency range. You may change the sound quality to one that emphasizes , you may superimpose an animation that changes the size of the heart, or you may superimpose a text such as "Yabai! on the video data. You may implement combining two or more. Further, for example, when the emotion estimation result indicates relaxation, the emotion corresponding signal generation unit 14a may set the image quality of the video data to a lower color temperature, a higher volume, or a flat sound quality.
  • an animation of an animal slowly floating in the screen may be superimposed on the video data, or a text such as "Mattari ⁇ " may be superimposed on the video data.
  • You may carry out combining two or more.
  • the emotion corresponding signal generation unit 14a may change the image quality of the video data to an image quality that emphasizes red, may lower the volume, or may indicate anger.
  • the icon may be superimposed on the video data so that the icon is displayed semi-transparently over the entire screen, or a text such as "Hmm! may be implemented.
  • the emotion-corresponding signal generation unit 14a may superimpose the emotion-corresponding signal on the image data by changing the image quality of the image data according to the estimation result of the emotion estimation unit 13.
  • the emotion corresponding signal may be superimposed on the video data by changing at least one of the volume and sound quality of the included sound data according to the estimation result of the emotion estimation unit 13 .
  • the emotion-corresponding signal generation unit 14a may superimpose the emotion-corresponding signal on the video data by superimposing an animation image or an icon image corresponding to the estimation result of the emotion estimating unit 13 on the video data.
  • the emotion corresponding signal may be superimposed on the video data by superimposing character information corresponding to the estimation result by the emotion estimation unit 13 on the image data.
  • the emotion-corresponding signal generation unit 14a outputs the image data on which the emotion-corresponding signal is superimposed to the image providing unit 11.
  • the image providing unit 11 transmits the image data on which the emotion corresponding signal is superimposed to the distributor device 4 .
  • the video data superimposed with the emotion-corresponding signal also contains sound data.
  • This sound data is the changed sound data when the volume or tone quality is changed by the emotion-responsive signal generating unit 14a, and is the changed sound data when the emotion-responsive signal generating unit 14a does not change the sound data. is the same as the input sound data.
  • the distributor device 4 transmits the video data superimposed with the emotion corresponding signal to the information processing system 5a.
  • the image receiving unit 51 of the information processing system 5a outputs the image data superimposed with the emotion corresponding signal to the output unit 52a.
  • the output unit 52a includes a display unit 522 and a speaker 523, and the display unit 522 displays video data superimposed with the emotion corresponding signal.
  • the speaker 523 outputs sound based on the sound data when the video data on which the emotion corresponding signal is superimposed includes the sound data.
  • the image providing device 1a of the present embodiment is realized by a computer system, like the image providing device 1 of the first embodiment.
  • the image providing device 1a may be realized by a cloud computer system.
  • the information processing system 5a of the present embodiment is also realized by a computer system in the same manner as the information processing system 5 of the first embodiment. Alternatively, it may be the terminal 505, the game machine body 503, the controller 504 and the TV 501, or any other configuration.
  • FIG. 9 shows an example of superimposing an emotion-corresponding signal on video data. good too.
  • FIG. 11 is a diagram showing a configuration example of a signal processing system according to a modification of this embodiment.
  • a signal processing system 100b shown in FIG. 11 is the same as the signal processing system 100 of Embodiment 1 except that an image providing device 1b is provided instead of the image providing device 1.
  • FIG. The image providing device 1b includes an emotion-corresponding signal generator 14b instead of the emotion-corresponding signal generator 14, and the image data superimposed with the emotion-corresponding signal is input to the image providing unit 11 from the emotion-corresponding signal generator 14b. Other than that, it is the same as the image providing device 1 of the first embodiment.
  • the image providing device 1b of the present embodiment is realized by a computer system, like the image providing device 1 of the first embodiment.
  • the image providing device 1b may be realized by a cloud computer system.
  • the emotion corresponding signal generation unit 14b shown in FIG. 11 generates an emotion corresponding signal for vibration, which is an emotion corresponding signal related to vibration, as in the first embodiment, and transmits the generated emotion corresponding signal to the emotion corresponding signal transmitting unit 15.
  • Output to Emotion-responsive signal transmission unit 15 transmits an emotional-responsive signal to information processing system 5 via distributor apparatus 4 in the same manner as in the first embodiment.
  • the emotion corresponding signal generation unit 14b superimposes the emotion corresponding signal on the video data, and outputs the video data superimposed with the emotion corresponding signal to the video providing unit 11, similarly to the emotion corresponding signal generation unit 14a shown in FIG. do.
  • the image providing unit 11 transmits the image data superimposed with the emotion corresponding signal to the information processing system 5 via the distributor device 4 .
  • the user can perceive the emotions of the athletes or performers in the event by viewing the video data on which the emotion-corresponding signals are superimposed, and can also feel the emotions of the athletes or performers through vibrations as in the first embodiment. can be done. This makes it possible to enhance the sense of unity between the player or performer in the event and the user who is watching the video of the event with the player or performer.

Abstract

The signal generation device according to the present disclosure is a video providing device (1) comprising: an emotional information acquisition unit (12) for acquiring sensor information from at least one of a wearable device (3) and imaging device (2) that acquire sensor information showing an emotion of an estimation target who is a player or performer in an event; an emotion estimation unit (13) for estimating the emotion of the estimation target using the sensor information; an emotion-associated signal generation unit (14) for generating an emotion-associated signal that causes an output unit (52) in an information processing system (5) for a user to execute output according to the estimation result from the emotion estimation unit (13); and an emotion-associated signal transmission unit (15) for transmitting the emotion-associated signal to the information processing system (5) directly or via another device.

Description

信号生成装置、信号処理システムおよび信号生成方法SIGNAL GENERATOR, SIGNAL PROCESSING SYSTEM AND SIGNAL GENERATING METHOD
 本開示は、信号生成装置、信号処理システムおよび信号生成方法に関する。 The present disclosure relates to a signal generation device, a signal processing system, and a signal generation method.
 近年、ウェアラブル装置により取得されたバイタル値、撮影装置によって撮影された映像を解析することで、人の体調、感情などを推定する技術への注目が高まっている。例えば、特許文献1には、商業施設などにおいて、管理サーバが、ウェアラブル装置により取得されたバイタル値をもとにユーザの体調または感情を分析し、分析結果を、商業施設のスタッフの端末へ送信する技術が開示されている。これにより、スタッフが、ユーザの体調または感情の変化に合わせて、当該ユーザへの対応を変えることができる。 In recent years, there has been increasing interest in technologies that estimate a person's physical condition and emotions by analyzing vital values acquired by wearable devices and images captured by imaging devices. For example, in Patent Literature 1, in a commercial facility, etc., a management server analyzes the user's physical condition or emotion based on vital values acquired by a wearable device, and transmits the analysis result to the terminal of the staff of the commercial facility. A technique for doing so is disclosed. This allows the staff to change their response to the user according to changes in the user's physical condition or emotions.
特開2018-207173号公報JP 2018-207173 A
 一方、スポーツ、コンサートなどのイベントを、イベント会場で直接観戦または鑑賞するだけでなく、これらのイベントの映像を配信するサービスの提供の需要が高まっている。映像の配信においては、付加価値を高めるために各種の工夫が検討されている。例えば、映像を視聴するユーザが、選手または演者の感情を共有できると、選手または演者と一体感を高めることができ、ユーザの満足度が向上することが予想される。 On the other hand, there is an increasing demand for not only watching or appreciating events such as sports and concerts directly at the event venue, but also for services that distribute video of these events. In video distribution, various ideas are being considered in order to increase added value. For example, if a user viewing a video can share the emotions of a player or a performer, it is possible to increase the sense of unity with the player or the performer, and it is expected that the user's satisfaction level will improve.
 しかしながら、特許文献1に記載の技術をイベントに適用しても、イベントの観客のウェアラブル装置から観客の感情を把握できるだけである。また、特許文献1はイベントの映像の提供に関する記載はなく、映像の配信において付加価値を高めるためにどのようなサービスを提供するかに関しても記載されていない。 However, even if the technology described in Patent Document 1 is applied to an event, it is only possible to grasp the emotions of the audience from the wearable devices of the audience at the event. Moreover, Patent Document 1 does not describe the provision of video of an event, nor does it describe what kind of service should be provided in order to increase the added value in video distribution.
 本開示は、上記に鑑みてなされたものであって、イベントの映像を視聴しているユーザの、イベントにおける選手または演者との一体感を高めることが可能な信号生成装置を得ることを目的とする。 The present disclosure has been made in view of the above, and an object thereof is to obtain a signal generation device that can enhance the sense of unity of a user who is watching an event video with a player or a performer in the event. do.
 上述した課題を解決し、目的を達成するために、本開示にかかる信号生成装置は、イベントにおける選手または演者である推定対象者の情緒を示す情報を取得するセンサから、センサによって取得されたセンサ情報を取得する情緒情報取得部、を備える。信号生成装置は、さらに、センサ情報を用いて、推定対象者の情緒を推定する情緒推定部と、情緒推定部による推定結果に応じた出力をユーザの情報処理システムにおける出力部に実行させるための情緒対応信号を生成する情緒対応信号生成部と、情緒対応信号を、直接または他の装置を経由して情報処理システムへ送信する情緒対応信号送信部と、を備える。 In order to solve the above-described problems and achieve the object, the signal generation device according to the present disclosure includes a sensor that acquires information indicating the emotion of an estimated target person who is a player or a performer in an event, a sensor acquired by the sensor an emotion information acquisition unit that acquires information; The signal generating device further includes an emotion estimating unit that estimates the emotion of the person to be presumed using the sensor information, and an output unit in the information processing system of the user that outputs according to the estimation result of the emotion estimating unit. An emotion-responsive signal generator for generating an emotional-responsive signal and an emotional-responsive signal transmitter for transmitting the emotional-responsive signal to an information processing system either directly or via another device.
 本開示にかかる信号生成装置は、イベントの映像を視聴しているユーザの、イベントにおける選手または演者との一体感を高めることができるという効果を奏する。 The signal generation device according to the present disclosure has the effect of enhancing the user's sense of unity with the athletes or performers in the event who are watching the video of the event.
実施の形態1にかかる信号処理システムの構成例を示す図1 is a diagram showing a configuration example of a signal processing system according to a first embodiment; FIG. 実施の形態1の映像提供装置における情緒対応信号の生成に関する動作の一例を示すフローチャートFlowchart showing an example of an operation related to generation of an emotion corresponding signal in the image providing device of the first embodiment 実施の形態1の情緒対応テーブルの一例を示す図A diagram showing an example of an emotion correspondence table according to the first embodiment 機械学習により情緒を推定する場合の実施の形態1の情緒推定部の構成例を示す図The figure which shows the structural example of the emotion estimation part of Embodiment 1 in the case of estimating an emotion by machine learning. ニューラルネットワークの一例を示す模式図Schematic diagram showing an example of a neural network 実施の形態1の出力対応テーブルの一例を示す図A diagram showing an example of an output correspondence table according to the first embodiment 実施の形態1の情報処理システムの機器構成例を示す図1 is a diagram showing a device configuration example of an information processing system according to Embodiment 1; FIG. 実施の形態1の映像提供装置を実現するコンピュータシステムの構成例を示す図FIG. 1 shows a configuration example of a computer system that realizes the image providing device of Embodiment 1; 実施の形態2にかかる信号処理システムの構成例を示す図FIG. 10 is a diagram showing a configuration example of a signal processing system according to a second embodiment; 実施の形態2の出力対応テーブルの一例を示す図A diagram showing an example of an output correspondence table according to the second embodiment 実施の形態2の変形例にかかる信号処理システムの構成例を示す図FIG. 10 is a diagram showing a configuration example of a signal processing system according to a modification of the second embodiment;
 以下に、実施の形態にかかる信号生成装置、信号処理システムおよび信号生成方法を図面に基づいて詳細に説明する。 A signal generation device, a signal processing system, and a signal generation method according to embodiments will be described in detail below with reference to the drawings.
実施の形態1.
 図1は、実施の形態1にかかる信号処理システムの構成例を示す図である。本実施の形態の信号処理システム100は、映像提供装置1および情報処理システム5を備える。映像提供装置1は、イベントにおける選手また演者の情緒に対応する後述する情緒対応信号を生成し、生成した情緒対応信号を直接または他の装置を経由して情報処理システム5へ送信する信号生成装置である。本実施の形態では、映像提供装置1は、信号処理装置としての機能を有するとともに、イベントを撮影した映像データをユーザの情報処理システム5へ提供する機能も有する。すなわち、映像提供装置1は、イベントを撮影する撮影装置2によって撮影された映像データを、配信事業者により運用される配信事業者装置4を介して、ユーザの情報処理システム5へ提供する。なお、映像提供装置1から提供される映像データは、撮影装置2によって撮影された映像データが、例えば、自由視点映像データなどに加工された映像データであってもよい。映像提供装置1は、さらに、撮影装置2と、イベントにおける選手または演者が身に着けるウェアラブル装置3とのうちの少なくとも一方を用いて、選手または演者の情緒を推定し、推定結果に対応した情緒対応信号を生成し、生成した情緒対応信号を、配信事業者装置4を介して情報処理システム5へ提供する。情緒対応信号は、選手または演者の情緒に応じて、情報処理システム5に振動、効果音、音楽などの出力を指示する信号である。情緒対応信号の詳細は後述する。
Embodiment 1.
FIG. 1 is a diagram illustrating a configuration example of a signal processing system according to a first embodiment; A signal processing system 100 according to the present embodiment includes a video providing device 1 and an information processing system 5 . The image providing device 1 is a signal generation device that generates an emotion-related signal, which will be described later, corresponding to the emotion of a player or performer in an event, and transmits the generated emotion-related signal to the information processing system 5 directly or via another device. is. In this embodiment, the video providing device 1 has a function as a signal processing device and also has a function of providing video data obtained by photographing an event to the user's information processing system 5 . That is, the image providing device 1 provides the user's information processing system 5 with the image data captured by the image capturing device 2 that captures the event via the distributor device 4 operated by the distributor. The video data provided by the video providing device 1 may be video data obtained by processing the video data captured by the imaging device 2 into, for example, free-viewpoint video data. The image providing device 1 further uses at least one of the photographing device 2 and the wearable device 3 worn by the athlete or performer in the event to estimate the emotion of the athlete or performer, and displays the emotion corresponding to the estimation result. A corresponding signal is generated, and the generated emotional corresponding signal is provided to the information processing system 5 via the distributor device 4 . The emotion corresponding signal is a signal that instructs the information processing system 5 to output vibration, sound effects, music, etc. according to the emotion of the player or performer. Details of the emotional response signal will be described later.
 以下では、映像提供装置1が、イベントの映像および情緒対応信号を、配信事業者装置4を介して情報処理システム5へ提供する例を説明するが、配信事業者装置4を介さずに情報処理システム5へ提供してもよい。なお、信号処理システム100に、映像提供装置1および情報処理システム5だけでなく、撮影装置2およびウェアラブル装置3のうちの少なくとも一方を含めてもよい。 An example in which the video providing device 1 provides the video of the event and the emotion corresponding signal to the information processing system 5 via the distributor device 4 will be described below. It may be provided to system 5. Note that the signal processing system 100 may include not only the image providing device 1 and the information processing system 5 but also at least one of the photographing device 2 and the wearable device 3 .
 図1では、撮影装置2を1台図示しているが、撮影装置2の数は複数であってもよい。撮影装置2は、イベントの全体または一部を撮影してもよいし、撮影装置2が複数設けられる場合には、特定の選手または演者を追尾して撮影する撮影装置であってもよいし、自由視点映像を生成するための複数の撮影装置2が含まれていてもよいし、ドローンなどを用いた空撮を行う撮影装置2が含まれていてもよい。本実施の形態の撮影装置2の撮影対象となるイベントは、例えば、スポーツ、コンサート、演劇などであり、具体的には、野球、サッカー、バレーボール、バスケットボール、格闘技、ボートレース、競馬、競輪、コンサート、演劇などであるが、これらに限定されない。イベントが行われる会場は、スタジアム、多目的ホール、コンサートホール、競艇場、競馬場、体育館などである。 Although one imaging device 2 is illustrated in FIG. 1, the number of imaging devices 2 may be plural. The photographing device 2 may photograph all or part of the event, or may be a photographing device that tracks and photographs a specific player or performer when a plurality of photographing devices 2 are provided. A plurality of imaging devices 2 for generating free-viewpoint video may be included, or an imaging device 2 for performing aerial photography using a drone or the like may be included. Events to be photographed by the photographing device 2 of the present embodiment are, for example, sports, concerts, plays, etc. Specifically, baseball, soccer, volleyball, basketball, martial arts, boat racing, horse racing, bicycle ring, concert , theatre, etc., but not limited to these. Venues where events are held include stadiums, multipurpose halls, concert halls, boat racecourses, racetracks, and gymnasiums.
 ウェアラブル装置3および撮影装置2は、イベントにおける選手または演者の情緒を示す情報を取得するセンサの例である。ウェアラブル装置3は、選手または演者の、生体情報、動きなどを検出可能なセンサであり、例えば、時計型、リストバンド型、衣服型、眼鏡型、指輪型などのセンサであるが、これらに限定されない。生体情報は、例えば、血圧、心拍数、体温、脳波、眼球の運動などのうちの少なくとも1つであるがこれらに限定されない。動きを示す情報は、加速度、筋肉の動きを示す生体電位情報などの少なくとも1つであるがこれらに限定されない。 The wearable device 3 and the imaging device 2 are examples of sensors that acquire information indicating the emotions of athletes or performers in an event. The wearable device 3 is a sensor capable of detecting biometric information, movement, etc., of a player or performer, and is, for example, a watch-type, wristband-type, clothes-type, eyeglass-type, ring-type sensor, etc., but is limited to these. not. The biological information is, for example, at least one of blood pressure, heart rate, body temperature, electroencephalogram, eye movement, etc., but is not limited to these. The information indicating movement is at least one of acceleration, biopotential information indicating muscle movement, and the like, but is not limited to these.
 また、一人の選手または演者が、複数の異なる種類のウェアラブル装置3を身に着けていてもよい。また、情緒の推定対象の選手または演者が複数存在する場合には、これらの複数の選手または演者がそれぞれウェアラブル装置3を身に着ける。 Also, one player or performer may wear multiple different types of wearable devices 3 . Further, when there are a plurality of players or performers whose emotions are to be estimated, each of these players or performers wears the wearable device 3 .
 また、ウェアラブル装置3は、選手または演者が身に着ける撮影装置であってもよい。例えば、ウェアラブル装置3は、選手または演者の目線で撮影可能なウェアラブルカメラであってもよい。なお、提供する映像データを撮影する撮影装置2のなかに、選手または演者が身に着ける撮影装置が含まれていてもよく、この場合、選手または演者が身に着ける撮影装置2は、撮影装置2であるとともにウェアラブル装置3である。センサとして生体情報、動きなどを検出可能なウェアラブル装置3が用いられる場合、センサによって取得されるセンサ情報は、生体情報、動きを示す情報であり、センサとして撮影装置2が用いられる場合、センサによって取得されるセンサ情報は映像データである。 Also, the wearable device 3 may be a photographing device worn by a player or a performer. For example, the wearable device 3 may be a wearable camera capable of shooting from the player's or performer's line of sight. The imaging device 2 that captures the video data to be provided may include an imaging device worn by the athlete or performer. In this case, the imaging device 2 worn by the athlete or performer 2 and the wearable device 3 . When the wearable device 3 capable of detecting biological information and movement is used as a sensor, the sensor information acquired by the sensor is information indicating biological information and movement. The acquired sensor information is video data.
 ウェアラブル装置3および撮影装置2と、映像提供装置1との間の通信は、無線通信を含むが、無線通信と有線通信とが混在していてもよい。無線通信では、イベントの会場におけるローカル5G(第5世代移動通信システム)、プライベートLTE(Long Term Evolution)などの専用回線が用いられてもよい。例えば、撮影装置2と映像提供装置1との間の通信ではローカル5Gが用いられ、ウェアラブル装置3と映像提供装置1との間の通信ではプライベートLTEが用いられてもよい。なお、通信の方式はこれらに限定されない。 Communication between the wearable device 3 and the imaging device 2 and the image providing device 1 includes wireless communication, but wireless communication and wired communication may be mixed. For wireless communication, dedicated lines such as local 5G (5th generation mobile communication system) and private LTE (Long Term Evolution) at the venue of the event may be used. For example, local 5G may be used for communication between the imaging device 2 and the image providing device 1 , and private LTE may be used for communication between the wearable device 3 and the image providing device 1 . Note that the communication method is not limited to these.
 図1に示すように、映像提供装置1は、映像提供部11、情緒情報取得部12、情緒推定部13、情緒対応信号生成部14および情緒対応信号送信部15を備える。映像提供装置1は、撮影装置2から映像データを取得し、取得した映像データを配信事業者装置4へ送信する。なお、映像提供装置1は、上述したように、映像データを加工してから配信事業者装置4へ送信してもよく、この場合、図示しない映像処理部により撮影装置2から取得された映像データが加工され、映像提供部11は、加工後の映像データを送信する。映像処理部による加工は、例えば、自由視点映像を生成する処理、解像度、コーデックなどを変換する処理などを含む。また、図示は省略しているが、映像データは、イベントの会場における集音装置によって、イベント会場の音が集音されて音データとして映像データに含まれていてもよい。なお、本実施の形態の信号生成装置は、情緒情報取得部12、情緒推定部13、情緒対応信号生成部14および情緒対応信号送信部15を備えればよい。したがって、映像提供装置1は、映像提供部11を備えていなくてもよく、この場合、映像データをユーザに提供する装置が別に設けられてもよいし、撮影装置2から映像データがユーザに提供されてもよい。また、この場合、映像データをユーザに提供する装置または撮影装置2は、配信事業者装置4を介して映像データを情報処理システム5に送信してもよいし、直接情報処理システム5に映像データを送信してもよい。 As shown in FIG. 1, the image providing device 1 includes an image providing unit 11, an emotion information acquiring unit 12, an emotion estimating unit 13, an emotion corresponding signal generating unit 14, and an emotion corresponding signal transmitting unit 15. The video providing device 1 acquires video data from the imaging device 2 and transmits the acquired video data to the distributor device 4 . Note that the image providing apparatus 1 may process the image data before transmitting it to the distributor apparatus 4 as described above. is processed, and the image providing unit 11 transmits the processed image data. Processing by the video processing unit includes, for example, processing for generating a free viewpoint video, processing for converting resolution, codec, and the like. Also, although not shown, the video data may include the sound of the event venue collected by a sound collector at the event venue and included in the video data as sound data. It should be noted that the signal generation device of the present embodiment may include emotion information acquiring section 12 , emotion estimating section 13 , emotion corresponding signal generating section 14 and emotion corresponding signal transmitting section 15 . Therefore, the image providing device 1 may not include the image providing unit 11. In this case, a separate device for providing image data to the user may be provided, or the image data may be provided to the user from the photographing device 2. may be Further, in this case, the device that provides video data to the user or the imaging device 2 may transmit the video data to the information processing system 5 via the distributor device 4, or may transmit the video data directly to the information processing system 5. may be sent.
 情緒情報取得部12は、イベントにおける選手または演者である推定対象者の情緒を示す情報を取得するセンサから、センサによって取得されたセンサ情報を取得する。センサは、推定対象者が身につけるウェアラブル装置3と、推定対象者を含む映像を撮影可能な撮影装置2とのうち少なくとも一方を含む。情緒情報取得部12は、取得したセンサ情報を情緒推定部13へ出力する。 The emotion information acquisition unit 12 acquires the sensor information acquired by the sensor from the sensor that acquires the information indicating the emotion of the presumed target person who is the player or performer in the event. The sensor includes at least one of a wearable device 3 worn by the presumed target person and a photographing device 2 capable of capturing an image including the presumed target person. The emotion information acquisition unit 12 outputs the acquired sensor information to the emotion estimation unit 13 .
 情緒推定部13は、センサ情報を用いて推定対象者の情緒を推定し、推定結果を情緒対応信号生成部14へ出力する。情緒推定部13は、例えば、センサ情報が数値で示される情報である場合には、センサ情報における各項目の情報の数値の範囲と情緒との対応を示す対応情報をテーブルとして保持し、保持しているテーブルを用いて情緒を推定してもよいし、機械学習を用いて情緒を推定してもよい。情緒の推定方法については後述する。 The emotion estimation unit 13 estimates the emotion of the person to be estimated using the sensor information, and outputs the estimation result to the emotion corresponding signal generation unit 14 . For example, when the sensor information is information indicated by a numerical value, the emotion estimating unit 13 holds, as a table, correspondence information indicating the correspondence between the numerical range of each item in the sensor information and the emotion. Emotions may be estimated using a table that contains information, or may be estimated using machine learning. A method of estimating emotion will be described later.
 情緒対応信号生成部14は、情緒推定部13から受け取った推定結果を用いて、推定結果に応じた出力を情報処理システム5における出力部52に実行させるための情緒対応信号を生成し、生成した情緒対応信号を情緒対応信号送信部15へ出力する。情緒対応信号送信部15は、情緒対応信号を、配信事業者装置4を介して情報処理システム5へ送信する。情緒対応信号送信部15は、情緒対応信号を、配信事業者装置4を介さずに情報処理システム5へ送信してもよい。なお、図1では、映像提供部11と情緒対応信号送信部15とが個別に記載されているが、映像提供部11が情緒対応信号送信部15としての機能を有し、映像提供部11が、映像データと情緒対応信号とを配信事業者装置4へ送信してもよい。なお、図1では、配信事業者装置4を1つ図示しているが、配信事業者装置4は複数であってもよい。また、複数の配信事業者装置4は、それぞれが異なる配信事業者により運用されるものであってもよい。 Using the estimation result received from the emotion estimating unit 13, the emotion corresponding signal generating unit 14 generates an emotion corresponding signal for causing the output unit 52 in the information processing system 5 to output according to the estimation result. The emotional response signal is output to the emotional response signal transmission unit 15 . Emotional response signal transmission unit 15 transmits an emotional response signal to information processing system 5 via distributor device 4 . The emotion-responsive signal transmission unit 15 may transmit the emotional-responsive signal to the information processing system 5 without going through the distributor device 4 . 1, the image providing unit 11 and the emotion corresponding signal transmitting unit 15 are shown separately, but the image providing unit 11 functions as the emotion corresponding signal transmitting unit 15, , the video data and the emotion corresponding signal may be transmitted to the distributor device 4 . Although one distributor device 4 is illustrated in FIG. 1, a plurality of distributor devices 4 may be provided. Also, the plurality of distributor devices 4 may be operated by different distributors.
 配信事業者装置4は、映像提供装置1から映像データおよび情緒対応信号を受信すると、映像データおよび情緒対応信号を、映像データの配信対象の情報処理システム5へ送信する。 Upon receiving the video data and the emotion corresponding signal from the video providing device 1, the distributor device 4 transmits the video data and the emotion corresponding signal to the information processing system 5 to which the video data is to be distributed.
 情報処理システム5は、映像データを受信し、受信した映像データを表示することが可能である。情報処理システム5は、図1に示すように、映像受信部51、出力部52および情緒対応信号処理部53を備える。情報処理システム5は、後述するように、テレビジョン放送を視聴可能なテレビ(以下、TVと略す)を備えていてもよいし、TVおよびゲーム機を備えていてもよいし、スマートフォンなどの端末であってもよい。なお、図1では、情報処理システム5を1つ図示しているが、情報処理システム5は複数であってもよい。映像受信部51は、配信事業者装置4から映像データおよび情緒対応信号を受信し、受信した映像データを出力部52へ出力し、受信した情緒対応信号を情緒対応信号処理部53へ出力する。 The information processing system 5 can receive video data and display the received video data. The information processing system 5 includes a video receiving section 51, an output section 52, and an emotion corresponding signal processing section 53, as shown in FIG. As will be described later, the information processing system 5 may include a television (hereinafter abbreviated as TV) on which television broadcasting can be viewed, may include a TV and a game machine, or may include a terminal such as a smartphone. may be Although one information processing system 5 is illustrated in FIG. 1, a plurality of information processing systems 5 may be provided. Video receiving portion 51 receives video data and an emotion corresponding signal from distributor device 4 , outputs the received video data to output portion 52 , and outputs the received emotion corresponding signal to emotion corresponding signal processing portion 53 .
 出力部52は、情緒対応信号処理部53からの指示に基づいて出力を実行する。出力部52は、振動生成部521、表示部522およびスピーカ523を備える。なお、図1では、出力部52が振動生成部521を備える例を示しているが、情報処理システム5は、振動生成部521を備えていなくてもよい。情報処理システム5が複数設けられる場合、振動生成部521を備える情報処理システム5と、振動生成部521を備えない情報処理システム5とが混在していてもよい。 The output unit 52 executes output based on instructions from the emotion-corresponding signal processing unit 53. The output unit 52 includes a vibration generation unit 521 , a display unit 522 and a speaker 523 . Note that FIG. 1 shows an example in which the output unit 52 includes the vibration generation unit 521 , but the information processing system 5 may not include the vibration generation unit 521 . When a plurality of information processing systems 5 are provided, the information processing system 5 including the vibration generating section 521 and the information processing system 5 not including the vibration generating section 521 may be mixed.
 振動生成部521は、振動をユーザに伝えることが可能である。表示部522は、映像データを表示することが可能である。スピーカ523は、映像データに含まれる音データを出力することが可能であり、また、情緒対応信号に基づく効果音、音楽を出力することが可能である。 The vibration generator 521 can transmit vibration to the user. The display unit 522 can display video data. The speaker 523 can output sound data included in the video data, and can output sound effects and music based on the emotion corresponding signal.
 情緒対応信号処理部53は、情緒対応信号に基づいて、出力部52の振動生成部521およびスピーカ523のうち、情緒対応信号に応じた動作を実施する機器を選択し、選択した機器へ情緒対応信号に従った出力を実施するよう指示する。これにより、出力部52は、情緒対応信号に基づいて出力を実行することになる。情緒対応信号が振動に関する指示を含む場合には、情緒対応信号処理部53は、振動生成部521を選択し、情緒対応信号が効果音、音楽などの音響に関する指示を示す場合には、情緒対応信号処理部53は、スピーカ523を選択する。情緒対応信号が振動に関する指示と音響に関する指示との両方を含む場合には、情緒対応信号処理部53は、振動生成部521およびスピーカ523を選択する。 Based on the emotion-responsive signal, the emotional-responsive signal processing unit 53 selects, from among the vibration generating unit 521 and the speaker 523 of the output unit 52, a device that performs an operation according to the emotional-responsive signal, and applies the emotional response to the selected device. Instruct to perform output according to the signal. As a result, the output unit 52 executes output based on the emotion corresponding signal. If the emotion corresponding signal includes an instruction regarding vibration, the emotion corresponding signal processing unit 53 selects the vibration generating unit 521. The signal processing unit 53 selects the speaker 523 . When the emotion corresponding signal includes both an instruction regarding vibration and an instruction regarding sound, emotion corresponding signal processing section 53 selects vibration generating section 521 and speaker 523 .
 次に、本実施の形態の情緒対応信号の生成について説明する。図2は、本実施の形態の映像提供装置1における情緒対応信号の生成に関する動作の一例を示すフローチャートである。図2に示すように、映像提供装置1は、センサ情報を取得する(ステップS1)。センサ情報の取得は、例えば、イベントの映像の提供の開始とともに開始される。ステップS1では、詳細には、情緒情報取得部12が、撮影装置2と、イベントにおける選手または演者が身に着けるウェアラブル装置3とのうちの少なくとも一方から、センサ情報を取得し、取得したセンサ情報を情緒推定部13へ出力する。 Next, the generation of the emotion corresponding signal according to this embodiment will be described. FIG. 2 is a flow chart showing an example of operations related to generation of an emotion corresponding signal in the image providing device 1 of the present embodiment. As shown in FIG. 2, the image providing device 1 acquires sensor information (step S1). Acquisition of sensor information is started, for example, when provision of video of the event is started. Specifically, in step S1, the emotion information acquisition unit 12 acquires sensor information from at least one of the imaging device 2 and the wearable device 3 worn by the athlete or performer in the event, and acquires the acquired sensor information. is output to the emotion estimation unit 13 .
 次に、映像提供装置1は、センサ情報を用いて選手または演者の情緒を推定する(ステップS2)。情緒推定部13は、センサ情報を用いて選手または演者の情緒を推定し、推定結果を情緒対応信号生成部14へ出力する。 Next, the image providing device 1 uses the sensor information to estimate the emotion of the player or performer (step S2). The emotion estimation unit 13 estimates the emotion of the player or performer using the sensor information, and outputs the estimation result to the emotion corresponding signal generation unit 14 .
 情緒推定部13は、例えば、センサ情報における各項目の情報の数値の範囲と情緒との対応を示すテーブルである情緒対応テーブルを用いて、センサ情報から選手または演者の情緒を推定してもよいし、機械学習によりセンサ情報から選手または演者の情緒を推定してもよい。図3は、本実施の形態の情緒対応テーブルの一例を示す図である。図3に示した例では、血流、心拍数、脳波(脳波の振幅、周波数など)、体の動き(加速度など)、筋肉の動き(生体電位の値など)の各情報に関して、興奮、緊張、怒り、リラックスなどの情緒に対応する範囲がそれぞれ情緒対応テーブルに格納されている。センサ情報内の各情報がこれらの値に該当しない場合には、情緒がその他と判定されてもよい。 The emotion estimating unit 13 may estimate the emotion of a player or a performer from sensor information, for example, using an emotion correspondence table, which is a table indicating the correspondence between the numerical range of each item in the sensor information and the emotion. Alternatively, machine learning may be used to estimate the player's or performer's emotion from sensor information. FIG. 3 is a diagram showing an example of an emotion correspondence table according to this embodiment. In the example shown in FIG. 3, each information of blood flow, heart rate, brain wave (brain wave amplitude, frequency, etc.), body movement (acceleration, etc.), muscle movement (biopotential value, etc.) is expressed as excitement, tension, and so on. , anger, and relaxation are stored in the emotion correspondence table. If each information in the sensor information does not correspond to these values, the emotion may be determined as other.
 なお、情緒推定部13は、図3に示した血流、心拍数、脳波、体の動き、筋肉の動きの各情報の全てに関して、センサ情報により示される値が、情緒対応テーブルで示される範囲内である場合に、選手または演者の情緒が、対応する興奮、緊張、怒り、リラックスなどの情緒であると推定してもよいし、いずれか1つの項目が該当する場合に対応する情緒であると推定してもよい。また、各項目の情報に優先度を定めておき、各項目に対応する情緒が異なっている場合に、優先度の高い項目の判定を優先するなどといった方法が用いられてもよい。例えば、情緒推定部13は、脳波が筋肉の動きより優先度を高く設定されている場合に、センサ情報で示される筋肉の動きの数値が怒りに対応する範囲であり、センサ情報で示される脳波の値がリラックスに対応する範囲であった場合には、リラックスと推定してもよい。または、このように、各項目に対応する情緒が異なる場合には、情緒推定部13は、その他と判定してもよい。 It should be noted that the emotion estimating unit 13 determines that the values indicated by the sensor information for all of the blood flow, heart rate, electroencephalogram, body motion, and muscle motion information shown in FIG. It may be presumed that the emotion of the player or performer is the corresponding emotion such as excitement, tension, anger, relaxation, etc., if any one item is applicable. can be estimated. Also, a method may be used in which a priority is set for each item of information, and when the emotion corresponding to each item is different, priority is given to the determination of the item with the higher priority. For example, when brain waves are given higher priority than muscle movements, the emotion estimator 13 determines that the numerical values of the muscle movements indicated by the sensor information are within a range corresponding to anger, and that the brain waves indicated by the sensor information are in a range corresponding to anger. If the value of is in the range corresponding to relaxation, it may be estimated to be relaxed. Alternatively, when the emotion corresponding to each item is different in this way, the emotion estimating unit 13 may determine otherwise.
 図3は、一例であり、情緒対応テーブルとして格納される項目は、血流、心拍数、脳波、体の動き、筋肉の動きに限定されず、これらのうちの一部であってもよいし、これら以外の項目を含んでいてもよい。また、情緒対応テーブルに格納される情緒の種類も図3に示した例に限定されず、情緒対応テーブルに格納される情緒の種類は、これらのうちの一部であってもよいし、これら以外の種類を含んでいてもよい。 FIG. 3 is an example, and the items stored as the emotion correspondence table are not limited to blood flow, heart rate, electroencephalogram, body movement, and muscle movement, and may be some of them. , may include items other than these. Also, the types of emotions stored in the emotion correspondence table are not limited to the example shown in FIG. It may contain other types.
 次に、情緒推定部13が機械学習により情緒を推定する例を説明する。図4は、機械学習により情緒を推定する場合の本実施の形態の情緒推定部13の構成例を示す図である。図4に示した例では、情緒推定部13は、学習済モデル生成部131、学習済モデル記憶部132および推定部133を備える。 Next, an example in which the emotion estimation unit 13 estimates emotions by machine learning will be described. FIG. 4 is a diagram showing a configuration example of the emotion estimating section 13 of the present embodiment when estimating an emotion by machine learning. In the example shown in FIG. 4 , the emotion estimation unit 13 includes a learned model generation unit 131 , a learned model storage unit 132 and an estimation unit 133 .
 推定部133は、学習済モデル記憶部132に格納されている学習済モデルを読み出し、読み出した学習済モデルに、情緒情報取得部12から入力されるセンサ情報を入力することで、推定対象者の情緒を推定する。すなわち、学習済モデルに、情緒情報取得部12から入力されるセンサ情報を入力して得られる出力を推定対象者の情緒の推定結果とする。学習済モデルは、センサ情報から選手または演者である推定対象者の情緒を推定するための学習済モデルであり、イベントの映像の提供の開始前に、学習済モデル生成部131によって、例えば、以下のように生成される。 The estimating unit 133 reads the learned model stored in the learned model storage unit 132, and inputs the sensor information input from the emotional information acquisition unit 12 to the read-out learned model, thereby Estimate emotions. That is, the output obtained by inputting the sensor information input from the emotion information acquisition unit 12 into the trained model is used as the estimation result of the emotion of the person to be estimated. A trained model is a trained model for estimating the emotion of an estimation target person who is a player or a performer from sensor information. is generated as
 学習済モデル生成部131は、情緒情報取得部12から入力されるセンサ情報と、当該センサ情報に対応する正解データとを含む学習用データセットを複数用いて学習済モデルを生成し、生成した学習済モデルを学習済モデル記憶部132に格納する。学習済モデルの生成は、イベントの映像の提供が行われる前に実施しておく。 The trained model generation unit 131 generates a trained model using a plurality of learning data sets including sensor information input from the emotion information acquisition unit 12 and correct data corresponding to the sensor information, and generates a trained model. The trained model is stored in the trained model storage unit 132 . A trained model is generated before the video of the event is provided.
 学習済モデル生成部131に入力されるセンサ情報は、情緒情報取得部12から入力されるものに限定されず、学習のために取得された学習用のセンサ情報であってもよい。学習用のセンサ情報は、センサ情報と同様の形式で同様の項目の情報を含む。学習用のセンサ情報は、図示しない入力手段によって映像提供装置1に入力され入力手段から学習済モデル生成部131に入力されてもよいし、他の装置から送信され図示しない受信手段により受信され受信手段から学習済モデル生成部131に入力されてもよい。正解データは、センサ情報に対応する情緒の正解が、例えば、上述した興奮、緊張、怒り、リラックスなどのうちのいずれであるかを示すデータである。正解データは、例えば、センサ情報に対応する情緒を当該センサ情報の取得元の対象者から聴き取ることで決定されてもよいし、有識者等がセンサ情報を確認して正解データを決定してもよい。正解データは、例えば、図示しない入力手段によって映像提供装置1に入力され入力手段から学習済モデル生成部131に入力されてもよいし、他の装置から送信され図示しない受信手段により受信され受信手段から学習済モデル生成部131に入力されてもよい。 The sensor information input to the learned model generation unit 131 is not limited to that input from the emotion information acquisition unit 12, and may be learning sensor information acquired for learning. The sensor information for learning includes information of similar items in the same format as the sensor information. The sensor information for learning may be input to the image providing apparatus 1 by input means (not shown) and input to the trained model generation unit 131 from the input means, or may be transmitted from another device and received by receiving means (not shown). It may be input to the learned model generation unit 131 from the means. The correct answer data is data indicating which of the above-described excitement, tension, anger, relaxation, etc. is the correct answer for the emotion corresponding to the sensor information. The correct data may be determined, for example, by listening to the emotion corresponding to the sensor information from the subject who acquired the sensor information, or an expert or the like may confirm the sensor information and determine the correct data. good. For example, the correct data may be input to the image providing apparatus 1 by an input means (not shown) and input to the trained model generation unit 131 from the input means, or may be transmitted from another apparatus and received by a receiving means (not shown). may be input to the learned model generation unit 131 from the
 学習済モデル生成部131における学習済モデルの生成は、例えば、教師あり学習により行われる。教師あり学習のアルゴリズムとしては、どのようなものを用いてもよいが、例えば、ニューラルネットワークモデルを用いることもできる。ニューラルネットワークは、複数のニューロンからなる入力層、複数のニューロンからなる中間層(隠れ層)、および複数のニューロンからなる出力層で構成される。中間層は、1層、または2層以上でもよい。 The generation of a trained model in the trained model generation unit 131 is performed, for example, by supervised learning. Any supervised learning algorithm may be used, and for example, a neural network model may also be used. A neural network consists of an input layer made up of multiple neurons, an intermediate layer (hidden layer) made up of multiple neurons, and an output layer made up of multiple neurons. The intermediate layer may be one layer, or two or more layers.
 図5は、ニューラルネットワークの一例を示す模式図である。例えば、図5に示すような3層のニューラルネットワークであれば、複数の入力が入力層(X1-X3)に入力されると、その値に重みW1(w11-w16)を掛けて中間層(Y1-Y2)に入力され、その結果にさらに重みW2(w21-w26)を掛けて出力層(Z1-Z3)から出力される。この出力結果は、重みW1の値と重みW2の値とによって変わる。 FIG. 5 is a schematic diagram showing an example of a neural network. For example, in a three-layer neural network as shown in FIG. Y1-Y2), and the result is multiplied by weight W2 (w21-w26) and output from the output layer (Z1-Z3). This output result changes depending on the value of weight W1 and the value of weight W2.
 本実施の形態においては、センサ情報が入力されたときの出力層からの出力が正解データに近づくように、重みW1と重みW2とを調整することで、センサ情報と正解データとの関係が学習される。なお、機械学習のアルゴリズムはニューラルネットワークに限定されない。また、機械学習として強化学習などが用いられてもよい。 In this embodiment, the relationship between the sensor information and the correct data is learned by adjusting the weight W1 and the weight W2 so that the output from the output layer when the sensor information is input approaches the correct data. be done. Note that machine learning algorithms are not limited to neural networks. Reinforcement learning or the like may also be used as machine learning.
 センサ情報として生体情報などの数値が用いられる場合には、学習済モデル生成部131の入力層に数値が入力される。センサ情報が複数の項目の情報を含む場合には、各項目の情報が入力層にそれぞれX1~X3として入力される。なお、図5では、入力および出力がそれぞれ3つの例を示しているが、入力および出力の数はこの例に限定されない。 When numerical values such as biological information are used as sensor information, the numerical values are input to the input layer of the learned model generation unit 131 . When the sensor information includes information on a plurality of items, the information on each item is input to the input layer as X1 to X3, respectively. Although FIG. 5 shows an example with three inputs and three outputs, the number of inputs and outputs is not limited to this example.
 センサ情報が映像データである場合、例えば一定時間ごとの映像データを静止画の画像データとして、機械学習の入力データとして用いてもよいし、一定時間内の映像データの全てを機械学習の入力データとして用いてもよい。例えば、映像データが、推定対象者を追尾して得られる映像データであったり、推定対象者の顔が拡大されて撮影されたものであったりする場合には、映像データにおける表情などに情緒が現れることもある。このような映像データが取得されている場合には、映像データをセンサ情報として用いることができる。また、センサ情報として、生体情報などの数値と映像データとの両方が用いられる場合には、これらの全てを機械学習の入力データとして用いる。なお、推定対象者となる選手または演者が複数の場合、推定対象者ごとに、学習済モデルが生成されてもよいし、推定対象を区別せずに共通の学習済モデルが生成されてもよいし、例えば、スポーツの種類ごと、イベントの会場ごとなどに学習済モデルが生成されてもよい。 When the sensor information is video data, for example, video data at fixed time intervals may be used as image data of still images as input data for machine learning, or all video data within a fixed time period may be used as input data for machine learning. may be used as For example, if the video data is video data obtained by tracking the presumed target, or if the face of the presumed target is magnified and photographed, the facial expressions in the video data may convey emotions. sometimes appear. When such image data is acquired, the image data can be used as sensor information. Also, when both numerical values such as biological information and image data are used as sensor information, all of these are used as input data for machine learning. If there are multiple players or performers to be estimated, a trained model may be generated for each estimated target, or a common trained model may be generated without distinguishing the estimation targets. However, for example, a trained model may be generated for each type of sport, each venue for an event, or the like.
 センサ情報としては、ウェアラブル装置3により取得された情報と撮影装置2により取得された映像データとのうち少なくとも1つが用いられればよい。 At least one of the information acquired by the wearable device 3 and the video data acquired by the imaging device 2 may be used as the sensor information.
 なお、図4に示した例では、情緒推定部13が、学習済モデル生成部131を備えているが、映像提供装置1とは別に学習済モデルを生成する学習装置を設け、学習装置が学習済モデル生成部131を備えてもよい。この場合、情緒推定部13は学習済モデル生成部131を備えなくてよく、学習装置の学習済モデル生成部131が、上記と同様に学習済モデルを生成する。そして、学習装置によって生成された学習済モデルが、情緒推定部13の学習済モデル記憶部132に格納される。 In the example shown in FIG. 4, the emotion estimating unit 13 includes a trained model generating unit 131. However, a learning device that generates a trained model is provided separately from the image providing device 1, and the learning device performs learning. A finished model generation unit 131 may be provided. In this case, the emotion estimation unit 13 does not need to include the trained model generation unit 131, and the trained model generation unit 131 of the learning device generates a trained model in the same manner as described above. Then, the learned model generated by the learning device is stored in the learned model storage section 132 of the emotion estimation section 13 .
 また、さらに、ウェアラブル装置3により取得された推定対象者の位置を示す位置情報が、情緒の推定に用いられてもよい。例えば、イベントがサッカーの試合である場合、推定対象者が、敵のゴール付近に存在する場合と、味方のゴール付近に存在する場合と、これら以外の場合とで、生体情報などが同一であったとしても、情緒に違いが現れる可能性がある。したがって、機械学習により情緒の推定が行われる場合には、例えば、サッカーのフィールド内の位置をセンサ情報の1つとして用いてもよい。情緒対応テーブルにより、情緒を推定する場合には、サッカーのフィールド内の位置をあらかじめ複数の領域に分割しておき、どの領域に推定対象者が存在するかに応じて、情緒対応テーブルにおいて定義される各情報の範囲を補正してもよい。 Further, the position information indicating the position of the estimation target person acquired by the wearable device 3 may be used for estimating the emotion. For example, if the event is a soccer match, the biometric information and the like are the same for the case where the person to be estimated exists near the opponent's goal, the case where the person is near the teammate's goal, and the other cases. Even so, there may be differences in emotion. Therefore, when emotion is estimated by machine learning, for example, a position within a soccer field may be used as one piece of sensor information. When estimating an emotion using an emotion correspondence table, the positions in the soccer field are divided into a plurality of regions in advance, and the regions defined in the emotion correspondence table are defined according to the region in which the person to be estimated exists. The range of each information may be corrected.
 図2の説明に戻る。ステップS2の後、映像提供装置1は、推定結果を用いて情緒対応信号を生成する(ステップS3)。詳細には、情緒対応信号生成部14が、情緒推定部13から受け取った推定結果を用いて、推定結果が示す情緒に対応する情緒対応信号を生成し、生成した情緒対応信号を情緒対応信号送信部15へ出力する。情緒対応信号生成部14は、例えば、情緒と情報処理システム5における出力内容との対応を示す出力対応情報を出力対応テーブルとして保持し、保持している出力対応テーブルを用いて情報処理システム5における出力内容を決定し、決定した出力内容に対応する情緒対応信号を生成する。 Return to the description of Figure 2. After step S2, the image providing device 1 uses the estimation result to generate an emotion corresponding signal (step S3). Specifically, the emotion corresponding signal generator 14 uses the estimation result received from the emotion estimating unit 13 to generate an emotion corresponding signal corresponding to the emotion indicated by the estimation result, and transmits the generated emotion corresponding signal. Output to unit 15 . For example, the emotion corresponding signal generation unit 14 holds, as an output correspondence table, output correspondence information indicating correspondence between emotions and output contents in the information processing system 5, and uses the held output correspondence table to An output content is determined, and an emotional response signal corresponding to the determined output content is generated.
 図6は、本実施の形態の出力対応テーブルの一例を示す図である。図6に示した例では、振動機能、効果音、音楽のそれぞれに関して、情緒の種類ごとに出力内容が示されている。例えば、図6に示した例では、推定結果が興奮を示す場合、振動機能の出力内容は振幅が大きい高周波振動であり、効果音の出力内容は、ファンファーレ、漫画の興奮を示す効果音、映画の興奮を示す効果音であり、音楽の出力内容は、興奮を示す映画音楽、興奮を示すゲーム音楽である。また、例えば、図6に示した例では、推定結果が緊張を示す場合、振動機能の出力内容はピーキーで間欠的な振動であり、効果音の出力内容は、漫画の緊張を示す効果音、映画の緊張を示す効果音であり、音楽の出力内容は、緊張を示す映画音楽、緊張を示すゲーム音楽である。また、例えば、図6に示した例では、推定結果が怒りを示す場合、振動機能の出力内容は振幅が大きい低周波振動であり、効果音の出力内容は、火山の噴火音、地震の音、漫画の怒りを示す効果音、映画の怒りを示す効果音であり、音楽の出力内容は、怒りを示す映画音楽、怒りを示すゲーム音楽である。また、例えば、図6に示した例では、推定結果がリラックスを示す場合、振動機能の出力内容はゆらぎのある低周波振動であり、効果音の出力内容は、波の音、鳥のさえずり、小川のせせらぎであり、音楽の出力内容は、リラックス効果のあるクラシック音楽、テンポの遅い音楽である。 FIG. 6 is a diagram showing an example of the output correspondence table of this embodiment. In the example shown in FIG. 6, output contents are shown for each type of emotion with respect to each of the vibration function, sound effects, and music. For example, in the example shown in FIG. 6, when the estimation result indicates excitement, the output content of the vibration function is high-frequency vibration with a large amplitude, and the output content of the sound effect is fanfare, sound effect indicating excitement of comics, movie The output contents of music are movie music indicating excitement and game music indicating excitement. Further, for example, in the example shown in FIG. 6, when the estimation result indicates tension, the output content of the vibration function is peaky and intermittent vibration, and the output content of the sound effect is a sound effect indicating the tension of a cartoon It is a sound effect indicating the tension of the movie, and the output contents of the music are the movie music indicating the tension and the game music indicating the tension. For example, in the example shown in FIG. 6, when the estimation result indicates anger, the output content of the vibration function is low-frequency vibration with a large amplitude, and the output content of the sound effect is the volcanic eruption sound or the sound of an earthquake. , angry sound effects in cartoons, and angry sound effects in movies. Further, for example, in the example shown in FIG. 6, when the estimation result indicates relaxation, the output content of the vibration function is low-frequency vibration with fluctuation, and the output content of the sound effect is the sound of waves, the chirping of birds, It is the babbling of a stream, and the music output contents are classical music with a relaxing effect and music with a slow tempo.
 このように、情緒対応信号生成部14は、情報処理システム5の振動生成部521を、情緒推定部13による推定結果に応じて振動させる情緒対応信号を生成してもよいし、情報処理システム5のスピーカ523に、情緒推定部13による推定結果に応じた効果音または音楽を出力させる情緒対応信号を生成してもよい。 In this way, the emotion corresponding signal generating section 14 may generate an emotion corresponding signal that causes the vibration generating section 521 of the information processing system 5 to vibrate according to the estimation result of the emotion estimating section 13. The emotion corresponding signal may be generated to cause the speaker 523 to output a sound effect or music corresponding to the result of estimation by the emotion estimation unit 13 .
 さらに、ウェアラブル装置3により取得された推定対象者の位置を示す位置情報に応じて、同じ情緒であっても、出力内容を変更してもよい。例えば、サッカーのフィールド内の位置をあらかじめ複数の領域に分割しておき、どの領域に推定対象者が存在するかに応じて、出力内容を定めておいてもよい。例えば、情緒の推定結果として興奮という結果が得られた場合に、敵のゴール付近に存在する場合と、味方のゴール付近に存在する場合と、これら以外の場合とで異なる出力内容となるように、領域ごとに出力内容を定めておいてもよい。同様に、例えば、コンサートなどの場合、ステージを複数の領域に分割しておき、同じ情緒の推定結果であっても、どの領域に推定対象者が存在するかにより出力内容を変えるように、領域ごとに出力内容を定めておいてもよい。 Furthermore, even if the emotion is the same, the output content may be changed according to the position information indicating the position of the estimation target acquired by the wearable device 3 . For example, a position within a soccer field may be divided into a plurality of areas in advance, and output contents may be determined according to which area the target person to be estimated exists. For example, if the estimated emotion is excitement, the output contents will be different depending on whether the character is near the enemy's goal, when the character is near the team's goal, or when the player is outside of these situations. , the output contents may be determined for each area. Similarly, for example, in the case of a concert, the stage is divided into a plurality of areas, and even if the estimation result of the same emotion is obtained, the output contents are changed depending on which area the person to be estimated exists. The output contents may be defined for each.
 なお、図6は例示であり、具体的な出力内容は図6に示した例に限定されない。また、出力内容を決定する際には、情緒対応信号生成部14は、情報処理システム5に実行させる出力として振動機能、効果音および音楽の全てを選択する必要はなく、これらのうちの少なくとも1つを選択すればよい。情緒対応信号生成部14は、出力対応テーブルを用いて出力内容を決定すると、決定した出力内容を情報処理システム5に実行させるための指示を示す情緒対応信号を生成する。 Note that FIG. 6 is an example, and specific output contents are not limited to the example shown in FIG. Further, when determining the output contents, the emotion corresponding signal generation unit 14 does not need to select all of the vibration function, the sound effect, and the music as outputs to be executed by the information processing system 5. You should choose one. When the output contents are determined using the output correspondence table, the emotion corresponding signal generation unit 14 generates an emotion corresponding signal indicating an instruction to cause the information processing system 5 to execute the determined output contents.
 図2の説明に戻る。ステップS3の後、映像提供装置1は、情緒対応信号を送信する(ステップS4)。詳細には、情緒対応信号送信部15が、情緒対応信号生成部14から受け取った情緒対応信号を配信事業者装置4へ送信する。以上の処理により、配信事業者装置4を介して情緒対応信号は、情報処理システム5へ到着する。なお、上述したように、情緒対応信号は映像データとともに配信事業者装置4へ送信されてもよい。 Return to the description of Figure 2. After step S3, the image providing device 1 transmits an emotion corresponding signal (step S4). Specifically, the emotion-responsive signal transmitting unit 15 transmits the emotional-responsive signal received from the emotion-responsive signal generating unit 14 to the distributor device 4 . Through the above processing, the emotion corresponding signal arrives at the information processing system 5 via the distributor device 4 . Incidentally, as described above, the emotion corresponding signal may be transmitted to the distributor device 4 together with the video data.
 また、推定対象者が複数存在する場合には、情緒推定部13は、特定の推定対象者を選択してもよいし、オペレータから図示しない入力手段によって推定対象者が指定されてもよいし、ユーザにより情緒対応信号の送信の対象者が選択されてもよい。例えば、撮影装置2が、特定の選手または演者を追尾して撮影する場合は、当該撮影装置2によって撮影されて映像データに関しては、追尾対象の選手または演者が推定対象者となる。また、アイドルグループのコンサートなどのように、複数の演者が撮影データに含まれる場合、映像提供装置1は、撮影対象の演者のそれぞれの情緒の推定結果の平均値を用いて情緒対応信号を生成してもよい。または、複数の演者が撮影データに含まれる場合に、映像の配信開始前にメニュー画面などにおいてユーザが推定対象者を選択し、映像提供装置1がユーザによる選択結果を情報処理システム5から取得して、選択された推定対象者に対応する情緒対応信号を情報処理システム5へ送信するようにしてもよい。または、映像提供装置1は、配信事業者装置4へ複数の演者のそれぞれの情緒対応信号を送信し、配信事業者装置4がユーザの選択結果を取得し、選択結果に応じて対応する情緒対応信号を情報処理システム5へ送信してもよい。また、イベントが、サッカー、野球、バレーなどのスポーツの場合、映像の配信前にメニュー画面などにおいてユーザによりどちらのチームを応援するかが選択され、映像提供装置1は、チーム全体の情緒の推定結果の平均値を用いて情緒対応信号を生成し、選択されたチームに対応する情緒対応信号を情報処理システム5へ送信してもよい。また、ウェアラブルカメラにより撮影された映像データに関しては、映像提供装置1は、このウェアラブルカメラに対応する選手または演者が身に着けたウェアラブル装置3から取得されたセンサ情報を用いて情緒の推定を行い、推定結果を用いて情緒対応信号を生成する。 Further, when there are a plurality of persons to be presumed, the emotion estimating unit 13 may select a specific person to be presumed, or may be designated by an input means (not shown) from the operator, or A user may select a target for transmission of the emotional response signal. For example, when the imaging device 2 tracks and shoots a specific player or performer, the player or performer to be tracked becomes the target person for the video data shot by the imaging device 2 . Also, when a plurality of performers are included in the photographed data, such as an idol group concert, the image providing device 1 generates an emotion-corresponding signal using an average value of estimation results of the emotions of each of the performers to be photographed. You may Alternatively, when a plurality of performers are included in the photographed data, the user selects an inference target person on a menu screen or the like before the start of video distribution, and the video providing apparatus 1 acquires the user's selection result from the information processing system 5. Then, an emotion corresponding signal corresponding to the selected person to be estimated may be transmitted to the information processing system 5 . Alternatively, the video providing device 1 transmits emotional response signals for each of the plurality of performers to the distributor device 4, the distributor device 4 acquires the user's selection result, and responds to the emotional response according to the selection result. The signal may be sent to information processing system 5 . Also, if the event is a sport such as soccer, baseball, or volleyball, the user selects which team to support on a menu screen or the like before distributing the video, and the video providing device 1 estimates the emotion of the entire team. The average value of the results may be used to generate an emotional response signal and send the emotional response signal corresponding to the selected team to information processing system 5 . As for video data captured by a wearable camera, the video providing device 1 uses sensor information acquired from the wearable device 3 worn by the player or performer corresponding to this wearable camera to estimate the emotion. , the estimation result is used to generate an emotion-corresponding signal.
 情報処理システム5では、上述したように、受信した情緒対応信号に従って、振動、効果音、音楽などの出力が行われる。次に、情報処理システム5の機器構成例について説明する。図7は、本実施の形態の情報処理システム5の機器構成例を示す図である。図7では、それぞれが情報処理システム5である情報処理システム5-1~5-4の機器構成例を示している。 As described above, the information processing system 5 outputs vibrations, sound effects, music, etc. according to the received emotional response signal. Next, a device configuration example of the information processing system 5 will be described. FIG. 7 is a diagram showing a device configuration example of the information processing system 5 of this embodiment. FIG. 7 shows a device configuration example of information processing systems 5-1 to 5-4, each of which is the information processing system 5. In FIG.
 図7に示した例では、情報処理システム5-1は、TV501およびスピーカ502を備え、情報処理システム5-2は、TV501、ゲーム機本体503およびコントローラ504を備え、情報処理システム5-3は、TV501を備え、情報処理システム5-4は、スマートフォンなどの端末505を備える。TV501は、一般に表示部およびスピーカを内蔵しており、イベントの映像データを表示することができる。また、TV501は、映像データが音データを含む場合には音を出力することもできる。さらに、TV501は、情緒対応信号が音響に関する指示である場合には情緒対応信号に対応した出力を行うこともできる。したがって、図7の情報処理システム5-3のように、情報処理システム5がTV501単独で構成されていてもよい。情報処理システム5-3のように、情報処理システム5がTV501単独である場合には、図1に示した情報処理システム5の映像受信部51、出力部52および情緒対応信号処理部53はTV501が備えることになる。ただし、この場合、出力部52には振動生成部521は含まれない。 In the example shown in FIG. 7, the information processing system 5-1 includes a TV 501 and a speaker 502, the information processing system 5-2 includes the TV 501, a game machine body 503 and a controller 504, and the information processing system 5-3 , and a TV 501, and the information processing system 5-4 includes a terminal 505 such as a smart phone. The TV 501 generally incorporates a display unit and a speaker, and can display video data of an event. Also, the TV 501 can output sound when the video data includes sound data. Furthermore, the TV 501 can also perform an output corresponding to the emotion corresponding signal when the emotion corresponding signal is an instruction regarding sound. Therefore, like the information processing system 5-3 in FIG. 7, the information processing system 5 may consist of the TV 501 alone. When the information processing system 5 is the TV 501 alone, as in the information processing system 5-3, the video receiving unit 51, the output unit 52, and the emotion corresponding signal processing unit 53 of the information processing system 5 shown in FIG. will be prepared. However, in this case, the output unit 52 does not include the vibration generation unit 521 .
 また、情報処理システム5-1のように、TV501に外付けのスピーカ502が接続されている場合には、TV501を介して映像データに含まれている音データがスピーカ502へ入力され、スピーカ502が音データに対応した出力を行う。また、情緒対応信号が音響に関する指示を示す場合には、TV501が、情緒対応信号で示される出力を行うようにスピーカ502へ指示することで、スピーカ502が情緒対応信号に基づく効果音、音楽などを出力する。情報処理システム5-1のように、情報処理システム5がTV501およびスピーカ502である場合には、図1に示した情報処理システム5の映像受信部51、出力部52の表示部522、および情緒対応信号処理部53をTV501が備えることになり、出力部52のスピーカ523がスピーカ502に対応することになる。 Further, when an external speaker 502 is connected to the TV 501 as in the information processing system 5-1, sound data included in video data is input to the speaker 502 via the TV 501, and the speaker 502 outputs corresponding to the sound data. Further, when the emotion corresponding signal indicates an instruction regarding sound, the TV 501 instructs the speaker 502 to output indicated by the emotion corresponding signal, whereby the speaker 502 outputs sound effects, music, etc. based on the emotion corresponding signal. to output When the information processing system 5 is the TV 501 and the speaker 502, as in the information processing system 5-1, the image receiving unit 51, the display unit 522 of the output unit 52, and the emotion display unit 52 of the information processing system 5 shown in FIG. The TV 501 is equipped with the corresponding signal processing unit 53 , and the speaker 523 of the output unit 52 corresponds to the speaker 502 .
 また、情報処理システム5-2の構成例の場合、ゲーム機本体503が映像データおよび情緒対応信号を受信し、映像データをTV501に表示させてもよい。ゲーム機本体503は、ビデオゲーム、コンピュータゲームなどと呼ばれるゲームを動作させることが可能なゲーム機である。コントローラ504は、ゲーム機本体503に対応するゲームコントローラであり、ゲーム機本体503で実行されるアプリケーションソフトウェアに関する入力を受け付けることが可能であるとともに、自身が振動することが可能である。ゲーム機本体503は、音データに関してはTV501に出力することでTV501に音を出力させ、また情緒対応信号が音響に関する指示を示す場合には、ゲーム機本体503が情緒対応信号で示される出力を行うようにTV501へ指示することで、TV501が情緒対応信号に基づく出力を実施する。また、また情緒対応信号が振動に関する指示を示す場合には、ゲーム機本体503が情緒対応信号で示される出力を行うようにコントローラ504へ指示することで、コントローラ504が情緒対応信号に基づいて振動する。情報処理システム5-2のように、情報処理システム5がTV501、ゲーム機本体503およびコントローラ504を備える場合には、図1に示した情報処理システム5の映像受信部51、出力部52の表示部522、および情緒対応信号処理部53をゲーム機本体503が備えることになり、出力部52のスピーカ523および表示部522をTV501が備え、出力部52の振動生成部521をコントローラ504が備えることになる。なお、ゲーム機本体503が出力部52のスピーカ523および表示部522を備え、ゲーム機本体503が映像データの表示、および情緒対応信号に基づく効果音、音楽の出力を行うようにしてもよい。 In addition, in the case of the configuration example of the information processing system 5-2, the game machine body 503 may receive the video data and the emotion-corresponding signal, and cause the TV 501 to display the video data. The game machine body 503 is a game machine capable of operating games called video games, computer games, and the like. The controller 504 is a game controller corresponding to the game machine main body 503, and can receive input regarding application software executed on the game machine main body 503, and can vibrate itself. The game machine body 503 causes the TV 501 to output sound by outputting sound data to the TV 501, and when the emotion corresponding signal indicates an instruction regarding sound, the game machine body 503 outputs the output indicated by the emotion corresponding signal. By instructing the TV 501 to do so, the TV 501 performs output based on the emotion corresponding signal. Further, when the emotion corresponding signal indicates an instruction regarding vibration, the game machine main body 503 instructs the controller 504 to perform an output indicated by the emotion corresponding signal, whereby the controller 504 vibrates based on the emotion corresponding signal. do. When the information processing system 5 includes the TV 501, the game machine body 503, and the controller 504, as in the information processing system 5-2, the display of the image receiving unit 51 and the output unit 52 of the information processing system 5 shown in FIG. The game machine main body 503 is provided with the unit 522 and the emotion corresponding signal processing unit 53, the TV 501 is provided with the speaker 523 and the display unit 522 of the output unit 52, and the controller 504 is provided with the vibration generation unit 521 of the output unit 52. become. Note that the game machine main body 503 may include the speaker 523 and the display section 522 of the output unit 52, and the game machine main body 503 may display video data and output sound effects and music based on the emotion corresponding signal.
 また、情報処理システム5-4のように、情報処理システム5がスマートフォンなどの端末505である場合、端末505が、図1に示した情報処理システム5の映像受信部51、出力部52および情緒対応信号処理部53を備える。端末505は、一般に、振動、表示、音を出力する機能を有しているため、端末505は、映像データを表示するとともに、情緒対応信号で示される振動、効果音、音楽などを表示することができる。 Further, as in the information processing system 5-4, when the information processing system 5 is a terminal 505 such as a smart phone, the terminal 505 includes the video receiving unit 51, the output unit 52 and the emotion processing unit 505 of the information processing system 5 shown in FIG. A corresponding signal processing unit 53 is provided. Since the terminal 505 generally has the functions of outputting vibration, display, and sound, the terminal 505 can display video data as well as vibration, sound effects, music, and the like indicated by the emotional response signal. can be done.
 上述したTV501、ゲーム機本体503、端末505などのように、映像データおよび情緒対応信号を受信する装置は、例えば、アプリケーションソフトウェアがインストールされることにより、映像データおよび情緒対応信号を受信することができ、これらに対応した動作を行うことができる。 A device that receives video data and an emotion-corresponding signal, such as the TV 501, the game machine body 503, and the terminal 505 described above, can receive the video data and the emotion-corresponding signal by installing application software, for example. It is possible to perform operations corresponding to these.
 以上述べたように、情報処理システム5は、単独の機器で実現されてもよいし、複数の機器の組み合わせで実現されてもよい。なお、上述した情報処理システム5の構成は例示であり、映像データの表示、情緒対応信号に基づく効果音、音楽の出力がパーソナルコンピュータなどにより行われてもよいし、情報処理システム5の構成は上述した例に限定されない。 As described above, the information processing system 5 may be realized by a single device, or may be realized by a combination of multiple devices. The configuration of the information processing system 5 described above is an example, and the display of video data, the sound effects based on the emotion-corresponding signal, and the output of music may be performed by a personal computer or the like. It is not limited to the examples described above.
 次に、本実施の形態の映像提供装置1のハードウェア構成について説明する。本実施の形態の映像提供装置1は、コンピュータシステム上で、映像提供装置1における処理が記述されたコンピュータプログラムであるプログラムが実行されることにより、コンピュータシステムが映像提供装置1として機能する。図8は、本実施の形態の映像提供装置1を実現するコンピュータシステムの構成例を示す図である。図8に示すように、このコンピュータシステムは、制御部101と入力部102と記憶部103と表示部104と通信部105と出力部106とを備え、これらはシステムバス107を介して接続されている。 Next, the hardware configuration of the image providing device 1 of this embodiment will be described. The image providing apparatus 1 of the present embodiment functions as the image providing apparatus 1 by executing a program, which is a computer program in which processing in the image providing apparatus 1 is described, on the computer system. FIG. 8 is a diagram showing a configuration example of a computer system that implements the image providing device 1 of this embodiment. As shown in FIG. 8, this computer system comprises a control section 101, an input section 102, a storage section 103, a display section 104, a communication section 105 and an output section 106, which are connected via a system bus 107. there is
 図8において、制御部101は、例えば、CPU(Central Processing Unit)等のプロセッサであり、本実施の形態の映像提供装置1における処理が記述されたプログラムを実行する。なお、制御部101の一部が、GPU(Graphics Processing Unit)、FPGA(Field-Programmable Gate Array)などの専用ハードウェアにより実現されてもよい。入力部102は、たとえばキーボード、マウスなどで構成され、コンピュータシステムの使用者が、各種情報の入力を行うために使用する。記憶部103は、RAM(Random Access Memory),ROM(Read Only Memory)などの各種メモリおよびハードディスクなどのストレージデバイスを含み、上記制御部101が実行すべきプログラム、処理の過程で得られた必要なデータ、などを記憶する。また、記憶部103は、プログラムの一時的な記憶領域としても使用される。表示部104は、ディスプレイ、LCD(液晶表示パネル)などで構成され、コンピュータシステムの使用者に対して各種画面を表示する。通信部105は、通信処理を実施する受信機および送信機である。出力部106は、プリンタ、スピーカなどである。なお、図8は、一例であり、コンピュータシステムの構成は図8の例に限定されない。 In FIG. 8, the control unit 101 is, for example, a processor such as a CPU (Central Processing Unit), and executes a program describing the processing in the image providing device 1 of this embodiment. Note that part of the control unit 101 may be realized by dedicated hardware such as a GPU (Graphics Processing Unit) or an FPGA (Field-Programmable Gate Array). The input unit 102 is composed of, for example, a keyboard and a mouse, and is used by the user of the computer system to input various information. The storage unit 103 includes various memories such as RAM (Random Access Memory) and ROM (Read Only Memory) and storage devices such as hard disks, and stores programs to be executed by the control unit 101 and necessary information obtained in the process of processing. store data, etc. The storage unit 103 is also used as a temporary storage area for programs. The display unit 104 includes a display, LCD (liquid crystal display panel), etc., and displays various screens to the user of the computer system. A communication unit 105 is a receiver and a transmitter that perform communication processing. The output unit 106 is a printer, speaker, or the like. Note that FIG. 8 is an example, and the configuration of the computer system is not limited to the example in FIG.
 ここで、本実施の形態のプログラムが実行可能な状態になるまでのコンピュータシステムの動作例について説明する。上述した構成をとるコンピュータシステムには、たとえば、図示しないCD(Compact Disc)-ROMドライブまたはDVD(Digital Versatile Disc)-ROMドライブにセットされたCD-ROMまたはDVD-ROMから、コンピュータプログラムが記憶部103にインストールされる。そして、プログラムの実行時に、記憶部103から読み出されたプログラムが記憶部103の主記憶領域に格納される。この状態で、制御部101は、記憶部103に格納されたプログラムに従って、本実施の形態の映像提供装置1としての処理を実行する。 Here, an example of the operation of the computer system until the program of the present embodiment becomes executable will be described. In the computer system having the above configuration, for example, a computer program is stored in a storage unit from a CD-ROM or DVD-ROM set in a CD (Compact Disc)-ROM drive or a DVD (Digital Versatile Disc)-ROM drive (not shown). 103 installed. Then, when the program is executed, the program read from storage unit 103 is stored in the main storage area of storage unit 103 . In this state, the control unit 101 executes processing as the image providing device 1 of this embodiment according to the program stored in the storage unit 103 .
 なお、上記の説明においては、CD-ROMまたはDVD-ROMを記録媒体として、映像提供装置1における処理を記述したプログラムを提供しているが、これに限らず、コンピュータシステムの構成、提供するプログラムの容量などに応じて、たとえば、通信部105を経由してインターネットなどの伝送媒体により提供されたプログラムを用いることとしてもよい。 In the above description, a program describing the processing in the image providing apparatus 1 is provided using a CD-ROM or DVD-ROM as a recording medium, but the configuration of the computer system and the provided program are not limited to this. For example, a program provided by a transmission medium such as the Internet via the communication unit 105 may be used depending on the capacity of the computer.
 図1に示した情緒推定部13および情緒対応信号生成部14は、図8に示した記憶部103に記憶されたコンピュータプログラムが図8に示した制御部101により実行されることにより実現される。情緒推定部13および情緒対応信号生成部14の実現には、記憶部103も用いられる。また、図1に示した映像提供部11、情緒情報取得部12および情緒対応信号送信部15は、図8に示した通信部105により実現される。図1に示した映像提供部11、情緒情報取得部12および情緒対応信号送信部15の実現には制御部101も用いられる。映像提供装置1は複数のコンピュータシステムにより実現されてもよい。例えば、映像提供装置1は、クラウドコンピュータシステムにより実現されてもよい。 The emotion estimation unit 13 and the emotion corresponding signal generation unit 14 shown in FIG. 1 are realized by executing a computer program stored in the storage unit 103 shown in FIG. 8 by the control unit 101 shown in FIG. . A storage unit 103 is also used to implement the emotion estimation unit 13 and the emotion corresponding signal generation unit 14 . 1 are implemented by the communication unit 105 shown in FIG. A control unit 101 is also used to realize the image providing unit 11, the emotion information acquiring unit 12, and the emotion corresponding signal transmitting unit 15 shown in FIG. The image providing device 1 may be realized by a plurality of computer systems. For example, the image providing device 1 may be realized by a cloud computer system.
 配信事業者装置4も、同様に、例えば、図8に示した構成のコンピュータシステムにより実現される。情報処理システム5も、同様に、例えば、図8に示した構成のコンピュータシステムにより実現される。図1に示した情緒対応信号処理部53は、図8に示した記憶部103に記憶されたコンピュータプログラムが図8に示した制御部101により実行されることにより実現される。映像受信部51は、図8に示した通信部105により実現される。映像受信部51の実現には制御部101も用いられる。出力部52は、図8に示した表示部104および出力部106により実現される。なお、上述したように、映像受信部51、出力部52および情緒対応信号処理部53の機能が分割されて複数の機器により実現されてもよい。 Similarly, the distributor device 4 is realized by, for example, a computer system with the configuration shown in FIG. The information processing system 5 is similarly implemented by, for example, a computer system having the configuration shown in FIG. Emotion-responsive signal processing unit 53 shown in FIG. 1 is realized by executing a computer program stored in storage unit 103 shown in FIG. 8 by control unit 101 shown in FIG. Video receiving unit 51 is realized by communication unit 105 shown in FIG. A control unit 101 is also used to realize the video receiving unit 51 . Output unit 52 is implemented by display unit 104 and output unit 106 shown in FIG. Note that, as described above, the functions of the video receiving unit 51, the output unit 52, and the emotion corresponding signal processing unit 53 may be divided and realized by a plurality of devices.
 以上のように、本実施の形態の映像提供装置1は、イベントにおける選手または演者である推定対象者の情緒を示す情報を取得するセンサにより取得されたセンサ情報を用いて、推定対象者の情緒を推定し、推定結果に応じた振動および音のうち少なくとも一方の出力を行うための情緒対応信号を生成する。そして、映像提供装置1は、イベントの映像データを受信する情報処理システム5に情緒対応信号を送信するようにした。このため、イベントの映像を視聴しているユーザの、イベントにおける選手または演者との一体感を高めることが可能となる。 As described above, the image providing apparatus 1 of the present embodiment uses the sensor information acquired by the sensor that acquires the information indicating the emotion of the estimation target person who is a player or performer in the event to determine the emotion of the estimation target person. is estimated, and an emotion corresponding signal for outputting at least one of vibration and sound according to the estimation result is generated. Then, the image providing device 1 transmits an emotion corresponding signal to the information processing system 5 that receives the image data of the event. Therefore, it is possible to enhance the sense of unity of the user viewing the video of the event with the players or performers in the event.
実施の形態2.
 図9は、実施の形態2にかかる信号処理システムの構成例を示す図である。本実施の形態の信号処理システム100aは、映像提供装置1aおよび情報処理システム5aを備える。なお、信号処理システム100aに、映像提供装置1aおよび情報処理システム5aだけでなく、ウェアラブル装置3および撮影装置2のうちの少なくとも一方を含めてもよい。映像提供装置1aは、実施の形態1の情緒対応信号送信部15を備えず、情緒対応信号生成部14の代わりに、情緒対応信号生成部14aを備える以外は、実施の形態1の映像提供装置1と同様である。情報処理システム5aは、実施の形態1の情緒対応信号処理部53を備えず、出力部52の代わりに出力部52aを備える以外は実施の形態1の情報処理システム5と同様である。実施の形態1と同様の機能を有する構成要素は実施の形態1と同一の符号を付して重複する説明を省略する。以下、実施の形態1と異なる点を主に説明する。
Embodiment 2.
FIG. 9 is a diagram illustrating a configuration example of a signal processing system according to a second embodiment; A signal processing system 100a of the present embodiment includes a video providing device 1a and an information processing system 5a. Note that the signal processing system 100a may include at least one of the wearable device 3 and the imaging device 2 in addition to the image providing device 1a and the information processing system 5a. The image providing apparatus 1a is the same as the image providing apparatus of Embodiment 1 except that it does not include the emotion-responsive signal transmitting section 15 of Embodiment 1 and includes an emotion-responsive signal generating section 14a instead of the emotion-responsive signal generating section 14. Same as 1. The information processing system 5a is the same as the information processing system 5 of the first embodiment except that the emotion corresponding signal processing unit 53 of the first embodiment is not provided and an output unit 52a is provided instead of the output unit 52. FIG. Components having functions similar to those of the first embodiment are denoted by the same reference numerals as those of the first embodiment, and overlapping descriptions are omitted. Differences from the first embodiment will be mainly described below.
 本実施の形態の映像提供装置1aでは、情緒推定部13は、実施の形態1と同様に、センサ情報を用いて、選手または演者である推定対象者の情緒を推定する。情緒推定部13は推定結果を情緒対応信号生成部14aへ出力する。情緒対応信号生成部14aには、情緒推定部13から出力された推定結果が入力されるとともに、撮影装置2から映像データが入力される。なお、映像データが加工されてから配信事業者装置4へ提供される場合には、図示しない映像処理部によって加工された後の映像データが情緒対応信号生成部14aに入力される。映像処理部における加工は実施の形態1と同様である。なお、実施の形態1と同様に、映像データには音データも含まれていてもよい。 In the image providing apparatus 1a of the present embodiment, the emotion estimation unit 13 uses sensor information to estimate the emotion of the target player or performer, as in the first embodiment. The emotion estimator 13 outputs the estimation result to the emotion corresponding signal generator 14a. The estimation result output from the emotion estimation unit 13 is input to the emotion corresponding signal generation unit 14a, and video data is input from the imaging device 2 as well. When the image data is processed and then provided to the distributor device 4, the image data processed by the image processing unit (not shown) is input to the emotion corresponding signal generation unit 14a. Processing in the video processing unit is the same as in the first embodiment. As in the first embodiment, the video data may also include sound data.
 情緒対応信号生成部14aは、情緒推定部13から受け取った推定結果を用いて、情報処理システム5aにおける出力内容を決定する。本実施の形態では、情緒対応信号生成部14aは、情報処理システム5aにおける出力内容を示す情緒対応信号を映像データに重畳する。本実施の形態では、情緒対応信号は映像データの映像部分と映像データに含まれている音データとのうち少なくとも一方に、情緒対応信号が重畳される。 Using the estimation result received from the emotion estimation unit 13, the emotion corresponding signal generation unit 14a determines the content of output in the information processing system 5a. In the present embodiment, the emotion-corresponding signal generator 14a superimposes an emotion-corresponding signal indicating the content of output in the information processing system 5a on the video data. In this embodiment, the emotion corresponding signal is superimposed on at least one of the video portion of the video data and the sound data included in the video data.
 本実施の形態においても、例えば、情緒対応信号生成部14aは、情緒と情報処理システム5aにおける出力内容との対応を示す出力対応情報を出力対応テーブルとして保持し、出力対応テーブルを用いて出力内容を決定する。 In the present embodiment as well, for example, the emotion-corresponding signal generator 14a holds, as an output-correspondence table, output-correspondence information indicating correspondence between emotions and output contents in the information processing system 5a, and uses the output-correspondence table to generate output contents. to decide.
 図10は、本実施の形態の出力対応テーブルの一例を示す図である。図10に示した例では、映像データの画質(図10では画質と記載)、音データの音量(図10では音量と記載)、音データの音質(図10では音質と記載)、アニメーション映像またはアイコン画像の映像データへの重畳(図10ではアニメーション/アイコンと記載)、テキスト即ち文字情報の映像データへの重畳(図10ではテキストと記載)を、情緒の推定結果に応じて情緒対応信号生成部14aが行うことが示されている。なお、これらの全てを行う必要はなく、1つ以上が行われればよい。本実施の形態では、図10に例示した処理を映像データおよび音データのうち少なくとも1つに実施することで、情緒対応信号を映像データに重畳させる。 FIG. 10 is a diagram showing an example of the output correspondence table of this embodiment. In the example shown in FIG. 10, the image quality of video data (described as image quality in FIG. 10), the volume of sound data (described as volume in FIG. 10), the sound quality of sound data (described as sound quality in FIG. 10), animation video or Superimposition of icon images on video data (denoted as animation/icon in FIG. 10) and superimposition of text, that is, character information on video data (denoted as text in FIG. 10) are performed according to emotion estimation results. It is shown that section 14a does. It should be noted that it is not necessary to perform all of these, and one or more may be performed. In the present embodiment, by performing the processing illustrated in FIG. 10 on at least one of the video data and the sound data, the emotion corresponding signal is superimposed on the video data.
 図10に示すように、例えば、情緒の推定結果が興奮を示す場合、情緒対応信号生成部14aは、映像データの画質を、輝度を高く設定し、エッジを強調した画質に変更してもよいし、音量を上げてもよいし、中低域を強調した音質に変更してもよいし、画面下部を人が走り抜ける映像のアニメーション(アニメーション映像)またはアイコン(アイコン画像)を映像データに重畳させてもよいし、「突撃!!!」などのテキストを映像データに重畳させてもよいし、これらのうち2つ以上を組み合わせて実施してもよい。また、例えば、情緒の推定結果が緊張を示す場合、情緒対応信号生成部14aは、映像データの画質を、色温度を高く設定した画質としてもよいし、音量を下げてもよいし、高域を強調した音質に変更してもよいし、ハートの大きさが変わるようなアニメーションを重畳させてもよいし、「ヤバイ!」などのテキストを映像データに重畳させてもよいし、これらのうち2つ以上を組み合わせて実施してもよい。また、例えば、情緒の推定結果がリラックスを示す場合、情緒対応信号生成部14aは、映像データの画質を、色温度を下げた設定としてもよいし、音量を上げてもよいし、フラットな音質に変更してもよいし、動物が画面のなかをゆっくり漂うアニメーションを映像データに重畳させてもよいし、「まったり~」などのテキストを映像データに重畳させてもよいし、これらのうち2つ以上を組み合わせて実施してもよい。また、例えば、情緒の推定結果が怒りを示す場合、情緒対応信号生成部14aは、映像データの画質を、赤色を強調した画質にしてもよいし、音量を下げてもよいし、怒りを示すアイコンが画面全体に半透明で表示されるように映像データに重畳させてもよいし、「むっ!」などのテキストを映像データに重畳させてもよいし、これらのうち2つ以上を組み合わせて実施してもよい。 As shown in FIG. 10, for example, when the emotion estimation result indicates excitement, the emotion-corresponding signal generation unit 14a may set the image quality of the video data to high luminance and change the image quality to an edge-enhanced image quality. However, you can increase the volume, change the sound quality to emphasize the mid-low range, or superimpose an animation (animation video) or an icon (icon image) of a video of a person running through the bottom of the screen on the video data. Alternatively, a text such as "Assault!!!" may be superimposed on the video data, or two or more of these may be combined. Further, for example, when the emotion estimation result indicates tension, the emotion corresponding signal generation unit 14a may set the image quality of the video data to an image quality set to a high color temperature, may lower the sound volume, or set the image quality to a high frequency range. You may change the sound quality to one that emphasizes , you may superimpose an animation that changes the size of the heart, or you may superimpose a text such as "Yabai!" on the video data. You may implement combining two or more. Further, for example, when the emotion estimation result indicates relaxation, the emotion corresponding signal generation unit 14a may set the image quality of the video data to a lower color temperature, a higher volume, or a flat sound quality. , an animation of an animal slowly floating in the screen may be superimposed on the video data, or a text such as "Mattari~" may be superimposed on the video data. You may carry out combining two or more. Further, for example, when the emotion estimation result indicates anger, the emotion corresponding signal generation unit 14a may change the image quality of the video data to an image quality that emphasizes red, may lower the volume, or may indicate anger. The icon may be superimposed on the video data so that the icon is displayed semi-transparently over the entire screen, or a text such as "Hmm!" may be implemented.
 以上のように、情緒対応信号生成部14aは、映像データを情緒推定部13による推定結果に応じた画質に変更することで、映像データに情緒対応信号を重畳してもよいし、映像データに含まれる音データの音量および音質のうち少なくとも一方を情緒推定部13による推定結果に応じて変更することで、映像データに情緒対応信号を重畳してもよい。また、情緒対応信号生成部14aは、映像データに情緒推定部13による推定結果に応じたアニメーション映像またはアイコン画像を重畳することで、映像データに情緒対応信号を重畳してもよいし、映像データに情緒推定部13による推定結果に応じた文字情報を重畳することで、映像データに情緒対応信号を重畳してもよい。 As described above, the emotion-corresponding signal generation unit 14a may superimpose the emotion-corresponding signal on the image data by changing the image quality of the image data according to the estimation result of the emotion estimation unit 13. The emotion corresponding signal may be superimposed on the video data by changing at least one of the volume and sound quality of the included sound data according to the estimation result of the emotion estimation unit 13 . The emotion-corresponding signal generation unit 14a may superimpose the emotion-corresponding signal on the video data by superimposing an animation image or an icon image corresponding to the estimation result of the emotion estimating unit 13 on the video data. The emotion corresponding signal may be superimposed on the video data by superimposing character information corresponding to the estimation result by the emotion estimation unit 13 on the image data.
 なお、図10に示した例は一例であり、情緒内容に応じた具体的な情緒対応信号の重畳方法は、図10に示した例に限定されない。 It should be noted that the example shown in FIG. 10 is just an example, and the specific method of superimposing the emotion-corresponding signal according to the emotional content is not limited to the example shown in FIG.
 情緒対応信号生成部14aは、情緒対応信号を重畳した映像データを映像提供部11へ出力する。なお、映像提供部11は、情緒対応信号が重畳された映像データを配信事業者装置4へ送信する。なお、情緒対応信号生成部14aに入力された映像データに音データが含まれている場合には、情緒対応信号が重畳された映像データにも、音データが含まれている。この音データは、上述した情緒対応信号生成部14aにより音量または音質が変更された場合には変更後の音データであり、上述した情緒対応信号生成部14aにより音データが変更されていない場合には入力された音データと同一である。配信事業者装置4は、情緒対応信号が重畳された映像データを情報処理システム5aに送信する。 The emotion-corresponding signal generation unit 14a outputs the image data on which the emotion-corresponding signal is superimposed to the image providing unit 11. In addition, the image providing unit 11 transmits the image data on which the emotion corresponding signal is superimposed to the distributor device 4 . When the video data input to the emotion-corresponding signal generator 14a contains sound data, the video data superimposed with the emotion-corresponding signal also contains sound data. This sound data is the changed sound data when the volume or tone quality is changed by the emotion-responsive signal generating unit 14a, and is the changed sound data when the emotion-responsive signal generating unit 14a does not change the sound data. is the same as the input sound data. The distributor device 4 transmits the video data superimposed with the emotion corresponding signal to the information processing system 5a.
 情報処理システム5aの映像受信部51は、情緒対応信号が重畳された映像データを出力部52aへ出力する。出力部52aは、表示部522およびスピーカ523を備えており、表示部522は、情緒対応信号が重畳された映像データを表示する。また、スピーカ523は、情緒対応信号が重畳された映像データに音データが含まれている場合に、音データに基づく音を出力する。以上の処理により、ユーザは、情緒対応信号が重畳された映像データを視聴することで、推定対象者の情緒に応じた出力を知覚することができ、イベントの映像を視聴しているユーザの、イベントにおける選手または演者との一体感を高めることが可能となる。以上述べた以外の本実施の形態の動作は実施の形態1と同様である。 The image receiving unit 51 of the information processing system 5a outputs the image data superimposed with the emotion corresponding signal to the output unit 52a. The output unit 52a includes a display unit 522 and a speaker 523, and the display unit 522 displays video data superimposed with the emotion corresponding signal. In addition, the speaker 523 outputs sound based on the sound data when the video data on which the emotion corresponding signal is superimposed includes the sound data. With the above processing, the user can perceive the output corresponding to the emotion of the person to be presumed by viewing the video data on which the emotion-corresponding signal is superimposed. It is possible to enhance the sense of unity with the players or performers in the event. Operations of the present embodiment other than those described above are the same as those of the first embodiment.
 本実施の形態の映像提供装置1aは、実施の形態1の映像提供装置1と同様にコンピュータシステムにより実現される。映像提供装置1aは、クラウドコンピュータシステムにより実現されてもよい。本実施の形態の情報処理システム5aも、実施の形態1の情報処理システム5と同様にコンピュータシステムにより実現されるが、実施の形態1と同様に、情報処理システム5aは、TV501であってもよいし、端末505であってもよいし、ゲーム機本体503、コントローラ504およびTV501であってもよいし、これら以外の構成であってもよい。 The image providing device 1a of the present embodiment is realized by a computer system, like the image providing device 1 of the first embodiment. The image providing device 1a may be realized by a cloud computer system. The information processing system 5a of the present embodiment is also realized by a computer system in the same manner as the information processing system 5 of the first embodiment. Alternatively, it may be the terminal 505, the game machine body 503, the controller 504 and the TV 501, or any other configuration.
<変形例>
 図9には、映像データに情緒対応信号を重畳する例を示したが、実施の形態1と組み合わせて、映像データに情緒対応信号を重畳するとともに、振動などを示す情緒対応信号を生成してもよい。図11は、本実施の形態の変形例にかかる信号処理システムの構成例を示す図である。図11に示した信号処理システム100bは、映像提供装置1の代わりに映像提供装置1bを備える以外は、実施の形態1の信号処理システム100と同様である。映像提供装置1bは、情緒対応信号生成部14の代わりに情緒対応信号生成部14bを備え、映像提供部11に情緒対応信号生成部14bから、情緒対応信号が重畳された映像データが入力される以外は、実施の形態1の映像提供装置1と同様である。
<Modification>
FIG. 9 shows an example of superimposing an emotion-corresponding signal on video data. good too. FIG. 11 is a diagram showing a configuration example of a signal processing system according to a modification of this embodiment. A signal processing system 100b shown in FIG. 11 is the same as the signal processing system 100 of Embodiment 1 except that an image providing device 1b is provided instead of the image providing device 1. FIG. The image providing device 1b includes an emotion-corresponding signal generator 14b instead of the emotion-corresponding signal generator 14, and the image data superimposed with the emotion-corresponding signal is input to the image providing unit 11 from the emotion-corresponding signal generator 14b. Other than that, it is the same as the image providing device 1 of the first embodiment.
 本実施の形態の映像提供装置1bは、実施の形態1の映像提供装置1と同様にコンピュータシステムにより実現される。映像提供装置1bは、クラウドコンピュータシステムにより実現されてもよい。 The image providing device 1b of the present embodiment is realized by a computer system, like the image providing device 1 of the first embodiment. The image providing device 1b may be realized by a cloud computer system.
 図11に示した情緒対応信号生成部14bは、実施の形態1と同様に、振動に関する情緒対応信号である振動用の情緒対応信号を生成し、生成した情緒対応信号を情緒対応信号送信部15へ出力する。情緒対応信号送信部15は実施の形態1と同様に情緒対応信号を、配信事業者装置4を介して情報処理システム5へ送信する。また、情緒対応信号生成部14bは、図9に示した情緒対応信号生成部14aと同様に、情緒対応信号を映像データに重畳し、情緒対応信号を重畳した映像データを映像提供部11へ出力する。映像提供部11は、情緒対応信号が重畳された映像データを配信事業者装置4を介して情報処理システム5へ送信する。これにより、ユーザは、情緒対応信号が重畳された映像データを視聴することでイベントにおける選手または演者の情緒を知覚するとともに、実施の形態1と同様に振動によっても選手または演者の情緒を感じることができる。これにより、選手または演者とイベントの映像を視聴しているユーザの、イベントにおける選手または演者との一体感を高めることが可能となる。 The emotion corresponding signal generation unit 14b shown in FIG. 11 generates an emotion corresponding signal for vibration, which is an emotion corresponding signal related to vibration, as in the first embodiment, and transmits the generated emotion corresponding signal to the emotion corresponding signal transmitting unit 15. Output to Emotion-responsive signal transmission unit 15 transmits an emotional-responsive signal to information processing system 5 via distributor apparatus 4 in the same manner as in the first embodiment. Further, the emotion corresponding signal generation unit 14b superimposes the emotion corresponding signal on the video data, and outputs the video data superimposed with the emotion corresponding signal to the video providing unit 11, similarly to the emotion corresponding signal generation unit 14a shown in FIG. do. The image providing unit 11 transmits the image data superimposed with the emotion corresponding signal to the information processing system 5 via the distributor device 4 . As a result, the user can perceive the emotions of the athletes or performers in the event by viewing the video data on which the emotion-corresponding signals are superimposed, and can also feel the emotions of the athletes or performers through vibrations as in the first embodiment. can be done. This makes it possible to enhance the sense of unity between the player or performer in the event and the user who is watching the video of the event with the player or performer.
 以上の実施の形態に示した構成は、一例を示すものであり、別の公知の技術と組み合わせることも可能であるし、実施の形態同士を組み合わせることも可能であるし、要旨を逸脱しない範囲で、構成の一部を省略、変更することも可能である。 The configurations shown in the above embodiments are only examples, and can be combined with other known techniques, or can be combined with other embodiments, without departing from the scope of the invention. It is also possible to omit or change part of the configuration.
 1,1a,1b 映像提供装置、2 撮影装置、3 ウェアラブル装置、4 配信事業者装置、5,5a 情報処理システム、11 映像提供部、12 情緒情報取得部、13 情緒推定部、14,14a,14b 情緒対応信号生成部、15 情緒対応信号送信部、51 映像受信部、52,52a 出力部、53 情緒対応信号処理部、100,100a,100b 信号処理システム、131 学習済モデル生成部、132 学習済モデル記憶部、133 推定部、521 振動生成部、522 表示部、523 スピーカ。 1, 1a, 1b video providing device, 2 imaging device, 3 wearable device, 4 distributor device, 5, 5a information processing system, 11 video providing unit, 12 emotional information acquiring unit, 13 emotional estimating unit, 14, 14a, 14b Emotional signal generation unit 15 Emotional signal transmission unit 51 Video reception unit 52, 52a Output unit 53 Emotional signal processing unit 100, 100a, 100b Signal processing system 131 Trained model generation unit 132 Learning Completed model storage unit, 133 estimation unit, 521 vibration generation unit, 522 display unit, 523 speaker.

Claims (14)

  1.  イベントにおける選手または演者である推定対象者の情緒を示す情報を取得するセンサから、前記センサによって取得されたセンサ情報を取得する情緒情報取得部と、
     前記センサ情報を用いて、前記推定対象者の情緒を推定する情緒推定部と、
     前記情緒推定部による推定結果に応じた出力をユーザの情報処理システムにおける出力部に実行させるための情緒対応信号を生成する情緒対応信号生成部と、
     前記情緒対応信号を、直接または他の装置を経由して前記情報処理システムへ送信する情緒対応信号送信部と、
     を備えることを特徴とする信号生成装置。
    an emotion information acquiring unit that acquires sensor information acquired by the sensor from a sensor that acquires information indicating the emotion of the presumed target who is a player or performer in the event;
    an emotion estimation unit that estimates the emotion of the person to be estimated using the sensor information;
    an emotion corresponding signal generation unit that generates an emotion corresponding signal for causing an output unit in a user's information processing system to output according to the result of estimation by the emotion estimation unit;
    an emotion-responsive signal transmitting unit that transmits the emotional-responsive signal to the information processing system directly or via another device;
    A signal generation device comprising:
  2.  前記センサは、前記推定対象者が身につけるウェアラブル装置と、前記推定対象者を含む映像を撮影可能な撮影装置とのうち少なくとも一方を含むことを特徴とする請求項1に記載の信号生成装置。 2. The signal generation apparatus according to claim 1, wherein the sensor includes at least one of a wearable device worn by the presumed target person and a photographing device capable of capturing an image including the presumed target person. .
  3.  前記ウェアラブル装置は前記推定対象者の生体情報を前記センサ情報として取得することを特徴とする請求項2に記載の信号生成装置。 The signal generation device according to claim 2, wherein the wearable device acquires the biometric information of the presumed target person as the sensor information.
  4.  前記出力部は、振動を前記ユーザに伝えることが可能な前記情報処理システムにおける振動生成部を含み、
     前記情緒対応信号生成部は、前記振動生成部を、前記情緒推定部による推定結果に応じて振動させる前記情緒対応信号を生成することを特徴とする請求項1から3のいずれか1つに記載の信号生成装置。
    The output unit includes a vibration generation unit in the information processing system capable of transmitting vibration to the user,
    4. The emotion corresponding signal generation unit according to any one of claims 1 to 3, wherein the vibration generation unit generates the emotion corresponding signal that causes the vibration generation unit to vibrate according to an estimation result of the emotion estimation unit. signal generator.
  5.  前記出力部は、スピーカを含み、
     前記情緒対応信号生成部は、前記スピーカに、前記情緒推定部による推定結果に応じた効果音または音楽を出力させる前記情緒対応信号を生成することを特徴とする請求項1から3のいずれか1つに記載の信号生成装置。
    The output unit includes a speaker,
    4. The emotion corresponding signal generator according to any one of claims 1 to 3, wherein the emotion corresponding signal generating unit generates the emotion corresponding signal for causing the speaker to output a sound effect or music according to the result of estimation by the emotion estimating unit. The signal generating device according to 1.
  6.  前記情緒対応信号生成部は、前記イベントを撮影した映像データに前記情緒対応信号を重畳することを特徴とする請求項1から3のいずれか1つに記載の信号生成装置。 The signal generation device according to any one of claims 1 to 3, wherein the emotion-responsive signal generation unit superimposes the emotion-responsive signal on video data of the event.
  7.  前記出力部は、前記映像データを表示可能な表示部を含み、
     前記情緒対応信号生成部は、前記映像データを前記情緒推定部による推定結果に応じた画質に変更することで、前記映像データに前記情緒対応信号を重畳することを特徴とする請求項6に記載の信号生成装置。
    The output unit includes a display unit capable of displaying the video data,
    7. The emotion-corresponding signal generator according to claim 6, wherein the emotion-corresponding signal is superimposed on the image data by changing the image quality of the image data according to the result of estimation by the emotion estimator. signal generator.
  8.  前記出力部は、前記映像データに含まれる音データを出力可能なスピーカを含み、
     前記情緒対応信号生成部は、前記映像データに含まれる前記音データの音量および音質のうち少なくとも一方を前記情緒推定部による推定結果に応じて変更することで、前記映像データに前記情緒対応信号を重畳することを特徴とする請求項6に記載の信号生成装置。
    The output unit includes a speaker capable of outputting sound data included in the video data,
    The emotion-corresponding signal generation unit changes at least one of volume and sound quality of the sound data included in the video data in accordance with an estimation result of the emotion estimation unit, thereby adding the emotion-corresponding signal to the video data. 7. The signal generation device according to claim 6, wherein the signals are superimposed.
  9.  前記出力部は、前記映像データを表示可能な表示部を含み、
     前記情緒対応信号生成部は、前記映像データに前記情緒推定部による推定結果に応じたアニメーション映像またはアイコン画像を重畳することで、前記映像データに前記情緒対応信号を重畳することを特徴とする請求項6に記載の信号生成装置。
    The output unit includes a display unit capable of displaying the video data,
    The emotion-corresponding signal generation unit superimposes the emotion-corresponding signal on the video data by superimposing an animation image or an icon image corresponding to an estimation result by the emotion estimating unit on the video data. Item 7. The signal generation device according to item 6.
  10.  前記情緒対応信号生成部は、前記映像データに前記情緒推定部による推定結果に応じた文字情報を重畳することで、前記映像データに前記情緒対応信号を重畳することを特徴とする請求項6に記載の信号生成装置。 7. The emotion-corresponding signal generation unit superimposes the emotion-corresponding signal on the video data by superimposing character information corresponding to the result of estimation by the emotion estimating unit on the video data. A signal generator as described.
  11.  前記情報処理システムは、携帯端末、テレビおよびゲーム機のうち少なくとも1つを含むことを特徴とする請求項1から10のいずれか1つに記載の信号生成装置。 The signal generation device according to any one of claims 1 to 10, wherein the information processing system includes at least one of a mobile terminal, a television, and a game machine.
  12.  信号生成装置と、
     ユーザの情報処理システムと、
     を備える信号処理システムであって、
     前記信号生成装置は、
     イベントにおける選手または演者である推定対象者の情緒を示す情報を取得するセンサから、前記センサによって取得されたセンサ情報を取得する情緒情報取得部と、
     前記センサ情報を用いて、前記推定対象者の情緒を推定する情緒推定部と、
     前記情緒推定部による推定結果に応じた出力を前記情報処理システムに実行させるための情緒対応信号を生成する情緒対応信号生成部と、
     前記情緒対応信号を、直接または他の装置を経由して前記情報処理システムへ送信する情緒対応信号送信部と、
     を備え、
     前記情報処理システムは、前記情緒対応信号に基づいて出力を実行する出力部を備えることを特徴とする信号処理システム。
    a signal generator;
    a user information processing system;
    A signal processing system comprising:
    The signal generator is
    an emotion information acquiring unit that acquires sensor information acquired by the sensor from a sensor that acquires information indicating the emotion of the presumed target who is a player or performer in the event;
    an emotion estimation unit that estimates the emotion of the person to be estimated using the sensor information;
    an emotion corresponding signal generation unit that generates an emotion corresponding signal for causing the information processing system to output according to the result of estimation by the emotion estimation unit;
    an emotion-responsive signal transmitting unit that transmits the emotional-responsive signal to the information processing system directly or via another device;
    with
    A signal processing system, wherein the information processing system comprises an output section that performs an output based on the emotion corresponding signal.
  13.  前記推定対象者の生体情報を前記センサ情報として取得する前記センサであるウェアラブル装置、
     を備えることを特徴とする請求項12に記載の信号処理システム。
    A wearable device that is the sensor that acquires the biometric information of the estimation target as the sensor information,
    13. The signal processing system of claim 12, comprising:
  14.  イベントにおける選手または演者である推定対象者の情緒を示す情報を取得するセンサから、前記センサによって取得されたセンサ情報を取得するステップと、
     前記センサ情報を用いて、前記推定対象者の情緒を推定するステップと、
     前記推定対象者の情緒の推定結果に応じた出力をユーザの情報処理システムにおける出力部に実行させるための情緒対応信号を生成するステップと、
     前記情緒対応信号を、直接または他の装置を経由して前記情報処理システムへ送信するステップと、
     を含むことを特徴とする信号生成方法。
    a step of acquiring sensor information acquired by the sensor from a sensor that acquires information indicating the emotion of the presumed target who is a player or performer in the event;
    estimating the emotion of the person to be presumed using the sensor information;
    a step of generating an emotion corresponding signal for causing an output unit in a user's information processing system to output according to the result of estimating the emotion of the person to be presumed;
    transmitting the emotional response signal to the information handling system, either directly or via another device;
    A signal generation method, comprising:
PCT/JP2021/022489 2021-06-14 2021-06-14 Signal generation device, signal processing system, and signal generation method WO2022264203A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/022489 WO2022264203A1 (en) 2021-06-14 2021-06-14 Signal generation device, signal processing system, and signal generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/022489 WO2022264203A1 (en) 2021-06-14 2021-06-14 Signal generation device, signal processing system, and signal generation method

Publications (1)

Publication Number Publication Date
WO2022264203A1 true WO2022264203A1 (en) 2022-12-22

Family

ID=84525892

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/022489 WO2022264203A1 (en) 2021-06-14 2021-06-14 Signal generation device, signal processing system, and signal generation method

Country Status (1)

Country Link
WO (1) WO2022264203A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005303722A (en) * 2004-04-13 2005-10-27 Nippon Telegr & Teleph Corp <Ntt> Communications system for transmitting feeling of oneness
JP2010140278A (en) * 2008-12-11 2010-06-24 Nippon Hoso Kyokai <Nhk> Voice information visualization device and program
WO2018088319A1 (en) * 2016-11-11 2018-05-17 ソニーモバイルコミュニケーションズ株式会社 Reproduction terminal and reproduction method
JP2020141976A (en) * 2019-03-08 2020-09-10 パナソニックIpマネジメント株式会社 Video distribution system and video distribution method
JP2021027509A (en) * 2019-08-07 2021-02-22 パナソニックIpマネジメント株式会社 Augmentation image display method and augmentation image display system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005303722A (en) * 2004-04-13 2005-10-27 Nippon Telegr & Teleph Corp <Ntt> Communications system for transmitting feeling of oneness
JP2010140278A (en) * 2008-12-11 2010-06-24 Nippon Hoso Kyokai <Nhk> Voice information visualization device and program
WO2018088319A1 (en) * 2016-11-11 2018-05-17 ソニーモバイルコミュニケーションズ株式会社 Reproduction terminal and reproduction method
JP2020141976A (en) * 2019-03-08 2020-09-10 パナソニックIpマネジメント株式会社 Video distribution system and video distribution method
JP2021027509A (en) * 2019-08-07 2021-02-22 パナソニックIpマネジメント株式会社 Augmentation image display method and augmentation image display system

Similar Documents

Publication Publication Date Title
JP6992845B2 (en) Information processing equipment, information processing methods, programs, and information processing systems
RU2719454C1 (en) Systems and methods for creating, translating and viewing 3d content
JP6663505B2 (en) Audience view perspective in VR environment
JP6679747B2 (en) Watching virtual reality environments associated with virtual reality (VR) user interactivity
KR20200127173A (en) Expanded VR participation and viewing at e-sports events
US20210158781A1 (en) Information processing system, information processing method, and computer program
CN114746159B (en) Artificial Intelligence (AI) controlled camera view generator and AI broadcaster
EP3573026B1 (en) Information processing apparatus, information processing method, and program
JP6815156B2 (en) Image display system and image display program
US9503772B2 (en) Control mechanisms
Hebbel-Seeger 360 degrees video and VR for training and marketing within sports
CN114746158B (en) Artificial Intelligence (AI) controlled camera view generator and AI broadcaster
JP2016526929A (en) Information processing apparatus, control method, and program
US10289193B2 (en) Use of virtual-reality systems to provide an immersive on-demand content experience
JP5952407B2 (en) Method and system for efficient game screen rendering for multiplayer video games
US20220219090A1 (en) DYNAMIC AND CUSTOMIZED ACCESS TIERS FOR CUSTOMIZED eSPORTS STREAMS
JP2023169282A (en) Computer program, server device, terminal device, and method
JP2015525502A (en) Management for super reality entertainment
WO2022264203A1 (en) Signal generation device, signal processing system, and signal generation method
JP2017012559A (en) Video game processing program, video game processing system, and user terminal
US20240114181A1 (en) Information processing device, information processing method, and program
EP4306192A1 (en) Information processing device, information processing terminal, information processing method, and program
US20230215119A1 (en) Systems and methods for parametric capture of live-streaming events
US20230046493A1 (en) Information processing system and information processing method
WO2024042929A1 (en) Information processing device and image generation method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21945883

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE