WO2022264203A1

WO2022264203A1 - Signal generation device, signal processing system, and signal generation method

Info

Publication number: WO2022264203A1
Application number: PCT/JP2021/022489
Authority: WO
Inventors: 智明龍; 孝幸永井; 貴文甲斐
Original assignee: 三菱電機株式会社
Priority date: 2021-06-14
Filing date: 2021-06-14
Publication date: 2022-12-22

Abstract

The signal generation device according to the present disclosure is a video providing device (1) comprising: an emotional information acquisition unit (12) for acquiring sensor information from at least one of a wearable device (3) and imaging device (2) that acquire sensor information showing an emotion of an estimation target who is a player or performer in an event; an emotion estimation unit (13) for estimating the emotion of the estimation target using the sensor information; an emotion-associated signal generation unit (14) for generating an emotion-associated signal that causes an output unit (52) in an information processing system (5) for a user to execute output according to the estimation result from the emotion estimation unit (13); and an emotion-associated signal transmission unit (15) for transmitting the emotion-associated signal to the information processing system (5) directly or via another device.

Description

SIGNAL GENERATOR, SIGNAL PROCESSING SYSTEM AND SIGNAL GENERATING METHOD

The present disclosure relates to a signal generation device, a signal processing system, and a signal generation method.

In recent years, there has been increasing interest in technologies that estimate a person's physical condition and emotions by analyzing vital values acquired by wearable devices and images captured by imaging devices. For example, in Patent Literature 1, in a commercial facility, etc., a management server analyzes the user's physical condition or emotion based on vital values acquired by a wearable device, and transmits the analysis result to the terminal of the staff of the commercial facility. A technique for doing so is disclosed. This allows the staff to change their response to the user according to changes in the user's physical condition or emotions.

JP 2018-207173 A

On the other hand, there is an increasing demand for not only watching or appreciating events such as sports and concerts directly at the event venue, but also for services that distribute video of these events. In video distribution, various ideas are being considered in order to increase added value. For example, if a user viewing a video can share the emotions of a player or a performer, it is possible to increase the sense of unity with the player or the performer, and it is expected that the user's satisfaction level will improve.

However, even if the technology described in Patent Document 1 is applied to an event, it is only possible to grasp the emotions of the audience from the wearable devices of the audience at the event. Moreover, Patent Document 1 does not describe the provision of video of an event, nor does it describe what kind of service should be provided in order to increase the added value in video distribution.

The present disclosure has been made in view of the above, and an object thereof is to obtain a signal generation device that can enhance the sense of unity of a user who is watching an event video with a player or a performer in the event. do.

In order to solve the above-described problems and achieve the object, the signal generation device according to the present disclosure includes a sensor that acquires information indicating the emotion of an estimated target person who is a player or a performer in an event, a sensor acquired by the sensor an emotion information acquisition unit that acquires information; The signal generating device further includes an emotion estimating unit that estimates the emotion of the person to be presumed using the sensor information, and an output unit in the information processing system of the user that outputs according to the estimation result of the emotion estimating unit. An emotion-responsive signal generator for generating an emotional-responsive signal and an emotional-responsive signal transmitter for transmitting the emotional-responsive signal to an information processing system either directly or via another device.

The signal generation device according to the present disclosure has the effect of enhancing the user's sense of unity with the athletes or performers in the event who are watching the video of the event.

1 is a diagram showing a configuration example of a signal processing system according to a first embodiment; FIG. Flowchart showing an example of an operation related to generation of an emotion corresponding signal in the image providing device of the first embodiment A diagram showing an example of an emotion correspondence table according to the first embodiment The figure which shows the structural example of the emotion estimation part of Embodiment 1 in the case of estimating an emotion by machine learning. Schematic diagram showing an example of a neural network A diagram showing an example of an output correspondence table according to the first embodiment 1 is a diagram showing a device configuration example of an information processing system according to Embodiment 1; FIG. FIG. 1 shows a configuration example of a computer system that realizes the image providing device of Embodiment 1; FIG. 10 is a diagram showing a configuration example of a signal processing system according to a second embodiment; A diagram showing an example of an output correspondence table according to the second embodiment FIG. 10 is a diagram showing a configuration example of a signal processing system according to a modification of the second embodiment;

A signal generation device, a signal processing system, and a signal generation method according to embodiments will be described in detail below with reference to the drawings.

Embodiment 1.
FIG. 1 is a diagram illustrating a configuration example of a signal processing system according to a first embodiment; A signal processing system 100 according to the present embodiment includes a video providing device 1 and an information processing system 5 . The image providing device 1 is a signal generation device that generates an emotion-related signal, which will be described later, corresponding to the emotion of a player or performer in an event, and transmits the generated emotion-related signal to the information processing system 5 directly or via another device. is. In this embodiment, the video providing device 1 has a function as a signal processing device and also has a function of providing video data obtained by photographing an event to the user's information processing system 5 . That is, the image providing device 1 provides the user's information processing system 5 with the image data captured by the image capturing device 2 that captures the event via the distributor device 4 operated by the distributor. The video data provided by the video providing device 1 may be video data obtained by processing the video data captured by the imaging device 2 into, for example, free-viewpoint video data. The image providing device 1 further uses at least one of the photographing device 2 and the wearable device 3 worn by the athlete or performer in the event to estimate the emotion of the athlete or performer, and displays the emotion corresponding to the estimation result. A corresponding signal is generated, and the generated emotional corresponding signal is provided to the information processing system 5 via the distributor device 4 . The emotion corresponding signal is a signal that instructs the information processing system 5 to output vibration, sound effects, music, etc. according to the emotion of the player or performer. Details of the emotional response signal will be described later.

An example in which the video providing device 1 provides the video of the event and the emotion corresponding signal to the information processing system 5 via the distributor device 4 will be described below. It may be provided to system 5. Note that the signal processing system 100 may include not only the image providing device 1 and the information processing system 5 but also at least one of the photographing device 2 and the wearable device 3 .

Although one imaging device 2 is illustrated in FIG. 1, the number of imaging devices 2 may be plural. The photographing device 2 may photograph all or part of the event, or may be a photographing device that tracks and photographs a specific player or performer when a plurality of photographing devices 2 are provided. A plurality of imaging devices 2 for generating free-viewpoint video may be included, or an imaging device 2 for performing aerial photography using a drone or the like may be included. Events to be photographed by the photographing device 2 of the present embodiment are, for example, sports, concerts, plays, etc. Specifically, baseball, soccer, volleyball, basketball, martial arts, boat racing, horse racing, bicycle ring, concert , theatre, etc., but not limited to these. Venues where events are held include stadiums, multipurpose halls, concert halls, boat racecourses, racetracks, and gymnasiums.

The wearable device 3 and the imaging device 2 are examples of sensors that acquire information indicating the emotions of athletes or performers in an event. The wearable device 3 is a sensor capable of detecting biometric information, movement, etc., of a player or performer, and is, for example, a watch-type, wristband-type, clothes-type, eyeglass-type, ring-type sensor, etc., but is limited to these. not. The biological information is, for example, at least one of blood pressure, heart rate, body temperature, electroencephalogram, eye movement, etc., but is not limited to these. The information indicating movement is at least one of acceleration, biopotential information indicating muscle movement, and the like, but is not limited to these.

Also, one player or performer may wear multiple different types of wearable devices 3 . Further, when there are a plurality of players or performers whose emotions are to be estimated, each of these players or performers wears the wearable device 3 .

Also, the wearable device 3 may be a photographing device worn by a player or a performer. For example, the wearable device 3 may be a wearable camera capable of shooting from the player's or performer's line of sight. The imaging device 2 that captures the video data to be provided may include an imaging device worn by the athlete or performer. In this case, the imaging device 2 worn by the athlete or performer 2 and the wearable device 3 . When the wearable device 3 capable of detecting biological information and movement is used as a sensor, the sensor information acquired by the sensor is information indicating biological information and movement. The acquired sensor information is video data.

Communication between the wearable device 3 and the imaging device 2 and the image providing device 1 includes wireless communication, but wireless communication and wired communication may be mixed. For wireless communication, dedicated lines such as local 5G (5th generation mobile communication system) and private LTE (Long Term Evolution) at the venue of the event may be used. For example, local 5G may be used for communication between the imaging device 2 and the image providing device 1 , and private LTE may be used for communication between the wearable device 3 and the image providing device 1 . Note that the communication method is not limited to these.

As shown in FIG. 1, the image providing device 1 includes an image providing unit 11, an emotion information acquiring unit 12, an emotion estimating unit 13, an emotion corresponding signal generating unit 14, and an emotion corresponding signal transmitting unit 15. The video providing device 1 acquires video data from the imaging device 2 and transmits the acquired video data to the distributor device 4 . Note that the image providing apparatus 1 may process the image data before transmitting it to the distributor apparatus 4 as described above. is processed, and the image providing unit 11 transmits the processed image data. Processing by the video processing unit includes, for example, processing for generating a free viewpoint video, processing for converting resolution, codec, and the like. Also, although not shown, the video data may include the sound of the event venue collected by a sound collector at the event venue and included in the video data as sound data. It should be noted that the signal generation device of the present embodiment may include emotion information acquiring section 12 , emotion estimating section 13 , emotion corresponding signal generating section 14 and emotion corresponding signal transmitting section 15 . Therefore, the image providing device 1 may not include the image providing unit 11. In this case, a separate device for providing image data to the user may be provided, or the image data may be provided to the user from the photographing device 2. may be Further, in this case, the device that provides video data to the user or the imaging device 2 may transmit the video data to the information processing system 5 via the distributor device 4, or may transmit the video data directly to the information processing system 5. may be sent.

The emotion information acquisition unit 12 acquires the sensor information acquired by the sensor from the sensor that acquires the information indicating the emotion of the presumed target person who is the player or performer in the event. The sensor includes at least one of a wearable device 3 worn by the presumed target person and a photographing device 2 capable of capturing an image including the presumed target person. The emotion information acquisition unit 12 outputs the acquired sensor information to the emotion estimation unit 13 .

The emotion estimation unit 13 estimates the emotion of the person to be estimated using the sensor information, and outputs the estimation result to the emotion corresponding signal generation unit 14 . For example, when the sensor information is information indicated by a numerical value, the emotion estimating unit 13 holds, as a table, correspondence information indicating the correspondence between the numerical range of each item in the sensor information and the emotion. Emotions may be estimated using a table that contains information, or may be estimated using machine learning. A method of estimating emotion will be described later.

Using the estimation result received from the emotion estimating unit 13, the emotion corresponding signal generating unit 14 generates an emotion corresponding signal for causing the output unit 52 in the information processing system 5 to output according to the estimation result. The emotional response signal is output to the emotional response signal transmission unit 15 . Emotional response signal transmission unit 15 transmits an emotional response signal to information processing system 5 via distributor device 4 . The emotion-responsive signal transmission unit 15 may transmit the emotional-responsive signal to the information processing system 5 without going through the distributor device 4 . 1, the image providing unit 11 and the emotion corresponding signal transmitting unit 15 are shown separately, but the image providing unit 11 functions as the emotion corresponding signal transmitting unit 15, , the video data and the emotion corresponding signal may be transmitted to the distributor device 4 . Although one distributor device 4 is illustrated in FIG. 1, a plurality of distributor devices 4 may be provided. Also, the plurality of distributor devices 4 may be operated by different distributors.

Upon receiving the video data and the emotion corresponding signal from the video providing device 1, the distributor device 4 transmits the video data and the emotion corresponding signal to the information processing system 5 to which the video data is to be distributed.

The information processing system 5 can receive video data and display the received video data. The information processing system 5 includes a video receiving section 51, an output section 52, and an emotion corresponding signal processing section 53, as shown in FIG. As will be described later, the information processing system 5 may include a television (hereinafter abbreviated as TV) on which television broadcasting can be viewed, may include a TV and a game machine, or may include a terminal such as a smartphone. may be Although one information processing system 5 is illustrated in FIG. 1, a plurality of information processing systems 5 may be provided. Video receiving portion 51 receives video data and an emotion corresponding signal from distributor device 4 , outputs the received video data to output portion 52 , and outputs the received emotion corresponding signal to emotion corresponding signal processing portion 53 .

The output unit 52 executes output based on instructions from the emotion-corresponding signal processing unit 53. The output unit 52 includes a vibration generation unit 521 , a display unit 522 and a speaker 523 . Note that FIG. 1 shows an example in which the output unit 52 includes the vibration generation unit 521 , but the information processing system 5 may not include the vibration generation unit 521 . When a plurality of information processing systems 5 are provided, the information processing system 5 including the vibration generating section 521 and the information processing system 5 not including the vibration generating section 521 may be mixed.

The vibration generator 521 can transmit vibration to the user. The display unit 522 can display video data. The speaker 523 can output sound data included in the video data, and can output sound effects and music based on the emotion corresponding signal.

Based on the emotion-responsive signal, the emotional-responsive signal processing unit 53 selects, from among the vibration generating unit 521 and the speaker 523 of the output unit 52, a device that performs an operation according to the emotional-responsive signal, and applies the emotional response to the selected device. Instruct to perform output according to the signal. As a result, the output unit 52 executes output based on the emotion corresponding signal. If the emotion corresponding signal includes an instruction regarding vibration, the emotion corresponding signal processing unit 53 selects the vibration generating unit 521. The signal processing unit 53 selects the speaker 523 . When the emotion corresponding signal includes both an instruction regarding vibration and an instruction regarding sound, emotion corresponding signal processing section 53 selects vibration generating section 521 and speaker 523 .

Next, the generation of the emotion corresponding signal according to this embodiment will be described. FIG. 2 is a flow chart showing an example of operations related to generation of an emotion corresponding signal in the image providing device 1 of the present embodiment. As shown in FIG. 2, the image providing device 1 acquires sensor information (step S1). Acquisition of sensor information is started, for example, when provision of video of the event is started. Specifically, in step S1, the emotion information acquisition unit 12 acquires sensor information from at least one of the imaging device 2 and the wearable device 3 worn by the athlete or performer in the event, and acquires the acquired sensor information. is output to the emotion estimation unit 13 .

Next, the image providing device 1 uses the sensor information to estimate the emotion of the player or performer (step S2). The emotion estimation unit 13 estimates the emotion of the player or performer using the sensor information, and outputs the estimation result to the emotion corresponding signal generation unit 14 .

The emotion estimating unit 13 may estimate the emotion of a player or a performer from sensor information, for example, using an emotion correspondence table, which is a table indicating the correspondence between the numerical range of each item in the sensor information and the emotion. Alternatively, machine learning may be used to estimate the player's or performer's emotion from sensor information. FIG. 3 is a diagram showing an example of an emotion correspondence table according to this embodiment. In the example shown in FIG. 3, each information of blood flow, heart rate, brain wave (brain wave amplitude, frequency, etc.), body movement (acceleration, etc.), muscle movement (biopotential value, etc.) is expressed as excitement, tension, and so on. , anger, and relaxation are stored in the emotion correspondence table. If each information in the sensor information does not correspond to these values, the emotion may be determined as other.

It should be noted that the emotion estimating unit 13 determines that the values indicated by the sensor information for all of the blood flow, heart rate, electroencephalogram, body motion, and muscle motion information shown in FIG. It may be presumed that the emotion of the player or performer is the corresponding emotion such as excitement, tension, anger, relaxation, etc., if any one item is applicable. can be estimated. Also, a method may be used in which a priority is set for each item of information, and when the emotion corresponding to each item is different, priority is given to the determination of the item with the higher priority. For example, when brain waves are given higher priority than muscle movements, the emotion estimator 13 determines that the numerical values of the muscle movements indicated by the sensor information are within a range corresponding to anger, and that the brain waves indicated by the sensor information are in a range corresponding to anger. If the value of is in the range corresponding to relaxation, it may be estimated to be relaxed. Alternatively, when the emotion corresponding to each item is different in this way, the emotion estimating unit 13 may determine otherwise.

FIG. 3 is an example, and the items stored as the emotion correspondence table are not limited to blood flow, heart rate, electroencephalogram, body movement, and muscle movement, and may be some of them. , may include items other than these. Also, the types of emotions stored in the emotion correspondence table are not limited to the example shown in FIG. It may contain other types.

Next, an example in which the emotion estimation unit 13 estimates emotions by machine learning will be described. FIG. 4 is a diagram showing a configuration example of the emotion estimating section 13 of the present embodiment when estimating an emotion by machine learning. In the example shown in FIG. 4 , the emotion estimation unit 13 includes a learned model generation unit 131 , a learned model storage unit 132 and an estimation unit 133 .

The estimating unit 133 reads the learned model stored in the learned model storage unit 132, and inputs the sensor information input from the emotional information acquisition unit 12 to the read-out learned model, thereby Estimate emotions. That is, the output obtained by inputting the sensor information input from the emotion information acquisition unit 12 into the trained model is used as the estimation result of the emotion of the person to be estimated. A trained model is a trained model for estimating the emotion of an estimation target person who is a player or a performer from sensor information. is generated as

The trained model generation unit 131 generates a trained model using a plurality of learning data sets including sensor information input from the emotion information acquisition unit 12 and correct data corresponding to the sensor information, and generates a trained model. The trained model is stored in the trained model storage unit 132 . A trained model is generated before the video of the event is provided.

The sensor information input to the learned model generation unit 131 is not limited to that input from the emotion information acquisition unit 12, and may be learning sensor information acquired for learning. The sensor information for learning includes information of similar items in the same format as the sensor information. The sensor information for learning may be input to the image providing apparatus 1 by input means (not shown) and input to the trained model generation unit 131 from the input means, or may be transmitted from another device and received by receiving means (not shown). It may be input to the learned model generation unit 131 from the means. The correct answer data is data indicating which of the above-described excitement, tension, anger, relaxation, etc. is the correct answer for the emotion corresponding to the sensor information. The correct data may be determined, for example, by listening to the emotion corresponding to the sensor information from the subject who acquired the sensor information, or an expert or the like may confirm the sensor information and determine the correct data. good. For example, the correct data may be input to the image providing apparatus 1 by an input means (not shown) and input to the trained model generation unit 131 from the input means, or may be transmitted from another apparatus and received by a receiving means (not shown). may be input to the learned model generation unit 131 from the

The generation of a trained model in the trained model generation unit 131 is performed, for example, by supervised learning. Any supervised learning algorithm may be used, and for example, a neural network model may also be used. A neural network consists of an input layer made up of multiple neurons, an intermediate layer (hidden layer) made up of multiple neurons, and an output layer made up of multiple neurons. The intermediate layer may be one layer, or two or more layers.

FIG. 5 is a schematic diagram showing an example of a neural network. For example, in a three-layer neural network as shown in FIG. Y1-Y2), and the result is multiplied by weight W2 (w21-w26) and output from the output layer (Z1-Z3). This output result changes depending on the value of weight W1 and the value of weight W2.

In this embodiment, the relationship between the sensor information and the correct data is learned by adjusting the weight W1 and the weight W2 so that the output from the output layer when the sensor information is input approaches the correct data. be done. Note that machine learning algorithms are not limited to neural networks. Reinforcement learning or the like may also be used as machine learning.

When numerical values such as biological information are used as sensor information, the numerical values are input to the input layer of the learned model generation unit 131 . When the sensor information includes information on a plurality of items, the information on each item is input to the input layer as X1 to X3, respectively. Although FIG. 5 shows an example with three inputs and three outputs, the number of inputs and outputs is not limited to this example.

When the sensor information is video data, for example, video data at fixed time intervals may be used as image data of still images as input data for machine learning, or all video data within a fixed time period may be used as input data for machine learning. may be used as For example, if the video data is video data obtained by tracking the presumed target, or if the face of the presumed target is magnified and photographed, the facial expressions in the video data may convey emotions. sometimes appear. When such image data is acquired, the image data can be used as sensor information. Also, when both numerical values such as biological information and image data are used as sensor information, all of these are used as input data for machine learning. If there are multiple players or performers to be estimated, a trained model may be generated for each estimated target, or a common trained model may be generated without distinguishing the estimation targets. However, for example, a trained model may be generated for each type of sport, each venue for an event, or the like.

At least one of the information acquired by the wearable device 3 and the video data acquired by the imaging device 2 may be used as the sensor information.

In the example shown in FIG. 4, the emotion estimating unit 13 includes a trained model generating unit 131. However, a learning device that generates a trained model is provided separately from the image providing device 1, and the learning device performs learning. A finished model generation unit 131 may be provided. In this case, the emotion estimation unit 13 does not need to include the trained model generation unit 131, and the trained model generation unit 131 of the learning device generates a trained model in the same manner as described above. Then, the learned model generated by the learning device is stored in the learned model storage section 132 of the emotion estimation section 13 .

Further, the position information indicating the position of the estimation target person acquired by the wearable device 3 may be used for estimating the emotion. For example, if the event is a soccer match, the biometric information and the like are the same for the case where the person to be estimated exists near the opponent's goal, the case where the person is near the teammate's goal, and the other cases. Even so, there may be differences in emotion. Therefore, when emotion is estimated by machine learning, for example, a position within a soccer field may be used as one piece of sensor information. When estimating an emotion using an emotion correspondence table, the positions in the soccer field are divided into a plurality of regions in advance, and the regions defined in the emotion correspondence table are defined according to the region in which the person to be estimated exists. The range of each information may be corrected.

Return to the description of Figure 2. After step S2, the image providing device 1 uses the estimation result to generate an emotion corresponding signal (step S3). Specifically, the emotion corresponding signal generator 14 uses the estimation result received from the emotion estimating unit 13 to generate an emotion corresponding signal corresponding to the emotion indicated by the estimation result, and transmits the generated emotion corresponding signal. Output to unit 15 . For example, the emotion corresponding signal generation unit 14 holds, as an output correspondence table, output correspondence information indicating correspondence between emotions and output contents in the information processing system 5, and uses the held output correspondence table to An output content is determined, and an emotional response signal corresponding to the determined output content is generated.

FIG. 6 is a diagram showing an example of the output correspondence table of this embodiment. In the example shown in FIG. 6, output contents are shown for each type of emotion with respect to each of the vibration function, sound effects, and music. For example, in the example shown in FIG. 6, when the estimation result indicates excitement, the output content of the vibration function is high-frequency vibration with a large amplitude, and the output content of the sound effect is fanfare, sound effect indicating excitement of comics, movie The output contents of music are movie music indicating excitement and game music indicating excitement. Further, for example, in the example shown in FIG. 6, when the estimation result indicates tension, the output content of the vibration function is peaky and intermittent vibration, and the output content of the sound effect is a sound effect indicating the tension of a cartoon It is a sound effect indicating the tension of the movie, and the output contents of the music are the movie music indicating the tension and the game music indicating the tension. For example, in the example shown in FIG. 6, when the estimation result indicates anger, the output content of the vibration function is low-frequency vibration with a large amplitude, and the output content of the sound effect is the volcanic eruption sound or the sound of an earthquake. , angry sound effects in cartoons, and angry sound effects in movies. Further, for example, in the example shown in FIG. 6, when the estimation result indicates relaxation, the output content of the vibration function is low-frequency vibration with fluctuation, and the output content of the sound effect is the sound of waves, the chirping of birds, It is the babbling of a stream, and the music output contents are classical music with a relaxing effect and music with a slow tempo.

In this way, the emotion corresponding signal generating section 14 may generate an emotion corresponding signal that causes the vibration generating section 521 of the information processing system 5 to vibrate according to the estimation result of the emotion estimating section 13. The emotion corresponding signal may be generated to cause the speaker 523 to output a sound effect or music corresponding to the result of estimation by the emotion estimation unit 13 .

Furthermore, even if the emotion is the same, the output content may be changed according to the position information indicating the position of the estimation target acquired by the wearable device 3 . For example, a position within a soccer field may be divided into a plurality of areas in advance, and output contents may be determined according to which area the target person to be estimated exists. For example, if the estimated emotion is excitement, the output contents will be different depending on whether the character is near the enemy's goal, when the character is near the team's goal, or when the player is outside of these situations. , the output contents may be determined for each area. Similarly, for example, in the case of a concert, the stage is divided into a plurality of areas, and even if the estimation result of the same emotion is obtained, the output contents are changed depending on which area the person to be estimated exists. The output contents may be defined for each.

Note that FIG. 6 is an example, and specific output contents are not limited to the example shown in FIG. Further, when determining the output contents, the emotion corresponding signal generation unit 14 does not need to select all of the vibration function, the sound effect, and the music as outputs to be executed by the information processing system 5. You should choose one. When the output contents are determined using the output correspondence table, the emotion corresponding signal generation unit 14 generates an emotion corresponding signal indicating an instruction to cause the information processing system 5 to execute the determined output contents.

Return to the description of Figure 2. After step S3, the image providing device 1 transmits an emotion corresponding signal (step S4). Specifically, the emotion-responsive signal transmitting unit 15 transmits the emotional-responsive signal received from the emotion-responsive signal generating unit 14 to the distributor device 4 . Through the above processing, the emotion corresponding signal arrives at the information processing system 5 via the distributor device 4 . Incidentally, as described above, the emotion corresponding signal may be transmitted to the distributor device 4 together with the video data.

Further, when there are a plurality of persons to be presumed, the emotion estimating unit 13 may select a specific person to be presumed, or may be designated by an input means (not shown) from the operator, or A user may select a target for transmission of the emotional response signal. For example, when the imaging device 2 tracks and shoots a specific player or performer, the player or performer to be tracked becomes the target person for the video data shot by the imaging device 2 . Also, when a plurality of performers are included in the photographed data, such as an idol group concert, the image providing device 1 generates an emotion-corresponding signal using an average value of estimation results of the emotions of each of the performers to be photographed. You may Alternatively, when a plurality of performers are included in the photographed data, the user selects an inference target person on a menu screen or the like before the start of video distribution, and the video providing apparatus 1 acquires the user's selection result from the information processing system 5. Then, an emotion corresponding signal corresponding to the selected person to be estimated may be transmitted to the information processing system 5 . Alternatively, the video providing device 1 transmits emotional response signals for each of the plurality of performers to the distributor device 4, the distributor device 4 acquires the user's selection result, and responds to the emotional response according to the selection result. The signal may be sent to information processing system 5 . Also, if the event is a sport such as soccer, baseball, or volleyball, the user selects which team to support on a menu screen or the like before distributing the video, and the video providing device 1 estimates the emotion of the entire team. The average value of the results may be used to generate an emotional response signal and send the emotional response signal corresponding to the selected team to information processing system 5 . As for video data captured by a wearable camera, the video providing device 1 uses sensor information acquired from the wearable device 3 worn by the player or performer corresponding to this wearable camera to estimate the emotion. , the estimation result is used to generate an emotion-corresponding signal.

As described above, the information processing system 5 outputs vibrations, sound effects, music, etc. according to the received emotional response signal. Next, a device configuration example of the information processing system 5 will be described. FIG. 7 is a diagram showing a device configuration example of the information processing system 5 of this embodiment. FIG. 7 shows a device configuration example of information processing systems 5-1 to 5-4, each of which is the information processing system 5. In FIG.

In the example shown in FIG. 7, the information processing system 5-1 includes a TV 501 and a speaker 502, the information processing system 5-2 includes the TV 501, a game machine body 503 and a controller 504, and the information processing system 5-3 , and a TV 501, and the information processing system 5-4 includes a terminal 505 such as a smart phone. The TV 501 generally incorporates a display unit and a speaker, and can display video data of an event. Also, the TV 501 can output sound when the video data includes sound data. Furthermore, the TV 501 can also perform an output corresponding to the emotion corresponding signal when the emotion corresponding signal is an instruction regarding sound. Therefore, like the information processing system 5-3 in FIG. 7, the information processing system 5 may consist of the TV 501 alone. When the information processing system 5 is the TV 501 alone, as in the information processing system 5-3, the video receiving unit 51, the output unit 52, and the emotion corresponding signal processing unit 53 of the information processing system 5 shown in FIG. will be prepared. However, in this case, the output unit 52 does not include the vibration generation unit 521 .

Further, when an external speaker 502 is connected to the TV 501 as in the information processing system 5-1, sound data included in video data is input to the speaker 502 via the TV 501, and the speaker 502 outputs corresponding to the sound data. Further, when the emotion corresponding signal indicates an instruction regarding sound, the TV 501 instructs the speaker 502 to output indicated by the emotion corresponding signal, whereby the speaker 502 outputs sound effects, music, etc. based on the emotion corresponding signal. to output When the information processing system 5 is the TV 501 and the speaker 502, as in the information processing system 5-1, the image receiving unit 51, the display unit 522 of the output unit 52, and the emotion display unit 52 of the information processing system 5 shown in FIG. The TV 501 is equipped with the corresponding signal processing unit 53 , and the speaker 523 of the output unit 52 corresponds to the speaker 502 .

In addition, in the case of the configuration example of the information processing system 5-2, the game machine body 503 may receive the video data and the emotion-corresponding signal, and cause the TV 501 to display the video data. The game machine body 503 is a game machine capable of operating games called video games, computer games, and the like. The controller 504 is a game controller corresponding to the game machine main body 503, and can receive input regarding application software executed on the game machine main body 503, and can vibrate itself. The game machine body 503 causes the TV 501 to output sound by outputting sound data to the TV 501, and when the emotion corresponding signal indicates an instruction regarding sound, the game machine body 503 outputs the output indicated by the emotion corresponding signal. By instructing the TV 501 to do so, the TV 501 performs output based on the emotion corresponding signal. Further, when the emotion corresponding signal indicates an instruction regarding vibration, the game machine main body 503 instructs the controller 504 to perform an output indicated by the emotion corresponding signal, whereby the controller 504 vibrates based on the emotion corresponding signal. do. When the information processing system 5 includes the TV 501, the game machine body 503, and the controller 504, as in the information processing system 5-2, the display of the image receiving unit 51 and the output unit 52 of the information processing system 5 shown in FIG. The game machine main body 503 is provided with the unit 522 and the emotion corresponding signal processing unit 53, the TV 501 is provided with the speaker 523 and the display unit 522 of the output unit 52, and the controller 504 is provided with the vibration generation unit 521 of the output unit 52. become. Note that the game machine main body 503 may include the speaker 523 and the display section 522 of the output unit 52, and the game machine main body 503 may display video data and output sound effects and music based on the emotion corresponding signal.

Further, as in the information processing system 5-4, when the information processing system 5 is a terminal 505 such as a smart phone, the terminal 505 includes the video receiving unit 51, the output unit 52 and the emotion processing unit 505 of the information processing system 5 shown in FIG. A corresponding signal processing unit 53 is provided. Since the terminal 505 generally has the functions of outputting vibration, display, and sound, the terminal 505 can display video data as well as vibration, sound effects, music, and the like indicated by the emotional response signal. can be done.

A device that receives video data and an emotion-corresponding signal, such as the TV 501, the game machine body 503, and the terminal 505 described above, can receive the video data and the emotion-corresponding signal by installing application software, for example. It is possible to perform operations corresponding to these.

As described above, the information processing system 5 may be realized by a single device, or may be realized by a combination of multiple devices. The configuration of the information processing system 5 described above is an example, and the display of video data, the sound effects based on the emotion-corresponding signal, and the output of music may be performed by a personal computer or the like. It is not limited to the examples described above.

Next, the hardware configuration of the image providing device 1 of this embodiment will be described. The image providing apparatus 1 of the present embodiment functions as the image providing apparatus 1 by executing a program, which is a computer program in which processing in the image providing apparatus 1 is described, on the computer system. FIG. 8 is a diagram showing a configuration example of a computer system that implements the image providing device 1 of this embodiment. As shown in FIG. 8, this computer system comprises a control section 101, an input section 102, a storage section 103, a display section 104, a communication section 105 and an output section 106, which are connected via a system bus 107. there is

In FIG. 8, the control unit 101 is, for example, a processor such as a CPU (Central Processing Unit), and executes a program describing the processing in the image providing device 1 of this embodiment. Note that part of the control unit 101 may be realized by dedicated hardware such as a GPU (Graphics Processing Unit) or an FPGA (Field-Programmable Gate Array). The input unit 102 is composed of, for example, a keyboard and a mouse, and is used by the user of the computer system to input various information. The storage unit 103 includes various memories such as RAM (Random Access Memory) and ROM (Read Only Memory) and storage devices such as hard disks, and stores programs to be executed by the control unit 101 and necessary information obtained in the process of processing. store data, etc. The storage unit 103 is also used as a temporary storage area for programs. The display unit 104 includes a display, LCD (liquid crystal display panel), etc., and displays various screens to the user of the computer system. A communication unit 105 is a receiver and a transmitter that perform communication processing. The output unit 106 is a printer, speaker, or the like. Note that FIG. 8 is an example, and the configuration of the computer system is not limited to the example in FIG.

Here, an example of the operation of the computer system until the program of the present embodiment becomes executable will be described. In the computer system having the above configuration, for example, a computer program is stored in a storage unit from a CD-ROM or DVD-ROM set in a CD (Compact Disc)-ROM drive or a DVD (Digital Versatile Disc)-ROM drive (not shown). 103 installed. Then, when the program is executed, the program read from storage unit 103 is stored in the main storage area of storage unit 103 . In this state, the control unit 101 executes processing as the image providing device 1 of this embodiment according to the program stored in the storage unit 103 .

In the above description, a program describing the processing in the image providing apparatus 1 is provided using a CD-ROM or DVD-ROM as a recording medium, but the configuration of the computer system and the provided program are not limited to this. For example, a program provided by a transmission medium such as the Internet via the communication unit 105 may be used depending on the capacity of the computer.

The emotion estimation unit 13 and the emotion corresponding signal generation unit 14 shown in FIG. 1 are realized by executing a computer program stored in the storage unit 103 shown in FIG. 8 by the control unit 101 shown in FIG. . A storage unit 103 is also used to implement the emotion estimation unit 13 and the emotion corresponding signal generation unit 14 . 1 are implemented by the communication unit 105 shown in FIG. A control unit 101 is also used to realize the image providing unit 11, the emotion information acquiring unit 12, and the emotion corresponding signal transmitting unit 15 shown in FIG. The image providing device 1 may be realized by a plurality of computer systems. For example, the image providing device 1 may be realized by a cloud computer system.

Similarly, the distributor device 4 is realized by, for example, a computer system with the configuration shown in FIG. The information processing system 5 is similarly implemented by, for example, a computer system having the configuration shown in FIG. Emotion-responsive signal processing unit 53 shown in FIG. 1 is realized by executing a computer program stored in storage unit 103 shown in FIG. 8 by control unit 101 shown in FIG. Video receiving unit 51 is realized by communication unit 105 shown in FIG. A control unit 101 is also used to realize the video receiving unit 51 . Output unit 52 is implemented by display unit 104 and output unit 106 shown in FIG. Note that, as described above, the functions of the video receiving unit 51, the output unit 52, and the emotion corresponding signal processing unit 53 may be divided and realized by a plurality of devices.

As described above, the image providing apparatus 1 of the present embodiment uses the sensor information acquired by the sensor that acquires the information indicating the emotion of the estimation target person who is a player or performer in the event to determine the emotion of the estimation target person. is estimated, and an emotion corresponding signal for outputting at least one of vibration and sound according to the estimation result is generated. Then, the image providing device 1 transmits an emotion corresponding signal to the information processing system 5 that receives the image data of the event. Therefore, it is possible to enhance the sense of unity of the user viewing the video of the event with the players or performers in the event.

Embodiment 2.
FIG. 9 is a diagram illustrating a configuration example of a signal processing system according to a second embodiment; A signal processing system 100a of the present embodiment includes a video providing device 1a and an information processing system 5a. Note that the signal processing system 100a may include at least one of the wearable device 3 and the imaging device 2 in addition to the image providing device 1a and the information processing system 5a. The image providing apparatus 1a is the same as the image providing apparatus of Embodiment 1 except that it does not include the emotion-responsive signal transmitting section 15 of Embodiment 1 and includes an emotion-responsive signal generating section 14a instead of the emotion-responsive signal generating section 14. Same as 1. The information processing system 5a is the same as the information processing system 5 of the first embodiment except that the emotion corresponding signal processing unit 53 of the first embodiment is not provided and an output unit 52a is provided instead of the output unit 52. FIG. Components having functions similar to those of the first embodiment are denoted by the same reference numerals as those of the first embodiment, and overlapping descriptions are omitted. Differences from the first embodiment will be mainly described below.

In the image providing apparatus 1a of the present embodiment, the emotion estimation unit 13 uses sensor information to estimate the emotion of the target player or performer, as in the first embodiment. The emotion estimator 13 outputs the estimation result to the emotion corresponding signal generator 14a. The estimation result output from the emotion estimation unit 13 is input to the emotion corresponding signal generation unit 14a, and video data is input from the imaging device 2 as well. When the image data is processed and then provided to the distributor device 4, the image data processed by the image processing unit (not shown) is input to the emotion corresponding signal generation unit 14a. Processing in the video processing unit is the same as in the first embodiment. As in the first embodiment, the video data may also include sound data.

Using the estimation result received from the emotion estimation unit 13, the emotion corresponding signal generation unit 14a determines the content of output in the information processing system 5a. In the present embodiment, the emotion-corresponding signal generator 14a superimposes an emotion-corresponding signal indicating the content of output in the information processing system 5a on the video data. In this embodiment, the emotion corresponding signal is superimposed on at least one of the video portion of the video data and the sound data included in the video data.

In the present embodiment as well, for example, the emotion-corresponding signal generator 14a holds, as an output-correspondence table, output-correspondence information indicating correspondence between emotions and output contents in the information processing system 5a, and uses the output-correspondence table to generate output contents. to decide.

FIG. 10 is a diagram showing an example of the output correspondence table of this embodiment. In the example shown in FIG. 10, the image quality of video data (described as image quality in FIG. 10), the volume of sound data (described as volume in FIG. 10), the sound quality of sound data (described as sound quality in FIG. 10), animation video or Superimposition of icon images on video data (denoted as animation/icon in FIG. 10) and superimposition of text, that is, character information on video data (denoted as text in FIG. 10) are performed according to emotion estimation results. It is shown that section 14a does. It should be noted that it is not necessary to perform all of these, and one or more may be performed. In the present embodiment, by performing the processing illustrated in FIG. 10 on at least one of the video data and the sound data, the emotion corresponding signal is superimposed on the video data.

As shown in FIG. 10, for example, when the emotion estimation result indicates excitement, the emotion-corresponding signal generation unit 14a may set the image quality of the video data to high luminance and change the image quality to an edge-enhanced image quality. However, you can increase the volume, change the sound quality to emphasize the mid-low range, or superimpose an animation (animation video) or an icon (icon image) of a video of a person running through the bottom of the screen on the video data. Alternatively, a text such as "Assault!!!" may be superimposed on the video data, or two or more of these may be combined. Further, for example, when the emotion estimation result indicates tension, the emotion corresponding signal generation unit 14a may set the image quality of the video data to an image quality set to a high color temperature, may lower the sound volume, or set the image quality to a high frequency range. You may change the sound quality to one that emphasizes , you may superimpose an animation that changes the size of the heart, or you may superimpose a text such as "Yabai!" on the video data. You may implement combining two or more. Further, for example, when the emotion estimation result indicates relaxation, the emotion corresponding signal generation unit 14a may set the image quality of the video data to a lower color temperature, a higher volume, or a flat sound quality. , an animation of an animal slowly floating in the screen may be superimposed on the video data, or a text such as "Mattari~" may be superimposed on the video data. You may carry out combining two or more. Further, for example, when the emotion estimation result indicates anger, the emotion corresponding signal generation unit 14a may change the image quality of the video data to an image quality that emphasizes red, may lower the volume, or may indicate anger. The icon may be superimposed on the video data so that the icon is displayed semi-transparently over the entire screen, or a text such as "Hmm!" may be implemented.

As described above, the emotion-corresponding signal generation unit 14a may superimpose the emotion-corresponding signal on the image data by changing the image quality of the image data according to the estimation result of the emotion estimation unit 13. The emotion corresponding signal may be superimposed on the video data by changing at least one of the volume and sound quality of the included sound data according to the estimation result of the emotion estimation unit 13 . The emotion-corresponding signal generation unit 14a may superimpose the emotion-corresponding signal on the video data by superimposing an animation image or an icon image corresponding to the estimation result of the emotion estimating unit 13 on the video data. The emotion corresponding signal may be superimposed on the video data by superimposing character information corresponding to the estimation result by the emotion estimation unit 13 on the image data.

It should be noted that the example shown in FIG. 10 is just an example, and the specific method of superimposing the emotion-corresponding signal according to the emotional content is not limited to the example shown in FIG.

The emotion-corresponding signal generation unit 14a outputs the image data on which the emotion-corresponding signal is superimposed to the image providing unit 11. In addition, the image providing unit 11 transmits the image data on which the emotion corresponding signal is superimposed to the distributor device 4 . When the video data input to the emotion-corresponding signal generator 14a contains sound data, the video data superimposed with the emotion-corresponding signal also contains sound data. This sound data is the changed sound data when the volume or tone quality is changed by the emotion-responsive signal generating unit 14a, and is the changed sound data when the emotion-responsive signal generating unit 14a does not change the sound data. is the same as the input sound data. The distributor device 4 transmits the video data superimposed with the emotion corresponding signal to the information processing system 5a.

The image receiving unit 51 of the information processing system 5a outputs the image data superimposed with the emotion corresponding signal to the output unit 52a. The output unit 52a includes a display unit 522 and a speaker 523, and the display unit 522 displays video data superimposed with the emotion corresponding signal. In addition, the speaker 523 outputs sound based on the sound data when the video data on which the emotion corresponding signal is superimposed includes the sound data. With the above processing, the user can perceive the output corresponding to the emotion of the person to be presumed by viewing the video data on which the emotion-corresponding signal is superimposed. It is possible to enhance the sense of unity with the players or performers in the event. Operations of the present embodiment other than those described above are the same as those of the first embodiment.

The image providing device 1a of the present embodiment is realized by a computer system, like the image providing device 1 of the first embodiment. The image providing device 1a may be realized by a cloud computer system. The information processing system 5a of the present embodiment is also realized by a computer system in the same manner as the information processing system 5 of the first embodiment. Alternatively, it may be the terminal 505, the game machine body 503, the controller 504 and the TV 501, or any other configuration.

<Modification>
FIG. 9 shows an example of superimposing an emotion-corresponding signal on video data. good too. FIG. 11 is a diagram showing a configuration example of a signal processing system according to a modification of this embodiment. A signal processing system 100b shown in FIG. 11 is the same as the signal processing system 100 of Embodiment 1 except that an image providing device 1b is provided instead of the image providing device 1. FIG. The image providing device 1b includes an emotion-corresponding signal generator 14b instead of the emotion-corresponding signal generator 14, and the image data superimposed with the emotion-corresponding signal is input to the image providing unit 11 from the emotion-corresponding signal generator 14b. Other than that, it is the same as the image providing device 1 of the first embodiment.

The image providing device 1b of the present embodiment is realized by a computer system, like the image providing device 1 of the first embodiment. The image providing device 1b may be realized by a cloud computer system.

The emotion corresponding signal generation unit 14b shown in FIG. 11 generates an emotion corresponding signal for vibration, which is an emotion corresponding signal related to vibration, as in the first embodiment, and transmits the generated emotion corresponding signal to the emotion corresponding signal transmitting unit 15. Output to Emotion-responsive signal transmission unit 15 transmits an emotional-responsive signal to information processing system 5 via distributor apparatus 4 in the same manner as in the first embodiment. Further, the emotion corresponding signal generation unit 14b superimposes the emotion corresponding signal on the video data, and outputs the video data superimposed with the emotion corresponding signal to the video providing unit 11, similarly to the emotion corresponding signal generation unit 14a shown in FIG. do. The image providing unit 11 transmits the image data superimposed with the emotion corresponding signal to the information processing system 5 via the distributor device 4 . As a result, the user can perceive the emotions of the athletes or performers in the event by viewing the video data on which the emotion-corresponding signals are superimposed, and can also feel the emotions of the athletes or performers through vibrations as in the first embodiment. can be done. This makes it possible to enhance the sense of unity between the player or performer in the event and the user who is watching the video of the event with the player or performer.

The configurations shown in the above embodiments are only examples, and can be combined with other known techniques, or can be combined with other embodiments, without departing from the scope of the invention. It is also possible to omit or change part of the configuration.

1, 1a, 1b video providing device, 2 imaging device, 3 wearable device, 4 distributor device, 5, 5a information processing system, 11 video providing unit, 12 emotional information acquiring unit, 13 emotional estimating unit, 14, 14a, 14b Emotional signal generation unit 15 Emotional signal transmission unit 51 Video reception unit 52, 52a Output unit 53 Emotional

signal processing unit

100, 100a, 100b Signal processing system 131 Trained model generation unit 132 Learning Completed model storage unit, 133 estimation unit, 521 vibration generation unit, 522 display unit, 523 speaker.

Claims

an emotion information acquiring unit that acquires sensor information acquired by the sensor from a sensor that acquires information indicating the emotion of the presumed target who is a player or performer in the event;
an emotion estimation unit that estimates the emotion of the person to be estimated using the sensor information;
an emotion corresponding signal generation unit that generates an emotion corresponding signal for causing an output unit in a user's information processing system to output according to the result of estimation by the emotion estimation unit;
an emotion-responsive signal transmitting unit that transmits the emotional-responsive signal to the information processing system directly or via another device;
A signal generation device comprising:
2. The signal generation apparatus according to claim 1, wherein the sensor includes at least one of a wearable device worn by the presumed target person and a photographing device capable of capturing an image including the presumed target person. .
The signal generation device according to claim 2, wherein the wearable device acquires the biometric information of the presumed target person as the sensor information.
The output unit includes a vibration generation unit in the information processing system capable of transmitting vibration to the user,
4. The emotion corresponding signal generation unit according to any one of claims 1 to 3, wherein the vibration generation unit generates the emotion corresponding signal that causes the vibration generation unit to vibrate according to an estimation result of the emotion estimation unit. signal generator.
The output unit includes a speaker,
4. The emotion corresponding signal generator according to any one of claims 1 to 3, wherein the emotion corresponding signal generating unit generates the emotion corresponding signal for causing the speaker to output a sound effect or music according to the result of estimation by the emotion estimating unit. The signal generating device according to 1.
The signal generation device according to any one of claims 1 to 3, wherein the emotion-responsive signal generation unit superimposes the emotion-responsive signal on video data of the event.
The output unit includes a display unit capable of displaying the video data,
7. The emotion-corresponding signal generator according to claim 6, wherein the emotion-corresponding signal is superimposed on the image data by changing the image quality of the image data according to the result of estimation by the emotion estimator. signal generator.
The output unit includes a speaker capable of outputting sound data included in the video data,
The emotion-corresponding signal generation unit changes at least one of volume and sound quality of the sound data included in the video data in accordance with an estimation result of the emotion estimation unit, thereby adding the emotion-corresponding signal to the video data. 7. The signal generation device according to claim 6, wherein the signals are superimposed.
The output unit includes a display unit capable of displaying the video data,
The emotion-corresponding signal generation unit superimposes the emotion-corresponding signal on the video data by superimposing an animation image or an icon image corresponding to an estimation result by the emotion estimating unit on the video data. Item 7. The signal generation device according to item 6.
7. The emotion-corresponding signal generation unit superimposes the emotion-corresponding signal on the video data by superimposing character information corresponding to the result of estimation by the emotion estimating unit on the video data. A signal generator as described.
The signal generation device according to any one of claims 1 to 10, wherein the information processing system includes at least one of a mobile terminal, a television, and a game machine.
a signal generator;
a user information processing system;
A signal processing system comprising:
The signal generator is
an emotion information acquiring unit that acquires sensor information acquired by the sensor from a sensor that acquires information indicating the emotion of the presumed target who is a player or performer in the event;
an emotion estimation unit that estimates the emotion of the person to be estimated using the sensor information;
an emotion corresponding signal generation unit that generates an emotion corresponding signal for causing the information processing system to output according to the result of estimation by the emotion estimation unit;
an emotion-responsive signal transmitting unit that transmits the emotional-responsive signal to the information processing system directly or via another device;
with
A signal processing system, wherein the information processing system comprises an output section that performs an output based on the emotion corresponding signal.
A wearable device that is the sensor that acquires the biometric information of the estimation target as the sensor information,
13. The signal processing system of claim 12, comprising:
a step of acquiring sensor information acquired by the sensor from a sensor that acquires information indicating the emotion of the presumed target who is a player or performer in the event;
estimating the emotion of the person to be presumed using the sensor information;
a step of generating an emotion corresponding signal for causing an output unit in a user's information processing system to output according to the result of estimating the emotion of the person to be presumed;
transmitting the emotional response signal to the information handling system, either directly or via another device;
A signal generation method, comprising: