WO2021240644A1 - Information output program, device, and method - Google Patents

Information output program, device, and method Download PDF

Info

Publication number
WO2021240644A1
WO2021240644A1 PCT/JP2020/020734 JP2020020734W WO2021240644A1 WO 2021240644 A1 WO2021240644 A1 WO 2021240644A1 JP 2020020734 W JP2020020734 W JP 2020020734W WO 2021240644 A1 WO2021240644 A1 WO 2021240644A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
event
moving image
output
commentary
Prior art date
Application number
PCT/JP2020/020734
Other languages
French (fr)
Japanese (ja)
Inventor
健二 山本
教数 塩月
Original Assignee
富士通株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 富士通株式会社 filed Critical 富士通株式会社
Priority to PCT/JP2020/020734 priority Critical patent/WO2021240644A1/en
Priority to JP2022527479A priority patent/JPWO2021240837A1/ja
Priority to PCT/JP2020/039429 priority patent/WO2021240837A1/en
Publication of WO2021240644A1 publication Critical patent/WO2021240644A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring

Definitions

  • the disclosed technology relates to an information output program, an information output device, and an information output method.
  • a commentary additional voice generation device that generates a commentary voice by synthesizing a commentary manuscript (text data) related to the content of a video.
  • This device detects a talking section (voiced sound section), which is a voiced sound section, and a pause section, which is a silent or background sound-only voice section, from the video sound which is the sound of the video. Then, this device converts the commentary sound into the speech speed based on the section length of the pause section, and adds the commentary sound converted into the speech speed to the video sound.
  • voiced sound section voiced sound section
  • a pause section which is a silent or background sound-only voice section
  • the disclosed technology outputs information showing the actual condition and explanation at an appropriate timing in live distribution contents such as sports games without the need for a manuscript prepared in advance and the actual condition and commentator. With the goal.
  • the disclosed technique acquires a sports image including sound information and a moving image, and event information related to an event indicated by each section of the moving image, and based on the acquired event information, the disclosed technique is used. A commentary on the event is generated for each section. Further, the disclosed technique adjusts the output timing of the generated live commentary based on the output timing of at least one section of the sound information and the moving image, and adjusts the output timing of the sound information and the moving image at least. Output with one.
  • One aspect is that it is possible to output information showing the actual condition and explanation at an appropriate timing in live distribution contents such as sports games without the need for a manuscript prepared in advance and the actual condition and commentator. Has.
  • the disclosed technology is applied to the live distribution content of a baseball game.
  • the content is distributed by adding the generated live commentary to the sound information collected at the stadium (hereinafter referred to as "ballpark audio"), the moving image taken at the stadium, or the video including the stadium audio. The case of doing so will be described.
  • the information output system 100 includes an information output device 10, a video distribution system 32, a stats input system 34, and a user terminal 36.
  • the video distribution system 32 shoots a baseball game held at a stadium with a camera and outputs the shot video.
  • the video includes a stadium sound and a moving image composed of a plurality of frames.
  • Time information is associated with each sampling point of the stadium sound and each frame of the moving image, and the stadium sound and the moving image are synchronized based on this time information.
  • the time information is the date and time when the video was shot, the elapsed time from the start of the game, and the like.
  • the stats input system 34 is a system for a person in charge to input stats information about a match while acquiring a video output from the video distribution system 32 and watching the video.
  • the content of the event is input for each event corresponding to one play such as pitching, hitting, running, and defense.
  • a time stamp is added to each event together with the input of the event, for example, manually by the person in charge.
  • the user terminal 36 is a terminal used by a user who uses the service provided by the information output system 100.
  • the user terminal 36 has a function of receiving content distributed from the information output device 10 and a function of outputting at least one of audio and moving images.
  • the user terminal 36 is, for example, a personal computer, a smartphone, a tablet terminal, a mobile phone, a television, a radio, or the like.
  • the information output device 10 generates a commentary on the video and outputs the content with the commentary added to the video acquired from the video distribution system 32.
  • the information output device 10 includes a video acquisition unit 11, an analysis unit 12, a stats acquisition unit 13, a synchronization unit 14, a generation unit 15, and a synthesis unit 16. .. Further, a scene information DB (Database) 21, a stats DB 22, an event DB 23, and a template DB 24 are stored in a predetermined storage area of the information output device 10.
  • the video acquisition unit 11, the analysis unit 12, the stats acquisition unit 13, and the synchronization unit 14 are examples of the disclosure technology acquisition unit. Further, the synthesis unit 16 is an example of an output unit of the disclosed technology.
  • the video acquisition unit 11 acquires the video output from the video distribution system 32, and divides the video into a stadium audio and a moving image.
  • the video acquisition unit 11 passes the divided moving image to the analysis unit 12, and also delivers the acquired video to the synthesis unit 16.
  • the analysis unit 12 acquires scene information for each section corresponding to each event in the moving image by image analysis of the moving image delivered from the video acquisition unit 11. Specifically, the analysis unit 12 detects the switching point of the cut of the camera from the difference in the pixel value between each frame of the moving image, and detects the switching point between the switching points as one section. Further, the analysis unit 12 recognizes the scene shown by the moving image of each section by using the recognition model.
  • the scene can be, for example, a scene that captures a defensive body shape, a scene that captures a batter standing in a turn at bat, a scene that captures the state of a bench, a scene of a running base, a scene of a pick-off ball, a scene of sliding, and the like.
  • the recognition model is machine-learned in advance about the correspondence between the moving image for each section and the label indicating the type of the correct scene shown by the moving image.
  • the analysis unit 12 acquires information such as a ball count, a strike count, an out count (hereinafter referred to as “BSO”), a score, an inning, and a runner's situation from the telop part of the frame image included in each section. This information can be obtained by comparison with a predetermined format, character recognition processing, or the like.
  • BSO will be referred to as "ball count (B) -strike count (S) -out count (O) (for example, 0-0-0)".
  • the analysis unit 12 stores the information acquired for each section in the scene information DB 21 in association with the time information associated with the start frame of the section and the time information associated with the end frame.
  • FIG. 3 shows an example of the scene information DB 21.
  • each row corresponds to the scene information of one section.
  • "Sequence No.” is assigned to each scene information in the order of time information.
  • the "start time” is the time information associated with the start frame of the section
  • the "end time” is the time information associated with the end frame of the section.
  • the "scene” is information indicating the type of the scene recognized by using the recognition model.
  • "inning” and "pre-event BSO” are information acquired from the telop in the frame image of the section. The information included in the scene information is not limited to the above example.
  • the stats acquisition unit 13 acquires stats information for each event input in the stats input system 34 and stores it in the stats DB 22.
  • FIG. 4 shows an example of the stats DB22.
  • each row corresponds to stats information for one event.
  • "Sequence No.” is assigned to each stats information in the order of time information.
  • the stats information includes a "start time” and a "end time” which are times entered by the person in charge as the start time and end time of the event indicated by each stats information.
  • stats information includes "inning" at the time of the event, "batter team” which is the name of the team to which the batter belongs, “batter” which is the name of the batter, and “pitcher team” which is the name of the team to which the pitcher belongs. And the information of "pitcher” which is the name of the pitcher is included.
  • stats information includes "number of pitches in the turn at bat”, “event content”, “direction of hitting” when the event is a hit, and “event result” indicating the number of pitches in the turn at bat for the batter at the time of the event. Contains information.
  • stats information includes "pre-event BSO” which is the BSO before the event and "post-event BSO” based on the event result. The information included in the stats information is not limited to the above example.
  • the synchronization unit 14 generates event information in which the scene information and the stats information are synchronized by associating each of the stats information with the scene information of each section based on the order of the stats information.
  • the scene information is the information acquired by analyzing the moving image
  • the scenes that can be acquired are limited.
  • the time information (start time and end time) of the scene information is the time information associated with each frame of the moving image, it represents the accurate time information of the scene information acquired from each section and also. It is also synchronized with the time information of the stadium voice.
  • the stats information can also acquire detailed information that cannot be acquired by the analysis of moving images.
  • the time information of the stats information is input by the person in charge, there may be cases of inaccuracies, input omissions, etc., and the grain size is coarse, and synchronization with the time information of the stadium voice is guaranteed. It has not been.
  • the time information of the stats information is corrected based on the time information of the scene information, and the information becomes accurate. This makes it possible to generate event information that is more detailed than the scene information and more accurate in the time information than the stats information.
  • the synchronization unit 14 has a sequence number of each of the scene information and the stats information.
  • Event information is generated by associating the stats information with the scene information in which the information of the items common to the scene information and the stats information match, while guaranteeing the context. Common items are, for example, "scene” and “event content”, “pre-event BSO”, and the like.
  • the synchronization unit 14 stores the generated event information in the event DB 23.
  • FIG. 5 shows an example of the event DB 23.
  • each row corresponds to one event information.
  • the event information has an item in which the item of the scene information and the item of the stats information are integrated.
  • the synchronization unit 14 may refer to, for example, sequence No. 14 for scene information in which the corresponding stats information does not exist.
  • Event information is generated by associating the same information as the immediately preceding stats information in order.
  • each of the scene information and the event information is an example of the event information of the disclosed technology
  • the stats information is an example of the external information of the disclosed technology.
  • the generation unit 15 Based on the event information stored in the event DB 23, the generation unit 15 generates a sentence (hereinafter, referred to as “actual commentary”) that is a commentary or a commentary regarding the event indicated by each event information. Specifically, the generation unit 15 selects a template corresponding to each event information from a plurality of live commentary templates stored in the template DB 24, and combines the selected template with the event information to provide a live commentary. To generate.
  • actual commentary a sentence that is a commentary or a commentary regarding the event indicated by each event information.
  • FIG. 6 shows an example of the template DB 24.
  • each row (each record) is information about one template, and by arranging a plurality of templates, a template group corresponding to one event information is formed.
  • the same "template group ID” is assigned as identification information to the templates included in the same template group, and "sequence No. in the template group” is assigned in the order of output.
  • the template DB 24 includes information on a "speaker type" indicating whether each template is intended to be a commentator or a commentator as a speaker.
  • the "template” is a format in which parameters are inserted in a part of a sentence that is a commentary or a commentary.
  • the part of ⁇ > is the part where the parameter is inserted, and the numbers 1, 2, ... Are assigned in ⁇ > in the order of appearance in each template.
  • the item name of the event information is specified as the "parameter type”.
  • "before (or after) B (or S)” defined as the parameter type represents only the corresponding count of BSO.
  • the template DB 24 includes a reproduction time (hereinafter, referred to as “voice time”) when the live commentary generated based on each template is used as voice data.
  • voice time reproduction time
  • the template DB 24 has items of "template”, "voice time”, and “parameter type” corresponding to each of a plurality of languages. You may remember.
  • the generation unit 15 selects the template group corresponding to the event information by using the selection model for selecting the template group suitable for the event information.
  • the selection model is a model in which the correspondence between the event information and the optimum template group for the event indicated by the event information is machine-learned in advance, and the degree of conformity between the target event information and each of the template groups is output. It is a model to do.
  • the generation unit 15 has a “voice time” that is the sum of the voice times of each template included in the predetermined template groups in descending order of the goodness of fit output from the selection model. Is calculated. Then, the generation unit 15 has the highest degree of conformity among the template groups whose voice time is shorter than the time from the "start time” to the "end time” of the event information (hereinafter referred to as "event time”). Select. For example, when the goodness of fit with each template group as shown in FIG. 7 is obtained for the event information having an event time of 20 seconds, the generation unit 15 selects the template group whose template group ID is 3.
  • the generation unit 15 inserts the value of the event information item indicated by the "parameter type" into the ⁇ > part of each template included in the selected template group, and generates a live commentary.
  • the sequence No. of the event DB 23 shown in FIG. Regarding the event information of 5 it is assumed that the template group whose template group ID is 1 as shown in FIG. 6 is selected. Of these, the sequence No. in the template group. Take the template of 6 as an example.
  • the generation unit 15 inserts the event result "foul" in ⁇ 1> of the template, inserts "1" which is B after the event in ⁇ 2>, and S after the event in ⁇ 3>. Insert "1".
  • the generation unit 15 generates a commentary sentence "Foul. This is one ball and one strike.”
  • the generation unit 15 passes the generated commentary to the synthesis unit 16.
  • the synthesizing unit 16 is a commentary on the actual situation delivered from the generation unit 15 based on the output timing of at least one of the stadium audio and the moving image of the video delivered from the video acquisition unit 11 for each section corresponding to each event. Adjust the output timing of the statement. Then, the synthesis unit 16 generates and outputs the content with the live commentary so that the live commentary sentence whose output timing is adjusted is output together with at least one of the stadium sound and the moving image.
  • the synthesizing unit 16 generates a content obtained by synthesizing the stadium audio and the audio data indicating the commentary text as the content with the commentary (see A in FIG. 8).
  • This content is applicable to radio broadcasting and the like.
  • the synthesizing unit 16 generates a content obtained by synthesizing the original video (with stadium sound) or moving image (without stadium sound) and the audio data showing the live commentary as the content with the live commentary.
  • This content can be applied to television broadcasting, Internet video distribution, and the like.
  • the synthesizing unit 16 synthesizes the original video (with stadium sound) or moving image (without stadium sound) and the image data (subtitle) that visualizes the text showing the live commentary as the content with the live commentary. (See B in FIG. 8). This content can also be applied to television broadcasting, Internet video distribution, and the like.
  • the synthesizing unit 16 When synthesizing at least one of the stadium sound and the moving image with the live commentary, the synthesizing unit 16 corresponds to the time information of the moving image or the time information of the stadium sound synchronized with the time information of the moving image and the live commentary. Synchronize with the time information of the event information. Since the time information of the event information matches the time information of the moving image, it is possible to easily synchronize the two.
  • the information output device 10 can be realized by, for example, the computer 40 shown in FIG.
  • the computer 40 includes a CPU (Central Processing Unit) 41, a memory 42 as a temporary storage area, and a non-volatile storage unit 43. Further, the computer 40 includes an input / output device 44 such as an input unit and a display unit, and an R / W (Read / Write) unit 45 that controls reading and writing of data to the storage medium 49. Further, the computer 40 includes a communication I / F (Interface) 46 connected to a network such as the Internet.
  • the CPU 41, the memory 42, the storage unit 43, the input / output device 44, the R / W unit 45, and the communication I / F 46 are connected to each other via the bus 47.
  • the storage unit 43 can be realized by an HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory, or the like.
  • the storage unit 43 as a storage medium stores an information output program 50 for causing the computer 40 to function as the information output device 10.
  • the information output program 50 includes a video acquisition process 51, an analysis process 52, a stats acquisition process 53, a synchronization process 54, a generation process 55, and a synthesis process 56.
  • the storage unit 43 has an information storage area 60 in which information constituting each of the scene information DB 21, the stats DB 22, the event DB 23, and the template DB 24 is stored.
  • the CPU 41 reads the information output program 50 from the storage unit 43, expands it into the memory 42, and sequentially executes the processes included in the information output program 50.
  • the CPU 41 operates as the video acquisition unit 11 shown in FIG. 2 by executing the video acquisition process 51. Further, the CPU 41 operates as the analysis unit 12 shown in FIG. 2 by executing the analysis process 52. Further, the CPU 41 operates as the stats acquisition unit 13 shown in FIG. 2 by executing the stats acquisition process 53. Further, the CPU 41 operates as the synchronization unit 14 shown in FIG. 2 by executing the synchronization process 54. Further, the CPU 41 operates as the generation unit 15 shown in FIG. 2 by executing the generation process 55. Further, the CPU 41 operates as the synthesis unit 16 shown in FIG.
  • the CPU 41 reads information from the information storage area 60, and expands each of the scene information DB 21, the stats DB 22, the event DB 23, and the template DB 24 into the memory 42.
  • the computer 40 that has executed the information output program 50 functions as the information output device 10.
  • the CPU 41 that executes the program is hardware.
  • the function realized by the information output program 50 can also be realized by, for example, a semiconductor integrated circuit, more specifically, an ASIC (Application Specific Integrated Circuit) or the like.
  • a semiconductor integrated circuit more specifically, an ASIC (Application Specific Integrated Circuit) or the like.
  • the video distribution system 32 shoots a baseball game held at the stadium with a camera and starts outputting the shot video.
  • the stats input system 34 acquires the video output from the video distribution system 32, and the person in charge inputs the stats information regarding the match while watching the video. Further, in the information output device 10, the information output process shown in FIG. 10 is executed.
  • the information output process is an example of the information output method of the disclosed technology.
  • step S12 the video acquisition unit 11 acquires the video for a predetermined time output from the video distribution system 32, and divides the video into a stadium audio and a moving image. Then, the video acquisition unit 11 passes the divided moving image to the analysis unit 12, and also delivers the acquired video to the synthesis unit 16.
  • step S14 the analysis unit 12 acquires scene information for each section corresponding to each event in the moving image by performing image analysis on the moving image delivered from the video acquisition unit 11. Then, the analysis unit 12 stores the information acquired for each section in the scene information DB 21 in association with the time information associated with the start frame of the section and the time information associated with the end frame. ..
  • step S16 the stats acquisition unit 13 acquires the stats information for each event input in the stats input system 34 and stores it in the stats DB 22.
  • step S18 the synchronization unit 14 sets the sequence No. of each of the scene information and the stats information. While guaranteeing the context, the stats information is associated with the scene information in which the information of the items common to the scene information and the stats information match. As a result, the synchronization unit 14 generates event information. The synchronization unit 14 stores the generated event information in the event DB 23.
  • step S20 the generation unit 15 selects a template corresponding to each event information from the templates of the plurality of live commentary sentences stored in the template DB 24, and combines the selected template with the event information to perform the live commentary. Generate a commentary. The generation unit 15 passes the generated commentary to the synthesis unit 16.
  • step S22 the synthesizing unit 16 synthesizes at least one of the stadium audio and the moving image with the audio data or the image data (subtitles) indicating the live commentary.
  • the synthesis unit 16 synchronizes the time information of the stadium sound synchronized with the time information of the moving image or the time information of the moving image with the time information of the event information corresponding to the live commentary.
  • the synthesizing unit 16 generates and outputs content with a live commentary in which at least one of the stadium audio and the moving image and the output timing of the live commentary are synchronized. Then, the information output process ends.
  • the content with live commentary output from the synthesis unit 16 is distributed to the user terminal 36, and the user who uses the user terminal 36 can view the content with live commentary.
  • the information output device acquires a video of a baseball game including a stadium sound and a moving image, and in the moving image, it is used for each event. Acquire the scene information for each corresponding section. Further, the information output device acquires stats information input externally based on the video. The information output device associates the stats information with the scene information to generate event information having accurate time information for the video and detailed information about the event. Then, the information output device generates a live commentary for each event based on each event information and a template.
  • the information output device adjusts the output timing of the generated live commentary based on the output timing of at least one section of the stadium audio and the moving image, and outputs the output together with at least one of the stadium audio and the moving image.
  • the information output device outputs information showing the actual condition and the explanation at an appropriate timing in the live distribution content such as a sports match without requiring a manuscript prepared in advance and the actual condition and the commentator.
  • the commentator and a commentator since there is no need for a commentator and a commentator, labor costs, distribution equipment, and other costs can be reduced.
  • the template according to the attribute information of the user who views the delivered content with live commentary may be selected.
  • a template for generating a commentary that is biased toward one of the teams, a template for generating a commentary that is biased toward the other team, and a template for a neutral position are prepared.
  • the information of the user's favorite team is acquired as the user's attribute information. Information may be acquired in advance, or information input by the user may be acquired before the start of distribution or during distribution. It may also be estimated based on the user's past viewing history.
  • select a template select a template that generates a commentary that is biased toward the user's favorite team.
  • the live commentary can be flexibly changed according to the user's preference, and the distributed content can be diversified.
  • the attributes of the user are not limited to the favorite team, but may be gender, age, proficiency level for rules, and the like.
  • the case where the live comment is mainly generated is explained, but the same applies to the commentary. Specifically, by preparing a template for commentary using event information and a selection model that associates the event information with the template for appropriate commentary for the event information, the commentary according to the event information is prepared. Can generate statements. In the case where two speakers, a commentator and a commentator, are assumed, when the commentary sentence is converted into voice data, the voice sound may be different depending on the speaker.
  • the event information is generated by associating the stats information with the scene information
  • the accuracy of the image analysis may be improved so that detailed information equivalent to the stats information may be acquired as the scene information.
  • external information other than stats information may be acquired and included in the event information. For example, prepare information such as past competition results, pitcher's ball type, and batting average for each batter's course. Then, such information related to the team, pitcher, and batter at each event can be included in the event information. This makes it possible to generate commentary sentences such as "So far, the batting average of BB players against AA pitchers is 30%" and "This batter is strong in raising the outside angle".
  • information collected from teams and players, information collected from the Internet, etc. can be prepared, and such information related to teams, pitchers, and batters at each event can be included in the event information. good. Then, for example, for event information corresponding to a scene in which a bench or an audience seat is shown, a template that uses such information is selected. In addition, this information may be used to generate a live commentary for a portion where time is available between events. This makes it possible to generate a commentary such as "AA player, yesterday's hit by pitch does not seem to have any effect.”
  • the present invention is not limited to this, and the selection may be made on a rule basis. For example, if the event result includes out, the template A may be selected, and if the pre-event BSO is 3-2-2 and there is a runner, the template B may be selected according to a predetermined rule.
  • a template group having a voice time shorter than the event time has been described, but the present invention is not limited to this.
  • the template group having the highest degree of conformity may be selected without considering the audio time.
  • the audio time of the selected template group is longer than the event time, it may be processed to speed up the playback speed of the audio data showing the live commentary, or some templates included in the template group may be deleted. do it.
  • an example of a baseball game has been described as a sports image, but the application example of the disclosed technology is not limited to this, and can be applied to, for example, soccer, basketball, and the like.
  • the application example of the disclosed technology is not limited to this, and can be applied to, for example, soccer, basketball, and the like.
  • soccer by analyzing the moving image, along with the time information corresponding to each frame, the running speed of the player, the current score (telop information), the distance of the pass, the bias of the positioning of all the players, and the attack direction. Is acquired as scene information.
  • stats information player names, positioning, play contents such as sliding and passes, play results, etc. are acquired.
  • the player name, the running speed of the player, the positioning, the result of the shot, the number of points scored when the shot is made, the content of the play such as steel, no-look pass, rebound, etc. are acquired as scene information.
  • stats information the team name, player name, score, etc. are acquired.
  • the event information may be generated by associating the stats information with the scene information, the template corresponding to the event information may be selected, and the live commentary may be generated.
  • the present invention is not limited to this.
  • the program according to the disclosed technology can also be provided in a form stored in a storage medium such as a CD-ROM, a DVD-ROM, or a USB memory.
  • Information output device 11
  • Video acquisition unit 12 Analysis unit 13
  • Stats acquisition unit 14 Synchronization unit 15
  • Generation unit 16 Synthesis unit 21
  • Scene information DB 22 Stats DB 23
  • Event DB 24 Template DB 32
  • Video distribution system 34 Stats input system 36
  • User terminal 40 Computer 41
  • Memory 43 Storage unit 49
  • Information output program 100 Information output system

Abstract

This information output device: acquires a video picture of a baseball game that includes sounds at a ball park and a dynamic picture image; acquires scene information of respective sections corresponding to events in the dynamic picture image; acquires stats information that has been inputted externally on the basis of the video picture; associates the stats information with the scene information so as to generate event information that comprises accurate clock time information with respect to the video picture and detailed information in relation to an event; generates play-by-play commentary sentences about the respective events on the basis of the event information and templates; and adjusts the output timing of the generated play-by-play commentary sentences on the basis of the output timing, for each of the sections, of the ballpark sounds and/or the dynamic picture image, to thereby output the commentary sentences together with the ballpark sounds and/or the dynamic picture.

Description

情報出力プログラム、装置、及び方法Information output programs, devices, and methods
 開示の技術は、情報出力プログラム、情報出力装置、及び情報出力方法に関する。 The disclosed technology relates to an information output program, an information output device, and an information output method.
 例えば、野球やサッカーの試合等のスポーツの映像をライブ配信することが行われている。このような映像には、試合等の状況に対する実況及び解説が付加される場合が多い。そこで、映像に実況及び解説を付加する技術が提案されている。 For example, live distribution of sports videos such as baseball and soccer games is being carried out. In many cases, a commentary and a commentary on the situation such as a match are added to such a video. Therefore, a technique for adding a live commentary and a commentary to a video has been proposed.
 例えば、映像の内容に関連する解説原稿(テキストデータ)を音声合成して解説音声を生成する解説付加音声生成装置が提案されている。この装置は、映像の音声である映像音声から発声音の音声区間である喋り区間(発声音区間)、及び、無音あるいは背景音のみの音声区間であるポーズ区間を検出する。そして、この装置は、ポーズ区間の区間長に基づいて、解説音声を話速変換し、映像音声に、話速変換された解説音声を付加する。 For example, a commentary additional voice generation device has been proposed that generates a commentary voice by synthesizing a commentary manuscript (text data) related to the content of a video. This device detects a talking section (voiced sound section), which is a voiced sound section, and a pause section, which is a silent or background sound-only voice section, from the video sound which is the sound of the video. Then, this device converts the commentary sound into the speech speed based on the section length of the pause section, and adds the commentary sound converted into the speech speed to the video sound.
特開2008-39845号公報Japanese Unexamined Patent Publication No. 2008-39845
 従来技術では、映像音声に付加する解説音声の元となる解説原稿が予め用意されている必要がある。しかしながら、スポーツの試合等に関する映像や音声のライブ配信では、試合等の状況に応じて実況及び解説の内容が異なるため、予め解説原稿を用意しておくことは困難である。また、視聴スタイルの多様化等により、配信コンテンツに応じて実況者及び解説者を用意することは、人件費、配信設備等のコスト面で大きな負担となる。さらに、付加する実況解説文が元の動画像や音声に対して適切なタイミングで出力されることが必要である。 In the conventional technology, it is necessary to prepare in advance the commentary manuscript that is the source of the commentary audio to be added to the video / audio. However, in live distribution of video and audio related to sports games and the like, it is difficult to prepare commentary manuscripts in advance because the actual situation and the contents of the commentary differ depending on the situation of the games and the like. In addition, due to the diversification of viewing styles and the like, preparing a commentator and a commentator according to the distributed content will be a heavy burden in terms of labor costs, distribution equipment, and the like. Furthermore, it is necessary that the live commentary to be added is output at an appropriate timing with respect to the original moving image and sound.
 一つの側面として、開示の技術は、事前に用意する原稿、並びに実況及び解説者を要することなく、スポーツの試合等のライブ配信コンテンツにおいて、実況及び解説を示す情報を適切なタイミングで出力することを目的とする。 As one aspect, the disclosed technology outputs information showing the actual condition and explanation at an appropriate timing in live distribution contents such as sports games without the need for a manuscript prepared in advance and the actual condition and commentator. With the goal.
 一つの態様として、開示の技術は、音情報と動画像とを含むスポーツの映像と、前記動画像の各区間が示す事象に関する事象情報とを取得し、取得した前記事象情報に基づいて、前記区間毎に前記事象に関する実況解説文を生成する。また、開示の技術は、前記音情報及び前記動画像の少なくとも一方の区間毎の出力タイミングに基づいて、生成した前記実況解説文の出力タイミングを調整して、前記音情報及び前記動画像の少なくとも一方と共に出力する。 As one embodiment, the disclosed technique acquires a sports image including sound information and a moving image, and event information related to an event indicated by each section of the moving image, and based on the acquired event information, the disclosed technique is used. A commentary on the event is generated for each section. Further, the disclosed technique adjusts the output timing of the generated live commentary based on the output timing of at least one section of the sound information and the moving image, and adjusts the output timing of the sound information and the moving image at least. Output with one.
 一つの側面として、事前に用意する原稿、並びに実況及び解説者を要することなく、スポーツの試合等のライブ配信コンテンツにおいて、実況及び解説を示す情報を適切なタイミングで出力することができる、という効果を有する。 One aspect is that it is possible to output information showing the actual condition and explanation at an appropriate timing in live distribution contents such as sports games without the need for a manuscript prepared in advance and the actual condition and commentator. Has.
情報出力システムの概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of an information output system. 情報出力装置の機能ブロック図である。It is a functional block diagram of an information output device. シーン情報DBの一例を示す図である。It is a figure which shows an example of a scene information DB. スタッツDBの一例を示す図である。It is a figure which shows an example of a stats DB. イベント情報DBの一例を示す図である。It is a figure which shows an example of the event information DB. 雛形DBの一例を示す図である。It is a figure which shows an example of a template DB. 雛形グループの選択を説明するための図である。It is a figure for demonstrating the selection of a template group. 実況解説付きコンテンツの生成を説明するための図である。It is a figure for demonstrating the generation of the content with live commentary. 情報出力装置として機能するコンピュータの概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the computer which functions as an information output device. 情報出力処理の一例を示すフローチャートである。It is a flowchart which shows an example of information output processing.
 以下、図面を参照して、開示の技術に係る実施形態の一例を説明する。なお、以下の実施形態では、一例として、野球の試合のライブ配信コンテンツに、開示の技術を適用した例について説明する。具体的には、球場で収音された音情報(以下、「球場音声」という)、球場で撮影された動画像、又は球場音声を含む映像に、生成した実況解説文を付加したコンテンツを配信する場合について説明する。 Hereinafter, an example of an embodiment relating to the disclosed technology will be described with reference to the drawings. In the following embodiment, as an example, an example in which the disclosed technology is applied to the live distribution content of a baseball game will be described. Specifically, the content is distributed by adding the generated live commentary to the sound information collected at the stadium (hereinafter referred to as "ballpark audio"), the moving image taken at the stadium, or the video including the stadium audio. The case of doing so will be described.
 図1に示すように、本実施形態に係る情報出力システム100は、情報出力装置10と、映像配信システム32と、スタッツ入力システム34と、ユーザ端末36とを含む。 As shown in FIG. 1, the information output system 100 according to the present embodiment includes an information output device 10, a video distribution system 32, a stats input system 34, and a user terminal 36.
 映像配信システム32は、球場で行われる野球の試合をカメラにより撮影し、撮影された映像を出力する。映像には、球場音声と、複数のフレームで構成された動画像とが含まれる。球場音声の各サンプリング点、及び動画像の各フレームには時刻情報が対応付けられており、この時刻情報に基づいて、球場音声と動画像との同期がとられる。なお、時刻情報は、映像が撮影された日時、試合開始からの経過時間等である。 The video distribution system 32 shoots a baseball game held at a stadium with a camera and outputs the shot video. The video includes a stadium sound and a moving image composed of a plurality of frames. Time information is associated with each sampling point of the stadium sound and each frame of the moving image, and the stadium sound and the moving image are synchronized based on this time information. The time information is the date and time when the video was shot, the elapsed time from the start of the game, and the like.
 スタッツ入力システム34は、映像配信システム32から出力された映像を取得し、その映像を見ながら、担当者が試合に関するスタッツ情報を入力するためのシステムである。スタッツとしては、例えば、投球、打撃、走塁、守備等のワンプレーに相当するイベント毎に、そのイベントの内容が入力される。また、イベントの入力と共に、例えば担当者の手動により、イベント毎にタイムスタンプが付与される。 The stats input system 34 is a system for a person in charge to input stats information about a match while acquiring a video output from the video distribution system 32 and watching the video. As the stats, for example, the content of the event is input for each event corresponding to one play such as pitching, hitting, running, and defense. In addition, a time stamp is added to each event together with the input of the event, for example, manually by the person in charge.
 ユーザ端末36は、情報出力システム100により提供されるサービスを利用するユーザが使用する端末である。ユーザ端末36は、情報出力装置10から配信されるコンテンツを受信する機能と、音声及び動画像の少なくとも一方を出力する機能とを備える。ユーザ端末36は、例えば、パーソナルコンピュータ、スマートフォン、タブレット端末、携帯電話、テレビ、ラジオ等である。 The user terminal 36 is a terminal used by a user who uses the service provided by the information output system 100. The user terminal 36 has a function of receiving content distributed from the information output device 10 and a function of outputting at least one of audio and moving images. The user terminal 36 is, for example, a personal computer, a smartphone, a tablet terminal, a mobile phone, a television, a radio, or the like.
 情報出力装置10は、映像に関する実況解説文を生成し、映像配信システム32から取得した映像に付加した実況解説付きコンテンツを出力する。情報出力装置10は、機能的には、図2に示すように映像取得部11と、解析部12と、スタッツ取得部13と、同期部14と、生成部15と、合成部16とを含む。また、情報出力装置10の所定の記憶領域には、シーン情報DB(Database)21と、スタッツDB22と、イベントDB23と、雛形DB24とが記憶される。なお、映像取得部11、解析部12、スタッツ取得部13、及び同期部14は、開示の技術の取得部の一例である。また、合成部16は、開示の技術の出力部の一例である。 The information output device 10 generates a commentary on the video and outputs the content with the commentary added to the video acquired from the video distribution system 32. Functionally, as shown in FIG. 2, the information output device 10 includes a video acquisition unit 11, an analysis unit 12, a stats acquisition unit 13, a synchronization unit 14, a generation unit 15, and a synthesis unit 16. .. Further, a scene information DB (Database) 21, a stats DB 22, an event DB 23, and a template DB 24 are stored in a predetermined storage area of the information output device 10. The video acquisition unit 11, the analysis unit 12, the stats acquisition unit 13, and the synchronization unit 14 are examples of the disclosure technology acquisition unit. Further, the synthesis unit 16 is an example of an output unit of the disclosed technology.
 映像取得部11は、映像配信システム32から出力された映像を取得し、映像を球場音声と動画像とに分割する。映像取得部11は、分割した動画像を解析部12へ受け渡すと共に、取得した映像を合成部16へ受け渡す。 The video acquisition unit 11 acquires the video output from the video distribution system 32, and divides the video into a stadium audio and a moving image. The video acquisition unit 11 passes the divided moving image to the analysis unit 12, and also delivers the acquired video to the synthesis unit 16.
 解析部12は、映像取得部11から受け渡された動画像を画像解析することにより、動画像において、各イベントに対応する区間毎にシーン情報を取得する。具体的には、解析部12は、動画像の各フレーム間の画素値の差から、カメラのカットの切り替わり点を検出し、切り替わり点と切り替わり点との間を1つの区間として検出する。また、解析部12は、認識モデルを用いて、各区間の動画像が示すシーンを認識する。シーンは、例えば、守備体形を捉えたシーン、打席に立つ打者を捉えたシーン、ベンチの様子を捉えたシーン、走塁のシーン、牽制球のシーン、スライディングのシーン等とすることができる。認識モデルは、区間毎の動画像と、その動画像が示す正解のシーンの種類を示すラベルとの対応付けを予め機械学習したものである。 The analysis unit 12 acquires scene information for each section corresponding to each event in the moving image by image analysis of the moving image delivered from the video acquisition unit 11. Specifically, the analysis unit 12 detects the switching point of the cut of the camera from the difference in the pixel value between each frame of the moving image, and detects the switching point between the switching points as one section. Further, the analysis unit 12 recognizes the scene shown by the moving image of each section by using the recognition model. The scene can be, for example, a scene that captures a defensive body shape, a scene that captures a batter standing in a turn at bat, a scene that captures the state of a bench, a scene of a running base, a scene of a pick-off ball, a scene of sliding, and the like. The recognition model is machine-learned in advance about the correspondence between the moving image for each section and the label indicating the type of the correct scene shown by the moving image.
 また、解析部12は、各区間に含まれるフレーム画像のテロップ部分から、ボールカウント、ストライクカウント、アウトカウント(以下、「BSO」とう)、スコア、イニング、ランナーの状況等の情報を取得する。これらの情報は、予め定めたフォーマットとの比較、文字認識処理等により取得することができる。なお、以下では、BSOについて、「ボールカウント(B)-ストライクカウント(S)-アウトカウント(O)(例えば、0-0-0)」と表記する。 Further, the analysis unit 12 acquires information such as a ball count, a strike count, an out count (hereinafter referred to as “BSO”), a score, an inning, and a runner's situation from the telop part of the frame image included in each section. This information can be obtained by comparison with a predetermined format, character recognition processing, or the like. In the following, BSO will be referred to as "ball count (B) -strike count (S) -out count (O) (for example, 0-0-0)".
 解析部12は、区間毎に取得した情報と、その区間の開始フレームに対応付けられた時刻情報、及び終了フレームに対応付けられた時刻情報とを対応付けて、シーン情報DB21に記憶する。 The analysis unit 12 stores the information acquired for each section in the scene information DB 21 in association with the time information associated with the start frame of the section and the time information associated with the end frame.
 図3に、シーン情報DB21の一例を示す。図3の例では、各行(各レコード)が1区間のシーン情報に相当する。各シーン情報には、時刻情報の順に「シーケンスNo.」が付与されている。また、図3の例において、「開始時刻」は、その区間の開始フレームに対応付けられた時刻情報であり、「終了時刻」は、その区間の終了フレームに対応付けられた時刻情報である。また、「シーン」は、認識モデルを用いて認識されたシーンの種類を示す情報である。また、「イニング」、「イベント前BSO」は、その区間のフレーム画像内のテロップから取得された情報である。なお、シーン情報に含まれる情報は、上記の例に限定されない。 FIG. 3 shows an example of the scene information DB 21. In the example of FIG. 3, each row (each record) corresponds to the scene information of one section. "Sequence No." is assigned to each scene information in the order of time information. Further, in the example of FIG. 3, the "start time" is the time information associated with the start frame of the section, and the "end time" is the time information associated with the end frame of the section. Further, the "scene" is information indicating the type of the scene recognized by using the recognition model. Further, "inning" and "pre-event BSO" are information acquired from the telop in the frame image of the section. The information included in the scene information is not limited to the above example.
 スタッツ取得部13は、スタッツ入力システム34において入力されたイベント毎のスタッツ情報を取得し、スタッツDB22に記憶する。図4に、スタッツDB22の一例を示す。図4の例では、各行(各レコード)が1つのイベントについてのスタッツ情報に相当する。各スタッツ情報には、時刻情報の順に「シーケンスNo.」が付与されている。また、図4の例では、スタッツ情報には、各スタッツ情報が示すイベントの開始時刻及び終了時刻として担当者により入力された時刻である「開始時刻」及び「終了時刻」が含まれる。また、スタッツ情報には、そのイベント時の「イニング」、打者が所属するチーム名である「打者チーム」、打者の氏名である「打者」、投手が所属するチーム名である「投手チーム」、及び投手の氏名である「投手」の情報が含まれる。また、スタッツ情報には、そのイベント時の打者に対するその打席での投球数を示す「打席内投球数」、「イベント内容」、イベントが打撃の場合における「打球方向」、及び「イベント結果」の情報が含まれる。また、スタッツ情報には、そのイベント前のBSOである「イベント前BSO」、及びイベント結果を踏まえた「イベント後BSO」が含まれる。なお、スタッツ情報に含まれる情報は上記の例に限定されない。 The stats acquisition unit 13 acquires stats information for each event input in the stats input system 34 and stores it in the stats DB 22. FIG. 4 shows an example of the stats DB22. In the example of FIG. 4, each row (each record) corresponds to stats information for one event. "Sequence No." is assigned to each stats information in the order of time information. Further, in the example of FIG. 4, the stats information includes a "start time" and a "end time" which are times entered by the person in charge as the start time and end time of the event indicated by each stats information. In addition, the stats information includes "inning" at the time of the event, "batter team" which is the name of the team to which the batter belongs, "batter" which is the name of the batter, and "pitcher team" which is the name of the team to which the pitcher belongs. And the information of "pitcher" which is the name of the pitcher is included. In addition, the stats information includes "number of pitches in the turn at bat", "event content", "direction of hitting" when the event is a hit, and "event result" indicating the number of pitches in the turn at bat for the batter at the time of the event. Contains information. Further, the stats information includes "pre-event BSO" which is the BSO before the event and "post-event BSO" based on the event result. The information included in the stats information is not limited to the above example.
 同期部14は、スタッツ情報の順番に基づいて、スタッツ情報の各々を各区間のシーン情報に対応付けることにより、シーン情報とスタッツ情報とを同期させたイベント情報を生成する。 The synchronization unit 14 generates event information in which the scene information and the stats information are synchronized by associating each of the stats information with the scene information of each section based on the order of the stats information.
 ここで、シーン情報は、動画像を解析して取得される情報であるため、取得できるシーンに限りがある。ただし、シーン情報の時刻情報(開始時刻及び終了時刻)は、動画像の各フレームに対応付けられた時刻情報であるため、各区間から取得されたシーン情報の正確な時刻情報を表し、かつ、球場音声の時刻情報との同期も取れている。一方、スタッツ情報は、動画像の解析では取得できない詳細な情報も取得可能である。ただし、スタッツ情報の時刻情報は、担当者により入力されるものであるため、不正確な場合や、入力漏れの場合等があり得ると共に、粒度が粗く、球場音声の時刻情報との同期も保証されていない。 Here, since the scene information is the information acquired by analyzing the moving image, the scenes that can be acquired are limited. However, since the time information (start time and end time) of the scene information is the time information associated with each frame of the moving image, it represents the accurate time information of the scene information acquired from each section and also. It is also synchronized with the time information of the stadium voice. On the other hand, the stats information can also acquire detailed information that cannot be acquired by the analysis of moving images. However, since the time information of the stats information is input by the person in charge, there may be cases of inaccuracies, input omissions, etc., and the grain size is coarse, and synchronization with the time information of the stadium voice is guaranteed. It has not been.
 そこで、スタッツ情報をシーン情報に対応付けることで、スタッツ情報の時刻情報がシーン情報の時刻情報に基づいて補正され、正確な情報となる。これにより、シーン情報よりも詳細で、かつスタッツ情報よりも時刻情報が正確なイベント情報を生成することができる。 Therefore, by associating the stats information with the scene information, the time information of the stats information is corrected based on the time information of the scene information, and the information becomes accurate. This makes it possible to generate event information that is more detailed than the scene information and more accurate in the time information than the stats information.
 具体的には、同期部14は、シーン情報及びスタッツ情報の各々のシーケンスNo.で前後関係を保証しつつ、シーン情報とスタッツ情報とで共通する項目の情報が一致するシーン情報に、スタッツ情報を対応付けることにより、イベント情報を生成する。共通する項目は、例えば、「シーン」と「イベント内容」や、「イベント前BSO」等である。 Specifically, the synchronization unit 14 has a sequence number of each of the scene information and the stats information. Event information is generated by associating the stats information with the scene information in which the information of the items common to the scene information and the stats information match, while guaranteeing the context. Common items are, for example, "scene" and "event content", "pre-event BSO", and the like.
 同期部14は、生成したイベント情報をイベントDB23に記憶する。図5に、イベントDB23の一例を示す。図5の例では、各行(各レコード)が1つのイベント情報に相当する。図5に示すように、イベント情報は、シーン情報の項目とスタッツ情報の項目とを統合した項目を有する。また、同期部14は、対応するスタッツ情報が存在しないシーン情報については、例えば、シーケンスNo.順で直前のスタッツ情報と同じ情報を対応付けてイベント情報を生成する。 The synchronization unit 14 stores the generated event information in the event DB 23. FIG. 5 shows an example of the event DB 23. In the example of FIG. 5, each row (each record) corresponds to one event information. As shown in FIG. 5, the event information has an item in which the item of the scene information and the item of the stats information are integrated. Further, the synchronization unit 14 may refer to, for example, sequence No. 14 for scene information in which the corresponding stats information does not exist. Event information is generated by associating the same information as the immediately preceding stats information in order.
 なお、シーン情報及びイベント情報の各々は、開示の技術の事象情報の一例であり、スタッツ情報は、開示の技術の外部情報の一例である。 Note that each of the scene information and the event information is an example of the event information of the disclosed technology, and the stats information is an example of the external information of the disclosed technology.
 生成部15は、イベントDB23に記憶されたイベント情報に基づいて、各イベント情報が示すイベントに関する実況又は解説となる文(以下、「実況解説文」という)を生成する。具体的には、生成部15は、雛形DB24に記憶された複数の実況解説文の雛形から、各イベント情報に応じた雛形を選択し、選択した雛形と、イベント情報とを組み合わせて実況解説文を生成する。 Based on the event information stored in the event DB 23, the generation unit 15 generates a sentence (hereinafter, referred to as “actual commentary”) that is a commentary or a commentary regarding the event indicated by each event information. Specifically, the generation unit 15 selects a template corresponding to each event information from a plurality of live commentary templates stored in the template DB 24, and combines the selected template with the event information to provide a live commentary. To generate.
 図6に、雛形DB24の一例を示す。図6の例では、各行(各レコード)が1つの雛形に関する情報であり、複数の雛形を並べることにより、1つのイベント情報に対応する雛形グループを形成している。同一の雛形グループに含まれる雛形には、識別情報として同一の「雛形グループID」が付与され、出力する順番に「雛形グループ内シーケンスNo.」が付与されている。また、雛形DB24には、各雛形が、話者として実況者を想定したものか、解説者を想定したものかを示す「話者種類」の情報が含まれる。 FIG. 6 shows an example of the template DB 24. In the example of FIG. 6, each row (each record) is information about one template, and by arranging a plurality of templates, a template group corresponding to one event information is formed. The same "template group ID" is assigned as identification information to the templates included in the same template group, and "sequence No. in the template group" is assigned in the order of output. Further, the template DB 24 includes information on a "speaker type" indicating whether each template is intended to be a commentator or a commentator as a speaker.
 「雛形」は、実況又は解説となる文の一部にパラメータを挿入する形式である。図6の例では、< >の部分がパラメータを挿入する部分であり、各雛形内での出現順に、< >内に1、2、・・・の番号が付与されている。< >に挿入するパラメータは、「パラメータ種類」として、イベント情報の項目名が規定されている。なお、図6において、パラメータ種類として規定されている「イベント前(又は後)B(又はS)」は、BSOの該当のカウントのみを表す。また、雛形DB24には、各雛形に基づいて生成された実況解説文を音声データにした場合の再生時間(以下、「音声時間」という)が含まれる。 The "template" is a format in which parameters are inserted in a part of a sentence that is a commentary or a commentary. In the example of FIG. 6, the part of <> is the part where the parameter is inserted, and the numbers 1, 2, ... Are assigned in <> in the order of appearance in each template. For the parameter to be inserted in <>, the item name of the event information is specified as the "parameter type". In FIG. 6, "before (or after) B (or S)" defined as the parameter type represents only the corresponding count of BSO. Further, the template DB 24 includes a reproduction time (hereinafter, referred to as “voice time”) when the live commentary generated based on each template is used as voice data.
 なお、実況解説文付きコンテンツを、様々な言語に対応可能なコンテンツとするために、雛形DB24に、複数の言語の各々に対応した「雛形」、「音声時間」、及び「パラメータ種類」の項目を記憶しておいてもよい。 In addition, in order to make the content with live commentary text compatible with various languages, the template DB 24 has items of "template", "voice time", and "parameter type" corresponding to each of a plurality of languages. You may remember.
 以下、より具体的に実況解説文の生成について説明する。生成部15は、イベント情報に適合する雛形グループを選択するための選択モデルを用いて、イベント情報に対応する雛形グループを選択する。選択モデルは、イベント情報と、そのイベント情報が示すイベントに対して最適な雛形グループとの対応付けを予め機械学習したモデルであり、対象のイベント情報と、雛形グループの各々との適合度を出力するモデルである。 Below, we will explain more specifically the generation of live commentary. The generation unit 15 selects the template group corresponding to the event information by using the selection model for selecting the template group suitable for the event information. The selection model is a model in which the correspondence between the event information and the optimum template group for the event indicated by the event information is machine-learned in advance, and the degree of conformity between the target event information and each of the template groups is output. It is a model to do.
 例えば、生成部15は、図7に示すように、選択モデルから出力された適合度が高い順に所定個の雛形グループについて、その雛形グループに含まれる各雛形の音声時間を合計した「音声時間」を算出する。そして、生成部15は、イベント情報の「開始時刻」から「終了時刻」までの時間(以下、「イベント時間」という)よりも音声時間が短い雛形グループの中で、適合度が最も高い雛形グループを選択する。例えば、イベント時間が20秒のあるイベント情報について、図7に示すような各雛形グループとの適合度が得られた場合、生成部15は、雛形グループIDが3の雛形グループを選択する。 For example, as shown in FIG. 7, the generation unit 15 has a “voice time” that is the sum of the voice times of each template included in the predetermined template groups in descending order of the goodness of fit output from the selection model. Is calculated. Then, the generation unit 15 has the highest degree of conformity among the template groups whose voice time is shorter than the time from the "start time" to the "end time" of the event information (hereinafter referred to as "event time"). Select. For example, when the goodness of fit with each template group as shown in FIG. 7 is obtained for the event information having an event time of 20 seconds, the generation unit 15 selects the template group whose template group ID is 3.
 生成部15は、選択した雛形グループに含まれる各雛形の< >部分に、「パラメータ種類」で示されるイベント情報の項目の値を挿入し、実況解説文を生成する。例えば、図5に示すイベントDB23のシーケンスNo.5のイベント情報について、図6に示す雛形グループIDが1の雛形グループが選択されたとする。このうち、雛形グループ内シーケンスNo.6の雛形を例に挙げる。生成部15は、雛形の<1>に、イベント結果である「ファウル」を挿入し、<2>に、イベント後Bである「1」を挿入し、<3>に、イベント後Sである「1」を挿入する。これにより、生成部15は、「ファウル。これで1ボール1ストライク」という実況解説文を生成する。生成部15は、生成した実況解説文を合成部16へ受け渡す。 The generation unit 15 inserts the value of the event information item indicated by the "parameter type" into the <> part of each template included in the selected template group, and generates a live commentary. For example, the sequence No. of the event DB 23 shown in FIG. Regarding the event information of 5, it is assumed that the template group whose template group ID is 1 as shown in FIG. 6 is selected. Of these, the sequence No. in the template group. Take the template of 6 as an example. The generation unit 15 inserts the event result "foul" in <1> of the template, inserts "1" which is B after the event in <2>, and S after the event in <3>. Insert "1". As a result, the generation unit 15 generates a commentary sentence "Foul. This is one ball and one strike." The generation unit 15 passes the generated commentary to the synthesis unit 16.
 合成部16は、映像取得部11から受け渡された映像の球場音声及び動画像の少なくとも一方の、各イベントに対応する区間毎の出力タイミングに基づいて、生成部15から受け渡された実況解説文の出力タイミングを調整する。そして、合成部16は、球場音声及び動画像の少なくとも一方と共に、出力タイミングが調整された実況解説文が出力されるように、実況解説付きコンテンツを生成し、出力する。 The synthesizing unit 16 is a commentary on the actual situation delivered from the generation unit 15 based on the output timing of at least one of the stadium audio and the moving image of the video delivered from the video acquisition unit 11 for each section corresponding to each event. Adjust the output timing of the statement. Then, the synthesis unit 16 generates and outputs the content with the live commentary so that the live commentary sentence whose output timing is adjusted is output together with at least one of the stadium sound and the moving image.
 具体的には、合成部16は、図8に示すように、実況解説付きコンテンツとして、球場音声と実況解説文を示す音声データとを合成したコンテンツを生成する(図8のA参照)。このコンテンツは、ラジオ放送等に適用可能である。また、合成部16は、実況解説付きコンテンツとして、元の映像(球場音声あり)又は動画像(球場音声なし)と、実況解説文を示す音声データとを合成したコンテンツを生成する。このコンテンツは、テレビ放送や、インターネット動画配信等に適用可能である。また、合成部16は、実況解説付きコンテンツとして、元の映像(球場音声あり)又は動画像(球場音声なし)と、実況解説文を示すテキストを可視化した画像データ(字幕)とを合成したコンテンツを生成する(図8のB参照)。このコンテンツも、テレビ放送や、インターネット動画配信等に適用可能である。 Specifically, as shown in FIG. 8, the synthesizing unit 16 generates a content obtained by synthesizing the stadium audio and the audio data indicating the commentary text as the content with the commentary (see A in FIG. 8). This content is applicable to radio broadcasting and the like. Further, the synthesizing unit 16 generates a content obtained by synthesizing the original video (with stadium sound) or moving image (without stadium sound) and the audio data showing the live commentary as the content with the live commentary. This content can be applied to television broadcasting, Internet video distribution, and the like. Further, the synthesizing unit 16 synthesizes the original video (with stadium sound) or moving image (without stadium sound) and the image data (subtitle) that visualizes the text showing the live commentary as the content with the live commentary. (See B in FIG. 8). This content can also be applied to television broadcasting, Internet video distribution, and the like.
 合成部16は、球場音声及び動画像の少なくとも一方と実況解説文とを合成する際、動画像の時刻情報又は動画像の時刻情報と同期する球場音声の時刻情報と、実況解説文に対応するイベント情報の時刻情報とを同期させる。イベント情報の時刻情報は動画像の時刻情報と一致しているため、容易に両者の同期を取ることができる。 When synthesizing at least one of the stadium sound and the moving image with the live commentary, the synthesizing unit 16 corresponds to the time information of the moving image or the time information of the stadium sound synchronized with the time information of the moving image and the live commentary. Synchronize with the time information of the event information. Since the time information of the event information matches the time information of the moving image, it is possible to easily synchronize the two.
 情報出力装置10は、例えば図9に示すコンピュータ40で実現することができる。コンピュータ40は、CPU(Central Processing Unit)41と、一時記憶領域としてのメモリ42と、不揮発性の記憶部43とを備える。また、コンピュータ40は、入力部、表示部等の入出力装置44と、記憶媒体49に対するデータの読み込み及び書き込みを制御するR/W(Read/Write)部45とを備える。また、コンピュータ40は、インターネット等のネットワークに接続される通信I/F(Interface)46を備える。CPU41、メモリ42、記憶部43、入出力装置44、R/W部45、及び通信I/F46は、バス47を介して互いに接続される。 The information output device 10 can be realized by, for example, the computer 40 shown in FIG. The computer 40 includes a CPU (Central Processing Unit) 41, a memory 42 as a temporary storage area, and a non-volatile storage unit 43. Further, the computer 40 includes an input / output device 44 such as an input unit and a display unit, and an R / W (Read / Write) unit 45 that controls reading and writing of data to the storage medium 49. Further, the computer 40 includes a communication I / F (Interface) 46 connected to a network such as the Internet. The CPU 41, the memory 42, the storage unit 43, the input / output device 44, the R / W unit 45, and the communication I / F 46 are connected to each other via the bus 47.
 記憶部43は、HDD(Hard Disk Drive)、SSD(Solid State Drive)、フラッシュメモリ等によって実現できる。記憶媒体としての記憶部43には、コンピュータ40を、情報出力装置10として機能させるための情報出力プログラム50が記憶される。情報出力プログラム50は、映像取得プロセス51と、解析プロセス52と、スタッツ取得プロセス53と、同期プロセス54と、生成プロセス55と、合成プロセス56とを有する。また、記憶部43は、シーン情報DB21、スタッツDB22、イベントDB23、及び雛形DB24の各々を構成する情報が記憶される情報記憶領域60を有する。 The storage unit 43 can be realized by an HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory, or the like. The storage unit 43 as a storage medium stores an information output program 50 for causing the computer 40 to function as the information output device 10. The information output program 50 includes a video acquisition process 51, an analysis process 52, a stats acquisition process 53, a synchronization process 54, a generation process 55, and a synthesis process 56. Further, the storage unit 43 has an information storage area 60 in which information constituting each of the scene information DB 21, the stats DB 22, the event DB 23, and the template DB 24 is stored.
 CPU41は、情報出力プログラム50を記憶部43から読み出してメモリ42に展開し、情報出力プログラム50が有するプロセスを順次実行する。CPU41は、映像取得プロセス51を実行することで、図2に示す映像取得部11として動作する。また、CPU41は、解析プロセス52を実行することで、図2に示す解析部12として動作する。また、CPU41は、スタッツ取得プロセス53を実行することで、図2に示すスタッツ取得部13として動作する。また、CPU41は、同期プロセス54を実行することで、図2に示す同期部14として動作する。また、CPU41は、生成プロセス55を実行することで、図2に示す生成部15として動作する。また、CPU41は、合成プロセス56を実行することで、図2に示す合成部16として動作する。また、CPU41は、情報記憶領域60から情報を読み出して、シーン情報DB21、スタッツDB22、イベントDB23、及び雛形DB24の各々をメモリ42に展開する。これにより、情報出力プログラム50を実行したコンピュータ40が、情報出力装置10として機能することになる。なお、プログラムを実行するCPU41はハードウェアである。 The CPU 41 reads the information output program 50 from the storage unit 43, expands it into the memory 42, and sequentially executes the processes included in the information output program 50. The CPU 41 operates as the video acquisition unit 11 shown in FIG. 2 by executing the video acquisition process 51. Further, the CPU 41 operates as the analysis unit 12 shown in FIG. 2 by executing the analysis process 52. Further, the CPU 41 operates as the stats acquisition unit 13 shown in FIG. 2 by executing the stats acquisition process 53. Further, the CPU 41 operates as the synchronization unit 14 shown in FIG. 2 by executing the synchronization process 54. Further, the CPU 41 operates as the generation unit 15 shown in FIG. 2 by executing the generation process 55. Further, the CPU 41 operates as the synthesis unit 16 shown in FIG. 2 by executing the synthesis process 56. Further, the CPU 41 reads information from the information storage area 60, and expands each of the scene information DB 21, the stats DB 22, the event DB 23, and the template DB 24 into the memory 42. As a result, the computer 40 that has executed the information output program 50 functions as the information output device 10. The CPU 41 that executes the program is hardware.
 なお、情報出力プログラム50により実現される機能は、例えば半導体集積回路、より詳しくはASIC(Application Specific Integrated Circuit)等で実現することも可能である。 The function realized by the information output program 50 can also be realized by, for example, a semiconductor integrated circuit, more specifically, an ASIC (Application Specific Integrated Circuit) or the like.
 次に、本実施形態に係る情報出力システム100の作用について説明する。映像配信システム32が、球場で行われる野球の試合をカメラにより撮影し、撮影された映像の出力を開始する。そして、スタッツ入力システム34が、映像配信システム32から出力された映像を取得し、その映像を見ながら、担当者が試合に関するスタッツ情報を入力する。また、情報出力装置10において、図10に示す情報出力処理が実行される。なお、情報出力処理は、開示の技術の情報出力方法の一例である。 Next, the operation of the information output system 100 according to the present embodiment will be described. The video distribution system 32 shoots a baseball game held at the stadium with a camera and starts outputting the shot video. Then, the stats input system 34 acquires the video output from the video distribution system 32, and the person in charge inputs the stats information regarding the match while watching the video. Further, in the information output device 10, the information output process shown in FIG. 10 is executed. The information output process is an example of the information output method of the disclosed technology.
 ステップS12で、 映像取得部11が、映像配信システム32から出力された所定時間分の映像を取得し、映像を球場音声と動画像とに分割する。そして、映像取得部11は、分割した動画像を解析部12へ受け渡すと共に、取得した映像を合成部16へ受け渡す。 In step S12, the video acquisition unit 11 acquires the video for a predetermined time output from the video distribution system 32, and divides the video into a stadium audio and a moving image. Then, the video acquisition unit 11 passes the divided moving image to the analysis unit 12, and also delivers the acquired video to the synthesis unit 16.
 次に、ステップS14で、解析部12が、映像取得部11から受け渡された動画像を画像解析することにより、動画像において、各イベントに対応する区間毎にシーン情報を取得する。そして、解析部12は、区間毎に取得した情報と、その区間の開始フレームに対応付けられた時刻情報、及び終了フレームに対応付けられた時刻情報とを対応付けて、シーン情報DB21に記憶する。 Next, in step S14, the analysis unit 12 acquires scene information for each section corresponding to each event in the moving image by performing image analysis on the moving image delivered from the video acquisition unit 11. Then, the analysis unit 12 stores the information acquired for each section in the scene information DB 21 in association with the time information associated with the start frame of the section and the time information associated with the end frame. ..
 次に、ステップS16で、スタッツ取得部13が、スタッツ入力システム34において入力されたイベント毎のスタッツ情報を取得し、スタッツDB22に記憶する。 Next, in step S16, the stats acquisition unit 13 acquires the stats information for each event input in the stats input system 34 and stores it in the stats DB 22.
 次に、ステップS18で、同期部14が、シーン情報及びスタッツ情報の各々のシーケンスNo.で前後関係を保証しつつ、シーン情報とスタッツ情報とで共通する項目の情報が一致するシーン情報に、スタッツ情報を対応付ける。これにより、同期部14は、イベント情報を生成する。同期部14は、生成したイベント情報をイベントDB23に記憶する。 Next, in step S18, the synchronization unit 14 sets the sequence No. of each of the scene information and the stats information. While guaranteeing the context, the stats information is associated with the scene information in which the information of the items common to the scene information and the stats information match. As a result, the synchronization unit 14 generates event information. The synchronization unit 14 stores the generated event information in the event DB 23.
 次に、ステップS20で、生成部15が、雛形DB24に記憶された複数の実況解説文の雛形から、各イベント情報に応じた雛形を選択し、選択した雛形と、イベント情報とを組み合わせて実況解説文を生成する。生成部15は、生成した実況解説文を合成部16へ受け渡す。 Next, in step S20, the generation unit 15 selects a template corresponding to each event information from the templates of the plurality of live commentary sentences stored in the template DB 24, and combines the selected template with the event information to perform the live commentary. Generate a commentary. The generation unit 15 passes the generated commentary to the synthesis unit 16.
 次に、ステップS22で、合成部16が、球場音声及び動画像の少なくとも一方と、実況解説文を示す音声データ又は画像データ(字幕)とを合成する。この際、合成部16は、動画像の時刻情報又は動画像の時刻情報と同期する球場音声の時刻情報と、実況解説文に対応するイベント情報の時刻情報とを同期させる。これにより、合成部16は、球場音声及び動画像の少なくとも一方と、実況解説文の出力タイミングとが同期した実況解説付きコンテンツを生成し、出力する。そして、情報出力処理は終了する。 Next, in step S22, the synthesizing unit 16 synthesizes at least one of the stadium audio and the moving image with the audio data or the image data (subtitles) indicating the live commentary. At this time, the synthesis unit 16 synchronizes the time information of the stadium sound synchronized with the time information of the moving image or the time information of the moving image with the time information of the event information corresponding to the live commentary. As a result, the synthesizing unit 16 generates and outputs content with a live commentary in which at least one of the stadium audio and the moving image and the output timing of the live commentary are synchronized. Then, the information output process ends.
 合成部16から出力された実況解説付きコンテンツは、ユーザ端末36に配信され、ユーザ端末36を利用するユーザにより、実況解説付きコンテンツが視聴される。 The content with live commentary output from the synthesis unit 16 is distributed to the user terminal 36, and the user who uses the user terminal 36 can view the content with live commentary.
 以上説明したように、本実施形態に係る情報出力システムによれば、情報出力装置が、 球場音声と動画像とを含む、野球の試合を撮影した映像を取得し、動画像において、各イベントに対応する区間毎のシーン情報を取得する。また、情報出力装置は、映像に基づいて外部で入力されたスタッツ情報を取得する。情報出力装置は、スタッツ情報をシーン情報に対応付けることにより、映像に対する正確な時刻情報と、イベントに関する詳細な情報とを持つイベント情報を生成する。そして、情報出力装置は、各イベント情報と雛形とに基づいて、各イベントに関する実況解説文を生成する。さらに、情報出力装置は、球場音声及び動画像の少なくとも一方の区間毎の出力タイミングに基づいて、生成した実況解説文の出力タイミングを調整して、球場音声及び動画像の少なくとも一方と共に出力する。これにより、事前に用意する原稿、並びに実況及び解説者を要することなく、スポーツの試合等のライブ配信コンテンツにおいて、実況及び解説を示す情報を適切なタイミングで出力することができる。結果として、実況者及び解説者が不要となるため、人件費、配信設備等のコストを削減することができる。また、実況者及び解説者の力量の差による実況解説文の偏りを解消することができる。 As described above, according to the information output system according to the present embodiment, the information output device acquires a video of a baseball game including a stadium sound and a moving image, and in the moving image, it is used for each event. Acquire the scene information for each corresponding section. Further, the information output device acquires stats information input externally based on the video. The information output device associates the stats information with the scene information to generate event information having accurate time information for the video and detailed information about the event. Then, the information output device generates a live commentary for each event based on each event information and a template. Further, the information output device adjusts the output timing of the generated live commentary based on the output timing of at least one section of the stadium audio and the moving image, and outputs the output together with at least one of the stadium audio and the moving image. As a result, it is possible to output information showing the actual condition and the explanation at an appropriate timing in the live distribution content such as a sports match without requiring a manuscript prepared in advance and the actual condition and the commentator. As a result, since there is no need for a commentator and a commentator, labor costs, distribution equipment, and other costs can be reduced. In addition, it is possible to eliminate the bias of the commentary text due to the difference in ability between the commentator and the commentator.
 なお、上記実施形態において、イベント情報に応じた雛形を選択する際、配信される実況解説付きコンテンツを視聴するユーザの属性情報に応じた雛形を選択するようにしてもよい。具体的には、いずれか一方のチームに偏った実況解説文が生成される雛形、他方のチームに偏った実況解説文が生成される雛形、及び中立の立場の雛形を用意しておく。また、ユーザの属性情報として、ユーザのひいきのチームの情報を取得する。情報の取得は、事前に登録された情報を取得してもよいし、配信開始前、又は配信中にユーザにより入力された情報を取得してもよい。また、ユーザの過去の視聴履歴に基づいて推定してもよい。そして、雛形を選択する際、ユーザのひいきのチームに偏った実況解説文が生成される雛形を選択する。これにより、ユーザの好みに応じて、柔軟に実況解説文を変更することができ、配信コンテンツの多様化が図れる。なお、ユーザの属性は、ひいきのチームに限らず、性別、年代、ルールに対する習熟度等であってもよい。 In the above embodiment, when selecting a template according to the event information, the template according to the attribute information of the user who views the delivered content with live commentary may be selected. Specifically, a template for generating a commentary that is biased toward one of the teams, a template for generating a commentary that is biased toward the other team, and a template for a neutral position are prepared. In addition, the information of the user's favorite team is acquired as the user's attribute information. Information may be acquired in advance, or information input by the user may be acquired before the start of distribution or during distribution. It may also be estimated based on the user's past viewing history. Then, when selecting a template, select a template that generates a commentary that is biased toward the user's favorite team. As a result, the live commentary can be flexibly changed according to the user's preference, and the distributed content can be diversified. The attributes of the user are not limited to the favorite team, but may be gender, age, proficiency level for rules, and the like.
 また、上記実施形態では、主に実況文を生成する場合を説明しているが、解説文についても同様である。具体的には、イベント情報を用いた解説に関する雛形、及び、イベント情報と、そのイベント情報に対する適切な解説に関する雛形とを対応付けた選択モデルを用意しておくことで、イベント情報に応じた解説文を生成することができる。なお、実況者及び解説者の2人の話者を想定する場合において、実況解説文を音声データに変換する場合、話者に応じて声音を異ならせるようにしてもよい。 Further, in the above embodiment, the case where the live comment is mainly generated is explained, but the same applies to the commentary. Specifically, by preparing a template for commentary using event information and a selection model that associates the event information with the template for appropriate commentary for the event information, the commentary according to the event information is prepared. Can generate statements. In the case where two speakers, a commentator and a commentator, are assumed, when the commentary sentence is converted into voice data, the voice sound may be different depending on the speaker.
 また、上記実施形態では、シーン情報にスタッツ情報を対応付けてイベント情報を生成する場合について説明したが、シーン情報のみをイベント情報としてもよい。また、画像解析の精度を上げて、シーン情報として、スタッツ情報と同等に詳細な情報を取得するようにしてもよい。さらに、スタッツ情報以外の外部情報を取得して、イベント情報に含めてもよい。例えば、過去の対戦成績、投手の球種、打者のコース別の打率等の情報を用意しておく。そして、各イベント時のチーム、投手、及び打者に関連する、これらの情報をイベント情報に含めることができる。これにより、例えば、「これまでのところ、AA投手に対するBB選手の打率は3割」、「この打者は外角高めに強いです」等の実況解説文を生成することが可能となる。 Further, in the above embodiment, the case where the event information is generated by associating the stats information with the scene information has been described, but only the scene information may be used as the event information. Further, the accuracy of the image analysis may be improved so that detailed information equivalent to the stats information may be acquired as the scene information. Further, external information other than stats information may be acquired and included in the event information. For example, prepare information such as past competition results, pitcher's ball type, and batting average for each batter's course. Then, such information related to the team, pitcher, and batter at each event can be included in the event information. This makes it possible to generate commentary sentences such as "So far, the batting average of BB players against AA pitchers is 30%" and "This batter is strong in raising the outside angle".
 さらに、外部情報として、チームや選手に取材した情報やインターネットから収集した情報等を用意しておき、各イベント時のチーム、投手、及び打者に関連する、これらの情報をイベント情報に含めてもよい。そして、例えば、ベンチや客席が映っているシーンに対応するイベント情報に対して、これらの情報を利用する雛形が選択されるようにする。また、これらの情報を、イベント間で時間が空いている部分に対する実況解説文の生成に用いてもよい。これにより、「AA選手、昨日の死球による影響はないそうです」等の実況解説文を生成することが可能となる。 Furthermore, as external information, information collected from teams and players, information collected from the Internet, etc. can be prepared, and such information related to teams, pitchers, and batters at each event can be included in the event information. good. Then, for example, for event information corresponding to a scene in which a bench or an audience seat is shown, a template that uses such information is selected. In addition, this information may be used to generate a live commentary for a portion where time is available between events. This makes it possible to generate a commentary such as "AA player, yesterday's hit by pitch does not seem to have any effect."
 また、上記実施形態では、選択モデルを用いて、イベント情報に対応する雛形を選択する場合について説明したが、これに限定されず、ルールベースで選択するようにしてもよい。例えば、イベント結果がアウトを含む場合は雛形A、イベント前BSOが3-2-2かつランナー有りの場合には雛形B等のように、予め定めたルールで雛形を選択してもよい。 Further, in the above embodiment, the case where the template corresponding to the event information is selected by using the selection model has been described, but the present invention is not limited to this, and the selection may be made on a rule basis. For example, if the event result includes out, the template A may be selected, and if the pre-event BSO is 3-2-2 and there is a runner, the template B may be selected according to a predetermined rule.
 また、上記実施形態では、雛形グループを選択する際、イベント時間より短い音声時間の雛形グループを選択する場合について説明したが、これに限定されない。実況解説付きコンテンツとして、実況解説文を示す字幕付きの動画像又は映像を生成する場合は、音声時間を考慮しなくてもよい。また、実況解説文を示す音声データを出力する場合でも、音声時間を考慮することなく、適合度が最も高い雛形グループを選択するようにしてもよい。この場合、選択した雛形グループの音声時間がイベント時間より長い場合には、実況解説文を示す音声データの再生速度を速めるように加工したり、雛形グループに含まれる一部の雛形を削除したりすればよい。 Further, in the above embodiment, when selecting a template group, a case where a template group having a voice time shorter than the event time is selected has been described, but the present invention is not limited to this. When generating a moving image or video with subtitles showing a live commentary as the content with live commentary, it is not necessary to consider the audio time. Further, even when the audio data showing the live commentary is output, the template group having the highest degree of conformity may be selected without considering the audio time. In this case, if the audio time of the selected template group is longer than the event time, it may be processed to speed up the playback speed of the audio data showing the live commentary, or some templates included in the template group may be deleted. do it.
 上記実施形態では、スポーツの映像として、野球の試合の例について説明したが、開示の技術の適用例はこれに限定されず、例えば、サッカーやバスケットボール等にも適用することができる。例えば、サッカーの場合、動画像を解析することにより、各フレームに対応する時刻情報と共に、選手の走行速度、現在のスコア(テロップの情報)、パスの距離、全選手のポジショニングの偏り、攻撃方向をシーン情報として取得する。また、スタッツ情報として、選手名、ポジショニング、スライディングやパスなどのプレー内容、プレーの成果等を取得する。また、例えば、バスケットボールの場合、選手名、選手の走行速度、ポジショニング、シュートの成果、シュートが入った場合の得点数、スチール、ノールックパス、リバウント等のプレーの内容等をシーン情報として取得する。また、スタッツ情報として、チーム名、選手名、スコア等を取得する。そして、いずれの場合も、上記実施形態と同様に、シーン情報にスタッツ情報を対応付けることによりイベント情報を生成し、イベント情報に応じた雛形を選択して、実況解説文を生成すればよい。 In the above embodiment, an example of a baseball game has been described as a sports image, but the application example of the disclosed technology is not limited to this, and can be applied to, for example, soccer, basketball, and the like. For example, in the case of soccer, by analyzing the moving image, along with the time information corresponding to each frame, the running speed of the player, the current score (telop information), the distance of the pass, the bias of the positioning of all the players, and the attack direction. Is acquired as scene information. In addition, as stats information, player names, positioning, play contents such as sliding and passes, play results, etc. are acquired. Further, for example, in the case of basketball, the player name, the running speed of the player, the positioning, the result of the shot, the number of points scored when the shot is made, the content of the play such as steel, no-look pass, rebound, etc. are acquired as scene information. Also, as stats information, the team name, player name, score, etc. are acquired. Then, in either case, as in the above embodiment, the event information may be generated by associating the stats information with the scene information, the template corresponding to the event information may be selected, and the live commentary may be generated.
 また、上記実施形態では、情報出力プログラムが記憶部に予め記憶(インストール)されている態様を説明したが、これに限定されない。開示の技術に係るプログラムは、CD-ROM、DVD-ROM、USBメモリ等の記憶媒体に記憶された形態で提供することも可能である。 Further, in the above embodiment, the embodiment in which the information output program is stored (installed) in the storage unit in advance has been described, but the present invention is not limited to this. The program according to the disclosed technology can also be provided in a form stored in a storage medium such as a CD-ROM, a DVD-ROM, or a USB memory.
10   情報出力装置
11   映像取得部
12   解析部
13   スタッツ取得部
14   同期部
15   生成部
16   合成部
21   シーン情報DB
22   スタッツDB
23   イベントDB
24   雛形DB
32   映像配信システム
34   スタッツ入力システム
36   ユーザ端末
40   コンピュータ
41   CPU
42   メモリ
43   記憶部
49   記憶媒体
50   情報出力プログラム
100 情報出力システム
10 Information output device 11 Video acquisition unit 12 Analysis unit 13 Stats acquisition unit 14 Synchronization unit 15 Generation unit 16 Synthesis unit 21 Scene information DB
22 Stats DB
23 Event DB
24 Template DB
32 Video distribution system 34 Stats input system 36 User terminal 40 Computer 41 CPU
42 Memory 43 Storage unit 49 Storage medium 50 Information output program 100 Information output system

Claims (20)

  1.  音情報と動画像とを含むスポーツの映像と、前記動画像の各区間が示す事象に関する事象情報とを取得し、
     取得した前記事象情報に基づいて、前記区間毎に前記事象に関する実況解説文を生成し、
     前記音情報及び前記動画像の少なくとも一方の区間毎の出力タイミングに基づいて、生成した前記実況解説文の出力タイミングを調整して、前記音情報及び前記動画像の少なくとも一方と共に出力する
     ことを含む処理をコンピュータに実行させるための情報出力プログラム。
    The sports video including the sound information and the moving image and the event information about the event indicated by each section of the moving image are acquired.
    Based on the acquired event information, a commentary on the event is generated for each section.
    This includes adjusting the output timing of the generated live commentary based on the output timing of at least one section of the sound information and the moving image, and outputting the sound information and the moving image together with at least one of the moving images. An information output program that causes a computer to perform processing.
  2.  前記音情報と、前記実況解説文を示す音声データとを合成して出力する請求項1に記載の情報出力プログラム。 The information output program according to claim 1, wherein the sound information and the voice data indicating the live commentary are combined and output.
  3.  前記動画像又は前記映像と、前記実況解説文を示す音声データ、又は前記実況解説文を示すテキストを可視化した画像データとを合成して出力する請求項1に記載の情報出力プログラム。 The information output program according to claim 1, wherein the moving image or the video and the audio data indicating the live commentary or the image data in which the text showing the live commentary is visualized are combined and output.
  4.  前記事象情報を、前記動画像を区間毎に画像解析することにより取得する請求項1~請求項3のいずれか1項に記載の情報出力プログラム。 The information output program according to any one of claims 1 to 3, wherein the event information is acquired by image analysis of the moving image for each section.
  5.  前記事象情報を、外部から入力された外部情報から取得する請求項1~請求項4のいずれか1項に記載の情報出力プログラム。 The information output program according to any one of claims 1 to 4, wherein the event information is acquired from external information input from the outside.
  6.  前記外部情報の順番に基づいて、前記外部情報の各々を前記区間の各々に対応付けた情報を、前記事象情報として取得する請求項5に記載の情報出力プログラム。 The information output program according to claim 5, wherein the information in which each of the external information is associated with each of the sections is acquired as the event information based on the order of the external information.
  7.  予め定めた複数の実況解説文の雛形から、取得した前記事象情報に応じた雛形を選択し、選択した雛形と、取得した前記事象情報とを組み合わせて前記実況解説文を生成する請求項1~請求項6のいずれか1項に記載の情報出力プログラム。 A claim that selects a template corresponding to the acquired event information from a plurality of predetermined live commentary templates, and combines the selected template with the acquired event information to generate the live commentary. The information output program according to any one of claims 1 to 6.
  8.  出力される前記音情報又は前記動画像の視聴者の属性情報を取得し、
     前記視聴者の属性情報に基づいて、前記雛形を選択する
     請求項7に記載の情報出力プログラム。
    Acquires the output information of the sound or the attribute information of the viewer of the moving image,
    The information output program according to claim 7, wherein the template is selected based on the attribute information of the viewer.
  9.  生成した前記実況解説文の長さに応じて、前記雛形を選択する請求項7又は請求項8に記載の情報出力プログラム。 The information output program according to claim 7 or 8, which selects the template according to the length of the generated commentary.
  10.  音情報と動画像とを含むスポーツの映像と、前記動画像の各区間が示す事象に関する事象情報とを取得する取得部と、
     前記取得部により取得された前記事象情報に基づいて、前記区間毎に前記事象に関する実況解説文を生成する生成部と、
     前記音情報及び前記動画像の少なくとも一方の区間毎の出力タイミングに基づいて、前記生成部により生成された前記実況解説文の出力タイミングを調整して、前記音情報及び前記動画像の少なくとも一方と共に出力する出力部と、
     を含む情報出力装置。
    An acquisition unit that acquires a sports image including sound information and a moving image, and event information related to an event indicated by each section of the moving image.
    Based on the event information acquired by the acquisition unit, a generation unit that generates a commentary on the event for each section, and a generation unit.
    Based on the output timing of at least one section of the sound information and the moving image, the output timing of the commentary on the live commentary generated by the generation unit is adjusted together with at least one of the sound information and the moving image. The output section to output and
    Information output device including.
  11.  前記出力部は、前記音情報と、前記実況解説文を示す音声データとを合成して出力する請求項10に記載の情報出力装置。 The information output device according to claim 10, wherein the output unit synthesizes and outputs the sound information and the voice data indicating the live commentary.
  12.  前記出力部は、前記動画像又は前記映像と、前記実況解説文を示す音声データ、又は前記実況解説文を示すテキストを可視化した画像データとを合成して出力する請求項10に記載の情報出力装置。 The information output according to claim 10, wherein the output unit synthesizes and outputs the moving image or the video and the audio data indicating the live commentary or the image data in which the text showing the live commentary is visualized. Device.
  13.  前記取得部は、前記事象情報を、前記動画像を区間毎に画像解析することにより取得する請求項10~請求項12のいずれか1項に記載の情報出力装置。 The information output device according to any one of claims 10 to 12, wherein the acquisition unit acquires the event information by image analysis of the moving image for each section.
  14.  前記取得部は、前記事象情報を、外部から入力された外部情報から取得する請求項10~請求項13のいずれか1項に記載の情報出力装置。 The information output device according to any one of claims 10 to 13, wherein the acquisition unit acquires the event information from external information input from the outside.
  15.  前記取得部は、前記外部情報の順番に基づいて、前記外部情報の各々を前記区間の各々に対応付けた情報を、前記事象情報として取得する請求項14に記載の情報出力装置。 The information output device according to claim 14, wherein the acquisition unit acquires information in which each of the external information is associated with each of the sections as event information, based on the order of the external information.
  16.  前記生成部は、予め定めた複数の実況解説文の雛形から、取得した前記事象情報に応じた雛形を選択し、選択した雛形と、取得した前記事象情報とを組み合わせて前記実況解説文を生成する請求項10~請求項15のいずれか1項に記載の情報出力装置。 The generation unit selects a template corresponding to the acquired event information from a plurality of predetermined live commentary templates, and combines the selected template with the acquired event information to obtain the live commentary. The information output device according to any one of claims 10 to 15.
  17.  前記取得部は、出力される前記音情報又は前記動画像の視聴者の属性情報を取得し、
     前記生成部は、前記視聴者の属性情報に基づいて、前記雛形を選択する
     請求項16に記載の情報出力装置。
    The acquisition unit acquires the output sound information or the attribute information of the viewer of the moving image, and obtains the output information.
    The information output device according to claim 16, wherein the generation unit selects the template based on the attribute information of the viewer.
  18.  前記生成部は、生成した前記実況解説文の長さに応じて、前記雛形を選択する請求項16又は請求項17に記載の情報出力装置。 The information output device according to claim 16 or 17, wherein the generation unit selects the template according to the length of the generated commentary.
  19.  音情報と動画像とを含むスポーツの映像と、前記動画像の各区間が示す事象に関する事象情報とを取得し、
     取得した前記事象情報に基づいて、前記区間毎に前記事象に関する実況解説文を生成し、
     前記音情報及び前記動画像の少なくとも一方の区間毎の出力タイミングに基づいて、生成した前記実況解説文の出力タイミングを調整して、前記音情報及び前記動画像の少なくとも一方と共に出力する
     ことを含む処理をコンピュータが実行する情報出力方法。
    The sports video including the sound information and the moving image and the event information about the event indicated by each section of the moving image are acquired.
    Based on the acquired event information, a commentary on the event is generated for each section.
    This includes adjusting the output timing of the generated live commentary based on the output timing of at least one section of the sound information and the moving image, and outputting the sound information and the moving image together with at least one of the moving images. An information output method in which a computer executes processing.
  20.  音情報と動画像とを含むスポーツの映像と、前記動画像の各区間が示す事象に関する事象情報とを取得し、
     取得した前記事象情報に基づいて、前記区間毎に前記事象に関する実況解説文を生成し、
     前記音情報及び前記動画像の少なくとも一方の区間毎の出力タイミングに基づいて、生成した前記実況解説文の出力タイミングを調整して、前記音情報及び前記動画像の少なくとも一方と共に出力する
     ことを含む処理をコンピュータに実行させるための情報出力プログラムを記憶した記憶媒体。
    The sports video including the sound information and the moving image and the event information about the event indicated by each section of the moving image are acquired.
    Based on the acquired event information, a commentary on the event is generated for each section.
    This includes adjusting the output timing of the generated live commentary based on the output timing of at least one section of the sound information and the moving image, and outputting the sound information and the moving image together with at least one of the moving images. A storage medium that stores an information output program for causing a computer to execute processing.
PCT/JP2020/020734 2020-05-26 2020-05-26 Information output program, device, and method WO2021240644A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/JP2020/020734 WO2021240644A1 (en) 2020-05-26 2020-05-26 Information output program, device, and method
JP2022527479A JPWO2021240837A1 (en) 2020-05-26 2020-10-20
PCT/JP2020/039429 WO2021240837A1 (en) 2020-05-26 2020-10-20 Information output program, device, and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/020734 WO2021240644A1 (en) 2020-05-26 2020-05-26 Information output program, device, and method

Publications (1)

Publication Number Publication Date
WO2021240644A1 true WO2021240644A1 (en) 2021-12-02

Family

ID=78723236

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/JP2020/020734 WO2021240644A1 (en) 2020-05-26 2020-05-26 Information output program, device, and method
PCT/JP2020/039429 WO2021240837A1 (en) 2020-05-26 2020-10-20 Information output program, device, and method

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/039429 WO2021240837A1 (en) 2020-05-26 2020-10-20 Information output program, device, and method

Country Status (2)

Country Link
JP (1) JPWO2021240837A1 (en)
WO (2) WO2021240644A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2024011105A (en) * 2022-07-14 2024-01-25 株式会社電通 Live audio real time generation system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005025413A (en) * 2003-06-30 2005-01-27 Nec Corp Content processing device, content processing method, and program
JP2005236541A (en) * 2004-02-18 2005-09-02 Nippon Telegr & Teleph Corp <Ntt> Method, apparatus, and program for supporting associating with baseball video images
JP2007184740A (en) * 2006-01-06 2007-07-19 Nippon Hoso Kyokai <Nhk> Content transmitter and content output device
JP2012039280A (en) * 2010-08-05 2012-02-23 Nippon Hoso Kyokai <Nhk> Explanatory broadcasting text creation support device and explanatory broadcasting text creation support program
JP2012129980A (en) * 2010-11-24 2012-07-05 Jvc Kenwood Corp Chapter creation device, chapter creation method, and chapter creation program
JP2017151864A (en) * 2016-02-26 2017-08-31 国立大学法人東京工業大学 Data creation device
JP2017203827A (en) * 2016-05-10 2017-11-16 日本放送協会 Explanation voice reproduction device and program thereof
WO2018216729A1 (en) * 2017-05-24 2018-11-29 日本放送協会 Audio guidance generation device, audio guidance generation method, and broadcasting system
JP6472912B1 (en) * 2018-02-20 2019-02-20 ヤフー株式会社 Display processing program, display processing apparatus, and display processing method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005025413A (en) * 2003-06-30 2005-01-27 Nec Corp Content processing device, content processing method, and program
JP2005236541A (en) * 2004-02-18 2005-09-02 Nippon Telegr & Teleph Corp <Ntt> Method, apparatus, and program for supporting associating with baseball video images
JP2007184740A (en) * 2006-01-06 2007-07-19 Nippon Hoso Kyokai <Nhk> Content transmitter and content output device
JP2012039280A (en) * 2010-08-05 2012-02-23 Nippon Hoso Kyokai <Nhk> Explanatory broadcasting text creation support device and explanatory broadcasting text creation support program
JP2012129980A (en) * 2010-11-24 2012-07-05 Jvc Kenwood Corp Chapter creation device, chapter creation method, and chapter creation program
JP2017151864A (en) * 2016-02-26 2017-08-31 国立大学法人東京工業大学 Data creation device
JP2017203827A (en) * 2016-05-10 2017-11-16 日本放送協会 Explanation voice reproduction device and program thereof
WO2018216729A1 (en) * 2017-05-24 2018-11-29 日本放送協会 Audio guidance generation device, audio guidance generation method, and broadcasting system
JP6472912B1 (en) * 2018-02-20 2019-02-20 ヤフー株式会社 Display processing program, display processing apparatus, and display processing method

Also Published As

Publication number Publication date
JPWO2021240837A1 (en) 2021-12-02
WO2021240837A1 (en) 2021-12-02

Similar Documents

Publication Publication Date Title
CN107615766B (en) System and method for creating and distributing multimedia content
US10293263B2 (en) Custom content feed based on fantasy sports data
US8121462B2 (en) Video edition device and method
JP5010292B2 (en) Video attribute information output device, video summarization device, program, and video attribute information output method
US8824863B2 (en) Information processing apparatus, information processing method, information processing program, and information processing system
US7988560B1 (en) Providing highlights of players from a fantasy sports team
JP4621758B2 (en) Content information reproducing apparatus, content information reproducing system, and information processing apparatus
TW201416888A (en) Scene clip playback system, method and recording medium thereof
JP2016107001A (en) Extraction program, method, and device
JP3923932B2 (en) Video summarization apparatus, video summarization method and program
US20130222418A1 (en) Providing a Graphic for Video Production
WO2021240644A1 (en) Information output program, device, and method
JP5407708B2 (en) Captured video processing apparatus, control method, and program
WO2022249522A1 (en) Information processing device, information processing method, and information processing system
JPWO2004012100A1 (en) Content summarization apparatus and content summarization program
JP2022067478A (en) Information processing program, device, and method
US20220238140A1 (en) Video tagging device and video tagging method
CN112233647A (en) Information processing apparatus and method, and computer-readable storage medium
JP4323937B2 (en) Video comment generating apparatus and program thereof
JP2016004566A (en) Presentation information control device, method and program
US20230179817A1 (en) Information processing apparatus, video distribution system, information processing method, and recording medium
WO2022163023A1 (en) Content correction device, content delivery server, content correction method, and recording medium
WO2022074788A1 (en) Information processing device, information processing method and program
JP2021087180A (en) Moving image editing device, moving image editing method, and computer program
JP2014053943A (en) Captured video processor, control method of the same and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20937573

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20937573

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP