WO2022230291A1 - 情報処理装置、情報処理方法、プログラム - Google Patents
情報処理装置、情報処理方法、プログラム Download PDFInfo
- Publication number
- WO2022230291A1 WO2022230291A1 PCT/JP2022/004897 JP2022004897W WO2022230291A1 WO 2022230291 A1 WO2022230291 A1 WO 2022230291A1 JP 2022004897 W JP2022004897 W JP 2022004897W WO 2022230291 A1 WO2022230291 A1 WO 2022230291A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- video
- scene
- clip
- processing apparatus
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 74
- 238000003672 processing method Methods 0.000 title claims description 5
- 238000012545 processing Methods 0.000 claims description 152
- 238000000034 method Methods 0.000 claims description 77
- 238000004458 analytical method Methods 0.000 claims description 53
- 238000010191 image analysis Methods 0.000 claims description 39
- 238000003384 imaging method Methods 0.000 claims description 38
- 230000006855 networking Effects 0.000 claims description 5
- 238000000605 extraction Methods 0.000 description 39
- 238000013075 data extraction Methods 0.000 description 31
- 238000004891 communication Methods 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000011161 development Methods 0.000 description 4
- 238000006467 substitution reaction Methods 0.000 description 4
- 235000004522 Pentaglottis sempervirens Nutrition 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000001815 facial effect Effects 0.000 description 3
- 240000004050 Pentaglottis sempervirens Species 0.000 description 2
- 208000027418 Wounds and injury Diseases 0.000 description 2
- 230000006378 damage Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 208000014674 injury Diseases 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000013077 scoring method Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 241001633663 Iris pseudacorus Species 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 230000037308 hair color Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
Definitions
- the present technology relates to the technical field of an information processing device, an information processing method, and a program for generating a digest video.
- Patent Literature 1 discloses a system for generating television content from information posted on a social networking system (SNS) so as to include content of high interest to viewers.
- SNS social networking system
- This technology was created in view of such problems, and aims to provide video content that reflects the interests of viewers.
- An information processing apparatus includes a specifying unit that specifies auxiliary information for generating a digest video based on scene-related information about a scene that occurred at an event.
- An event is, for example, an entertainment such as a sports game or a concert.
- Auxiliary information is, for example, information used to generate a digest video, and is information used to determine which part of the captured video is cut out. For example, in the case of a sports match, information such as the name of the player, the type of scene, the type of play, and the like are specifically used as auxiliary information.
- FIG. 8 is a diagram showing a first processing flow shown together with FIGS. 5 to 7, and this diagram is a flowchart showing an example of processing executed by a posted data extraction unit;
- FIG. 10 is a flowchart showing an example of processing executed by a metadata extraction unit in the first processing flow;
- FIG. 9 is a flowchart showing an example of processing executed by a video analysis unit in the first processing flow;
- FIG. 10 is a flowchart showing an example of processing executed by a video generation unit in the first processing flow;
- FIG. 12 is a diagram showing a second processing flow shown together with FIGS. 9 to 11, and this diagram is a flowchart showing an example of processing executed by a posted data extraction unit;
- FIG. FIG. 11 is a flow chart showing an example of processing executed by a metadata extraction unit in the second processing flow;
- FIG. 11 is a flow chart showing an example of processing executed by a video analysis unit in a second processing flow;
- FIG. 11 is a flow chart showing an example of processing executed by a video generation unit in a second processing flow;
- FIG. FIG. 11 is a flowchart showing an example of clip collection generation processing;
- FIG. 7 is a flow chart showing an example of processing for generating a clip video and a collection of clips; It is an example which shows an example of the score given to the image
- FIG. 10 is a flowchart showing an example of processing for combining clip videos to generate a clip collection for a target scene;
- FIG. 7 is a flow chart showing an example of processing for generating a clip video and a collection of clips;
- FIG. 11 is a flow chart showing another example of clip video and clip collection generation processing;
- FIG. 1 is a block diagram of a computer device;
- the information processing device 1 of the present embodiment is a device that generates a digest video DV of events such as sports games, concerts, and stage performances.
- the generated digest video DV is distributed to viewers.
- the digest video DV is a video that collects important scenes to help you understand the flow of the match. Also, the digest video DV can be read as a highlight video.
- the information processing device 1 includes a posted data extraction unit 2, a metadata extraction unit 3, a video analysis unit 4, and a video generation unit 5.
- the posted data extraction unit 2 performs processing to extract keywords from sentences, hashtags, videos, etc. posted on SNS (Social Networking Services). Therefore, the information processing device 1 is configured to be able to communicate with the SNS server 100 via the communication network NW.
- SNS Social Networking Services
- the keywords extracted by the posted data extraction unit 2 are, for example, the player names of the players participating in the game, their jersey numbers, or the names of the coaches and referees. These pieces of information are information that can identify a person. Player names include not only first names and family names, but also nicknames and the like.
- the keyword extracted by the posted data extraction unit 2 may be scene type information indicating the content of the play. Specifically, it includes type information about scoring scenes such as touchdowns and field goals, and type information about various fouls such as offsides and holdings. Alternatively, information indicating a better play than usual or information indicating an unsuccessful play may be of a type such as a super play or a misstep.
- the information extracted by the posted data extraction unit 2 is information that serves as an index for generating the digest video DV.
- the information posted on the SNS is information used to generate digest video DV that matches the viewer's interests.
- the information extracted by the posted data extraction unit 2 is information about a specific scene in the event, and is referred to as "scene-related information”.
- the metadata extraction unit 3 performs processing for extracting metadata containing information representing the development of the game.
- Metadata may be, for example, information distributed independently by the company that manages the game, or a scorer who records various information such as the development of the game while watching the game. It may be information input by a company, or data distributed by a company that handles information about sports. Alternatively, it may be information about the development of the game uploaded on the web.
- Metadata is information associated with information such as changes in score associated with the occurrence of .
- the metadata may be distributed each time a specific scene occurs in the game, or may be distributed collectively after the game ends.
- the metadata extraction unit 3 is information about a specific scene in the event, and this information is also "scene-related information”.
- the information processing device 1 is configured to allow mutual communication with the metadata server 200 via the communication network NW so that the metadata extraction unit 3 can execute metadata extraction processing.
- the image analysis unit 4 performs processing for receiving images from a plurality of imaging devices CA arranged at the match site, and performs image analysis processing on the received images. Further, the video analysis unit 4 performs a process of acquiring a broadcast video VA, which is a video that has been broadcast, and performs image analysis processing on the broadcast video VA.
- a broadcast video VA which is a video that has been broadcast
- FIG. 1 illustrates the first imaging device CA1, the second imaging device CA2, and the third imaging device CA3 as an example of the imaging device CA, but this is an example, and only one imaging device CA is used. It may be installed at the game venue, or four or more imaging devices CA may be installed at the game venue.
- the image obtained from the first imaging device CA1 be the first image V1
- the image obtained from the second imaging device CA2 be the second image V2
- the third image obtained from the third imaging device CA3 be V3.
- Each imaging device CA is synchronized, and the images captured at the same timing can be known by referring to the time code.
- the video analysis unit 4 obtains information on the subject being imaged for each time through image analysis processing.
- the subject information includes, for example, the subject's name such as a player's name, uniform number information, the imaging angle, and the subject's posture.
- the subject may be identified based on facial features, hairstyle, hair color, facial expression, and the like.
- the video analysis unit 4 obtains scene type information that identifies a scene through image analysis processing.
- the scene type information includes, for example, information such as whether the captured scene is a scoring scene, a foul scene, a player substitution scene, or an injury scene.
- the scene type may be specified by detecting the posture of the subject as described above. For example, the scene type may be specified by estimating the content of the referee's judgment by detecting the referee's posture, or the scoring scene may be detected by detecting the player's fist pump.
- the video analysis unit 4 identifies the IN point and the OUT point through image analysis processing.
- the in-point and out-point are information for specifying the clipping range of the video imaged by the imaging device CA.
- clip video CV a video of a predetermined range clipped by a set of in-points and out-points.
- the in-point and out-point may be determined, for example, by using image analysis processing to identify the moment when the play to be detected occurs, and using that moment as a base point. Further, when detecting the in-point and the out-point based on the broadcast video VA, it may be performed by detecting the timing of video switching. That is, the video analysis unit 4 may specify the IN point and the OUT point by performing image analysis processing on the broadcast video VA and detecting the switching point of the imaging device CA.
- the video analysis unit 4 adds information obtained by image analysis processing to the video. For example, the fact that player A and player B are imaged in a certain time period in the first video V1, and that the time period is a touchdown scene, etc. are linked and stored. As a result, for example, when it is desired to create a digest video DV using a scene in which a specific player is imaged, it is possible to easily specify the time period in which the specific player was imaged.
- the video analysis unit 4 identifies the development of the game by executing image analysis processing on the broadcast video VA.
- the broadcast video VA is made by connecting specific partial videos (clip video CV) using the first video V1, second video V2, and third video V3 as materials, and superimposing various information such as score information and player name information. It is generated.
- the video analysis unit 4 may assign a score to each video by performing image analysis processing.
- the score may be calculated as a likelihood when the captured subject is specified, or may be calculated as an index indicating whether or not the video is appropriate for presentation to the viewer.
- FIG. 1 shows a configuration in which the image analysis unit 4 acquires images from the imaging device CA
- the images may be acquired from a storage device in which images captured by the imaging device CA are stored.
- the video generation unit 5 performs processing for generating a digest video DV using the first video V1, the second video V2 and the third video V3.
- the video generation unit 5 includes a specification unit 10, a clip collection generation unit 11, and a digest video generation unit 12 (see FIG. 2).
- the specifying unit 10 performs a process of specifying the auxiliary information SD for generating the digest video DV.
- a process of specifying the auxiliary information SD for generating the digest video DV For brevity, an example of the flow of generation of the digest video DV will be shown.
- the clip collection CS is a combination of a plurality of clip videos CV.
- a clip video CV obtained by cutting out the time zone in which the scoring scene was captured from the first video V1 captured by the first imaging device CA1.
- a clip video CV obtained by clipping the time period in which the scoring scene was captured from the second video V2 captured by the second imaging device CA2, and the third video V3 captured by the third imaging device CA3.
- the clip video CV obtained by clipping the time period in which the score scene was captured is combined from the above to generate a clip collection CS for the score scene.
- Such clip collections CS are generated, for example, by the number of scoring scenes, the number of foul scenes, or the number of player substitution scenes.
- the digest video DV is generated by selecting and combining the clip collections CS to be presented to the viewer from the plurality of clip collections CS generated in this way.
- the auxiliary information SD is used to select the clip video CV to be included in the clip collection CS.
- the auxiliary information SD is a keyword used when selecting a clip collection CS included in the digest video DV from a plurality of clip collections CS.
- any information that can identify the player may be used.
- keywords such as position names and referees may be used.
- the auxiliary information SD may be a keyword as scene type information. For example, if there are many posts about foul scenes on SNS, it can be determined that viewers are highly interested in foul scenes. In that case, the clip collection CS of the foul scene is selected and incorporated into the digest video DV.
- the auxiliary information SD may be type information such as a scoring scene or a foul scene, or more detailed type information such as a field goal scene, a touchdown scene, or a specific foul name. It may be a keyword indicating
- the auxiliary information SD may indicate the order of combining the clip videos CV included in the clip collection CS.
- the first image V1 is a wide-angle image captured from the side of the field
- the second image V2 is a ball-holding image.
- the auxiliary information SD indicating the combining order may differ according to the scene type.
- the scoring scene may start with a wide-angle image
- the foul scene may start with a telephoto image.
- the auxiliary information SD may be information indicating whether or not the video has been broadcast.
- the viewer may have already viewed the broadcast video VA about the game. Even if such viewers view the same video, it will not provide meaningful information to the viewer. It is conceivable to generate a digest video DV.
- Auxiliary information SD indicating whether or not the video has been broadcast is used in such a case for selecting a clip collection CS or selecting a clip video CV.
- the clip collection generation unit 11 generates a clip video CV based on the auxiliary information SD. Specifically, by presenting the specified auxiliary information SD such as the player name to the video analysis unit 4, the video analysis unit 4 is caused to determine the in-point and the out-point of the video in which the player is captured, and the clip video CV. to generate
- the clip collection generation unit 11 combines clip videos CV to generate a clip collection CS.
- the order of combining the clip videos CV may be based on the auxiliary information SD, or may be a predetermined order.
- the clip collection generation unit 11 generates the clip collection CS using the analysis result of the image analysis processing by the video analysis unit 4 and the auxiliary information SD.
- the clip collection generating unit 11 may insert an image representing that the video is switched between the clip video CVs when combining the two clip video CVs.
- the digest video generation unit 12 combines the clip collections CS generated by the clip collection generation unit 11 to generate a digest video DV.
- the order in which the clip collections CS are combined is determined, for example, according to the occurrence time of each scene. An image or the like representing that the video is switched may be inserted between the clip collections CS.
- the generated digest video DV may be posted on SNS or uploaded on a web page.
- FIGS. 4 to 7 An example of the first processing flow is shown in FIGS. 4 to 7.
- FIG. 4 shows an example of the processing flow executed by the post data extraction unit 2 of the information processing device 1
- FIG. 5 shows an example of the processing flow executed by the metadata extraction unit 3
- FIG. 6 shows an example of the processing flow to be performed
- FIG. 7 shows an example of the processing flow executed by the image generation unit 5.
- the posted data extraction unit 2 analyzes the SNS posted data in step S101 of FIG. Keywords with a high frequency of appearance and keywords with a high degree of attention are extracted by this analysis processing. These keywords are, for example, the aforementioned player names and scene types.
- step S102 the posted data extraction unit 2 determines whether the extracted keyword is related to the target event. Specifically, it is determined whether or not the extracted person's name exists as a member of the team participating in the game for which the digest video DV is to be generated, or whether the extracted keyword is related to the target game. It is determined whether or not
- the post data extraction unit 2 performs a process of outputting the extracted keyword to the metadata extraction unit 3 in step S103.
- the post data extraction unit 2 determines whether the event has ended in step S104 without performing the process of step S103.
- the post data extraction unit 2 returns to step S101 to continue extracting keywords. On the other hand, if it is determined that the event has ended, the post data extraction unit 2 ends the series of processes shown in FIG.
- FIG. 4 and subsequent figures show an example of generating the clip collection CS for generating the digest video DV in parallel with the progress of the event. is being executed.
- step S104 a process for determining whether or not the extraction of keywords and the like has been completed.
- keywords are continuously extracted from the post data posted on the SNS from the start of an event such as a sports match to the end of the event. It is output to the metadata extraction unit 3 as appropriate.
- the metadata extraction unit 3 executes a series of processing shown in FIG. Specifically, in step S201, the metadata extraction unit 3 analyzes the metadata acquired from the metadata server 200 and extracts information for identifying the scene that occurred in the event. For example, in the case of an American football game, information such as the time when a scene corresponding to a touchdown, which is one of the scene types, occurred, the name of the player who scored the touchdown, and information on the change in the score due to the touchdown. Extract.
- step S ⁇ b>202 the metadata extraction unit 3 determines whether or not the keyword extracted from the SNS post has been obtained from the post data extraction unit 2 .
- the metadata extraction unit 3 returns to the process of step S201.
- the metadata extraction unit 3 identifies metadata related to the keyword obtained in step S203.
- the metadata extraction unit 3 outputs the specified metadata to the video analysis unit 4 in step S204.
- step S205 the metadata extraction unit 3 determines whether the event has ended.
- the metadata extraction unit 3 returns to step S201 to analyze the metadata. On the other hand, when determining that the event has ended, the metadata extraction unit 3 ends the series of processes shown in FIG.
- Metadata extraction unit 3 executes a series of processes shown in FIG. 5 to accumulate metadata in metadata server 200 as an external information processing device from the start of an event such as a sports match to the end of the event. is continuously executed to extract information on each scene that occurs during the game.
- the video analysis unit 4 executes a series of processing shown in FIG.
- step S301 the video analysis unit 4 performs video analysis by performing image recognition processing on a plurality of videos such as the first video V1, the second video V2, the third video V3, and the broadcast video VA.
- the imaged uniform number, player's face, ball, etc. are identified.
- the video analysis unit 4 may further specify a camera angle, and may specify an IN point and an OUT point for generating a clip video CV.
- likelihood information that indicates the likelihood of recognition results may be calculated.
- the likelihood information is used for image selection processing and the like in the image generation unit 5 in the subsequent stage.
- the information specified by the image recognition process is stored in association with time information such as the elapsed time of the game and the elapsed time from the start of recording for each of the multiple videos.
- step S302 the video analysis unit 4 determines whether the event has ended. When determining that the event has not ended, the video analysis unit 4 continues the video analysis processing by returning to the processing of step S301. On the other hand, when determining that the event has ended, the video analysis unit 4 ends the series of processes shown in FIG.
- various types of information are extracted from the video captured from the start of an event such as a sports match to the end of the event.
- the video generation unit 5 generates a digest video DV according to the processing results of the posted data extraction unit 2, the metadata extraction unit 3, and the video analysis unit 4.
- step S401 of FIG. 7 the video generation unit 5 determines whether or not keywords and metadata have been acquired.
- the video generation unit 5 proceeds to step S402. , to generate a clip video CV for the target scene based on the keyword or metadata. This process generates a clip video CV based on the in-point and out-point specified by the video analysis unit 4 for the target scene.
- the video generation unit 5 After generating the clip video CV, in step S403, the video generation unit 5 combines the clip video CV to generate a clip collection CS for the target scene.
- the clip video CV may be generated by, for example, combining the first video V1, the second video V2 and the third video V3 in a predetermined order.
- a template is prepared so that videos are combined in a predetermined camera angle order according to the scene type, and each clip video CV is applied to the template based on camera angle information for each imaging device CA.
- the clip videos CV may be combined in an optimal order.
- the video generation unit 5 After generating the clip video CV, the video generation unit 5 returns to the process of step S401.
- step S401 If it is determined in the determination processing in step S401 that the keyword and metadata have not been acquired, the video generation unit 5 proceeds to step S404 and determines whether the event has ended.
- the video generation unit 5 returns to step S401 to continue generating the clip video CV and clip collection CS.
- step S405 combines the clip collection CS to generate the digest video DV.
- the digest video DV is basically generated by combining in chronological order the clip collection CS for each scene that occurred during the match.
- the digest video DV will be generated while making a selection so that the clip collection CS with the highest priority is included from the clip collection CS.
- the high-priority clip collection CS includes a clip collection CS corresponding to a scene scored by one of the teams, a clip collection CS corresponding to a scene that is estimated to be of high interest to viewers based on SNS posting data, and the like. be.
- post data posted during a predetermined period (10 minutes, 30 minutes, etc.) after the match may be used.
- the posted data posted in a predetermined period after the end of the match includes posts summarizing the match and posts referring to scenes in the match that the player would like to see again.
- the video generation unit 5 After generating the digest video DV, the video generation unit 5 performs processing for saving the digest video DV in step S406.
- the place where the digest video DV is saved may be a storage unit inside the information processing device 1 or may be a storage unit of a server device separate from the information processing device 1 .
- FIGS. 8 to 11 An example of the second processing flow is shown in FIGS. 8 to 11.
- FIG. Note that the same step numbers are given to the same processes as those described in the first process flow, and the description thereof will be omitted as appropriate.
- the posted data extraction unit 2 analyzes the SNS posted data in step S101 of FIG. Through this analysis processing, keywords such as player names and scene types that appear frequently and keywords that attract attention are extracted.
- step S102 the posted data extraction unit 2 determines whether the extracted keyword is related to the target event.
- the post data extraction unit 2 performs a process of classifying the extracted keywords in step S110.
- the extracted keywords can be classified into keywords related to people such as players, referees, and managers, keywords related to scoring scenes such as field goals and touchdowns, and keywords related to foul scenes such as offsides and holdings. Classify.
- the post data extraction unit 2 After classifying the keywords, the post data extraction unit 2 outputs the classification result to the metadata extraction unit 3 in step S111.
- step S111 determines whether or not the event is related to the target event, or after step S111 is executed. If it is determined that the event is not related to the target event, or after step S111 is executed, the post data extraction unit 2 does not perform the processes of steps S110 and S111, and the event ends in step S104. determine whether or not
- the post data extraction unit 2 returns to step S101 to continue extracting keywords. On the other hand, if it is determined that the event has ended, the post data extraction unit 2 ends the series of processes shown in FIG.
- the metadata extraction unit 3 executes a series of processing shown in FIG.
- step S210 of FIG. 9 the metadata extraction unit 3 determines whether or not the keyword classification result has been acquired. If it is determined that the classification result has been acquired, the metadata extraction unit 3 performs branch processing according to the classification result in step S211.
- step S212 the metadata extraction unit 3 identifies metadata that includes the person related to the keyword.
- the metadata extraction unit 3 identifies metadata about the scoring scene in step S213.
- the metadata extracting unit 3 identifies metadata about the foul scene in step S214.
- the metadata extraction unit 3 proceeds to step S204 and outputs the specified metadata and the aforementioned classification results to the video analysis unit 4.
- step S205 the metadata extraction unit 3 determines whether the event has ended.
- the metadata extraction unit 3 returns to the process of step S210 to determine whether the classification result is obtained. On the other hand, when determining that the event has ended, the metadata extraction unit 3 ends the series of processes shown in FIG.
- the video analysis unit 4 executes a series of processes shown in FIG.
- step S ⁇ b>310 the video analysis unit 4 determines whether metadata and classification results have been acquired from the metadata extraction unit 3 . When determining that the metadata has not been acquired, the video analysis unit 4 executes the process of step S310 again.
- the video analysis unit 4 proceeds to step S311 and performs branch processing according to the classification result.
- step S312 the video analysis unit 4 performs uniform number recognition and face recognition by image recognition processing in order to specify the time zone in which the specified person was imaged. conduct.
- the video analysis unit 4 performs scoreboard recognition by image recognition processing in step S313 to identify the scoring scene.
- Scoreboard recognition by image recognition processing may be performed, for example, by detecting a location where a scoreboard installed in a venue is imaged and extracting the score of the scoreboard, or by analyzing broadcast video VA. By recognizing subtitles, graphics, etc. superimposed on the captured image, changes in scores of both teams may be detected.
- image recognition processing is not performed on the entire captured video, but on a predetermined range of video around the specified time. Image recognition processing may be performed as As a result, it is possible to reduce the processing load and shorten the processing time associated with the image recognition processing.
- the video analysis unit 4 detects a foul display by image recognition processing in order to identify the foul scene in step S314.
- the image recognition processing for identifying the foul scene may be, for example, identifying the occurrence timing of the foul scene by recognizing a yellow flag thrown into the field, or by analyzing the broadcast video VA.
- a foul scene may be identified by recognizing subtitles, graphics, etc. superimposed on the image.
- a scene in which a yellow card or a red card is shown to a target player may be specified as a foul scene by detecting the posture of the referee.
- processing may be performed on the video of a predetermined section based on the metadata in the same manner as in step S313.
- step S315 After executing any of steps S312, S313, or S314, the video analysis unit 4 proceeds to step S315, and identifies the camera angle by image analysis processing.
- the information of the camera angle specified here is used in the later processing of generating the clip collection CS.
- step S316 the video analysis unit 4 executes image analysis processing for specifying the IN point and the OUT point.
- the in-point and the out-point may be determined based on the occurrence timing of the scene.
- the IN point may be 15 seconds before the scene generation timing
- the OUT point may be 20 seconds after the IN point.
- step S302 the video analysis unit 4 determines whether the event has ended. If it is determined that the event has not ended, the video analysis section 4 returns to the process of step S310. On the other hand, when determining that the event has ended, the video analysis unit 4 ends the series of processes shown in FIG.
- the video generation unit 5 generates a digest video DV according to the processing results of the posted data extraction unit 2, the metadata extraction unit 3, and the video analysis unit 4.
- step S410 of FIG. 11 the video generation unit 5 determines whether or not it has detected that the IN point and the OUT point have been specified.
- the video generation unit 5 proceeds to step S411 and performs processing for generating a clip video CV based on the in-point and out-point.
- step S403 the video generation unit 5 combines the clip video CV to generate a clip collection CS for the target scene.
- the video generation unit 5 After generating the clip video CV, the video generation unit 5 returns to the process of step S410.
- step S410 If it is determined in the determination processing in step S410 that the in point and the out point have not been specified, the video generation unit 5 proceeds to step S404 and determines whether the event has ended.
- the video generation unit 5 returns to step S410.
- step S405 the video generation unit 5 proceeds to step S405 to combine the clip collection CS to generate the digest video DV, and in subsequent step S406 performs processing for saving the digest video DV. .
- the third processing flow is an example of generating a digest video DV without using metadata.
- FIG. 8 A specific description will be given with reference to FIGS. 8, 10 and 11.
- the post data extraction unit 2 extracts and classifies keywords related to the event by executing the series of processes shown in FIG.
- the classification result is output to the video analysis unit 4 in step S111.
- the metadata extraction unit 3 does not perform any processing because it does not need to analyze the metadata.
- step S310 of FIG. 10 the video analysis unit 4 determines whether the keyword classification results have been acquired instead of determining whether the metadata has been acquired.
- the video generation unit 5 generates a digest video DV by executing a series of processes shown in FIG.
- the first example uses a different template for each scene type.
- step S501 of FIG. 12 the video generation unit 5 performs branch processing according to the scene type of the target scene.
- the type of target scene may be estimated from a keyword or determined based on metadata.
- the video generation unit 5 selects a touchdown scene template in step S502.
- the template is information that defines what kind of camera angle videos are to be combined and in what order.
- the video generation unit 5 selects a template for the field goal scene in step S503.
- the video generation unit 5 selects a foul scene template in step S504.
- the video generation unit 5 executes processing for generating the clip collection CS using the selected template in step S505.
- step S506 the video generation unit 5 adopts the target section in the broadcast video VA as the clip collection CS.
- the target section may be determined, for example, based on the posting time to the SNS, or may be determined based on the scene occurrence time in the metadata.
- the video generation unit 5 After executing the processing in either step S505 or S506, the video generation unit 5 ends the generation processing of the clip collection CS.
- Another example is an example in which the determination of the IN point and OUT point for not only the generation of the clip collection CS but also the generation of the clip video CV is made according to the scene type of the target scene.
- step S421 (see FIG. 13).
- step S501 the video generation unit 5 performs branch processing according to the scene type of the target scene.
- the video generation unit 5 determines the in-point and out-point for the touchdown scene and generates a clip video CV in step S510.
- the in point and the out point may be determined, for example, so that the clip video CV has an optimum length.
- step S502 the video generation unit 5 selects a touchdown scene template.
- the video generation unit 5 determines in points and out points for the field goal scene and generates a clip video CV in step S511.
- step S503 the video generation unit 5 selects a template for a field goal scene.
- the video generation unit 5 determines the in-point and out-point for the foul scene and generates a clip video CV in step S512.
- step S504 the video generation unit 5 selects a template for foul scenes.
- step S505 the video generation unit 5 executes processing for generating a collection of clips CS using the selected template.
- step S506 the video generation unit 5 adopts the target section in the broadcast video VA as the clip collection CS.
- the target section may be determined, for example, based on the posting time to the SNS, or may be determined based on the scene occurrence time in the metadata.
- the video generation unit 5 After executing the processing in either step S505 or S506, the video generation unit 5 ends the generation processing of the clip collection CS.
- FIGS. 12 and 13 show examples in which one template is prepared for each foul scene, different templates may be prepared according to the type of foul. Templates may also be prepared for other scene types, such as injury scenes, in addition to the illustrated cases.
- Scoring method> If there is a limit to the playback time length of the clip collection CS, it may not be possible to combine all the selected clip videos CV. In such a case, a scoring process may be performed to assign a score to each clip video CV so that the clip video CV with a high score is preferentially included in the clip collection CS.
- FIG. 14 shows an example of the score given as a result of scoring for the size of the subject and the score given as a result of scoring for the direction of the subject for the clip video CV for each imaging device CA.
- Each score is a value in the range of 0 to 1, and the higher the value, the better the score.
- the first video V1 is a bird's-eye view video, and the subject is captured small, so the score for the size of the subject is 0.02.
- the score for the orientation of the subject is set to 0.1.
- the second image V2 is a telephoto image in which the player holding the ball is projected large, and the score for the size of the subject is 0.85.
- the score for the subject orientation is set to 0.9.
- the third video V3 is a bird's-eye view of a relatively narrow area, and the size of the subject is not so large, so the score for the size of the subject is 0.1.
- the score for the orientation of the subject is set to 0.1.
- a fourth image V4 is an image captured by the fourth imaging device CA4.
- the fourth image V4 is a telephoto image in which the subject is captured large, and the score for the size of the subject is 0.92. However, since the orientation of the subject does not face the imaging device CA, the score for the orientation of the subject is set to 0.1.
- the fourth image V4 is preferentially selected. Further, when priority is given to the image in which the front of the subject is captured, the second image V2 is preferentially selected.
- scoring may be calculated for each clip collection CS including a plurality of clip videos CV, instead of being calculated for each clip video CV. Then, when selecting the clip collections CS included in the digest video DV, the clip collections CS with high scores assigned by the scoring may be more likely to be included.
- the clip video CV including the captured image given the highest score may be selected, or the clip video CV may be selected based on the average score of each captured image. good too.
- the average score is, for example, the average score calculated for each captured image included in the clip video CV.
- step S601 of FIG. 15 the video generation unit 5 selects the clip video CV whose score is equal to or greater than the threshold. As a result, videos with low scores and unattractive to viewers can be omitted.
- step S602 the image generation unit 5 generates a clip collection CS by combining the clip images CV in order of score.
- the score given by the scoring process can be regarded as an index that indicates that the video is easy for viewers to see and that the video is suitable for understanding what happened in the scene.
- the viewer who viewed the clip collection CS can correctly understand what happened in the scene. In other words, it is possible to prevent a situation in which a clip video CV with a low score is viewed and the viewer cannot understand the event that occurred in the scene.
- this example is a process for generating a clip video CV and a clip collection CS, which is executed instead of steps S402 and S403 in FIG. 7, or instead of steps S411 and S403 in FIG. This is the process to be executed on
- step S501 the video generation unit 5 performs branch processing according to the scene type of the target scene.
- the video generation unit 5 selects an optimal video (imaging device CA) for the touchdown scene in step S610.
- a plurality of images may be selected. That is, a plurality of imaging devices CA may be selected.
- step S502 the video generation unit 5 selects a touchdown scene template.
- the video generation unit 5 selects an optimal video for the field goal scene in step S611.
- step S503 the video generation unit 5 selects a template for a field goal scene.
- the video generation unit 5 selects the optimum video for the foul scene in step S612.
- step S504 the video generation unit 5 selects a template for foul scenes.
- step S613 the video generation unit 5 determines the in-point and the out-point for the section where the score is equal to or greater than the threshold from the section in which the target scene was captured, and determines the clip video CV. to generate This process is executed for each selected video.
- step S505 the video generation unit 5 executes processing for generating the clip collection CS using the selected template.
- step S506 the video generation unit 5 adopts the target section in the broadcast video VA as the clip collection CS.
- the target section may be determined, for example, based on the posting time to the SNS, or may be determined based on the scene occurrence time in the metadata.
- the video generation unit 5 After executing the processing in either step S505 or S506, the video generation unit 5 ends the generation processing of the clip collection CS.
- this example is a process for generating a clip video CV and a clip collection CS, which is executed instead of steps S402 and S403 in FIG. 7, or instead of steps S411 and S403 in FIG. This is the process to be executed on
- step S601 of FIG. 17 the video generation unit 5 selects the clip video CV whose score is equal to or greater than the threshold. As a result, videos with low scores and unattractive to viewers can be eliminated.
- step S613 the video generation unit 5 cuts out a section of the selected clip video CV whose score is equal to or greater than the threshold value and generates a new clip video CV. Specifically, the clip video CV is generated by determining the in-point and out-point of the section whose score is above the threshold. This process is executed for each selected video.
- step S602 the image generation unit 5 generates a clip collection CS by combining the clip images CV in order of score.
- a section with a high score is further selected and cut out from the clip video CV with a high score, so that a digest video DV or the like using only video with high interest of the viewer can be generated.
- post data is extracted from SNS.
- the post data may be extracted from an unspecified number of accounts or from a specific account.
- the post data for an unspecified number of accounts it is possible to better understand the interests of the viewers.
- the extraction of the posted data may be performed by extracting the posted data itself, or may be information obtained after subjecting the posted data to statistical processing. For example, it may be information extracted by statistical processing, such as a keyword that appears frequently in information posted in the last predetermined time. These pieces of information may be extracted by the SNS server 100 that manages posts to the SNS, or may be obtained from another server that analyzes posts on the SNS server 100 .
- the video analysis unit 4 has shown an example of analyzing the broadcast video VA.
- analyzing the broadcast video VA not only the image analysis process but also the audio analysis process of analyzing the voice of the commentator or the commentator may be performed. This makes it possible to more specifically and accurately identify the scene that occurred during the match, and also facilitates identification of the player associated with the scene.
- an in-point and an out-point for generating a clip video CV may be determined by audio analysis processing.
- the commemorative play is, for example, a play at the moment when a certain player's career record reaches a predetermined numerical value, or a play at the time when the previous record is broken.
- the clip collection CS may be generated without combining the videos. For example, if there is no enlarged image with the angle of view specified by the template, the clip collection CS is generated without including that image.
- the referee's gestures may be set in detail according to the type of play and the type of foul.
- a dedicated imaging device CA that captures images of the referee is placed in the venue, and the content of the play that occurred during the match, that is, by specifying the referee's posture and gestures by image analysis processing. , the scene type, etc. can be specified.
- the scene type information thus obtained can be used, for example, in place of metadata.
- referees to be subjected to image analysis processing may include not only the chief referee but also an assistant referee.
- the digest video DV is generated by combining a plurality of clip collections CS, but the digest video DV may be generated from one clip collection CS. Specifically, when there is one clip collection CS to be presented to the viewer, the digest video DV may be generated so as to include only one clip collection CS.
- Computer device> A configuration of a computer device including an arithmetic processing unit that implements the information processing device 1 described above will be described with reference to FIG. 18 .
- the CPU 71 of the computer device functions as an arithmetic processing unit that performs the various processes described above, and programs stored in a non-volatile memory unit 74 such as a ROM 72 or an EEP-ROM (Electrically Erasable Programmable Read-Only Memory), or Various processes are executed according to programs loaded from the storage unit 79 to the RAM 73 .
- the RAM 73 also appropriately stores data necessary for the CPU 71 to execute various processes.
- the CPU 71 , ROM 72 , RAM 73 and nonvolatile memory section 74 are interconnected via a bus 83 .
- An input/output interface (I/F) 75 is also connected to this bus 83 .
- the input/output interface 75 is connected to an input section 76 including operators and operating devices.
- an input section 76 including operators and operating devices.
- various operators and operation devices such as a keyboard, mouse, key, dial, touch panel, touch pad, remote controller, etc. are assumed.
- a user's operation is detected by the input unit 76 , and a signal corresponding to the input operation is interpreted by the CPU 71 .
- the input/output interface 75 is connected integrally or separately with a display unit 77 such as an LCD or an organic EL panel, and an audio output unit 78 such as a speaker.
- the display unit 77 is a display unit that performs various displays, and is configured by, for example, a display device provided in the housing of the computer device, a separate display device connected to the computer device, or the like.
- the display unit 77 displays images for various types of image processing, moving images to be processed, etc. on the display screen based on instructions from the CPU 71 . Further, the display unit 77 displays various operation menus, icons, messages, etc., ie, as a GUI (Graphical User Interface), based on instructions from the CPU 71 .
- GUI Graphic User Interface
- the input/output interface 75 may be connected to a storage unit 79 made up of a hard disk, solid-state memory, etc., and a communication unit 80 made up of a modem or the like.
- the communication unit 80 performs communication processing via a transmission line such as the Internet, wired/wireless communication with various devices, bus communication, and the like.
- a drive 81 is also connected to the input/output interface 75 as required, and a removable storage medium 82 such as a magnetic disk, optical disk, magneto-optical disk, or semiconductor memory is appropriately mounted.
- Data files such as programs used for each process can be read from the removable storage medium 82 by the drive 81 .
- the read data file is stored in the storage unit 79 , and the image and sound contained in the data file are output by the display unit 77 and the sound output unit 78 .
- Computer programs and the like read from the removable storage medium 82 are installed in the storage unit 79 as required.
- software for the processing of this embodiment can be installed via network communication by the communication unit 80 or via the removable storage medium 82 .
- the software may be stored in advance in the ROM 72, the storage unit 79, or the like.
- the CPU 71 performs processing operations based on various programs, thereby executing necessary information processing and communication processing as the information processing apparatus 1 including the arithmetic processing unit described above.
- the information processing apparatus 1 is not limited to being configured with a single computer device as shown in FIG. 2, and may be configured by systematizing a plurality of computer devices.
- the plurality of computer devices may be systematized by a LAN (Local Area Network) or the like, or may be remotely located by a VPN (Virtual Private Network) or the like using the Internet or the like.
- the plurality of computing devices may include computing devices as a group of servers (cloud) available through a cloud computing service.
- the information processing apparatus 1 specifies auxiliary information SD for generating a digest video DV based on scene-related information about a scene that occurred in an event such as a sports match.
- a part 10 is provided.
- An event is, for example, an entertainment such as a sports game or a concert.
- the auxiliary information SD is, for example, information used to generate the digest video DV, and is information used to determine which part of the captured video is cut out.
- information such as the name of the player, the type of scene, the type of play, and the like are specifically used as auxiliary information.
- the scene-related information may be information including metadata distributed from another information processing apparatus (metadata server 200).
- Metadata is information that includes the progress of events such as sports. Taking a sports match as an example, the time information when a specific play occurred, the names of the players involved in the play, and the results of the play change. It contains information on the score obtained.
- the auxiliary information SD By specifying the auxiliary information SD based on such metadata, it is possible to more appropriately specify the time zone to be extracted from the captured video.
- the scene-related information may include information related to posts by users of social networking services (SNS). Various posts are made to SNS according to the progress of the event. Then, by analyzing the content posted to the SNS, it becomes possible to specify the scene that the viewer is highly interested in. By specifying the auxiliary information SD based on the scene-related information obtained from such SNS, it is possible to generate a digest video DV containing an appropriate scene that matches the interest of the viewer. .
- the information related to posts by users using the SNS is information related to information posted to the SNS, and includes, for example, keywords that appear frequently in the most recent predetermined time period. This information may be a keyword extracted based on information posted on the SNS, a keyword presented by a service attached to the SNS, or a keyword presented by a service different from the SNS. may be obtained.
- the auxiliary information SD may be information indicating whether or not it has been adopted as the broadcast video VA. For example, if it is possible to identify the section that is used as the broadcast video VA in the captured video, it is possible to identify the section that is not used as the broadcast video VA. This makes it possible to generate the digest video DV so as to include the clip video CV that is not used as the broadcast video VA. Therefore, it is possible to provide a digest video DV containing new video for the viewer.
- the auxiliary information SD may be keyword information.
- the keyword information is, for example, player name information, scene type information, play type regular use, equipment name, and the like. By using the keyword information, it is possible to realize the process of specifying the time zone to be cut out from the imaged video with a small processing load.
- the keyword information may be scene type information. For example, a clip video CV to be clipped from the captured video is determined based on the scene type information. Therefore, it is possible to generate a digest video DV containing a clip video CV corresponding to a predetermined scene type.
- the keyword information may be information that identifies the participants of the event. If the event is a sports match, a scene to be cut out from the captured video is determined based on keyword information such as the name of the player who participated in the match and the uniform number. Therefore, it is possible to generate a digest video DV or the like focusing on a specific player.
- the auxiliary information SD may be information used for generating a clip collection CS including one or more clip videos CV obtained from a plurality of imaging devices CA that capture events. For example, when a specific play type is selected as the auxiliary information SD, the specific play type is captured from a plurality of videos (first video V1, second video V2, etc.) captured by a plurality of imaging devices CA. By cutting out and connecting the sections, a clip collection CS for the play type is generated. By generating the digest video DV so as to include the clip collection CS generated in this way, one play can be viewed from different angles, and the digest video DV makes it easier for the viewer to grasp the play situation. can be generated.
- the clip collection CS is a combination of clip videos CV obtained by capturing a specific scene in the event, and the auxiliary information SD may include information on the order in which the clip videos CV are combined in advance.
- the clip collection CS is formed by combining a plurality of clip videos CV as partial videos of one play captured from different angles. In the generation of such a clip collection CS, by connecting the images in a predetermined order, it is possible to provide the viewer with images that allow one play to be viewed from different angles. The processing load for determination can be reduced.
- the information on the order in which the clip video CV is combined may be information corresponding to the scene type of a specific scene. That is, the predetermined order may be an appropriate order that differs for each scene type. For example, when generating one clip collection CS for one field goal that occurred in an American football game, a specific By combining the clip videos CV in the order of , an appropriate clip collection CS for the field goal can be generated. Templates in a specific order are images taken from different angles, such as a side view, a back view of the goal, a front view of the goal, a bird's eye view, etc., in a predetermined order.
- the information processing apparatus 1 may include a clip collection generator 11 that generates a clip collection CS using the auxiliary information SD. As a result, the information processing apparatus 1 executes a series of processes from specifying the auxiliary information SD to generating the clip video CV and generating the clip collection CS.
- the information processing device 1 is a single device, there is no need to transmit the necessary information from specifying the auxiliary information SD to generating the clip collection CS to other information processing devices, thereby reducing the processing load. is planned. Note that another short video or image may be inserted between the clip video CV and the clip video CV.
- the clip collection generator 11 of the information processing device 1 may generate the clip collection CS by combining the clip videos CV.
- the clip collection CS is generated only by combining the clip images CV without interposing another image. As a result, the processing load required for generating the clip collection CS can be reduced.
- the clip collection CS may be a combination of clip videos CV obtained by capturing specific scenes in the event. By combining a plurality of clip videos CV obtained by clipping images of a certain scene from different angles, a clip collection CS is generated that allows the scene to be confirmed from different angles. As a result, it is possible to generate a digest video DV that allows the user to easily comprehend what happened in each scene.
- the clip collection generation unit 11 of the information processing device 1 may generate the clip collection CS using the analysis result obtained by image analysis processing on the video obtained from the imaging device CA that captures the event and the auxiliary information SD.
- Image analysis processing for video makes it possible to specify information about the subject of the video, scene type information, and the like. As a result, it is possible to generate the clip collection CS corresponding to the auxiliary information SD, and to generate the appropriate digest video DV.
- the image analysis process may be a process of identifying a person appearing in the video. Appropriate identification of a person appearing in a video by image analysis processing makes it possible to identify a clip video CV to be included in the clip collection CS based on a keyword such as a player's name. Therefore, it is possible to reduce the processing load associated with the selection of the clip video CV.
- the image analysis process may be a process of identifying the type of scene appearing in the video. Appropriately specifying the type of the scene shown in the video by the image analysis processing makes it possible to specify the clip video CV to be included in the clip collection CS based on the keyword such as the scene type. Therefore, it is possible to reduce the processing load associated with the selection of the clip video CV.
- the image analysis process may be a process of identifying in points and out points. By specifying the in-point and out-point by the image analysis processing, it is possible to cut out a video of an appropriate section as the clip video CV. Therefore, it is possible to generate an appropriate clip collection CS and digest video DV.
- the image analysis process may include a process of giving a score to each clip video CV. Depending on the time length of the clip video CV, it may not be possible to include all the clip video CV of the scene in question in one clip collection CS. There are also clip videos CV that should not be included in the clip collection CS. By scoring each clip video CV, it is possible to generate a clip collection CS in which only appropriate clip video CVs are combined.
- a computer device executes a process of specifying auxiliary information for generating a digest video based on scene-related information about a scene that occurred at an event.
- a program to be executed by the information processing apparatus 1 described above can be recorded in advance in a HDD (Hard Disk Drive) as a recording medium built in a device such as a computer device, or in a ROM or the like in a microcomputer having a CPU.
- the program may be a flexible disk, a CD-ROM (Compact Disk Read Only Memory), an MO (Magneto Optical) disk, a DVD (Digital Versatile Disc), a Blu-ray Disc (registered trademark), a magnetic disk, a semiconductor It can be temporarily or permanently stored (recorded) in a removable recording medium such as a memory or memory card.
- Such removable recording media can be provided as so-called package software.
- it can also be downloaded from a download site via a network such as a LAN (Local Area Network) or the Internet.
- LAN Local Area Network
- the present technology can also adopt the following configuration.
- An information processing device comprising a specifying unit that specifies auxiliary information for generating a digest video based on scene-related information about a scene that occurred at an event.
- the scene-related information is information including metadata distributed from another information processing apparatus.
- the scene-related information includes information related to a post by a user using a social networking service.
- the auxiliary information is information indicating whether or not it has been adopted as a broadcast video.
- the information processing apparatus (6) The information processing apparatus according to (5), wherein the keyword information is scene type information. (7) The information processing apparatus according to (5), wherein the keyword information is information specifying a participant of the event. (8) Any one of (1) to (7) above, wherein the auxiliary information is information used for generating a clip collection containing one or more clip images obtained from a plurality of imaging devices that capture the event. The information processing device described. (9) The clip collection is a combination of clip images obtained by capturing a specific scene in the event, The information processing apparatus according to (8), wherein the auxiliary information includes information on a predetermined order of combining the clip images. (10) The information processing apparatus according to (9), wherein the information on the order of combination is information corresponding to a scene type of the specific scene.
- the information processing apparatus including a clip collection generating unit that generates the clip collection using the auxiliary information.
- the clip collection generation unit generates the clip collection by combining the clip images.
- the clip collection is a combination of clip videos obtained by capturing specific scenes in the event.
- the clip collection generation unit generates the clip collection using the auxiliary information and analysis results obtained by image analysis processing on video obtained from an imaging device that captures the event (11) to (13).
- the information processing apparatus according to any one of .
- the information processing apparatus according to (14), wherein the image analysis process is a process of identifying a person appearing in a video.
- information processing device 10 identification unit 11 clip collection generation unit 200 metadata server (another information processing device)
- CV Clip video CS Clip collection VA Broadcast video
Abstract
Description
例えば、下記特許文献1では、ソーシャルネットワーキングシステム(SNS:Social Networking Service)に投稿された情報から視聴者の興味の度合いが高い内容を含むようにテレビコンテンツを生成するシステムが開示されている。
イベントとは、例えば、スポーツの試合やコンサートなどの催し物である。また、補助情報とは、例えば、ダイジェスト映像を生成するために用いられる情報であり、撮像された映像の中からどの部分の映像を切り出すかを決定するために用いられる情報である。例えば、スポーツの試合であれば、具体的には、選手名やシーンの種別やプレイの種別などの情報が補助情報とされる。
<1.システム構成>
<2.処理フロー>
<2-1.第1の処理フロー>
<2-2.第2の処理フロー>
<2-3.第3の処理フロー>
<2-4.クリップ集の生成処理のフロー>
<3.スコアリングについて>
<3-1.スコアリング方法>
<3-2.スコアを用いた映像選択における処理フロー>
<4.変形例>
<5.コンピュータ装置>
<6.まとめ>
<7.本技術>
本実施の形態のシステム構成例について図1を参照して説明する。
本実施の形態の情報処理装置1は、スポーツの試合やコンサートや舞台などの催し物についてのダイジェスト映像DVを生成する装置である。生成されたダイジェスト映像DVは、視聴者に対して配信される。
また、映像解析部4は、放送された映像である放送映像VAを取得する処理を行い、放送映像VAに対する画像解析処理を行う。
また、放送映像VAに基づいてイン点及びアウト点を検出する場合には、映像の切り替わりのタイミングを検出することにより行ってもよい。即ち、映像解析部4は、放送映像VAに対する画像解析処理を行い、撮像装置CAのスイッチング点を検出することにより、イン点及びアウト点を特定してもよい。
これにより、例えば、特定の選手が撮像されたシーンを用いてダイジェスト映像DVを作成したい場合などに、特定の選手が撮像された時間帯を容易に特定することができる。
放送映像VAは、第1映像V1や第2映像V2や第3映像V3を素材として特定の部分映像(クリップ映像CV)を繋ぎ合わせると共に、得点情報や選手名情報などの各種情報が重畳されて生成されたものである。
補助情報SDに基づいてクリップ集CSを生成することで、例えば、図3に示すように、第1映像V1がフィールドのサイドから俯瞰で撮像した広角映像とされ、第2映像V2がボール保持した選手付近を撮像した望遠映像とされ、第3映像V3がゴールポスト側から撮像した映像とされていた場合に、各映像から切り出したクリップ映像CVを適切な順序で結合することが可能となる。
そのような視聴者に対して同じ映像を視聴させても視聴者に対して有意な情報を提供することにはならないため、視聴者が視聴していない角度から撮像された映像が含まれるようにダイジェスト映像DVを生成することが考えられる。放送された映像であるか否かを示す補助情報SDは、このような場合においてクリップ集CSの選択或いはクリップ映像CVの選択に用いられる。
クリップ集CSの結合順序は、例えば、各シーンの発生時刻に沿って決定される。クリップ集CSの間には、映像が切り替わることを表現する画像等が挿入されてもよい。
情報処理装置1が実行する処理についていくつかの例を説明する。
第1の処理フローの例を図4から図7の各図に示す。具体的には、情報処理装置1の投稿データ抽出部2が実行する処理フローの一例を図4に、メタデータ抽出部3が実行する処理フローの一例を図5に、映像解析部4が実行する処理フローの一例を図6に、映像生成部5が実行する処理フローの一例を図7に示す。
一方、イベントは終了したと判定した場合、投稿データ抽出部2は図4に示す一連の処理を終了する。
具体的に、メタデータ抽出部3はステップS201において、メタデータサーバ200から取得したメタデータを解析してイベントにおいて発生したシーンを特定するための情報を抽出する。例えば、アメリカンフットボールの試合であれば、シーン種別の一つであるタッチダウンに該当するシーンが発生した時刻と、タッチダウンによって得点を得た選手名と、タッチダウンによる得点の変化の情報などを抽出する。
一方、イベントは終了したと判定した場合、メタデータ抽出部3は図5に示す一連の処理を終了する。
イベントは終了していないと判定した場合、映像解析部4はステップS301の処理へと戻ることにより、映像解析処理を継続して行う。
一方、イベントは終了したと判定した場合、映像解析部4は図6に示す一連の処理を終了する。
第2の処理フローの例を図8から図11の各図に示す。なお、第1の処理フローにおいて説明した処理と同様の処理については、同じステップ番号を付し適宜説明を省略する。
一方、イベントは終了したと判定した場合、投稿データ抽出部2は図8に示す一連の処理を終了する。
分類結果を取得したと判定した場合、メタデータ抽出部3はステップS211において、分類結果に応じた分岐処理を行う。
一方、イベントは終了したと判定した場合、メタデータ抽出部3は図9に示す一連の処理を終了する。
メタデータを取得していないと判定した場合、映像解析部4はステップS310の処理を再び実行する。
また、サッカーであれば、審判員の姿勢を検出することにより、対象選手に対してイエローカードやレッドカードを掲げているシーンを反則シーンとして特定してもよい。
ここで特定されたカメラアングルの情報は、後段のクリップ集CSを生成する処理において用いられる。
なお、イン点及びアウト点は、シーンの発生タイミングを基点として決められてもよい。例えば、シーンの発生タイミングの15秒前をイン点とし、イン点から20秒後をアウト点としてもよい。
イベントは終了していないと判定した場合、映像解析部4はステップS310の処理へと戻る。
一方、イベントは終了したと判定した場合、映像解析部4は図10に示す一連の処理を終了する。
第3の処理フローは、メタデータを利用せずにダイジェスト映像DVを生成する場合の例である。
図7や図11のステップS403で説明したクリップ集CSの生成処理について、具体的な処理の流れを説明する。
<3-1.スコアリング方法>
クリップ集CSの再生時間長に制限がある場合には、選択したクリップ映像CVを全て結合できない場合もある。そのような場合には、各クリップ映像CVにスコアを付すスコアリング処理を行い、スコアの高いクリップ映像CVが優先的にクリップ集CSに含まれるようにしてもよい。
また、被写体の正面が撮像された映像を優先する場合には、第2映像V2が優先的に選択される。
そして、ダイジェスト映像DVに含まれるクリップ集CSを選択する場合に、当該スコアリングによって付与されたクリップ集CSごとのスコアが高いクリップ集CSが含まれやすくなるようにしてもよい。
図7や図11のステップS403で説明したクリップ集CSの生成処理の具体的な処理手順について説明する。特に、本例においては、スコアを用いてクリップ集CSを生成する例について説明する。
具体的には、スコアが閾値上とされた区間のイン点及びアウト点を決定してクリップ映像CVを生成する。この処理は、選択された映像ごとに実行される。
上述した例では、SNSから投稿データを抽出することを示した。ここで、投稿データの抽出対象としては、不特定多数のアカウントであってもよいし、特定のアカウントであってもよい。不特定多数のアカウントについての投稿データを抽出することにより、視聴者の興味関心をより把握することが可能となる。
一方、チームの関係者や試合の実況を行っている者などが利用する特定のアカウントについての投稿データを抽出することにより、誤った情報を抽出してしまう可能性を低減させることができる。即ち、ある程度のノイズを除去することができる。
これらの情報は、SNSへの投稿を管理しているSNSサーバ100において抽出されてもよいし、SNSサーバ100についての投稿を解析する別のサーバ装置から得てもよい。
また、観客の歓声などを音声解析することによりシーンの発生タイミングを把握することやシーン種別を特定することなどを行ってもよい。
なお、記念プレイとは、ある選手にとっての通算成績が所定の数値に達した瞬間のプレイや、それまでの記録を塗り替えた際のプレイなどである。
そのような場合には、審判員を撮像する専用の撮像装置CAを会場内に配置し、画像解析処理によって審判員の姿勢やジェスチャーを特定することにより、試合中に起きたプレイの内容、即ち、シーン種別等を特定することが可能となる。
なお、画像解析処理の対象となる審判員は、主審だけでなく副審などが含まれていてもよい。
上述した情報処理装置1を実現する演算処理部を備えるコンピュータ装置の構成について図18を参照して説明する。
CPU71、ROM72、RAM73、不揮発性メモリ部74は、バス83を介して相互に接続されている。このバス83にはまた、入出力インタフェース(I/F)75も接続されている。
例えば入力部76としては、キーボード、マウス、キー、ダイヤル、タッチパネル、タッチパッド、リモートコントローラ等の各種の操作子や操作デバイスが想定される。
入力部76によりユーザの操作が検知され、入力された操作に応じた信号はCPU71によって解釈される。
表示部77は各種表示を行う表示部であり、例えばコンピュータ装置の筐体に設けられるディスプレイデバイスや、コンピュータ装置に接続される別体のディスプレイデバイス等により構成される。
表示部77は、CPU71の指示に基づいて表示画面上に各種の画像処理のための画像や処理対象の動画等の表示を実行する。また表示部77はCPU71の指示に基づいて、各種操作メニュー、アイコン、メッセージ等、即ちGUI(Graphical User Interface)としての表示を行う。
ドライブ81により、リムーバブル記憶媒体82から各処理に用いられるプログラム等のデータファイルなどを読み出すことができる。読み出されたデータファイルは記憶部79に記憶されたり、データファイルに含まれる画像や音声が表示部77や音声出力部78で出力されたりする。またリムーバブル記憶媒体82から読み出されたコンピュータプログラム等は必要に応じて記憶部79にインストールされる。
なお、情報処理装置1は、図2のようなコンピュータ装置が単一で構成されることに限らず、複数のコンピュータ装置がシステム化されて構成されてもよい。複数のコンピュータ装置は、LAN(Local Area Network)等によりシステム化されていてもよいし、インターネット等を利用したVPN(Virtual Private Network)等により遠隔地に配置されたものでもよい。複数のコンピュータ装置には、クラウドコンピューティングサービスによって利用可能なサーバ群(クラウド)としてのコンピュータ装置が含まれてもよい。
上述した各例において説明したように、情報処理装置1は、スポーツの試合などのイベントにおいて発生したシーンについてのシーン関連情報に基づいて、ダイジェスト映像DVを生成するための補助情報SDを特定する特定部10を備えている。
イベントとは、例えば、スポーツの試合やコンサートなどの催し物である。また、補助情報SDとは、例えば、ダイジェスト映像DVを生成するために用いられる情報であり、撮像された映像の中からどの部分の映像を切り出すかを決定するために用いられる情報である。例えば、スポーツの試合であれば、具体的には、選手名やシーンの種別やプレイの種別などの情報が補助情報とされる。
補助情報SDを特定することで、撮像された映像から切り出すべき時間帯を特定することができるため、ダイジェスト映像DVの生成を行うことができる。
メタデータとは、スポーツなどの催し物の進行状況が含まれた情報であり、スポーツの試合を例に挙げると、特定のプレイが発生した時間情報とプレイに関係した選手名とプレイの結果として変動した得点の情報などが含まれている。
このようなメタデータに基づいて補助情報SDを特定することで、撮像された映像から切り出すべき時間帯をより適切に特定することができる。
SNSには催し物の進行状況に合わせて様々な投稿がなされる。そして、SNSへの投稿内容を解析することにより、視聴者の興味関心の高いシーンを特定することが可能となる。
このようなSNSから得られた情報であるシーン関連情報に基づいて補助情報SDを特定することにより、視聴者の興味関心に適合した適切なシーンが含まれたダイジェスト映像DVを生成することができる。
なお、上述したように、SNSの利用ユーザによる投稿に関する情報とは、SNSに投稿された情報に関連する情報であり、例えば、直近の所定時間における出現頻度が高いキーワードなどを含むものである。この情報は、SNSに投稿された情報に基づいてキーワードを抽出してもよいし、SNSに付属するサービスによって提示されたキーワードを取得してもよいし、SNSとは異なるサービスによって提示されたキーワードを取得してもよい。
例えば、撮像された映像において放送映像VAとして採用された区間を特定することができれば、放送映像VAとして採用されていない区間を特定することができる。
これにより、放送映像VAとして採用されていないクリップ映像CVを含むようにダイジェスト映像DVを生成することが可能となる。従って、視聴者にとって新たな映像が含まれたダイジェスト映像DVを提供することが可能となる。
キーワード情報は、例えば、選手名の情報やシーンの種別情報やプレイの種別常用や用具の名称などの情報である。
キーワード情報を用いることにより、撮像された映像から切り出すべき時間帯を特定する処理を少ない処理負担で実現することができる。
例えば、撮像された映像から切り出すクリップ映像CVがシーンの種別情報に基づいて決定される。
従って、所定のシーン種別に応じたクリップ映像CVが含まれたダイジェスト映像DVを生成することができる。
イベントがスポーツの試合であれば、試合に出場した選手名や背番号などのキーワード情報に基づいて、撮像された映像から切り出すべきシーンが決定される。
従って、特定の選手に着目したダイジェスト映像DVなどを生成することができる。
例えば、補助情報SDとして特定のプレイ種別が選択された場合に、複数の撮像装置CAにおいて撮像された複数の映像(第1映像V1や第2映像V2など)から特定のプレイ種別が撮像された区間を切り出して結合することにより、当該プレイ種別に関するクリップ集CSが生成される。
このようにして生成されたクリップ集CSが含まれるようにダイジェスト映像DVを生成することで、一つのプレイを異なる角度から視聴することができ、視聴者にとってよりプレイ状況を把握しやすいダイジェスト映像DVを生成することができる。
クリップ集CSは、一つのプレイを異なる角度から撮像した部分映像としてのクリップ映像CVが複数結合されたものとされる。
このようなクリップ集CSの生成において、予め決められた順序で映像を繋ぎ合わせることにより、一つのプレイを異なる角度から視聴することができる映像を視聴者に提供することができると共に、結合順序を決定するための処理負担を軽減することができる。
即ち、予め決められた順序は、シーンの種別ごとに異なる適切な順序とされていてもよい。
例えば、アメリカンフットボールの試合において発生した一つのフィールドゴールに対して一つのクリップ集CSを生成する場合に、フィールドゴールについての状況を視聴者が正しく認識するため、或いは、臨場感を高めるために特定の順序でクリップ映像CVを結合することにより、当該フィールドゴールについての適切なクリップ集CSを生成することができる。特定の順序であるテンプレートは、例えば、側方からの映像、ゴールの背面側からの映像、ゴールの正面側からの映像、俯瞰の映像、などのように異なる角度から撮像した映像を所定の順序で結合することが規定されたものとされる。このテンプレートに準じるように各撮像装置CAの映像を当てはめることで、自動的に適切なクリップ集CSを生成することができる。そして、映像の結合順序を決定するための処理負担を軽減することができる。
また、テンプレートは、シーン種別に応じて異なるものとされてもよい。
これにより、情報処理装置1において補助情報SDの特定からクリップ映像CVの生成及びクリップ集CSの生成までの一連の処理が実行される。
情報処理装置1が単一の装置である場合には、補助情報SDの特定からクリップ集CSの生成までに必要な情報を他の情報処理装置に対して送信する必要が無く、処理負担の軽減が図られる。
なお、クリップ映像CVとクリップ映像CVの間に短い別の映像や画像などが挟み込まれていてもよい。
例えば、クリップ映像CV間に別の映像を挟まずに結合するだけでクリップ集CSが生成される。
これにより、クリップ集CSの生成に要する処理負担の軽減を図ることができる。
あるシーンについて異なる角度から撮像された映像を切り取った複数のクリップ映像CVを結合することにより、当該シーンを異なる角度から確認できるクリップ集CSが生成される。
これにより、ユーザにとって各シーンで起きた事象を把握しやすいダイジェスト映像DVを生成することができる。
映像に対する画像解析処理によって映像の被写体についての情報やシーンの種別情報などを特定することが可能となる。
これにより、補助情報SDに応じたクリップ集CSを生成することができ、適切なダイジェスト映像DVを生成することができる。
画像解析処理によって映像に写っている人物が適切に特定されることで、選手名などのキーワードを元にクリップ集CSに含めるべきクリップ映像CVを特定することが可能となる。
従って、クリップ映像CVの選択に係る処理負担を軽減することができる。
画像解析処理によって映像に映っているシーンの種別が適切に特定されることで、シーン種別などのキーワードを元にクリップ集CSに含めるべきクリップ映像CVを特定することが可能となる。
従って、クリップ映像CVの選択に係る処理負担を軽減することができる。
画像解析処理によってイン点及びアウト点が特定されることにより、クリップ映像CVとして適切な区間の映像を切り取ることができる。
従って、適切なクリップ集CSの生成及びダイジェスト映像DVの生成を行うことができる。
クリップ映像CVの時間長によっては一つのクリップ集CSに当該シーンを撮像したクリップ映像CVを全て含めることができない場合がある。また、クリップ集CSに含めない方がよいクリップ映像CVなども存在する。
クリップ映像CVごとにスコアリングされることで、適切なクリップ映像CVのみを結合させたクリップ集CSを生成することができる。
また、このようなプログラムは、リムーバブル記録媒体からパーソナルコンピュータ等にインストールする他、ダウンロードサイトから、LAN(Local Area Network)、インターネットなどのネットワークを介してダウンロードすることもできる。
本技術は以下のような構成を採ることもできる。
(1)
イベントにおいて発生したシーンについてのシーン関連情報に基づいて、ダイジェスト映像を生成するための補助情報を特定する特定部を備えた
情報処理装置。
(2)
前記シーン関連情報は他の情報処理装置から配信されるメタデータを含む情報とされた
上記(1)に記載の情報処理装置。
(3)
前記シーン関連情報はソーシャルネットワーキングサービスの利用ユーザによる投稿に関連する情報を含むものとされた
上記(1)から上記(2)の何れかに記載の情報処理装置。
(4)
前記補助情報は、放送映像として採用されたか否かを示す情報とされた
上記(1)から上記(3)の何れかに記載の情報処理装置。
(5)
前記補助情報は、キーワード情報とされた
上記(1)から上記(4)の何れかに記載の情報処理装置。
(6)
前記キーワード情報は、シーンの種別情報とされた
上記(5)に記載の情報処理装置。
(7)
前記キーワード情報は、前記イベントの参加者を特定する情報とされた
上記(5)に記載の情報処理装置。
(8)
前記補助情報は、前記イベントを撮像する複数の撮像装置から得られる一つ以上のクリップ映像が含まれるクリップ集の生成に用いられる情報とされた
上記(1)から上記(7)の何れかに記載の情報処理装置。
(9)
前記クリップ集は、前記イベントにおける特定のシーンを撮像したクリップ映像が結合されたものとされ、
前記補助情報は、予め決められた前記クリップ映像の結合順序の情報を含む
上記(8)に記載の情報処理装置。
(10)
前記結合順序の情報は、前記特定のシーンについてのシーン種別に応じた情報とされた
上記(9)に記載の情報処理装置。
(11)
前記補助情報を用いて前記クリップ集を生成するクリップ集生成部を備えた
上記(8)から上記(10)の何れかに記載の情報処理装置。
(12)
前記クリップ集生成部は、前記クリップ映像を結合することにより前記クリップ集を生成する
上記(11)に記載の情報処理装置。
(13)
前記クリップ集は、前記イベントにおける特定のシーンを撮像したクリップ映像を結合したものとされた
上記(12)に記載の情報処理装置。
(14)
前記クリップ集生成部は、前記イベントを撮像する撮像装置から得られる映像に対する画像解析処理によって得られた解析結果と前記補助情報を用いて前記クリップ集を生成する
上記(11)から上記(13)の何れかに記載の情報処理装置。
(15)
前記画像解析処理は、映像に映っている人物を特定する処理とされた
上記(14)に記載の情報処理装置。
(16)
前記画像解析処理は、映像に映っているシーンの種別を特定する処理とされた
上記(14)に記載の情報処理装置。
(17)
前記画像解析処理は、イン点及びアウト点を特定する処理とされた
上記(14)に記載の情報処理装置。
(18)
前記画像解析処理は、前記クリップ映像ごとにスコアを付与する処理を含む
上記(14)に記載の情報処理装置。
(19)
イベントにおいて発生したシーンについてのシーン関連情報に基づいて、ダイジェスト映像を生成するための補助情報を特定する処理を、コンピュータ装置が実行する
情報処理方法。
(20)
イベントにおいて発生したシーンについてのシーン関連情報に基づいて、ダイジェスト映像を生成するための補助情報を特定する機能を、演算処理装置に実行させる
プログラム。
10 特定部
11 クリップ集生成部
200 メタデータサーバ(他の情報処理装置)
CA 撮像装置
DV ダイジェスト映像
SD 補助情報
CV クリップ映像
CS クリップ集
VA 放送映像
Claims (20)
- イベントにおいて発生したシーンについてのシーン関連情報に基づいて、ダイジェスト映像を生成するための補助情報を特定する特定部を備えた
情報処理装置。 - 前記シーン関連情報は他の情報処理装置から配信されるメタデータを含む情報とされた
請求項1に記載の情報処理装置。 - 前記シーン関連情報はソーシャルネットワーキングサービスの利用ユーザによる投稿に関連する情報を含むものとされた
請求項1に記載の情報処理装置。 - 前記補助情報は、放送映像として採用されたか否かを示す情報とされた
請求項1に記載の情報処理装置。 - 前記補助情報は、キーワード情報とされた
請求項1に記載の情報処理装置。 - 前記キーワード情報は、シーンの種別情報とされた
請求項5に記載の情報処理装置。 - 前記キーワード情報は、前記イベントの参加者を特定する情報とされた
請求項5に記載の情報処理装置。 - 前記補助情報は、前記イベントを撮像する複数の撮像装置から得られる一つ以上のクリップ映像が含まれるクリップ集の生成に用いられる情報とされた
請求項1に記載の情報処理装置。 - 前記クリップ集は、前記イベントにおける特定のシーンを撮像したクリップ映像が結合されたものとされ、
前記補助情報は、予め決められた前記クリップ映像の結合順序の情報を含む
請求項8に記載の情報処理装置。 - 前記結合順序の情報は、前記特定のシーンについてのシーン種別に応じた情報とされた
請求項9に記載の情報処理装置。 - 前記補助情報を用いて前記クリップ集を生成するクリップ集生成部を備えた
請求項8に記載の情報処理装置。 - 前記クリップ集生成部は、前記クリップ映像を結合することにより前記クリップ集を生成する
請求項11に記載の情報処理装置。 - 前記クリップ集は、前記イベントにおける特定のシーンを撮像したクリップ映像を結合したものとされた
請求項12に記載の情報処理装置。 - 前記クリップ集生成部は、前記イベントを撮像する撮像装置から得られる映像に対する画像解析処理によって得られた解析結果と前記補助情報を用いて前記クリップ集を生成する
請求項11に記載の情報処理装置。 - 前記画像解析処理は、映像に映っている人物を特定する処理とされた
請求項14に記載の情報処理装置。 - 前記画像解析処理は、映像に映っているシーンの種別を特定する処理とされた
請求項14に記載の情報処理装置。 - 前記画像解析処理は、イン点及びアウト点を特定する処理とされた
請求項14に記載の情報処理装置。 - 前記画像解析処理は、前記クリップ映像ごとにスコアを付与する処理を含む
請求項14に記載の情報処理装置。 - イベントにおいて発生したシーンについてのシーン関連情報に基づいて、ダイジェスト映像を生成するための補助情報を特定する処理を、コンピュータ装置が実行する
情報処理方法。 - イベントにおいて発生したシーンについてのシーン関連情報に基づいて、ダイジェスト映像を生成するための補助情報を特定する機能を、演算処理装置に実行させる
プログラム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2023517064A JPWO2022230291A1 (ja) | 2021-04-26 | 2022-02-08 | |
EP22795214.0A EP4332871A1 (en) | 2021-04-26 | 2022-02-08 | Information processing device, information processing method, and program |
CN202280029439.XA CN117178285A (zh) | 2021-04-26 | 2022-02-08 | 信息处理装置、信息处理方法和程序 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021074249 | 2021-04-26 | ||
JP2021-074249 | 2021-04-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022230291A1 true WO2022230291A1 (ja) | 2022-11-03 |
Family
ID=83848273
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/004897 WO2022230291A1 (ja) | 2021-04-26 | 2022-02-08 | 情報処理装置、情報処理方法、プログラム |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP4332871A1 (ja) |
JP (1) | JPWO2022230291A1 (ja) |
CN (1) | CN117178285A (ja) |
WO (1) | WO2022230291A1 (ja) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017107404A (ja) | 2015-12-10 | 2017-06-15 | botPress株式会社 | コンテンツ生成装置 |
JP6765558B1 (ja) * | 2020-02-28 | 2020-10-07 | 株式会社ドワンゴ | コンテンツ配信装置、コンテンツ配信方法、コンテンツ配信システム、および、コンテンツ配信プログラム |
-
2022
- 2022-02-08 CN CN202280029439.XA patent/CN117178285A/zh active Pending
- 2022-02-08 WO PCT/JP2022/004897 patent/WO2022230291A1/ja active Application Filing
- 2022-02-08 JP JP2023517064A patent/JPWO2022230291A1/ja active Pending
- 2022-02-08 EP EP22795214.0A patent/EP4332871A1/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017107404A (ja) | 2015-12-10 | 2017-06-15 | botPress株式会社 | コンテンツ生成装置 |
JP6765558B1 (ja) * | 2020-02-28 | 2020-10-07 | 株式会社ドワンゴ | コンテンツ配信装置、コンテンツ配信方法、コンテンツ配信システム、および、コンテンツ配信プログラム |
Non-Patent Citations (2)
Title |
---|
HISASHI MIYAMORI; SATOSHI NAKAMURA; KATSUMI TANAKA: "Method of Automatically Extracting Metadata of TV Programs Using Its Live Chat on the Web", TRANSACTIONS OF THE INFORMATION PROCESSING SOCIETY OF JAPAN, vol. 46, no. SIG18 (TOD28), 15 December 2018 (2018-12-15), JP , pages 59 - 71, XP009540852, ISSN: 1882-7799 * |
YAMAUCHI, TAKANE; KITAYAMA, DAISUKE: "Characteristic scene extraction based on switching viewpoints for automatic digest video generation", THE 6TH FORUM ON DATA ENGINEERING AND INFORMATION MANAGEMENT (THE 12TH ANNUAL MEETING OF THE DATABASE SOCIETY OF JAPAN); MARCH 3 TO 5, 2014, 3 March 2014 (2014-03-03) - 5 March 2014 (2014-03-05), JP, pages 1 - 5, XP009540851 * |
Also Published As
Publication number | Publication date |
---|---|
JPWO2022230291A1 (ja) | 2022-11-03 |
EP4332871A1 (en) | 2024-03-06 |
CN117178285A (zh) | 2023-12-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107615766B (zh) | 用于创建和分配多媒体内容的系统和方法 | |
US8121462B2 (en) | Video edition device and method | |
US20090190804A1 (en) | Electronic apparatus and image processing method | |
JP4692775B2 (ja) | 映像コンテンツ再生支援方法、映像コンテンツ再生支援システム、及び情報配信プログラム | |
CN113841418A (zh) | 动态视频精彩场面 | |
CN112753227A (zh) | 用于在体育事件电视节目中检测人群噪声的发生的音频处理 | |
WO2021241430A1 (ja) | 情報処理装置、情報処理方法、プログラム | |
US9681200B2 (en) | Data processing method and device | |
JP2002335473A (ja) | 動画コンテンツの検索情報抽出システム、検索情報抽出方法、検索情報保存システム、動画コンテンツのストリーミング配信方法 | |
TW201540065A (zh) | 擷取方法及裝置(一) | |
WO2022230291A1 (ja) | 情報処理装置、情報処理方法、プログラム | |
US10200764B2 (en) | Determination method and device | |
WO2014103374A1 (ja) | 情報管理装置、サーバ及び制御方法 | |
JP2012221322A (ja) | オーサリング支援装置、オーサリング支援方法およびプログラム | |
TWI497959B (zh) | Scene extraction and playback system, method and its recording media | |
JP2022067478A (ja) | 情報処理プログラム、装置、及び方法 | |
KR101434783B1 (ko) | 신 프래그먼트 전송 시스템, 신 프래그먼트 전송방법, 및 그 기록매체 | |
JP2012231291A (ja) | 動画編集装置、動画編集方法およびプログラム | |
JP2016004566A (ja) | 提示情報制御装置、方法及びプログラム | |
CN112287771A (zh) | 用于检测视频事件的方法、装置、服务器和介质 | |
JP2010081531A (ja) | 映像処理装置及びその方法 | |
WO2022209648A1 (ja) | 情報処理装置、情報処理方法および非一時的なコンピュータ可読記憶媒体 | |
JP4276638B2 (ja) | 映像編集装置、映像編集方法、映像編集プログラム、及びプログラムの記録媒体 | |
EP3596628B1 (en) | Methods, systems and media for transforming fingerprints to detect unauthorized media content items | |
US20150208122A1 (en) | Extraction method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22795214 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023517064 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18285445 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022795214 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022795214 Country of ref document: EP Effective date: 20231127 |