WO2019188406A1 - Subtitle generation device and subtitle generation program - Google Patents

Subtitle generation device and subtitle generation program Download PDF

Info

Publication number
WO2019188406A1
WO2019188406A1 PCT/JP2019/010807 JP2019010807W WO2019188406A1 WO 2019188406 A1 WO2019188406 A1 WO 2019188406A1 JP 2019010807 W JP2019010807 W JP 2019010807W WO 2019188406 A1 WO2019188406 A1 WO 2019188406A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
displayed
map
unit
display
Prior art date
Application number
PCT/JP2019/010807
Other languages
French (fr)
Japanese (ja)
Inventor
英樹 竹原
Original Assignee
株式会社Jvcケンウッド
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Jvcケンウッド filed Critical 株式会社Jvcケンウッド
Publication of WO2019188406A1 publication Critical patent/WO2019188406A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/438Interfacing the downstream path of the transmission network originating from a server, e.g. retrieving MPEG packets from an IP network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker

Definitions

  • This disclosure relates to a caption generation device and a caption generation program.
  • audio transmitted along with the video may be displayed on the display unit as subtitles.
  • characters for supplementing the video may be displayed as subtitles on the display unit. It is required to display subtitles in a manner corresponding to the display state of the video of one or more channels displayed on the display unit.
  • Embodiments are intended to provide a caption generation device and a caption generation program capable of generating captions in a manner corresponding to the display state of video of one or more channels displayed on a display unit.
  • the text data related to the video is obtained according to the number of videos displayed on the display unit or the display video size indicating the size of the video displayed on the display unit.
  • a caption generation device including a caption summary unit that generates a summary caption.
  • the map scale setting unit for setting the scale of the map displayed on the display unit, and the text data related to the video displayed on the display unit according to the scale of the map
  • a caption generation device includes a caption summary unit that generates a summary caption.
  • a summary subtitle summarizing text data related to the video is generated according to the number of displayed videos or the display video size indicating the size of the displayed video
  • a caption generation program for causing a computer to execute a caption summarization step is provided.
  • the map scale setting step for setting the scale of the displayed map, and the caption summary for generating the summary caption that summarizes the text data related to the video according to the map scale.
  • a caption generation program for causing a computer to execute the steps.
  • captions can be generated in a manner corresponding to the number of channels of one or a plurality of videos displayed on the display unit.
  • FIG. 1 is a block diagram showing a video display device configured to include the caption generation device of the first embodiment.
  • FIG. 2 is a block diagram illustrating a specific configuration example of the summary caption generation unit in FIG.
  • FIG. 3A is a diagram illustrating an example of a one-screen mode in which an image is displayed on the display unit.
  • FIG. 3B is a diagram illustrating an example of a two-screen mode in which an image is displayed on the display unit.
  • FIG. 3C is a diagram illustrating an example of a four-screen mode in which an image is displayed on the display unit.
  • FIG. 4A is a diagram showing the channel importance level and the summary level in the single screen mode in a tabular format.
  • FIG. 4B is a diagram showing the channel importance level and the summary level in the two-screen mode in a tabular format.
  • FIG. 4C is a diagram showing the channel importance level and the summary level in the 4-screen mode in a tabular format.
  • FIG. 5A is a diagram illustrating an example of a picture-in-picture mode in which video is displayed on the display unit.
  • FIG. 5B is a diagram illustrating an example of a picture-out-picture mode in which video is displayed on the display unit.
  • FIG. 6A is a diagram showing, in a tabular form, channel importance levels and summarization degrees in the picture-in-picture mode.
  • FIG. 6B is a diagram showing in a tabular form the channel importance and the summarization degree in the picture-out-picture mode.
  • FIG. 7 is a flowchart illustrating the operation of the caption generation device according to the first embodiment and the process that the caption generation program according to the first embodiment causes the computer to execute.
  • FIG. 8 is a block diagram illustrating a video transmission / reception system including a map display device configured to include the caption generation device of the second embodiment.
  • FIG. 9 is a block diagram illustrating a specific configuration example of a map display device configured to include the caption generation device of the second embodiment.
  • FIG. 10A is a diagram illustrating an example of display state transition when the map scale is changed from 1/10000 to 1 / 50,000.
  • FIG. 10B is a diagram illustrating an example of display state transition when the scale of the map is changed from 1 / 50,000 to 1 / 100,000.
  • FIG. 10A is a diagram illustrating an example of display state transition when the map scale is changed from 1/10000 to 1 / 50,000.
  • FIG. 10B is a diagram illustrating an example of display state transition when the scale of the map is changed from 1 /
  • FIG. 10C is a diagram illustrating an example of transition of the display state when the scale of the map is changed from 1/10000 to 1 / 200,000.
  • FIG. 11 is a diagram illustrating another display method of captions in the caption generation device of the second embodiment.
  • FIG. 12A is a diagram illustrating another example of display state transition when the map scale is changed from 1/10000 to 1 / 50,000.
  • FIG. 12B is a diagram illustrating another example of the transition of the display state when the scale of the map is changed from 1 / 50,000 to 1 / 100,000.
  • FIG. 12C is a diagram illustrating another example of the transition of the display state when the scale of the map is changed from 1/10000 to 1 / 200,000.
  • FIG. 12A is a diagram illustrating another example of display state transition when the map scale is changed from 1/10000 to 1 / 50,000.
  • FIG. 12B is a diagram illustrating another example of the transition of the display state when the scale of the map is changed from 1 / 50,000 to 1 /
  • FIG. 13 is a diagram showing an example in a tabular form in which the summarization degree is set according to the number of display channels that is the number of camera videos displayed in the map.
  • FIG. 14 is a diagram showing an example of setting a summary level according to a map scale in a table format.
  • FIG. 15 is a flowchart illustrating the operation of the caption generation device according to the second embodiment and the process that the caption generation program according to the second embodiment causes the computer to execute.
  • FIG. 16 is a block diagram showing a posted moving image distribution system including the caption generation device according to the third embodiment.
  • FIG. 17A is a diagram illustrating a first display state of a display unit included in a computer that receives a moving image or the like distributed from a content server of the posted moving image distribution system.
  • FIG. 17B is a diagram illustrating a second display state of the display unit included in the computer that receives a moving image or the like distributed from the content server of the posted moving image distribution system.
  • FIG. 1 shows a video display device 10 configured to include the caption generation device of the first embodiment.
  • multimedia streams of channels 1 to n are input to input terminals 41t to 4nt of summary caption generation units 41 to 4n, respectively.
  • the multimedia stream includes a video stream and an audio stream.
  • the video stream includes video data
  • the audio stream includes audio data.
  • Any summary subtitle generation unit of summary subtitle generation units 41 to 4n is referred to as summary subtitle generation unit 4, and any input terminal of input terminals 41t to 4nt is referred to as input terminal 4t.
  • n is an integer of 2 or more.
  • the multimedia stream input to the input terminals 41t to 4nt is distributed from an arbitrary content distribution source such as a terrestrial or satellite television broadcast, an Internet broadcast, and a moving image distribution website.
  • a multimedia stream in which a video shot with a smartphone or a video camera is edited as necessary with a personal computer, a smartphone, a video camera, or the like may be distributed.
  • the summary subtitle generation unit 4 is a video acquisition unit that acquires video or audio that acquires audio related to video. Functions as an acquisition unit.
  • the channel number setting unit 1 sets the number of channels for displaying video based on the video data of the multimedia stream on the display unit 5.
  • the channel number setting unit 1 may set the number of channels.
  • the number of channels is the number of channels displayed on the display unit 5 among the images of channels 1 to n, and is the number of channels set from 1 to the maximum number of channels. As an example, it is assumed here that the maximum number of channels is set to four.
  • the number of channels may be fixed.
  • the channel importance setting unit 2 sets the importance of each channel. In the channel importance setting unit 2, the same importance may be set in advance for all channels. The channel importance setting unit 2 may set the importance higher as the channel number is smaller. When the user operates the operation unit 6, the channel importance level setting unit 2 may set the importance level of each channel. As will be described later, the channel importance setting unit 2 may automatically set the importance of each channel in accordance with the display mode selected by the user by operating the operation unit 6.
  • the summarization degree setting unit 3 displays the video of each channel on the display unit 5 according to the number of channels set by the channel number setting unit 1 and the importance of each channel set by the channel importance setting unit 2.
  • the subtitle summarization degree is an index indicating the degree of summarization of the number of characters of text data to be displayed as subtitles on the display unit 5.
  • Define the degree of summarization with equation (1).
  • the summarization degree in a state where the number of characters of text data is not reduced is set to 100, and the number of characters The amount of summarization is reduced as the amount of reduction increases. That is, the summarization degree here indicates the remaining rate of text data to be displayed as subtitles.
  • Summarization degree (number of characters in summarized text data / number of characters in text data) ⁇ 100 (1)
  • the summary degree setting signal indicating the summary degree of each channel set by the summary degree setting unit 3 is supplied to each summary caption generation unit 4.
  • the summary level setting unit 3 may supply the summary level setting signal to the summary caption generation units 41 to 44.
  • Each summary caption generation unit 4 generates text data (caption data) that is a caption displayed on the display unit 5 together with the video of each channel, based on the audio data accompanying the video data of the multimedia stream.
  • Each summary caption generation unit 4 summarizes the text data according to the input summary degree setting signal and generates summary caption data. Both subtitle data that does not reduce the number of characters of text data and subtitle data that reduces the number of characters of text data may be referred to as summary subtitle data.
  • Each summary caption generation unit 4 supplies video data and audio data included in the multimedia stream and summary caption data to the display unit 5.
  • the display unit 5 includes a drawing unit 51, a display panel 52, an audio processing circuit 53, and a speaker 54.
  • the audio processing circuit 53 and the speaker 54 may be provided outside the display unit 5.
  • each summary caption generation unit 4 stores the input multimedia stream and the generated summary caption data, and stores them in response to a request from the display unit 5 such as VOD (Video on demand). It is also possible to supply the multimedia stream and summary caption data that have been used to the display unit 5.
  • VOD Video on demand
  • the drawing unit 51 draws video data and summary caption data of each channel.
  • the display panel 52 displays a video based on the video data and summary caption data drawn on the drawing unit 51.
  • the video data of each channel and the summary caption data may be reduced.
  • the audio processing circuit 53 performs D / A conversion on the selected audio data among the audio data of the channels 1 to 4 and supplies an analog audio signal to the speaker 54.
  • the speaker 54 outputs sound based on the input analog sound signal.
  • the sound data output as sound by the speaker 54 may be fixed as the sound data of the channel 1, or the user may be configured to select the sound data of any channel by the operation unit 6.
  • the display unit 5 displays the video data selected from the video data and the summary subtitle data supplied from the summary subtitle generation units 41 to 44 and the video based on the summary subtitle data.
  • the display mode can be switched.
  • the display unit 5 displays a display mode in which only video data and summary subtitle data from one summary subtitle generation unit 4 are displayed, and a plurality of video data and summary subtitle data from a plurality of summary subtitle generation units 4. It is possible to switch between display modes for simultaneously displaying video. Details of the display mode will be described later.
  • the summary caption generation unit 4 includes an audio stream acquisition unit 401, an audio recognition unit 402, a caption summary unit 403, and a multiplexing unit 404.
  • the audio stream acquisition unit 401 acquires an audio stream from the multimedia stream input to the input terminal 4t.
  • the audio stream acquisition unit 401 supplies the input multimedia stream to the multiplexing unit 404 and supplies the audio stream to the audio recognition unit 402.
  • the voice recognition unit 402 recognizes voice data included in the voice stream, generates text data, and supplies the text data to the caption summarization unit 403.
  • the caption summarization unit 403 receives a summary degree setting signal.
  • the caption summarizing section 403 summarizes the text data according to the summarization degree indicated by the summarization degree setting signal, and generates summary text data.
  • the caption summarizing unit 403 supplies both the text data before summarization and the summary text data to the multiplexing unit 404 as summary caption data.
  • the subtitle summarization section 403 generates summary subtitle data using a representative extraction type summary as a summary technique.
  • the caption summary unit 403 extracts words with high appearance frequency included in the text data as important words, and generates summary caption data.
  • the caption summary unit 403 may use a generation summary instead of the extraction summary.
  • summary caption data is generated using an expression different from the text data, such as paraphrasing, generalizing, or rearranging the text based on the content of the text data.
  • the multiplexing unit 404 multiplexes the video data and audio data included in the multimedia stream supplied from the audio stream acquisition unit 401 and the summary caption data supplied from the caption summary unit 403 in synchronization.
  • the multiplexing unit 404 supplies the multiplexed data to the display unit 5.
  • FIGS. 3A to 3C An example of a display mode for displaying the video images of channels 1 to 4 on the display unit 5 will be described with reference to FIGS. 3A to 3C.
  • 4A to 4C show examples of channel importance, summarization degree, and display video size of each channel in the display modes of FIGS. 3A to 3C. It is assumed that the video data of each channel is full HD and has 1920 pixels in the horizontal direction and 1080 pixels in the vertical direction, and the display panel 52 is a full HD panel.
  • the display video size indicates the size of the video data displayed on the display unit 5.
  • FIG. 3A shows a display mode in which only the video V1 of channel 1 is displayed on the display panel 52 in full screen.
  • the display mode shown in FIG. 3A will be referred to as a single screen mode.
  • a caption ST1 based on the summary caption data of channel 1 is displayed near the lower end of the video V1.
  • Subtitle ST1 is displayed in one or more lines.
  • the number of characters and the number of lines in one line of the subtitle ST1 and subtitles ST2 to ST4 described later are set to the number of characters and the number of lines that allow the user to recognize the subtitle according to the size and resolution of the display panel 52.
  • the channel importance level setting unit 2 sets the importance level of the channel 1 to 100, and the channels 2 to 4 are not displayed, so the importance levels of the channels 2 to 4 are set to 0. .
  • the summarization degree setting unit 3 sets the summarization degree of subtitle data of channel 1 to 100, and sets the summarization degree of subtitle data of channels 2 to 4 to 0.
  • the display video size of the video of channel 1 is 1920 pixels in the horizontal direction and 1080 pixels in the vertical direction.
  • the text data generated by the speech recognition unit 402 is not reduced and is displayed as the caption ST1.
  • FIG. 3B shows a display mode in which the display video size of the video V1 of channel 1 and the video V2 of channel 2 is reduced and displayed on the display panel 52 side by side.
  • the display mode shown in FIG. 3B is referred to as a two-screen mode.
  • a caption ST1 based on the summary caption data of channel 1 is displayed near the lower end of the video V1
  • a caption ST2 based on the summary caption data of channel 2 is displayed near the lower end of the video V2.
  • the channel importance setting unit 2 sets the importance of the channels 1 and 2 to 100, and the channels 3 and 4 are not displayed, so the importance of the channels 3 and 4 is set to 0.
  • the summarization degree setting unit 3 sets the summarization degree of the caption data of channels 1 and 2 to 70, and sets the summarization degree of the caption data of channels 3 and 4 to 0.
  • the display video size of the video of channels 1 and 2 is 960 pixels in the horizontal direction and 540 pixels in the vertical direction.
  • subtitles ST1 and ST2 based on summary subtitle data obtained by reducing the text data generated by the speech recognition unit 402 by 30% are displayed.
  • the subtitle ST1 in the two-screen mode is a subtitle in which text data is reduced compared to the subtitle ST1 in the one-screen mode.
  • FIG. 3C shows a display mode in which the display image size of the images V1 to V4 of the channels 1 to 4 is reduced and displayed side by side on the display panel 52.
  • the display mode shown in FIG. 3C will be referred to as a 4-screen mode.
  • Subtitles ST1 to ST4 based on the summary caption data of channels 1 to 4 are displayed near the lower ends of the videos V1 to V4.
  • the channel importance setting unit 2 sets the importance of the channels 1 to 4 to 100.
  • the summarization degree setting unit 3 sets the summarization degree of subtitle data of channels 1 to 4 to 25.
  • the display video size of the video of channels 1 to 4 is the same as the display video size of the video of channels 1 and 2 in the 2-screen mode, which is 960 pixels in the horizontal direction and 540 pixels in the vertical direction.
  • subtitles ST1 to ST4 based on summary subtitle data obtained by reducing the text data generated by the speech recognition unit 402 by 75% are displayed.
  • the area where the subtitles ST1 to ST4 are displayed in the 4-screen mode is narrower than the area where the subtitles ST1 and ST2 are displayed in the 2-screen mode.
  • the subtitles ST1 and ST2 in the 4-screen mode are subtitles with text data reduced compared to the subtitles ST1 and ST2 in the 2-screen mode.
  • FIGS. 5A and 5B another example of the display mode for displaying the video images of channels 1 to 4 on the display unit 5 will be described.
  • 6A and 6B show examples of channel importance, summarization degree, and display video size of each channel in the display modes of FIGS. 5A and 5B.
  • FIG. 5A shows a display mode in which the channel 1 video V1 is displayed on the display panel 52 in a full screen, and the display video size of the channel 2 video V2 is reduced and superimposed on the video V1.
  • the display mode shown in FIG. 5A is referred to as a picture-in-picture mode (hereinafter referred to as PIP mode).
  • PIP mode picture-in-picture mode
  • a caption ST1 based on summary caption data of channel 1 is displayed near the lower end of the video V1
  • a caption ST2 based on summary caption data of channel 2 is displayed near the lower end of the video V2.
  • the channel importance level setting unit 2 sets the importance level of the channel 1 to 100, sets the importance level of the channel 2 to 25, and the channels 3 and 4 are not displayed. Set the importance of 3 and 4 to 0.
  • Summarization degree setting unit 3 sets the summarization degree of subtitle data of channel 1 to 100, sets the summarization degree of subtitle data of channel 2 to 25, and sets the summarization degree of subtitle data of channels 3 and 4 to 0. .
  • the display image size of the channel 1 image is 1920 pixels in the horizontal direction and 1080 pixels in the vertical direction
  • the display image size of the image in channel 2 is 960 pixels in the horizontal direction and 540 pixels in the vertical direction.
  • the channel 2 image is superimposed on the channel 1 image, the region of 960 pixels in the horizontal direction and 540 pixels in the vertical direction of the channel 1 image is not displayed.
  • the text data of the channel 1 generated by the speech recognition unit 402 is displayed as subtitle ST1 without being reduced, and the subtitle ST2 based on the summary subtitle data in which the text data of the channel 2 generated by the speech recognition unit 402 is reduced by 75%. Is displayed.
  • the display video size of the video of channel 2 is smaller than that of channel 1. Accordingly, the area in which the subtitle ST2 is displayed in the PIP mode is narrower than the area in which the subtitle ST1 is displayed.
  • the subtitle ST2 in the PIP mode is a subtitle in which text data is reduced compared to the subtitle ST1.
  • FIG. 5B shows a display mode in which the display video size of the video V1 of the channel 1 is reduced to the video V1, and the display video size of the videos V2 to V4 of the channels 2 to 4 is reduced and displayed outside the video V1.
  • the display mode shown in FIG. 5B is referred to as a picture-out-picture mode (hereinafter referred to as POP mode).
  • POP mode picture-out-picture mode
  • Subtitle ST1 based on summary caption data of channel 1 is displayed near the lower end of video V1
  • subtitles ST2 to ST4 based on summary subtitle data of channels 2 to 4 are displayed near the lower end of video V2 to V4. ing.
  • the channel importance level setting unit 2 sets the importance level of the channel 1 to 100, and sets the importance levels of the channels 2 to 4 to 11.
  • Summarization degree setting unit 3 sets the summarization degree of caption data of channel 1 to 56, and sets the summarization degree of caption data of channels 2 to 4 to 6.
  • the display video size of the video of channel 1 is 1440 pixels in the horizontal direction and 810 pixels in the vertical direction
  • the display video size of the video of channels 2 to 4 is 480 pixels in the horizontal direction and 270 pixels in the vertical direction. is there.
  • the subtitle ST1 based on the summary subtitle data generated by the speech recognition unit 402 and reduced by 44% in the channel 1 text data is displayed, and the subtitle based on the summary subtitle data obtained by reducing the text data in the channels 2 to 4 by 94%.
  • ST2 to ST4 are displayed.
  • the display video size of the video of channel 1 is smaller than that of channel 1 in the single screen mode of FIG. 3A or the PIP mode of FIG. 5A. Accordingly, the area in which the subtitle ST1 is displayed in the POP mode is narrower than that in the single-screen mode or the PIP mode.
  • the subtitle ST1 in the POP mode is a subtitle in which text data is reduced as compared with the subtitle ST1 in the one-screen mode or the PIP mode of FIG. 5A.
  • the display video size of the video of channels 2 to 4 is smaller than that of channel 1. Accordingly, the area where the subtitles ST2 to ST4 are displayed in the POP mode is narrower than the area where the subtitle ST1 is displayed.
  • Subtitles ST2 to ST4 in the POP mode are subtitles in which text data is reduced compared to the subtitle ST1.
  • the channel number setting unit 1 sets the number of channels in step S101.
  • the channel importance level setting unit 2 sets the importance level of each channel.
  • the summary level setting unit 3 sets the summary level.
  • step S104 the audio stream acquisition unit 401 separates the video stream and the audio stream and acquires the audio stream. Since the audio stream acquisition unit 401 acquires a video stream in step S104, step S104 is a video acquisition step.
  • step S105 the speech recognition unit 402 recognizes speech data included in the speech stream and generates text data.
  • step S106 the caption summary unit 403 generates summary caption data with the set summary degree. Step S106 is a caption summarization step.
  • step S107 the multiplexing unit 404 multiplexes the video data, audio data, and summary caption data included in the video stream.
  • step S108 the display unit 5 displays video and outputs audio based on the multiplexed data.
  • the channel importance level setting unit 2 and the summary level setting unit 3 determine whether or not an instruction to change the display mode has been given in step S109. If an instruction to change the display mode is given (YES), the processing of steps S102 to S109 is repeated. If an instruction to change the display mode is not given (NO), the summary subtitle generating unit 4 determines whether or not the multimedia stream is continuously input in step S110.
  • the processes in steps S104 to S110 are repeated. If the multimedia stream is not continuously input (NO), the summary subtitle generating unit 4 ends the process.
  • the caption generation device includes the summary degree setting unit 3 and the caption summary unit 403.
  • the summarization degree setting unit 3 is related to the video of each channel according to the number of channels of one or more channels displayed on the display unit 5 (display panel 52) or the display video size of the video of each channel.
  • the summarization degree of subtitles of each channel displayed on the display unit 5 is set.
  • the subtitle summarizing section 403 summarizes the subtitles of each channel according to the subtitle summarization levels set by the summarization degree setting section 3 and generates summary subtitles.
  • captions can be generated in a manner corresponding to the display state of the video of one or a plurality of channels displayed on the display unit 5.
  • the video display device 10 that includes the caption generation device of the first embodiment and displays the video of each channel in any one of a plurality of display modes corresponds to the display state of the video of one or more channels A mode subtitle can be displayed.
  • Each unit of the video display device 10 shown in FIG. 1 and each unit of the summary subtitle generating unit 4 shown in FIG. 2 may be configured by hardware such as an integrated circuit, or may be configured by software (computer program). Good. Use of hardware and software is optional.
  • the flowchart shown in FIG. 7 may be processing that the subtitle generation program of the first embodiment causes the computer to execute.
  • the caption generation program may be transmitted to the video display device 10 via a network such as the Internet, or may be stored in a non-temporary storage medium and provided to the video display device 10.
  • the following configuration can be additionally provided.
  • the number of channels is 16, and the display unit 5 displays a 16-channel reduced video and corresponding subtitles.
  • the line-of-sight detection device selects four channels that are determined to be viewed with interest by the user by detecting the line of sight of the user. As shown in FIG. 3C, the display unit 5 displays the selected 4-channel reduced video and subtitles corresponding thereto.
  • the line-of-sight detection device selects one channel that is determined to be viewed with interest by the user by detecting the line of sight of the user. As shown in FIG. 3A, the display unit 5 displays the selected one-channel video and subtitles corresponding thereto.
  • FIG. 8 shows a video transmission / reception system including a map display device 20 configured to include the caption generation device of the second embodiment.
  • the map display device 20 is connected to a network 50 such as the Internet.
  • the video cameras 301 to 30n and the map providing server 40 are also connected to the network 50.
  • n is an integer of 2 or more. Any video camera among the video cameras 301 to 30n is referred to as a video camera 30.
  • the video shot by one or more video cameras 30 is video of one or more channels.
  • the network interface 201 of the map display device 20 is connected to the network 50.
  • the map display device 20 receives the map data provided from the map providing server 40, the video data transmitted by each video camera 30, and the metadata. Specifically, the map display device 20 receives from each video camera 30 video data captured by each video camera 30 and metadata which is data describing information related to the video data.
  • the map display device 20 acquires video data and metadata by using WEB API (WEB Application Programming INTERFACE) provided by each video camera 30.
  • WEB API WEB Application Programming INTERFACE
  • the WEB API is an interface that the program of one device calls and uses the function provided by the program of the other device via a network.
  • Each metadata is generated and recorded by each video camera 30.
  • the metadata includes, for example, position information of the video camera 30 (that is, shooting location), shooting date, photographer information, producer information, camera number, camera priority, shooting purpose, title, shooting overview, caster name, Arbitrarily entered sentences such as the names of characters are included.
  • the metadata includes at least position information of the video camera 30.
  • the position information of the video camera 30 may include the name of the shooting location such as Tokyo Station or Tokyo International Airport, in addition to the latitude and longitude of the shooting location.
  • the shooting date is information indicating the date and time when the video data was shot.
  • the photographer information is, for example, a photographer's name or an ID that identifies the photographer.
  • the producer information is, for example, a broadcast station name or an ID for identifying the broadcast station.
  • the camera number is a number assigned to each video camera 30.
  • the camera number may be a serial code, for example.
  • the camera priority is information indicating the priority of video data to be displayed.
  • the shooting purpose is, for example, program shooting, landscape shooting, interview, or the like.
  • the title is, for example, a program name of video data.
  • the shooting summary includes position information (ie, shooting location), shooting date, photographer information, producer information, camera number, camera priority, camera serial code, shooting purpose, title, caster name, and character name. Assume that it is summarized and described with other information. In the caster name and the character name, a person name or the like suitable for each purpose is described.
  • the metadata includes the position information of the video camera 30, the metadata does not include the position information of the video camera 30, and the position information of the video camera 30 and the metadata including other contents are separately provided. It is good also as data.
  • the camera video acquisition unit 202 acquires video data transmitted from each video camera 30.
  • the metadata extraction unit 203 acquires metadata. As described above, the camera image acquisition unit 202 and the metadata extraction unit 203 acquire the image data and metadata transmitted from each video camera 30 using the WEB API provided by each video camera 30. May be.
  • the map data acquisition unit 209 acquires map data. The user can change the center position of the map displayed on the display panel 2131 of the display unit 213 by operating the operation unit 214, and can also change the scale of the map.
  • the map center position setting unit 216 sets the center position of the map displayed on the display panel 2131 according to the operation of the operation unit 214 by the user, and the map scale setting unit 217 is set according to the operation of the operation unit 214 by the user.
  • the scale of the map displayed on the display panel 2131 is set.
  • the map data acquisition unit 209 acquires a map of the set center position and scale from the map providing server 40.
  • the metadata of each video camera 30 acquired by the metadata extraction unit 203 is supplied to the camera position acquisition unit 204 and the caption information acquisition unit 205.
  • the camera position acquisition unit 204 acquires the latitude / longitude of the shooting location included in the position information of the metadata.
  • the camera position acquisition unit 204 is a position acquisition unit that acquires position information indicating a position where a video is taken.
  • the subtitle information acquisition unit 205 acquires text data displayed as subtitles on the display panel 2131 among the text data described in the metadata, and supplies the text data to the subtitle summary unit 208.
  • the position information of the video camera 30 can be acquired through a route different from the metadata.
  • the router may notify the map display device 20 of location information via a route different from the metadata. Good.
  • the metadata extraction unit 203 acquires the position information using the WEB API provided by the router.
  • the connection status between the video camera 30 and the router is provided by WEB API information provided by the router
  • the video display unit 20 associates the video camera 30 with the router in the map display device 20, and the router location information is obtained from the video camera 30. It is position information.
  • the text data displayed as subtitles may be the name of the shooting location, the shooting purpose, or an arbitrarily entered sentence, and is arbitrary. Text data to be displayed as subtitles may be determined in advance, or may be configured so that the user can select by operating the operation unit 214.
  • the metadata includes only position information of the video camera 30, the shooting location is displayed as a caption.
  • the display area inside / outside determination unit 206 receives information indicating the map center position set by the map center position setting unit 216 and the map scale set by the map scale setting unit 217.
  • the display area inside / outside determination unit 206 has information on the screen size of the display panel 2131.
  • the display area inside / outside determination unit 206 determines whether each video camera 30 is located within the display area of the map displayed on the display panel 2131 based on the input information indicating the center position of the map and the scale of the map. It is determined whether it is located outside the display area.
  • the display area inside / outside determination unit 206 functions as a determination unit that determines the number of videos to be displayed on the display unit 213 from the scale of the map and the position information indicating the position where the video was shot.
  • the display area inside / outside determination unit 206 supplies information on the number of video cameras 30 located within the display area of the map displayed on the display panel 2131 among the video cameras 301 to 30n to the summary degree setting unit 207. .
  • the summarization degree setting unit 207 sets the summarization degree according to the number of video cameras 30 located within the map display area.
  • the subtitle summarizing section 208 summarizes the text data supplied from the subtitle information acquiring section 205 according to the summarization degree set by the summarization degree setting section 207, and generates summary subtitle data.
  • the caption summary unit 403 creates summary caption data using an extraction summary or a generation summary according to the degree of summary.
  • the caption summarizing unit 208 is configured to display position information, shooting date, photographer information, producer information, camera number, camera priority, shooting purpose, title, shooting outline, caster according to the degree of summarization. Summarize by selecting one or more items such as names and characters.
  • the caption summarization unit 208 selects 10 items from the above-described items.
  • the caption summary unit 208 generates summary caption data including information on the selected 10 items.
  • the caption summarizing unit 208 selects two items from the above-described items.
  • the caption summary unit 208 generates summary caption data including information on the two selected items.
  • priorities may be set in advance for each item included in the metadata or the like.
  • the caption summary unit 208 may be configured to select an item with a higher priority based on the priority.
  • the caption summarizing unit 208 may be configured to create summary caption data using an extraction summary or a generation summary according to the degree of summarization.
  • the caption summary unit 208 may create summary caption data from the shooting summary using the extraction type summary or the generation type summary according to the degree of summary.
  • the summary caption data based on the text data supplied from each video camera 30 is supplied to the image composition unit 212.
  • the video reduction unit 210 reduces the video data supplied from each video camera 30 to generate reduced video data (thumbnail image).
  • the video reduction rate instruction unit 211 instructs the video reduction unit 210 to reduce the video data.
  • the reduction ratio may be fixed. When the number of pixels of one frame of video data from each video camera 30 is sufficiently smaller than the number of pixels of the display panel 2131, the video data may not be reduced.
  • the video reduction rate instruction unit 211 may be supplied with information on the number of video cameras 30 located in the display area of the map displayed on the display panel 2131 from the display area inside / outside determination unit 206.
  • the video reduction rate instructing unit 211 may change the video data reduction rate in accordance with the input information on the number of video cameras 30.
  • the image composition unit 212 synthesizes the map data supplied from the map data acquisition unit 209, the reduced video data supplied from the video reduction unit 210, and the summary caption data supplied from the caption summary unit 208.
  • the reduced video data is synthesized based on the position information of the video camera 30 so as to be arranged at the position indicated by the position information of the video camera 30 on the map data.
  • the image composition unit 212 supplies the composite image data to the display unit 213.
  • the video camera 30 located in the display area transmits to the display panel 2131 the map at a predetermined scale of the predetermined display area that the user has operated to display on the display panel 2131.
  • the reduced video data and the summary caption related to the video are superimposed and displayed. This makes it possible to visualize who is shooting what video for what purpose at which position on the map.
  • the map display device 20 may be configured to receive audio data of sound collected at the time of shooting by each video camera 30, reproduce the audio data of the selected video camera 30, and output from the speaker.
  • the metadata extraction unit 203 functions as a voice acquisition unit that acquires voice data and a voice recognition unit that generates text data related to the video based on the acquired voice data. That is, each video camera 30 records the sound collected at the time of shooting the video data as sound data. Each video camera 30 transmits video data and audio data to the map display device 20 via the network 50. The metadata extraction unit 203 acquires audio data transmitted from each video camera 30. The metadata extraction unit 203 recognizes the acquired voice data to generate text data, and supplies the text data to the camera position acquisition unit 204 and the caption information acquisition unit 205 as metadata.
  • Metadata can be created from sound collected at the time of shooting video data, and it is not necessary to prepare metadata in advance, so that the burden on the photographer is reduced.
  • each video camera 30 is not limited to recording the sound collected when shooting the video data as the sound data.
  • Each video camera 30 may record, as audio data, sound collected when the video data is not captured, in addition to the sound collected when the video data is captured.
  • the video camera 30 collects sound including information such as an outline of shooting other than during shooting and records it as audio data.
  • the metadata extraction unit 203 may acquire the voice data and generate text data based on the voice data. In other words, the metadata extraction unit 203 may generate text data based on audio data associated with video data.
  • the display order setting unit 215 sets the display order when the reduced video data in the plurality of video cameras 30 overlap.
  • the display order of the reduced video data captured by the camera with the higher priority may be increased based on the priority of the camera.
  • the display order setting unit 215 may change the display order.
  • FIG. 10A An example of display state transition when the map scale is changed will be described with reference to FIGS. 10A to 10C.
  • FIG. 10A only the video camera 30 with the camera number 01 is located in the map M1 in a state where the map M1 with a scale of 1/10000 is displayed on the display panel 2131.
  • the camera video Ci1 and the subtitle CST1 of the camera video Ci1 are superimposed and displayed.
  • the subtitle CST1 is displayed outside the camera video Ci1.
  • the summarization degree is set to 100. Based on the degree of summarization, position information, shooting date, photographer information, producer information, camera number, camera priority, shooting purpose, title, caster name, character name, and summary caption data including 10 items Is generated and the summary caption data is displayed as captions.
  • the camera video indicates the reduced video data described above, and the same applies to the following.
  • the video camera 30 with the camera numbers 01 to 03 is positioned in the map M5 in a state where the map M5 of 1 / 50,000 scale is displayed on the display panel 2131.
  • the center position of the map M1 is displaced to the right of the map M5.
  • camera video images Ci1 to Ci3 and captions CST1 to CST3 of the camera video images Ci1 to Ci3 are displayed in a superimposed manner.
  • the number of videos of the video camera 30 displayed on the map M5 is larger than that displayed on the map M1. Accordingly, the area where the subtitles CST1 to CST3 are displayed on the map M5 is narrower than the area where the subtitle CST1 is displayed on the map M1.
  • the subtitle CST1 in the map M5 is a subtitle in which text data is reduced compared to the subtitle CST1 in the map M1.
  • the summarization degree is set to 70, position information, shooting date, photographer information, producer information, camera number, shooting purpose, title Summary subtitle data including the following seven items is displayed as subtitles.
  • FIG. 10B when the user further changes the scale of the map to 1 / 100,000, the video camera 30 with the camera numbers 01 to 06 is displayed with the map M10 having a scale of 1 / 100,000 displayed on the display panel 2131. Located in the map M10. Here, the center position of the map M5 is displaced to the right of the map M10. On the map M10, camera images Ci1 to Ci6 and captions CST1 to CST6 of the camera images Ci1 to Ci6 are displayed in a superimposed manner.
  • the number of videos of the video camera 30 displayed on the map M10 is larger than that displayed on the map M5. Accordingly, the area where the subtitles CST1 to CST6 are displayed on the map M10 is narrower than the area where the subtitles CST1 to CST3 are displayed on the map M5.
  • the subtitles CST1 to CST3 in the map M10 are subtitles with text data reduced from the subtitles CST1 to CST3 in the map M5.
  • the summary degree is set to 20
  • summary caption data including two items of photographer information and title is displayed as captions.
  • the display order setting unit 215 positions the image located on the lower side by clicking the image located on the lower side or the like. be able to.
  • the video camera 30 with the camera numbers 01 to 10 is displayed with the map M20 with a scale of 1 / 200,000 displayed on the display panel 2131.
  • the center position of the map M10 is displaced to the left of the map M20.
  • camera videos Ci1 to Ci10 and captions CST1 to CST10 of the camera videos Ci1 to Ci10 are displayed in a superimposed manner.
  • the number of videos of the video camera 30 displayed on the map M20 is larger than that displayed on the map M10. Accordingly, the area where the subtitles CST1 to CST10 are displayed on the map M20 is narrower than the area where the subtitles CST1 to CST6 are displayed on the map M10.
  • the subtitles CST1 to CST6 in the map M20 are subtitles with text data reduced compared to the subtitles CST1 to CST6 in the map M10.
  • the summarization degree is set to 10
  • the summary caption data including one item as the title is displayed as the caption.
  • icons smaller than the camera images Ci1 to Ci10 indicating the respective video cameras 30 may be displayed instead of the camera images Ci1 to Ci10.
  • the video camera 30 can be identified by assigning a camera number to the icon.
  • each subtitle may be displayed larger than the icon so that each subtitle can be easily recognized.
  • the map scale is the same (for example, the map scale is 1 / 100,000)
  • the content of the camera video is supplemented as compared with the case where the camera video is displayed. Need to explain. Therefore, when the camera video is not displayed, the number of metadata items included in the summary subtitle data may be increased or the number of characters of the summary subtitle data may be increased as compared with the case where the camera video is displayed. .
  • the summarization degree is set larger than when the camera video is displayed even if the map scale is the same.
  • the summarization degree is set to 20
  • summary caption data including two items of photographer information and title is displayed as captions. Is done.
  • the summarization degree is set to 30, and the summary caption including the three items of photographer information, producer information, and title Data is displayed as subtitles.
  • the subtitles CST1 to CST10 are displayed so as to be adjacent to the camera videos Ci1 to Ci10.
  • an area for displaying subtitles CST1 to CST3 may be set, for example, at the right end of the maps M1 and M5, and the subtitles CST1 to CST3 may be displayed separately from the camera videos Ci1 to Ci3. The same applies to the maps M10 and M20.
  • the captions CST1 to CST10 are displayed on the outside so as to be adjacent to the camera videos Ci1 to Ci10.
  • a region for displaying CST3 may be set, and subtitles CST1 to CST3 may be displayed in a state of being separated from the camera videos Ci1 to Ci3.
  • the summarization degree of the caption summary data displayed outside so as to be adjacent to the camera video and the summarization degree of the caption summary data displayed in a state separated from the camera video may be changed.
  • the size of the subtitle display area is changed without changing the size of the camera video display area according to the scale of the map.
  • the type of subtitles to be displayed is reduced step by step so as not to be displayed step by step from the lowest priority.
  • the subtitle information interpolates who is shooting what video at what position on the map for what purpose. Can be visualized.
  • 12A to 12C another example of the display state transition when the map scale is changed will be described.
  • 12A to 12C in all of the maps M1 to M20, the size of the camera video is varied according to the number of videos of the video camera 30 displayed on each map.
  • each caption is displayed inside each camera video except for the camera videos Ci1 to Ci10 displayed on the map M20.
  • subtitles may be displayed outside each camera video adjacent to each camera video, and each subtitle is displayed in the same manner as in FIG. And each subtitle may be displayed in a state separated from each camera video.
  • FIG. 13 shows an example in which the summarization degree is set according to the number of display channels, which is the number of camera videos displayed in the map.
  • the number of display channels is 1 to 2, 3 to 5, 6 to 10, 11 to 20, 21 or more, and the summarization degrees are 100, 80, 40, 10, 5 respectively.
  • the number of display channels may be a predetermined number or more, the summarization degree may be set to 0, and captions may not be displayed.
  • the number of display channels may be 21 or more, and the summarization degree may be 0.
  • FIG. 14 shows an example of setting the summarization degree according to the scale of the map.
  • the scale of the map is less than 1 / 10,000, less than 1 / 10,000, less than 1 / 50,000, more than 1 / 50,000, less than 1 / 100,000 and more than 1 / 100,000.
  • the summarization degree may be set to 0 so that no caption is displayed.
  • Step S202 is a video acquisition step.
  • the display area inside / outside determination unit 206 detects the video camera 30 located in the map displayed on the display unit 213 (display panel 2131) in step S203.
  • the summarization degree setting unit 207 sets the summarization degree according to the number of video cameras 30 located in the map displayed on the display unit 213.
  • the caption summarizing section 208 summarizes the text data described in the metadata according to the degree of summarization to generate summarized caption data.
  • Step S205 is a caption summarization step.
  • the video reduction unit 210 reduces the video data transmitted from the video camera 30 located in the map displayed on the display unit 213 in step S206.
  • step S207 the image composition unit 212 synthesizes the map data, the reduced video data, and the summary caption data.
  • the display unit 213 displays the composite image.
  • the map center position setting unit 216 and the map scale setting unit 217 determine whether or not the map center position or scale has been changed in step S209.
  • the process of setting the map scale by the map scale setting unit 217 is a map scale setting step. If the center position or scale of the map is changed (YES), the processes in steps S201 to S209 are repeated. If the map center position or scale is not changed (NO), in step S210, the map display device 20 determines whether or not an instruction to end map display has been given by the operation unit 214.
  • the video camera 30 may transmit still images to the map display device 20 at predetermined intervals.
  • the video camera 30 may capture a still image and transmit it to the map display device 20, or a still camera may capture a still image and transmit it to the map display device 20 instead of the video camera 30. . That is, it is only necessary that an image obtained by photographing the subject by the camera is transmitted to the map display device 20.
  • the predetermined interval is, for example, 3 seconds.
  • the plurality of camera images may be sequentially displayed at predetermined time intervals so that the plurality of camera images are not displayed simultaneously.
  • Each part of the map display device 20 shown in FIG. 9 may be configured by hardware such as an integrated circuit, or may be configured by software (computer program). Use of hardware and software is optional.
  • the flowchart shown in FIG. 15 may be processing that the subtitle generation program of the second embodiment causes the computer to execute.
  • the map display device 20 shown in FIG. 9 can be configured by a browser which is software for viewing a map.
  • the caption generation program may be transmitted to the map display device 20 via the network 50, or may be stored in a non-temporary storage medium and provided to the map display device 20.
  • FIG. 16 shows a posted moving image distribution system including the caption generation device according to the third embodiment.
  • a content server 60 and a computer 70 are connected to the network 50.
  • the content server 60 includes a moving image storage unit 601 that stores posted moving images, a thumbnail image generation unit 602 that generates thumbnail images of posted moving images, and a text data storage unit 603 that stores text data associated with each posted moving image. Prepare.
  • the content server 60 includes a summary degree setting unit 604 and a caption summary unit 605.
  • the text data may be a character describing the outline of the content of the posted moving image, may be a character supplementing the content of the posted moving image, or may be a comment related to the posted moving image.
  • the posted moving image distributed by the content server 60 or the thumbnail image of the posted moving image is a video of one or a plurality of channels.
  • the computer 70 receives a posted moving image or a thumbnail image distributed by the content server 60.
  • the storage unit 701 provided in the computer 70 stores a browser 702 that is software for viewing a posted moving image provided by the content server 60.
  • the computer 70 can display a thumbnail image for selecting a posted moving image on the display unit 703 by executing the browser 702, or can display a posted moving image by selecting a thumbnail image.
  • FIG. 17A shows an example of the display state of the display unit 703 when the computer 70 instructs the content server 60 to display a large thumbnail image.
  • the subtitle TST1 of the thumbnail image Ti1 is displayed adjacent to the thumbnail image Ti1 of the posted moving image 001
  • the subtitle TST2 of the thumbnail image Ti2 is displayed adjacent to the thumbnail image Ti2 of the posted moving image 002.
  • the summary degree setting unit 604 sets a summary degree that does not reduce the number of characters in the text data or does not reduce the number of characters in response to an instruction to display a large thumbnail image.
  • the subtitle summarizing section 605 summarizes the text data with the summarization degree set by the summarization degree setting section 604 to generate the subtitle data of the subtitles TST1 and TST2.
  • the content server 60 distributes the video data of the large thumbnail images Ti1 and Ti2 and the summary caption data of the captions TST1 and TST2 to the computer 70.
  • FIG. 17B shows an example of the display state of the display unit 703 when the computer 70 instructs the content server 60 to display a thumbnail image having a small size.
  • Subtitles TST1 to TST9 are displayed adjacent to thumbnail images Ti1 to Ti9 of posted moving images 001 to 009.
  • the summary degree setting unit 604 sets a summary degree that reduces the number of characters in the text data in response to an instruction to display a small thumbnail image.
  • the caption summarizing section 605 summarizes the text data with the summarization degree set by the summarization degree setting section 604, and generates the summarization caption data of the captions TST1 to TST9.
  • the content server 60 distributes video data of small thumbnail images Ti1 to Ti9 and summary caption data of captions TST1 to TST9 to the computer 70.
  • the content server 60 distributes the moving image data of the posted moving image to the computer 70.
  • the summary degree setting unit 604 and the caption summary unit 605 are provided in the content server 60, but the browser 702 has the same function, and the display of FIGS. 17A and 17B. It is also possible to realize the state.
  • captions are generated in a mode corresponding to the number of channels of one or a plurality of videos displayed on the display units 5, 213, and 703. Can be generated.
  • the video display device 10 the map display device 20, and the posted moving image distribution system (computer 70) that include the caption generation device of the first to third embodiments or execute the caption generation program of the first to third embodiments. For example, even if the number of video channels increases, the user can comprehensively grasp the subtitles of each channel.

Abstract

A subtitle generation device equipped with an image acquisition unit (summarized subtitle generation units 41-4n) and a subtitle summarization unit (403). The image acquisition unit acquires an image. The subtitle summarization unit (403) generates summarized subtitles which summarize the text data pertaining to the image, according to the number of images displayed on a display unit (5) or the display image size expressing the size of the images displayed on the display unit (5).

Description

字幕生成装置及び字幕生成プログラムSubtitle generating apparatus and subtitle generating program
 本開示は、字幕生成装置及び字幕生成プログラムに関する。 This disclosure relates to a caption generation device and a caption generation program.
 近年、地上波または衛星によるテレビジョン放送、インターネット放送、動画配信ウェブサイト等、ユーザが見ることができる映像のチャネル数が増大している。 In recent years, the number of video channels that can be viewed by users, such as terrestrial or satellite television broadcasting, Internet broadcasting, and video distribution websites, has increased.
特開2002-223399号公報JP 2002-223399 A 特開2010-81149号公報JP 2010-81149 A 特開2013-183217号公報JP 2013-183217 A
 映像を表示部に表示する際に、映像に付随して送信される音声を字幕として表示部に表示することがある。また、表示部に、映像に加えて、映像を補足するための文字を字幕として表示することがある。表示部に表示される1または複数のチャネルの映像の表示状態に応じた態様で字幕を表示することが求められる。 When displaying video on the display unit, audio transmitted along with the video may be displayed on the display unit as subtitles. In addition to the video, characters for supplementing the video may be displayed as subtitles on the display unit. It is required to display subtitles in a manner corresponding to the display state of the video of one or more channels displayed on the display unit.
 実施形態は、表示部に表示される1または複数のチャネルの映像の表示状態に応じた態様で字幕を生成することができる字幕生成装置及び字幕生成プログラムを提供することを目的とする。 Embodiments are intended to provide a caption generation device and a caption generation program capable of generating captions in a manner corresponding to the display state of video of one or more channels displayed on a display unit.
 実施形態の第1の態様によれば、表示部に表示される映像の数、または前記表示部に表示される映像の大きさを示す表示映像サイズに応じて、前記映像に関連するテキストデータを要約した要約字幕を生成する字幕要約部を備える字幕生成装置が提供される。 According to the first aspect of the embodiment, the text data related to the video is obtained according to the number of videos displayed on the display unit or the display video size indicating the size of the video displayed on the display unit. Provided is a caption generation device including a caption summary unit that generates a summary caption.
 実施形態の第2の態様によれば、表示部に表示される地図の縮尺を設定する地図縮尺設定部と、前記地図の縮尺に応じて、前記表示部に表示される映像に関連するテキストデータを要約した要約字幕を生成する字幕要約部とを備える字幕生成装置が提供される。 According to the second aspect of the embodiment, the map scale setting unit for setting the scale of the map displayed on the display unit, and the text data related to the video displayed on the display unit according to the scale of the map A caption generation device is provided that includes a caption summary unit that generates a summary caption.
 実施形態の第3の態様によれば、表示される映像の数、または前記表示される映像の大きさを示す表示映像サイズに応じて、前記映像に関連するテキストデータを要約した要約字幕を生成する字幕要約ステップをコンピュータに実行させる字幕生成プログラムが提供される。 According to the third aspect of the embodiment, a summary subtitle summarizing text data related to the video is generated according to the number of displayed videos or the display video size indicating the size of the displayed video A caption generation program for causing a computer to execute a caption summarization step is provided.
 実施形態の第4の態様によれば、表示される地図の縮尺を設定する地図縮尺設定ステップと、前記地図の縮尺に応じて、映像に関連するテキストデータを要約した要約字幕を生成する字幕要約ステップとをコンピュータに実行させる字幕生成プログラムが提供される。 According to the fourth aspect of the embodiment, the map scale setting step for setting the scale of the displayed map, and the caption summary for generating the summary caption that summarizes the text data related to the video according to the map scale. There is provided a caption generation program for causing a computer to execute the steps.
 実施形態の字幕生成装置及び字幕生成プログラムによれば、表示部に表示される1または複数の映像のチャネル数に応じた態様で字幕を生成することができる。 According to the caption generation device and the caption generation program of the embodiment, captions can be generated in a manner corresponding to the number of channels of one or a plurality of videos displayed on the display unit.
図1は、第1実施形態の字幕生成装置を含んで構成される映像表示装置を示すブロック図である。FIG. 1 is a block diagram showing a video display device configured to include the caption generation device of the first embodiment. 図2は、図1における要約字幕生成部の具体的な構成例を示すブロック図である。FIG. 2 is a block diagram illustrating a specific configuration example of the summary caption generation unit in FIG. 図3Aは、表示部に映像を表示する1画面モードの一例を示す図である。FIG. 3A is a diagram illustrating an example of a one-screen mode in which an image is displayed on the display unit. 図3Bは、表示部に映像を表示する2画面モードの一例を示す図である。FIG. 3B is a diagram illustrating an example of a two-screen mode in which an image is displayed on the display unit. 図3Cは、表示部に映像を表示する4画面モードの一例を示す図である。FIG. 3C is a diagram illustrating an example of a four-screen mode in which an image is displayed on the display unit. 図4Aは、1画面モードにおけるチャネル重要度及び要約度合を表形式で示す図である。FIG. 4A is a diagram showing the channel importance level and the summary level in the single screen mode in a tabular format. 図4Bは、2画面モードにおけるチャネル重要度及び要約度合を表形式で示す図である。FIG. 4B is a diagram showing the channel importance level and the summary level in the two-screen mode in a tabular format. 図4Cは、4画面モードにおけるチャネル重要度及び要約度合を表形式で示す図である。FIG. 4C is a diagram showing the channel importance level and the summary level in the 4-screen mode in a tabular format. 図5Aは、表示部に映像を表示するピクチャ・イン・ピクチャモードの一例を示す図である。FIG. 5A is a diagram illustrating an example of a picture-in-picture mode in which video is displayed on the display unit. 図5Bは、表示部に映像を表示するピクチャ・アウト・ピクチャモードの一例を示す図である。FIG. 5B is a diagram illustrating an example of a picture-out-picture mode in which video is displayed on the display unit. 図6Aは、ピクチャ・イン・ピクチャモードにおけるチャネル重要度及び要約度合を表形式で示す図である。FIG. 6A is a diagram showing, in a tabular form, channel importance levels and summarization degrees in the picture-in-picture mode. 図6Bは、ピクチャ・アウト・ピクチャモードにおけるチャネル重要度及び要約度合を表形式で示す図である。FIG. 6B is a diagram showing in a tabular form the channel importance and the summarization degree in the picture-out-picture mode. 図7は、第1実施形態の字幕生成装置の動作、及び第1実施形態の字幕生成プログラムがコンピュータに実行させる処理を示すフローチャートである。FIG. 7 is a flowchart illustrating the operation of the caption generation device according to the first embodiment and the process that the caption generation program according to the first embodiment causes the computer to execute. 図8は、第2実施形態の字幕生成装置を含んで構成される地図表示装置を備える映像送受信システムを示すブロック図である。FIG. 8 is a block diagram illustrating a video transmission / reception system including a map display device configured to include the caption generation device of the second embodiment. 図9は、第2実施形態の字幕生成装置を含んで構成される地図表示装置の具体的な構成例を示すブロック図である。FIG. 9 is a block diagram illustrating a specific configuration example of a map display device configured to include the caption generation device of the second embodiment. 図10Aは、地図の縮尺を1万分の1から5万分の1に変更したときの表示状態の遷移の例を示す図である。FIG. 10A is a diagram illustrating an example of display state transition when the map scale is changed from 1/10000 to 1 / 50,000. 図10Bは、地図の縮尺を5万分の1から10万分の1に変更したときの表示状態の遷移の例を示す図である。FIG. 10B is a diagram illustrating an example of display state transition when the scale of the map is changed from 1 / 50,000 to 1 / 100,000. 図10Cは、地図の縮尺を10万分の1から20万分の1に変更したときの表示状態の遷移の例を示す図である。FIG. 10C is a diagram illustrating an example of transition of the display state when the scale of the map is changed from 1/10000 to 1 / 200,000. 図11は、第2実施形態の字幕生成装置における字幕の他の表示方法を示す図である。FIG. 11 is a diagram illustrating another display method of captions in the caption generation device of the second embodiment. 図12Aは、地図の縮尺を1万分の1から5万分の1に変更したときの表示状態の遷移の他の例を示す図である。FIG. 12A is a diagram illustrating another example of display state transition when the map scale is changed from 1/10000 to 1 / 50,000. 図12Bは、地図の縮尺を5万分の1から10万分の1に変更したときの表示状態の遷移の他の例を示す図である。FIG. 12B is a diagram illustrating another example of the transition of the display state when the scale of the map is changed from 1 / 50,000 to 1 / 100,000. 図12Cは、地図の縮尺を10万分の1から20万分の1に変更したときの表示状態の遷移の他の例を示す図である。FIG. 12C is a diagram illustrating another example of the transition of the display state when the scale of the map is changed from 1/10000 to 1 / 200,000. 図13は、地図内に表示されるカメラ映像の数である表示チャネル数に応じて要約度合を設定する場合の例を表形式で示す図である。FIG. 13 is a diagram showing an example in a tabular form in which the summarization degree is set according to the number of display channels that is the number of camera videos displayed in the map. 図14は、地図の縮尺に応じて要約度合を設定する場合の例を表形式で示す図である。FIG. 14 is a diagram showing an example of setting a summary level according to a map scale in a table format. 図15は、第2実施形態の字幕生成装置の動作、及び第2実施形態の字幕生成プログラムがコンピュータに実行させる処理を示すフローチャートである。FIG. 15 is a flowchart illustrating the operation of the caption generation device according to the second embodiment and the process that the caption generation program according to the second embodiment causes the computer to execute. 図16は、第3実施形態の字幕生成装置を含んで構成される投稿動画像配信システムを示すブロック図である。FIG. 16 is a block diagram showing a posted moving image distribution system including the caption generation device according to the third embodiment. 図17Aは、投稿動画像配信システムのコンテンツサーバから配信される動画像等を受信するコンピュータが備える表示部の第1の表示状態を示す図である。FIG. 17A is a diagram illustrating a first display state of a display unit included in a computer that receives a moving image or the like distributed from a content server of the posted moving image distribution system. 図17Bは、投稿動画像配信システムのコンテンツサーバから配信される動画像等を受信するコンピュータが備える表示部の第2の表示状態を示す図である。FIG. 17B is a diagram illustrating a second display state of the display unit included in the computer that receives a moving image or the like distributed from the content server of the posted moving image distribution system.
 以下、各実施形態の字幕生成装置及び字幕生成プログラムについて、添付図面を参照して説明する。 Hereinafter, the caption generation device and the caption generation program of each embodiment will be described with reference to the accompanying drawings.
<第1実施形態>
 図1は、第1実施形態の字幕生成装置を含んで構成される映像表示装置10を示す。図1において、要約字幕生成部41~4nの入力端子41t~4ntには、それぞれ、チャネル1~nのマルチメディアストリームが入力される。マルチメディアストリームは、映像ストリーム及び音声ストリームを含む。映像ストリームは映像データを含み、音声ストリームは音声データを含む。
<First Embodiment>
FIG. 1 shows a video display device 10 configured to include the caption generation device of the first embodiment. In FIG. 1, multimedia streams of channels 1 to n are input to input terminals 41t to 4nt of summary caption generation units 41 to 4n, respectively. The multimedia stream includes a video stream and an audio stream. The video stream includes video data, and the audio stream includes audio data.
 要約字幕生成部41~4nのうちの任意の要約字幕生成部を要約字幕生成部4と称し、入力端子41t~4ntのうちの任意の入力端子を入力端子4tと称することとする。nは2以上の整数である。入力端子41t~4ntに入力されるマルチメディアストリームは、地上波または衛星によるテレビジョン放送、インターネット放送、動画配信ウェブサイト等の任意のコンテンツ配信元から配信される。また、スマートフォンまたはビデオカメラ等で撮影された映像が、パーソナルコンピュータ、スマートフォン、ビデオカメラ等で必要に応じて編集されたマルチメディアストリームが配信されてもよい。 Any summary subtitle generation unit of summary subtitle generation units 41 to 4n is referred to as summary subtitle generation unit 4, and any input terminal of input terminals 41t to 4nt is referred to as input terminal 4t. n is an integer of 2 or more. The multimedia stream input to the input terminals 41t to 4nt is distributed from an arbitrary content distribution source such as a terrestrial or satellite television broadcast, an Internet broadcast, and a moving image distribution website. In addition, a multimedia stream in which a video shot with a smartphone or a video camera is edited as necessary with a personal computer, a smartphone, a video camera, or the like may be distributed.
 なお、要約字幕生成部4には映像ストリーム及び音声ストリームを含むマルチメディアストリームが入力されることから、要約字幕生成部4は映像を取得する映像取得部、または映像に関連する音声を取得する音声取得部として機能する。 Note that since the multimedia stream including the video stream and the audio stream is input to the summary subtitle generation unit 4, the summary subtitle generation unit 4 is a video acquisition unit that acquires video or audio that acquires audio related to video. Functions as an acquisition unit.
 チャネル数設定部1は、表示部5にマルチメディアストリームの映像データに基づく映像を表示するチャネル数を設定する。ユーザが操作部6を操作することによって、チャネル数設定部1がチャネル数を設定してもよい。チャネル数とは、チャネル1~nの映像のうち表示部5に表示されるチャネル数であり、1から最大のチャネル数までで設定されるチャンネル数である。一例として、ここでは最大のチャネル数が4に設定されているとする。チャネル数は固定であってもよい。 The channel number setting unit 1 sets the number of channels for displaying video based on the video data of the multimedia stream on the display unit 5. When the user operates the operation unit 6, the channel number setting unit 1 may set the number of channels. The number of channels is the number of channels displayed on the display unit 5 among the images of channels 1 to n, and is the number of channels set from 1 to the maximum number of channels. As an example, it is assumed here that the maximum number of channels is set to four. The number of channels may be fixed.
 チャネル重要度設定部2は、各チャネルの重要度を設定する。チャネル重要度設定部2には、予め全てのチャネルで同じ重要度が設定されていてもよい。チャネル重要度設定部2は、チャネルの番号が小さいほど重要度を大きく設定してもよい。ユーザが操作部6を操作することによって、チャネル重要度設定部2が各チャネルの重要度を設定してもよい。後述するように、ユーザが操作部6を操作して選択する表示モードに応じて、チャネル重要度設定部2が各チャネルの重要度を自動的に設定してもよい。 The channel importance setting unit 2 sets the importance of each channel. In the channel importance setting unit 2, the same importance may be set in advance for all channels. The channel importance setting unit 2 may set the importance higher as the channel number is smaller. When the user operates the operation unit 6, the channel importance level setting unit 2 may set the importance level of each channel. As will be described later, the channel importance setting unit 2 may automatically set the importance of each channel in accordance with the display mode selected by the user by operating the operation unit 6.
 要約度合設定部3は、チャネル数設定部1で設定されたチャネル数と、チャネル重要度設定部2で設定された各チャネルの重要度とに応じて、表示部5に各チャネルの映像と併せて表示される字幕の要約度合を設定する。字幕の要約度合とは、表示部5に字幕として表示する対象のテキストデータの文字数の要約の程度を示す指標である。 The summarization degree setting unit 3 displays the video of each channel on the display unit 5 according to the number of channels set by the channel number setting unit 1 and the importance of each channel set by the channel importance setting unit 2. Set the subtitle summary level. The subtitle summarization degree is an index indicating the degree of summarization of the number of characters of text data to be displayed as subtitles on the display unit 5.
 要約度合を式(1)で定義する。第1実施形態及び後述する第2実施形態においては、理解を容易にするため、式(1)より分かるように、テキストデータの文字数を削減しない(要約しない)状態の要約度合を100とし、文字数が削減される量が多くなるほど要約度合の数値が小さくなるようにしている。即ち、ここでの要約度合は、字幕として表示する対象のテキストデータの残存率を示す。 要約 Define the degree of summarization with equation (1). In the first embodiment and the second embodiment to be described later, in order to facilitate understanding, as can be seen from equation (1), the summarization degree in a state where the number of characters of text data is not reduced (not summarized) is set to 100, and the number of characters The amount of summarization is reduced as the amount of reduction increases. That is, the summarization degree here indicates the remaining rate of text data to be displayed as subtitles.
 要約度合=(要約されたテキストデータの文字数/テキストデータの文字数)×100  …(1) Summarization degree = (number of characters in summarized text data / number of characters in text data) × 100 (1)
 要約度合設定部3で設定された各チャネルの要約度合を示す要約度合設定信号は各要約字幕生成部4に供給される。ここではチャネル数が4に設定されているので、要約度合設定部3は要約字幕生成部41~44に要約度合設定信号を供給すればよい。 The summary degree setting signal indicating the summary degree of each channel set by the summary degree setting unit 3 is supplied to each summary caption generation unit 4. Here, since the number of channels is set to 4, the summary level setting unit 3 may supply the summary level setting signal to the summary caption generation units 41 to 44.
 各要約字幕生成部4は、マルチメディアストリームの映像データに付随する音声データに基づいて、表示部5に各チャネルの映像と併せて表示される字幕であるテキストデータ(字幕データ)を生成する。各要約字幕生成部4は、入力された要約度合設定信号に応じて、テキストデータを要約して要約字幕データを生成する。テキストデータの文字数を削減しない字幕データ及びテキストデータの文字数を削減した字幕データの双方を要約字幕データと称することがある。 Each summary caption generation unit 4 generates text data (caption data) that is a caption displayed on the display unit 5 together with the video of each channel, based on the audio data accompanying the video data of the multimedia stream. Each summary caption generation unit 4 summarizes the text data according to the input summary degree setting signal and generates summary caption data. Both subtitle data that does not reduce the number of characters of text data and subtitle data that reduces the number of characters of text data may be referred to as summary subtitle data.
 各要約字幕生成部4は、マルチメディアストリームに含まれる映像データ及び音声データと要約字幕データとを表示部5に供給する。表示部5は、描画部51、表示パネル52、音声処理回路53、スピーカ54を有する。音声処理回路53及びスピーカ54が表示部5の外部に設けられていてもよい。 Each summary caption generation unit 4 supplies video data and audio data included in the multimedia stream and summary caption data to the display unit 5. The display unit 5 includes a drawing unit 51, a display panel 52, an audio processing circuit 53, and a speaker 54. The audio processing circuit 53 and the speaker 54 may be provided outside the display unit 5.
 ここで、各要約字幕生成部4は、入力されたマルチメディアストリームと生成した要約字幕データとを記憶しておき、VOD(Video on Demand)のように表示部5からの要求に応じて、記憶していたマルチメディアストリームと要約字幕データを表示部5に供給することもできる。 Here, each summary caption generation unit 4 stores the input multimedia stream and the generated summary caption data, and stores them in response to a request from the display unit 5 such as VOD (Video on demand). It is also possible to supply the multimedia stream and summary caption data that have been used to the display unit 5.
 描画部51は、各チャネルの映像データ及び要約字幕データを描画する。表示パネル52は、描画部51に描画された映像データ及び要約字幕データによる映像を表示する。各チャネルの映像データ及び要約字幕データによる映像は縮小されることがある。 The drawing unit 51 draws video data and summary caption data of each channel. The display panel 52 displays a video based on the video data and summary caption data drawn on the drawing unit 51. The video data of each channel and the summary caption data may be reduced.
 音声処理回路53はチャネル1~4の音声データのうちの選択された音声データをD/A変換してアナログ音声信号をスピーカ54に供給する。スピーカ54は入力されたアナログ音声信号に基づいて音声を出力する。スピーカ54によって音声として出力される音声データはチャネル1の音声データで固定されていてもよいし、ユーザが操作部6によっていずれかのチャネルの音声データを選択するように構成されていてもよい。 The audio processing circuit 53 performs D / A conversion on the selected audio data among the audio data of the channels 1 to 4 and supplies an analog audio signal to the speaker 54. The speaker 54 outputs sound based on the input analog sound signal. The sound data output as sound by the speaker 54 may be fixed as the sound data of the channel 1, or the user may be configured to select the sound data of any channel by the operation unit 6.
 ユーザが操作部6を操作することによって、表示部5は、要約字幕生成部41~44から供給される映像データ及び要約字幕データのうちの選択された映像データ及び要約字幕データによる映像を表示する表示モードを切り替えることができる。例えば、表示部5は、1つの要約字幕生成部4からの映像データ及び要約字幕データによる映像のみを表示する表示モードと、複数の要約字幕生成部4からの映像データ及び要約字幕データによる複数の映像を同時に表示する表示モードとを切り替えることができる。表示モードの詳細については後述する。 When the user operates the operation unit 6, the display unit 5 displays the video data selected from the video data and the summary subtitle data supplied from the summary subtitle generation units 41 to 44 and the video based on the summary subtitle data. The display mode can be switched. For example, the display unit 5 displays a display mode in which only video data and summary subtitle data from one summary subtitle generation unit 4 are displayed, and a plurality of video data and summary subtitle data from a plurality of summary subtitle generation units 4. It is possible to switch between display modes for simultaneously displaying video. Details of the display mode will be described later.
 図2を用いて、要約字幕生成部4の具体的な構成例を説明する。図2に示すように、要約字幕生成部4は、音声ストリーム取得部401、音声認識部402、字幕要約部403、多重化部404を備える。 A specific configuration example of the summary caption generation unit 4 will be described with reference to FIG. As shown in FIG. 2, the summary caption generation unit 4 includes an audio stream acquisition unit 401, an audio recognition unit 402, a caption summary unit 403, and a multiplexing unit 404.
 音声ストリーム取得部401は、入力端子4tに入力されたマルチメディアストリームから音声ストリームを取得する。音声ストリーム取得部401は、入力されたマルチメディアストリームを多重化部404に供給し、音声ストリームを音声認識部402に供給する。 The audio stream acquisition unit 401 acquires an audio stream from the multimedia stream input to the input terminal 4t. The audio stream acquisition unit 401 supplies the input multimedia stream to the multiplexing unit 404 and supplies the audio stream to the audio recognition unit 402.
 音声認識部402は、音声ストリームに含まれる音声データを音声認識してテキストデータを生成し、字幕要約部403に供給する。字幕要約部403には、要約度合設定信号が入力される。字幕要約部403は、要約度合設定信号が示す要約度合に応じてテキストデータを要約して要約テキストデータを生成する。字幕要約部403は、要約前のテキストデータ及び要約テキストデータの双方を要約字幕データとして多重化部404に供給する。 The voice recognition unit 402 recognizes voice data included in the voice stream, generates text data, and supplies the text data to the caption summarization unit 403. The caption summarization unit 403 receives a summary degree setting signal. The caption summarizing section 403 summarizes the text data according to the summarization degree indicated by the summarization degree setting signal, and generates summary text data. The caption summarizing unit 403 supplies both the text data before summarization and the summary text data to the multiplexing unit 404 as summary caption data.
 字幕要約部403は、要約の手法として代表的な抽出型要約を用いて要約字幕データを生成する。例えば、字幕要約部403は、テキストデータに含まれる出現頻度の高い単語を重要な単語として抽出して、要約字幕データを生成する。字幕要約部403は、抽出型要約の代わりに生成型要約を用いてもよい。生成型要約は、例えば、テキストデータの内容に基づき、テキストの言い換え、一般化または並び替えを行う等、テキストデータとは異なる表現を用いて要約字幕データを生成する。 The subtitle summarization section 403 generates summary subtitle data using a representative extraction type summary as a summary technique. For example, the caption summary unit 403 extracts words with high appearance frequency included in the text data as important words, and generates summary caption data. The caption summary unit 403 may use a generation summary instead of the extraction summary. In the generation type summary, for example, summary caption data is generated using an expression different from the text data, such as paraphrasing, generalizing, or rearranging the text based on the content of the text data.
 多重化部404は、音声ストリーム取得部401から供給されたマルチメディアストリームに含まれる映像データ及び音声データと、字幕要約部403から供給された要約字幕データとを同期させて多重化する。多重化部404は、多重化データを表示部5に供給する。 The multiplexing unit 404 multiplexes the video data and audio data included in the multimedia stream supplied from the audio stream acquisition unit 401 and the summary caption data supplied from the caption summary unit 403 in synchronization. The multiplexing unit 404 supplies the multiplexed data to the display unit 5.
 図3A~図3Cを用いて、表示部5にチャネル1~4の映像を表示する表示モードの例を説明する。図4A~図4Cは、図3A~図3Cの表示モードにおける各チャネルのチャネル重要度、要約度合、表示映像サイズの例を示している。各チャネルの映像データはフルHDであって水平方向1920画素、垂直方向1080画素を有するとし、表示パネル52はフルHDのパネルであるとする。なお、表示映像サイズは、表示部5に表示される映像データの大きさを示す。 An example of a display mode for displaying the video images of channels 1 to 4 on the display unit 5 will be described with reference to FIGS. 3A to 3C. 4A to 4C show examples of channel importance, summarization degree, and display video size of each channel in the display modes of FIGS. 3A to 3C. It is assumed that the video data of each channel is full HD and has 1920 pixels in the horizontal direction and 1080 pixels in the vertical direction, and the display panel 52 is a full HD panel. The display video size indicates the size of the video data displayed on the display unit 5.
 図3Aは、チャネル1の映像V1のみを表示パネル52に全画面表示する表示モードである。図3Aに示す表示モードを1画面モードと称することとする。映像V1の下端部近傍には、チャネル1の要約字幕データに基づく字幕ST1が表示されている。字幕ST1は1または複数行で表示されている。字幕ST1及び後述する字幕ST2~ST4の1行における文字数及び行数は、表示パネル52の大きさ及び解像度に応じて、ユーザが字幕を認識できる文字数及び行数に設定される。 FIG. 3A shows a display mode in which only the video V1 of channel 1 is displayed on the display panel 52 in full screen. The display mode shown in FIG. 3A will be referred to as a single screen mode. A caption ST1 based on the summary caption data of channel 1 is displayed near the lower end of the video V1. Subtitle ST1 is displayed in one or more lines. The number of characters and the number of lines in one line of the subtitle ST1 and subtitles ST2 to ST4 described later are set to the number of characters and the number of lines that allow the user to recognize the subtitle according to the size and resolution of the display panel 52.
 このとき、図4Aに示すように、チャネル重要度設定部2はチャネル1の重要度を100に設定し、チャネル2~4は非表示であるためチャネル2~4の重要度を0に設定する。要約度合設定部3は、チャネル1の字幕データの要約度合を100に設定し、チャネル2~4の字幕データの要約度合を0に設定する。 At this time, as shown in FIG. 4A, the channel importance level setting unit 2 sets the importance level of the channel 1 to 100, and the channels 2 to 4 are not displayed, so the importance levels of the channels 2 to 4 are set to 0. . The summarization degree setting unit 3 sets the summarization degree of subtitle data of channel 1 to 100, and sets the summarization degree of subtitle data of channels 2 to 4 to 0.
 1画面モードでは、チャネル1の映像の表示映像サイズは水平方向1920画素、垂直方向1080画素である。1画面モードでは、音声認識部402が生成したテキストデータが削減されず字幕ST1として表示される。 In the single screen mode, the display video size of the video of channel 1 is 1920 pixels in the horizontal direction and 1080 pixels in the vertical direction. In the one-screen mode, the text data generated by the speech recognition unit 402 is not reduced and is displayed as the caption ST1.
 図3Bは、チャネル1の映像V1及びチャネル2の映像V2の表示映像サイズを縮小して表示パネル52に並べて表示する表示モードである。図3Bに示す表示モードを2画面モードと称することとする。映像V1の下端部近傍にはチャネル1の要約字幕データに基づく字幕ST1が表示され、映像V2の下端部近傍にはチャネル2の要約字幕データに基づく字幕ST2が表示されている。 FIG. 3B shows a display mode in which the display video size of the video V1 of channel 1 and the video V2 of channel 2 is reduced and displayed on the display panel 52 side by side. The display mode shown in FIG. 3B is referred to as a two-screen mode. A caption ST1 based on the summary caption data of channel 1 is displayed near the lower end of the video V1, and a caption ST2 based on the summary caption data of channel 2 is displayed near the lower end of the video V2.
 このとき、図4Bに示すように、チャネル重要度設定部2はチャネル1及び2の重要度を100に設定し、チャネル3及び4は非表示であるためチャネル3及び4の重要度を0に設定する。要約度合設定部3は、チャネル1及び2の字幕データの要約度合を70に設定し、チャネル3及び4の字幕データの要約度合を0に設定する。 At this time, as shown in FIG. 4B, the channel importance setting unit 2 sets the importance of the channels 1 and 2 to 100, and the channels 3 and 4 are not displayed, so the importance of the channels 3 and 4 is set to 0. Set. The summarization degree setting unit 3 sets the summarization degree of the caption data of channels 1 and 2 to 70, and sets the summarization degree of the caption data of channels 3 and 4 to 0.
 2画面モードでは、チャネル1及び2の映像の表示映像サイズは水平方向960画素、垂直方向540画素である。2画面モードでは、音声認識部402が生成したテキストデータを30%削減した要約字幕データに基づく字幕ST1及びST2が表示される。 In the 2-screen mode, the display video size of the video of channels 1 and 2 is 960 pixels in the horizontal direction and 540 pixels in the vertical direction. In the two-screen mode, subtitles ST1 and ST2 based on summary subtitle data obtained by reducing the text data generated by the speech recognition unit 402 by 30% are displayed.
 2画面モードでは、表示パネル52に表示されている映像のチャネルが1画面モードにおけるそれよりも多い。これに伴って、2画面モードにおいて字幕ST1及びST2が表示されている領域は、1画面モードにおいて字幕ST1が表示されている領域よりも狭い。2画面モードにおける字幕ST1は、1画面モードにおける字幕ST1よりもテキストデータが削減された字幕である。 In the 2-screen mode, there are more video channels displayed on the display panel 52 than in the 1-screen mode. Accordingly, the area where the subtitles ST1 and ST2 are displayed in the two-screen mode is narrower than the area where the subtitle ST1 is displayed in the one-screen mode. The subtitle ST1 in the two-screen mode is a subtitle in which text data is reduced compared to the subtitle ST1 in the one-screen mode.
 図3Cは、チャネル1~4の映像V1~V4の表示映像サイズを縮小して表示パネル52に並べて表示する表示モードである。図3Cに示す表示モードを4画面モードと称することとする。映像V1~V4の下端部近傍にはチャネル1~4それぞれの要約字幕データに基づく字幕ST1~ST4が表示されている。 FIG. 3C shows a display mode in which the display image size of the images V1 to V4 of the channels 1 to 4 is reduced and displayed side by side on the display panel 52. The display mode shown in FIG. 3C will be referred to as a 4-screen mode. Subtitles ST1 to ST4 based on the summary caption data of channels 1 to 4 are displayed near the lower ends of the videos V1 to V4.
 このとき、図4Cに示すように、チャネル重要度設定部2はチャネル1~4の重要度を100に設定する。要約度合設定部3は、チャネル1~4の字幕データの要約度合を25に設定する。 At this time, as shown in FIG. 4C, the channel importance setting unit 2 sets the importance of the channels 1 to 4 to 100. The summarization degree setting unit 3 sets the summarization degree of subtitle data of channels 1 to 4 to 25.
 4画面モードでは、チャネル1~4の映像の表示映像サイズは2画面モードのチャネル1及び2の映像の表示映像サイズと同じであり、水平方向960画素、垂直方向540画素である。4画面モードでは、音声認識部402が生成したテキストデータを75%削減した要約字幕データに基づく字幕ST1~ST4が表示される。 In the 4-screen mode, the display video size of the video of channels 1 to 4 is the same as the display video size of the video of channels 1 and 2 in the 2-screen mode, which is 960 pixels in the horizontal direction and 540 pixels in the vertical direction. In the 4-screen mode, subtitles ST1 to ST4 based on summary subtitle data obtained by reducing the text data generated by the speech recognition unit 402 by 75% are displayed.
 4画面モードでは、表示パネル52に表示されている映像のチャネルが2画面モードにおけるそれよりも多い。これに伴って、4画面モードにおいて字幕ST1~ST4が表示されている領域は、2画面モードにおいて字幕ST1及びST2が表示されている領域よりも狭い。4画面モードにおける字幕ST1及びST2は、2画面モードにおける字幕ST1及びST2よりもテキストデータが削減された字幕である。 In the 4-screen mode, there are more video channels displayed on the display panel 52 than in the 2-screen mode. Accordingly, the area where the subtitles ST1 to ST4 are displayed in the 4-screen mode is narrower than the area where the subtitles ST1 and ST2 are displayed in the 2-screen mode. The subtitles ST1 and ST2 in the 4-screen mode are subtitles with text data reduced compared to the subtitles ST1 and ST2 in the 2-screen mode.
 図5A及び図5Bを用いて、表示部5にチャネル1~4の映像を表示する表示モードの他の例を説明する。図6A及び図6Bは、図5A及び図5Bの表示モードにおける各チャネルのチャネル重要度、要約度合、表示映像サイズの例を示している。 5A and 5B, another example of the display mode for displaying the video images of channels 1 to 4 on the display unit 5 will be described. 6A and 6B show examples of channel importance, summarization degree, and display video size of each channel in the display modes of FIGS. 5A and 5B.
 図5Aは、チャネル1の映像V1を表示パネル52に全画面表示し、チャネル2の映像V2の表示映像サイズを縮小して映像V1に重畳する表示モードである。図5Aに示す表示モードをピクチャ・イン・ピクチャモード(以下、PIPモード)と称することとする。映像V1の下端部近傍には、チャネル1の要約字幕データに基づく字幕ST1が表示され、映像V2の下端部近傍にはチャネル2の要約字幕データに基づく字幕ST2が表示されている。 FIG. 5A shows a display mode in which the channel 1 video V1 is displayed on the display panel 52 in a full screen, and the display video size of the channel 2 video V2 is reduced and superimposed on the video V1. The display mode shown in FIG. 5A is referred to as a picture-in-picture mode (hereinafter referred to as PIP mode). A caption ST1 based on summary caption data of channel 1 is displayed near the lower end of the video V1, and a caption ST2 based on summary caption data of channel 2 is displayed near the lower end of the video V2.
 このとき、図6Aに示すように、チャネル重要度設定部2はチャネル1の重要度を100に設定し、チャネル2の重要度を25に設定し、チャネル3及び4は非表示であるためチャネル3及び4の重要度を0に設定する。要約度合設定部3は、チャネル1の字幕データの要約度合を100に設定し、チャネル2の字幕データの要約度合を25に設定し、チャネル3及び4の字幕データの要約度合を0に設定する。 At this time, as shown in FIG. 6A, the channel importance level setting unit 2 sets the importance level of the channel 1 to 100, sets the importance level of the channel 2 to 25, and the channels 3 and 4 are not displayed. Set the importance of 3 and 4 to 0. Summarization degree setting unit 3 sets the summarization degree of subtitle data of channel 1 to 100, sets the summarization degree of subtitle data of channel 2 to 25, and sets the summarization degree of subtitle data of channels 3 and 4 to 0. .
 図5Aに示すPIPモードでは、チャネル1の映像の表示映像サイズは水平方向1920画素、垂直方向1080画素であり、チャネル2の映像の表示映像サイズは水平方向960画素、垂直方向540画素である。但し、チャネル1の映像にチャネル2の映像が重畳されていることから、チャネル1の映像の水平方向960画素、垂直方向540画素の領域は非表示である。 In the PIP mode shown in FIG. 5A, the display image size of the channel 1 image is 1920 pixels in the horizontal direction and 1080 pixels in the vertical direction, and the display image size of the image in channel 2 is 960 pixels in the horizontal direction and 540 pixels in the vertical direction. However, since the channel 2 image is superimposed on the channel 1 image, the region of 960 pixels in the horizontal direction and 540 pixels in the vertical direction of the channel 1 image is not displayed.
 PIPモードでは、音声認識部402が生成したチャネル1のテキストデータが削減されず字幕ST1として表示され、音声認識部402が生成したチャネル2のテキストデータを75%削減した要約字幕データに基づく字幕ST2が表示される。 In the PIP mode, the text data of the channel 1 generated by the speech recognition unit 402 is displayed as subtitle ST1 without being reduced, and the subtitle ST2 based on the summary subtitle data in which the text data of the channel 2 generated by the speech recognition unit 402 is reduced by 75%. Is displayed.
 PIPモードでは、チャネル2の映像の表示映像サイズはチャネル1のそれよりも小さい。これに伴って、PIPモードにおいて字幕ST2が表示されている領域は字幕ST1が表示されている領域よりも狭い。PIPモードにおける字幕ST2は、字幕ST1よりもテキストデータが削減された字幕である。 In the PIP mode, the display video size of the video of channel 2 is smaller than that of channel 1. Accordingly, the area in which the subtitle ST2 is displayed in the PIP mode is narrower than the area in which the subtitle ST1 is displayed. The subtitle ST2 in the PIP mode is a subtitle in which text data is reduced compared to the subtitle ST1.
 図5Bは、チャネル1の映像V1の表示映像サイズを縮小して映像V1とし、チャネル2~4の映像V2~V4の表示映像サイズを縮小して映像V1の外側に表示する表示モードである。図5Bに示す表示モードをピクチャ・アウト・ピクチャモード(以下、POPモード)と称することとする。映像V1の下端部近傍には、チャネル1の要約字幕データに基づく字幕ST1が表示され、映像V2~V4の下端部近傍にはチャネル2~4の要約字幕データに基づく字幕ST2~ST4が表示されている。 FIG. 5B shows a display mode in which the display video size of the video V1 of the channel 1 is reduced to the video V1, and the display video size of the videos V2 to V4 of the channels 2 to 4 is reduced and displayed outside the video V1. The display mode shown in FIG. 5B is referred to as a picture-out-picture mode (hereinafter referred to as POP mode). Subtitle ST1 based on summary caption data of channel 1 is displayed near the lower end of video V1, and subtitles ST2 to ST4 based on summary subtitle data of channels 2 to 4 are displayed near the lower end of video V2 to V4. ing.
 このとき、図6Bに示すように、チャネル重要度設定部2はチャネル1の重要度を100に設定し、チャネル2~4の重要度を11に設定する。要約度合設定部3は、チャネル1の字幕データの要約度合を56に設定し、チャネル2~4の字幕データの要約度合を6に設定する。 At this time, as shown in FIG. 6B, the channel importance level setting unit 2 sets the importance level of the channel 1 to 100, and sets the importance levels of the channels 2 to 4 to 11. Summarization degree setting unit 3 sets the summarization degree of caption data of channel 1 to 56, and sets the summarization degree of caption data of channels 2 to 4 to 6.
 図6Bに示すPOPモードでは、チャネル1の映像の表示映像サイズは水平方向1440画素、垂直方向810画素であり、チャネル2~4の映像の表示映像サイズは水平方向480画素、垂直方向270画素である。 In the POP mode shown in FIG. 6B, the display video size of the video of channel 1 is 1440 pixels in the horizontal direction and 810 pixels in the vertical direction, and the display video size of the video of channels 2 to 4 is 480 pixels in the horizontal direction and 270 pixels in the vertical direction. is there.
 POPモードでは、音声認識部402が生成したチャネル1のテキストデータを44%削減した要約字幕データに基づく字幕ST1が表示され、チャネル2~4のテキストデータを94%削減した要約字幕データに基づく字幕ST2~ST4が表示される。 In the POP mode, the subtitle ST1 based on the summary subtitle data generated by the speech recognition unit 402 and reduced by 44% in the channel 1 text data is displayed, and the subtitle based on the summary subtitle data obtained by reducing the text data in the channels 2 to 4 by 94%. ST2 to ST4 are displayed.
 POPモードでは、チャネル1の映像の表示映像サイズは図3Aの1画面モードまたは図5AのPIPモードにおけるチャネル1のそれよりも小さい。これに伴って、POPモードにおいて字幕ST1が表示されている領域は1画面モードまたはPIPモードにおけるそれよりも狭い。POPモードにおける字幕ST1は、1画面モードまたは図5AのPIPモードにおける字幕ST1よりもテキストデータが削減された字幕である。 In the POP mode, the display video size of the video of channel 1 is smaller than that of channel 1 in the single screen mode of FIG. 3A or the PIP mode of FIG. 5A. Accordingly, the area in which the subtitle ST1 is displayed in the POP mode is narrower than that in the single-screen mode or the PIP mode. The subtitle ST1 in the POP mode is a subtitle in which text data is reduced as compared with the subtitle ST1 in the one-screen mode or the PIP mode of FIG. 5A.
 また、POPモードでは、チャネル2~4の映像の表示映像サイズはチャネル1のそれよりも小さい。これに伴って、POPモードにおいて字幕ST2~ST4が表示されている領域は字幕ST1が表示されている領域よりも狭い。POPモードにおける字幕ST2~ST4は、字幕ST1よりもテキストデータが削減された字幕である。 In the POP mode, the display video size of the video of channels 2 to 4 is smaller than that of channel 1. Accordingly, the area where the subtitles ST2 to ST4 are displayed in the POP mode is narrower than the area where the subtitle ST1 is displayed. Subtitles ST2 to ST4 in the POP mode are subtitles in which text data is reduced compared to the subtitle ST1.
 図7に示すフローチャートを用いて、第1実施形態の字幕生成装置の動作を説明する。図7において、処理が開始されると、チャネル数設定部1は、ステップS101にて、チャネル数を設定する。ここではチャネル数は固定であるとする。チャネル重要度設定部2は、ステップS102にて、各チャネルの重要度を設定する。要約度合設定部3は、ステップS103にて、要約度合を設定する。 The operation of the caption generation device of the first embodiment will be described using the flowchart shown in FIG. In FIG. 7, when the process is started, the channel number setting unit 1 sets the number of channels in step S101. Here, it is assumed that the number of channels is fixed. In step S102, the channel importance level setting unit 2 sets the importance level of each channel. In step S103, the summary level setting unit 3 sets the summary level.
 音声ストリーム取得部401は、ステップS104にて、映像ストリームと音声ストリームとを分離して音声ストリームを取得する。音声ストリーム取得部401はステップS104にて映像ストリームを取得するので、ステップS104は映像取得ステップである。音声認識部402は、ステップS105にて、音声ストリームに含まれる音声データを音声認識してテキストデータを生成する。字幕要約部403は、ステップS106にて、設定された要約度合の要約字幕データを生成する。ステップS106は字幕要約ステップである。 In step S104, the audio stream acquisition unit 401 separates the video stream and the audio stream and acquires the audio stream. Since the audio stream acquisition unit 401 acquires a video stream in step S104, step S104 is a video acquisition step. In step S105, the speech recognition unit 402 recognizes speech data included in the speech stream and generates text data. In step S106, the caption summary unit 403 generates summary caption data with the set summary degree. Step S106 is a caption summarization step.
 多重化部404は、ステップS107にて、映像ストリームに含まれる映像データ、音声データ、要約字幕データを多重化する。表示部5は、ステップS108にて、多重化データに基づいて、映像を表示し、音声を出力する。 In step S107, the multiplexing unit 404 multiplexes the video data, audio data, and summary caption data included in the video stream. In step S108, the display unit 5 displays video and outputs audio based on the multiplexed data.
 チャネル重要度設定部2及び要約度合設定部3は、ステップS109にて、表示モード変更の指示がなされたか否かを判定する。表示モード変更の指示がなされれば(YES)、ステップS102~S109の処理が繰り返される。表示モード変更の指示がなされなければ(NO)、要約字幕生成部4は、ステップS110にて、マルチメディアストリームが継続的に入力されているか否かを判定する。 The channel importance level setting unit 2 and the summary level setting unit 3 determine whether or not an instruction to change the display mode has been given in step S109. If an instruction to change the display mode is given (YES), the processing of steps S102 to S109 is repeated. If an instruction to change the display mode is not given (NO), the summary subtitle generating unit 4 determines whether or not the multimedia stream is continuously input in step S110.
 マルチメディアストリームが継続的に入力されていれば(YES)、ステップS104~S110の処理が繰り返される。マルチメディアストリームが継続的に入力されていなければ(NO)、要約字幕生成部4は、処理を終了させる。 If the multimedia stream is continuously input (YES), the processes in steps S104 to S110 are repeated. If the multimedia stream is not continuously input (NO), the summary subtitle generating unit 4 ends the process.
 以上のように、第1実施形態の字幕生成装置は、要約度合設定部3及び字幕要約部403を備える。要約度合設定部3は、表示部5(表示パネル52)に表示される1または複数のチャネルの映像のチャネル数または各チャネルの映像の表示映像サイズに応じて、各チャネルの映像と関連して表示部5に表示される各チャネルの字幕の要約度合を設定する。字幕要約部403は、要約度合設定部3で設定された各チャネルの字幕の要約度合に応じて、各チャネルの字幕を要約して要約字幕を生成する。 As described above, the caption generation device according to the first embodiment includes the summary degree setting unit 3 and the caption summary unit 403. The summarization degree setting unit 3 is related to the video of each channel according to the number of channels of one or more channels displayed on the display unit 5 (display panel 52) or the display video size of the video of each channel. The summarization degree of subtitles of each channel displayed on the display unit 5 is set. The subtitle summarizing section 403 summarizes the subtitles of each channel according to the subtitle summarization levels set by the summarization degree setting section 3 and generates summary subtitles.
 第1実施形態の字幕生成装置によれば、表示部5に表示される1または複数のチャネルの映像の表示状態に応じた態様で字幕を生成することができる。第1実施形態の字幕生成装置を備え、複数の表示モードのうちのいずれかの表示モードで各チャネルの映像を表示する映像表示装置10は、1または複数のチャネルの映像の表示状態に応じた態様の字幕を表示することができる。 According to the caption generation device of the first embodiment, captions can be generated in a manner corresponding to the display state of the video of one or a plurality of channels displayed on the display unit 5. The video display device 10 that includes the caption generation device of the first embodiment and displays the video of each channel in any one of a plurality of display modes corresponds to the display state of the video of one or more channels A mode subtitle can be displayed.
 図1に示す映像表示装置10の各部、図2に示す要約字幕生成部4の各部は、集積回路等のハードウェアで構成されていてもよいし、ソフトウェア(コンピュータプログラム)で構成されていてもよい。ハードウェアとソフトウェアとの使い分けは任意である。図7に示すフローチャートは、第1実施形態の字幕生成プログラムがコンピュータに実行させる処理であってもよい。 Each unit of the video display device 10 shown in FIG. 1 and each unit of the summary subtitle generating unit 4 shown in FIG. 2 may be configured by hardware such as an integrated circuit, or may be configured by software (computer program). Good. Use of hardware and software is optional. The flowchart shown in FIG. 7 may be processing that the subtitle generation program of the first embodiment causes the computer to execute.
 字幕生成プログラムはインターネット等のネットワークを介して映像表示装置10に送信されてもよいし、非一時的な記憶媒体に記憶されて映像表示装置10に提供されてもよい。 The caption generation program may be transmitted to the video display device 10 via a network such as the Internet, or may be stored in a non-temporary storage medium and provided to the video display device 10.
 図1において、付加的に次のように構成することができる。例えば、チャネル数を16とし、表示部5は16チャネルの縮小した映像とそれに対応した字幕を表示する。視線検出装置は、ユーザの視線を検出することによってユーザが興味を持って見ていると判断される4チャネルを選択する。表示部5は、図3Cに示すように、選択された4チャネルの縮小した映像とそれに対応した字幕を表示する。 In FIG. 1, the following configuration can be additionally provided. For example, the number of channels is 16, and the display unit 5 displays a 16-channel reduced video and corresponding subtitles. The line-of-sight detection device selects four channels that are determined to be viewed with interest by the user by detecting the line of sight of the user. As shown in FIG. 3C, the display unit 5 displays the selected 4-channel reduced video and subtitles corresponding thereto.
 視線検出装置は、ユーザの視線を検出することによってユーザが興味を持って見ていると判断される1チャネルを選択する。表示部5は、図3Aに示すように、選択された1チャネルの映像とそれに対応した字幕を表示する。 The line-of-sight detection device selects one channel that is determined to be viewed with interest by the user by detecting the line of sight of the user. As shown in FIG. 3A, the display unit 5 displays the selected one-channel video and subtitles corresponding thereto.
<第2実施形態>
 図8は、第2実施形態の字幕生成装置を含んで構成される地図表示装置20を備える映像送受信システムを示す。地図表示装置20は、インターネット等のネットワーク50に接続されている。ビデオカメラ301~30nと地図提供サーバ40も、ネットワーク50に接続されている。ここでもnは2以上の整数である。ビデオカメラ301~30nのうちの任意のビデオカメラをビデオカメラ30と称することとする。
Second Embodiment
FIG. 8 shows a video transmission / reception system including a map display device 20 configured to include the caption generation device of the second embodiment. The map display device 20 is connected to a network 50 such as the Internet. The video cameras 301 to 30n and the map providing server 40 are also connected to the network 50. Here, n is an integer of 2 or more. Any video camera among the video cameras 301 to 30n is referred to as a video camera 30.
 1または複数のビデオカメラ30によって撮影された映像は、1または複数のチャネルの映像である。 The video shot by one or more video cameras 30 is video of one or more channels.
 図9を用いて、地図表示装置20の具体的な構成例及び動作を説明する。図9において、地図表示装置20のネットワークインタフェース201は、ネットワーク50と接続されている。地図表示装置20は、地図提供サーバ40より提供された地図データ、各ビデオカメラ30が送信する映像データ及びメタデータを受信する。詳細には、地図表示装置20は、各ビデオカメラ30から、各ビデオカメラ30によって撮影された映像データと、映像データに関連する情報を記述したデータであるメタデータとを受信する。 A specific configuration example and operation of the map display device 20 will be described with reference to FIG. In FIG. 9, the network interface 201 of the map display device 20 is connected to the network 50. The map display device 20 receives the map data provided from the map providing server 40, the video data transmitted by each video camera 30, and the metadata. Specifically, the map display device 20 receives from each video camera 30 video data captured by each video camera 30 and metadata which is data describing information related to the video data.
 ここでは、地図表示装置20は映像データとメタデータを、それぞれ各ビデオカメラ30によって提供されるWEB API(WEB Application Programming INTERFACE)を用いて取得するものとする。なお、WEB APIとは、ネットワークを介して、一方の装置のプログラムが提供する機能を、他方の装置のプログラムが呼び出して利用するインターフェイスである。また、各メタデータは各ビデオカメラ30で生成されて記録されている。 Here, it is assumed that the map display device 20 acquires video data and metadata by using WEB API (WEB Application Programming INTERFACE) provided by each video camera 30. The WEB API is an interface that the program of one device calls and uses the function provided by the program of the other device via a network. Each metadata is generated and recorded by each video camera 30.
 メタデータには、映像データと関連する各種の情報がテキストデータで記述されている。メタデータには、例えば、ビデオカメラ30の位置情報(即ち、撮影場所)、撮影日、撮影者情報、製作者情報、カメラ番号、カメラの優先順位、撮影目的、タイトル、撮影概要、キャスタ名、登場人物名等の任意に入力した文章等が含まれている。メタデータは少なくともビデオカメラ30の位置情報を含む。ビデオカメラ30の位置情報には、撮影場所の緯度・経度に加えて、東京駅、東京国際空港等の撮影場所の名称が含まれていてもよい。 In the metadata, various types of information related to video data are described as text data. The metadata includes, for example, position information of the video camera 30 (that is, shooting location), shooting date, photographer information, producer information, camera number, camera priority, shooting purpose, title, shooting overview, caster name, Arbitrarily entered sentences such as the names of characters are included. The metadata includes at least position information of the video camera 30. The position information of the video camera 30 may include the name of the shooting location such as Tokyo Station or Tokyo International Airport, in addition to the latitude and longitude of the shooting location.
 詳細には、撮影日は、映像データが撮影された日時を示す情報である。撮影者情報は、例えば、撮影者の名前または撮影者を識別するIDである。製作者情報は、例えば、放送局の名前または放送局を識別するIDである。カメラ番号は、各ビデオカメラ30に割り当てられた番号である。なお、カメラ番号は、例えばシリアルコードであってもよい。カメラの優先順位とは、表示させる映像データの優先度を示す情報である。撮影目的は、例えば、番組撮影、風景撮影またはインタビュー等である。タイトルは、例えば、映像データの番組名などである。 Specifically, the shooting date is information indicating the date and time when the video data was shot. The photographer information is, for example, a photographer's name or an ID that identifies the photographer. The producer information is, for example, a broadcast station name or an ID for identifying the broadcast station. The camera number is a number assigned to each video camera 30. The camera number may be a serial code, for example. The camera priority is information indicating the priority of video data to be displayed. The shooting purpose is, for example, program shooting, landscape shooting, interview, or the like. The title is, for example, a program name of video data.
 撮影概要には、位置情報(即ち、撮影場所)、撮影日、撮影者情報、製作者情報、カメラ番号、カメラの優先順位、カメラのシリアルコード、撮影目的、タイトル、キャスタ名、登場人物名がまとめられてその他の情報と共に記述されているものとする。キャスタ名、登場人物名には、それぞれの目的に合わせた人物名等が記載されている。なお、メタデータにはビデオカメラ30の位置情報が含まれるとしたが、メタデータにビデオカメラ30の位置情報を含まずに、ビデオカメラ30の位置情報とその他の内容からなるメタデータとを別々のデータとしてもよい。 The shooting summary includes position information (ie, shooting location), shooting date, photographer information, producer information, camera number, camera priority, camera serial code, shooting purpose, title, caster name, and character name. Assume that it is summarized and described with other information. In the caster name and the character name, a person name or the like suitable for each purpose is described. Although the metadata includes the position information of the video camera 30, the metadata does not include the position information of the video camera 30, and the position information of the video camera 30 and the metadata including other contents are separately provided. It is good also as data.
 カメラ映像取得部202は、各ビデオカメラ30から送信された映像データを取得する。メタデータ抽出部203は、メタデータを取得する。なお、前述のように、カメラ映像取得部202及びメタデータ抽出部203は、各ビデオカメラ30によって提供されるWEB APIを用いて、各ビデオカメラ30から送信された映像データ及びメタデータを取得してもよい。地図データ取得部209は地図データを取得する。ユーザは、操作部214を操作することによって、表示部213の表示パネル2131に表示される地図の中心位置を変更することもできるし、地図の縮尺を変更することもできる。 The camera video acquisition unit 202 acquires video data transmitted from each video camera 30. The metadata extraction unit 203 acquires metadata. As described above, the camera image acquisition unit 202 and the metadata extraction unit 203 acquire the image data and metadata transmitted from each video camera 30 using the WEB API provided by each video camera 30. May be. The map data acquisition unit 209 acquires map data. The user can change the center position of the map displayed on the display panel 2131 of the display unit 213 by operating the operation unit 214, and can also change the scale of the map.
 地図中心位置設定部216は、ユーザによる操作部214の操作に応じて表示パネル2131に表示される地図の中心位置を設定し、地図縮尺設定部217は、ユーザによる操作部214の操作に応じて表示パネル2131に表示される地図の縮尺を設定する。地図データ取得部209は、設定された中心位置及び縮尺の地図を地図提供サーバ40より取得する。 The map center position setting unit 216 sets the center position of the map displayed on the display panel 2131 according to the operation of the operation unit 214 by the user, and the map scale setting unit 217 is set according to the operation of the operation unit 214 by the user. The scale of the map displayed on the display panel 2131 is set. The map data acquisition unit 209 acquires a map of the set center position and scale from the map providing server 40.
 メタデータ抽出部203が取得した各ビデオカメラ30のメタデータは、カメラ位置取得部204及び字幕情報取得部205に供給される。カメラ位置取得部204は、メタデータの位置情報に含まれる撮影場所の緯度・経度を取得する。即ち、カメラ位置取得部204は、映像が撮影された位置を示す位置情報を取得する位置取得部である。字幕情報取得部205は、メタデータに記述されているテキストデータのうち、表示パネル2131に字幕として表示されるテキストデータを取得して、字幕要約部208に供給する。 The metadata of each video camera 30 acquired by the metadata extraction unit 203 is supplied to the camera position acquisition unit 204 and the caption information acquisition unit 205. The camera position acquisition unit 204 acquires the latitude / longitude of the shooting location included in the position information of the metadata. In other words, the camera position acquisition unit 204 is a position acquisition unit that acquires position information indicating a position where a video is taken. The subtitle information acquisition unit 205 acquires text data displayed as subtitles on the display panel 2131 among the text data described in the metadata, and supplies the text data to the subtitle summary unit 208.
 なお、ビデオカメラ30の位置情報はメタデータとは別の経路で取得することもできる。例えば、GPS取得機能を有さないビデオカメラ30がGPS取得機能を有するルータに接続されている場合には、ルータがメタデータとは別の経路で位置情報を地図表示装置20に通知してもよい。この場合、メタデータ抽出部203が位置情報をルータによって提供されるWEB APIを用いて取得する。ここで、ビデオカメラ30とルータとの接続状況はルータによって提供されるWEB APIの情報で提供され、地図表示装置20においてビデオカメラ30とルータとが関連付けされ、ルータの位置情報をビデオカメラ30の位置情報とする。 It should be noted that the position information of the video camera 30 can be acquired through a route different from the metadata. For example, if a video camera 30 that does not have a GPS acquisition function is connected to a router that has a GPS acquisition function, the router may notify the map display device 20 of location information via a route different from the metadata. Good. In this case, the metadata extraction unit 203 acquires the position information using the WEB API provided by the router. Here, the connection status between the video camera 30 and the router is provided by WEB API information provided by the router, the video display unit 20 associates the video camera 30 with the router in the map display device 20, and the router location information is obtained from the video camera 30. It is position information.
 字幕として表示されるテキストデータは、撮影場所の名称であってもよいし、撮影目的であってもよいし、任意に入力した文章でもあってもよく、任意である。字幕として表示されるテキストデータが予め定められていてもよいし、ユーザが操作部214を操作することにより選択できるように構成されていてもよい。メタデータがビデオカメラ30の位置情報のみを含む場合には、撮影場所が字幕として表示される。 The text data displayed as subtitles may be the name of the shooting location, the shooting purpose, or an arbitrarily entered sentence, and is arbitrary. Text data to be displayed as subtitles may be determined in advance, or may be configured so that the user can select by operating the operation unit 214. When the metadata includes only position information of the video camera 30, the shooting location is displayed as a caption.
 表示領域内外判定部206には、地図中心位置設定部216によって設定された地図の中心位置、及び、地図縮尺設定部217によって設定された地図の縮尺を示す情報が入力される。表示領域内外判定部206は、表示パネル2131の画面サイズの情報を有する。表示領域内外判定部206は、入力された地図の中心位置及び地図の縮尺を示す情報に基づいて、各ビデオカメラ30が表示パネル2131に表示されている地図の表示領域内に位置しているか、表示領域外に位置しているかを判定する。表示領域内外判定部206は、地図の縮尺と映像が撮影された位置を示す位置情報とから、表示部213に表示させる映像の数を判定する判定部として機能する。 The display area inside / outside determination unit 206 receives information indicating the map center position set by the map center position setting unit 216 and the map scale set by the map scale setting unit 217. The display area inside / outside determination unit 206 has information on the screen size of the display panel 2131. The display area inside / outside determination unit 206 determines whether each video camera 30 is located within the display area of the map displayed on the display panel 2131 based on the input information indicating the center position of the map and the scale of the map. It is determined whether it is located outside the display area. The display area inside / outside determination unit 206 functions as a determination unit that determines the number of videos to be displayed on the display unit 213 from the scale of the map and the position information indicating the position where the video was shot.
 表示領域内外判定部206は、ビデオカメラ301~30nのうち、表示パネル2131に表示されている地図の表示領域内に位置しているビデオカメラ30の数の情報を要約度合設定部207に供給する。要約度合設定部207は、地図の表示領域内に位置しているビデオカメラ30の数に応じて要約度合を設定する。 The display area inside / outside determination unit 206 supplies information on the number of video cameras 30 located within the display area of the map displayed on the display panel 2131 among the video cameras 301 to 30n to the summary degree setting unit 207. . The summarization degree setting unit 207 sets the summarization degree according to the number of video cameras 30 located within the map display area.
 字幕要約部208は、字幕情報取得部205から供給されたテキストデータを、要約度合設定部207で設定された要約度合に応じて要約して、要約字幕データを生成する。 The subtitle summarizing section 208 summarizes the text data supplied from the subtitle information acquiring section 205 according to the summarization degree set by the summarization degree setting section 207, and generates summary subtitle data.
 第1実施形態において、字幕要約部403は、要約度合に応じて、抽出型要約または生成型要約を用いて要約字幕データを作成する。第2実施形態においては、字幕要約部208は、要約度合いに応じて、位置情報、撮影日、撮影者情報、製作者情報、カメラ番号、カメラの優先順位、撮影目的、タイトル、撮影概要、キャスタ名、登場人物名等の項目から1つ以上を選択して要約する。 In the first embodiment, the caption summary unit 403 creates summary caption data using an extraction summary or a generation summary according to the degree of summary. In the second embodiment, the caption summarizing unit 208 is configured to display position information, shooting date, photographer information, producer information, camera number, camera priority, shooting purpose, title, shooting outline, caster according to the degree of summarization. Summarize by selecting one or more items such as names and characters.
 例えば、字幕要約部208は、要約度合が100であった場合には、上述した項目のうち10の項目を選択する。字幕要約部208は、選択した10の項目の情報を含む要約字幕データを生成する。また、字幕要約部208は、要約度合が20であった場合には、上述した項目のうち2つの項目を選択する。字幕要約部208は、選択した2つの項目の情報を含む要約字幕データを生成する。 For example, when the summarization degree is 100, the caption summarization unit 208 selects 10 items from the above-described items. The caption summary unit 208 generates summary caption data including information on the selected 10 items. Moreover, when the summarization degree is 20, the caption summarizing unit 208 selects two items from the above-described items. The caption summary unit 208 generates summary caption data including information on the two selected items.
 さらに、字幕要約部208またはその他の部分には、メタデータ等に含まれる各項目に対して予め優先順位が設定されていてもよい。字幕要約部208は、その優先順位に基づき、優先順位が高い項目を選択する構成にしてもよい。 Furthermore, in the caption summary unit 208 or other parts, priorities may be set in advance for each item included in the metadata or the like. The caption summary unit 208 may be configured to select an item with a higher priority based on the priority.
 第2実施形態においても、第1実施形態と同様に、字幕要約部208は、要約度合に応じて、抽出型要約または生成型要約を用いて要約字幕データを作成する構成であってもよい。特に、メタデータに撮影概要が含まれる場合、字幕要約部208は、撮影概要から、要約度合に応じて、抽出型要約または生成型要約を用いて要約字幕データを作成してもよい。 Also in the second embodiment, similar to the first embodiment, the caption summarizing unit 208 may be configured to create summary caption data using an extraction summary or a generation summary according to the degree of summarization. In particular, when the shooting summary is included in the metadata, the caption summary unit 208 may create summary caption data from the shooting summary using the extraction type summary or the generation type summary according to the degree of summary.
 なお、第1実施形態と同様に、テキストデータの文字数を削減しないことがあってもよい。各ビデオカメラ30から供給されたテキストデータに基づく要約字幕データは、画像合成部212に供給される。 Note that the number of characters in the text data may not be reduced as in the first embodiment. The summary caption data based on the text data supplied from each video camera 30 is supplied to the image composition unit 212.
 映像縮小部210は、各ビデオカメラ30から供給された映像データを縮小して縮小映像データ(サムネイル画像)を生成する。映像縮小率指示部211は、映像縮小部210に対して映像データの縮小率を指示する。縮小率は固定であってもよい。各ビデオカメラ30からの映像データの1フレームの画素数が表示パネル2131の画素数よりも十分に小さいとき、映像データを縮小しないことがあってもよい。 The video reduction unit 210 reduces the video data supplied from each video camera 30 to generate reduced video data (thumbnail image). The video reduction rate instruction unit 211 instructs the video reduction unit 210 to reduce the video data. The reduction ratio may be fixed. When the number of pixels of one frame of video data from each video camera 30 is sufficiently smaller than the number of pixels of the display panel 2131, the video data may not be reduced.
 映像縮小率指示部211には、表示領域内外判定部206から表示パネル2131に表示されている地図の表示領域内に位置しているビデオカメラ30の数の情報が供給されてもよい。映像縮小率指示部211は、入力されたビデオカメラ30の数の情報に応じて映像データの縮小率を変更することがあってもよい。 The video reduction rate instruction unit 211 may be supplied with information on the number of video cameras 30 located in the display area of the map displayed on the display panel 2131 from the display area inside / outside determination unit 206. The video reduction rate instructing unit 211 may change the video data reduction rate in accordance with the input information on the number of video cameras 30.
 画像合成部212は、地図データ取得部209から供給された地図データと、映像縮小部210から供給された縮小映像データと、字幕要約部208から供給された要約字幕データとを画像合成する。ここで、縮小映像データはビデオカメラ30の位置情報に基づいて地図データ上のビデオカメラ30の位置情報が示す位置に配置されるように合成される。画像合成部212は、合成画像データを表示部213に供給する。 The image composition unit 212 synthesizes the map data supplied from the map data acquisition unit 209, the reduced video data supplied from the video reduction unit 210, and the summary caption data supplied from the caption summary unit 208. Here, the reduced video data is synthesized based on the position information of the video camera 30 so as to be arranged at the position indicated by the position information of the video camera 30 on the map data. The image composition unit 212 supplies the composite image data to the display unit 213.
 以上の構成及び動作により、表示パネル2131には、ユーザが表示パネル2131に表示させようと操作した所定の表示領域の所定の縮尺の地図に、表示領域内に位置しているビデオカメラ30が送信した縮小映像データと、その映像と関連する要約字幕とが重畳されて表示される。これによって、地図上のどの位置で誰がどのような映像をどのような目的で撮影しているかを可視化することができる。 With the above configuration and operation, the video camera 30 located in the display area transmits to the display panel 2131 the map at a predetermined scale of the predetermined display area that the user has operated to display on the display panel 2131. The reduced video data and the summary caption related to the video are superimposed and displayed. This makes it possible to visualize who is shooting what video for what purpose at which position on the map.
 図9においては、音声データの受信及び音声の出力に関する構成の図示を省略している。地図表示装置20は、各ビデオカメラ30による撮影時に収音した音声の音声データを受信し、選択されたビデオカメラ30の音声データを再生してスピーカより出力するように構成されていてもよい。 In FIG. 9, the illustration of the configuration relating to reception of audio data and output of audio is omitted. The map display device 20 may be configured to receive audio data of sound collected at the time of shooting by each video camera 30, reproduce the audio data of the selected video camera 30, and output from the speaker.
 この場合、メタデータ抽出部203は、音声データを取得する音声取得部、及び取得した音声データに基づいて映像に関連するテキストデータを生成する音声認識部として機能する。即ち、各ビデオカメラ30は、映像データの撮影時に収音した音声を、音声データとして記録する。各ビデオカメラ30は、ネットワーク50を介して、地図表示装置20に映像データ及び音声データを送信する。メタデータ抽出部203は、各ビデオカメラ30から送信された音声データを取得する。メタデータ抽出部203は、取得した音声データを音声認識してテキストデータを生成し、このテキストデータをメタデータとして、カメラ位置取得部204及び字幕情報取得部205に供給する。 In this case, the metadata extraction unit 203 functions as a voice acquisition unit that acquires voice data and a voice recognition unit that generates text data related to the video based on the acquired voice data. That is, each video camera 30 records the sound collected at the time of shooting the video data as sound data. Each video camera 30 transmits video data and audio data to the map display device 20 via the network 50. The metadata extraction unit 203 acquires audio data transmitted from each video camera 30. The metadata extraction unit 203 recognizes the acquired voice data to generate text data, and supplies the text data to the camera position acquisition unit 204 and the caption information acquisition unit 205 as metadata.
 このような構成とすることにより、映像データの撮影時に収音した音声からメタデータを作成することができ、メタデータを予め用意する必要がなくなるため、撮影者の負担が軽減される。 By adopting such a configuration, metadata can be created from sound collected at the time of shooting video data, and it is not necessary to prepare metadata in advance, so that the burden on the photographer is reduced.
 また、各ビデオカメラ30は、映像データの撮影時に収音した音声を、音声データとして記録することに限定されない。各ビデオカメラ30は、上述した映像データの撮影時に収音した音声とは別に、映像データの撮影時以外で収音した音声を音声データとして記録してもよい。例えば、ビデオカメラ30は、撮影時以外で、撮影概要などの情報を含む音声を収音し、音声データとして記録する。メタデータ抽出部203は、この音声データを取得し、この音声データに基づいて、テキストデータを生成してもよい。即ち、メタデータ抽出部203は、映像データに関連づけられた音声データに基づき、テキストデータを生成すればよい。 In addition, each video camera 30 is not limited to recording the sound collected when shooting the video data as the sound data. Each video camera 30 may record, as audio data, sound collected when the video data is not captured, in addition to the sound collected when the video data is captured. For example, the video camera 30 collects sound including information such as an outline of shooting other than during shooting and records it as audio data. The metadata extraction unit 203 may acquire the voice data and generate text data based on the voice data. In other words, the metadata extraction unit 203 may generate text data based on audio data associated with video data.
 表示順位設定部215は複数のビデオカメラ30における縮小映像データが重なっている場合の表示順位を設定する。メタデータにビデオカメラ30のカメラの優先順位が含まれている場合には、カメラの優先順位に基づいて優先順位の高いカメラで撮影された縮小映像データの表示順位を高くしてもよい。ユーザが操作部214を操作することによって、表示順位設定部215は表示順位を変更してもよい。 The display order setting unit 215 sets the display order when the reduced video data in the plurality of video cameras 30 overlap. When the priority of the camera of the video camera 30 is included in the metadata, the display order of the reduced video data captured by the camera with the higher priority may be increased based on the priority of the camera. When the user operates the operation unit 214, the display order setting unit 215 may change the display order.
 図10A~図10Cを用いて、地図の縮尺を変更した場合の表示状態の遷移の例を説明する。図10Aにおいて、表示パネル2131に縮尺1万分の1の地図M1が表示されている状態で、カメラ番号01のビデオカメラ30のみが地図M1内に位置している。地図M1には、カメラ映像Ci1と、カメラ映像Ci1の字幕CST1とが重畳されて表示されている。字幕CST1はカメラ映像Ci1の外側に表示されている。 An example of display state transition when the map scale is changed will be described with reference to FIGS. 10A to 10C. In FIG. 10A, only the video camera 30 with the camera number 01 is located in the map M1 in a state where the map M1 with a scale of 1/10000 is displayed on the display panel 2131. On the map M1, the camera video Ci1 and the subtitle CST1 of the camera video Ci1 are superimposed and displayed. The subtitle CST1 is displayed outside the camera video Ci1.
 表示パネル2131に縮尺1万分の1の地図M1が表示されている場合においては、例えば、要約度合を100と設定する。要約度合に基づき、位置情報、撮影日、撮影者情報、製作者情報、カメラ番号、カメラの優先順位、撮影目的、タイトル、キャスタ名、登場人物名を選択し、10の項目を含む要約字幕データを作成し、字幕として要約字幕データが表示される。なお、カメラ映像とは、上述した縮小映像データを示し、以降においても同様である。 When the map M1 with a scale of 1/10000 is displayed on the display panel 2131, for example, the summarization degree is set to 100. Based on the degree of summarization, position information, shooting date, photographer information, producer information, camera number, camera priority, shooting purpose, title, caster name, character name, and summary caption data including 10 items Is generated and the summary caption data is displayed as captions. The camera video indicates the reduced video data described above, and the same applies to the following.
 ユーザが地図の縮尺を5万分の1に変更すると、表示パネル2131に縮尺5万分の1の地図M5が表示されている状態で、カメラ番号01~03のビデオカメラ30が地図M5内に位置する。ここでは、地図M1の中心位置は地図M5の右方向へと変位されている。地図M5には、カメラ映像Ci1~Ci3と、カメラ映像Ci1~Ci3の字幕CST1~CST3とが重畳されて表示されている。 When the scale of the map is changed to 1 / 50,000, the video camera 30 with the camera numbers 01 to 03 is positioned in the map M5 in a state where the map M5 of 1 / 50,000 scale is displayed on the display panel 2131. . Here, the center position of the map M1 is displaced to the right of the map M5. On the map M5, camera video images Ci1 to Ci3 and captions CST1 to CST3 of the camera video images Ci1 to Ci3 are displayed in a superimposed manner.
 地図M5に表示されているビデオカメラ30の映像の数は、地図M1に表示されているそれよりも多い。これに伴って、地図M5において字幕CST1~CST3が表示されている領域は、地図M1において字幕CST1が表示されている領域よりも狭い。地図M5における字幕CST1は、地図M1における字幕CST1よりもテキストデータが削減された字幕である。 The number of videos of the video camera 30 displayed on the map M5 is larger than that displayed on the map M1. Accordingly, the area where the subtitles CST1 to CST3 are displayed on the map M5 is narrower than the area where the subtitle CST1 is displayed on the map M1. The subtitle CST1 in the map M5 is a subtitle in which text data is reduced compared to the subtitle CST1 in the map M1.
 表示パネル2131に縮尺5万分の1の地図M5が表示されている場合、例えば、要約度合を70と設定し、位置情報、撮影日、撮影者情報、製作者情報、カメラ番号、撮影目的、タイトルの7つの項目を含む要約字幕データが字幕として表示される。 When the map M5 with a scale of 1 / 50,000 is displayed on the display panel 2131, for example, the summarization degree is set to 70, position information, shooting date, photographer information, producer information, camera number, shooting purpose, title Summary subtitle data including the following seven items is displayed as subtitles.
 図10Bにおいて、さらに、ユーザが地図の縮尺を10万分の1に変更すると、表示パネル2131に縮尺10万分の1の地図M10が表示されている状態で、カメラ番号01~06のビデオカメラ30が地図M10内に位置する。ここでは、地図M5の中心位置は地図M10の右方向へと変位されている。地図M10には、カメラ映像Ci1~Ci6と、カメラ映像Ci1~Ci6の字幕CST1~CST6とが重畳されて表示されている。 In FIG. 10B, when the user further changes the scale of the map to 1 / 100,000, the video camera 30 with the camera numbers 01 to 06 is displayed with the map M10 having a scale of 1 / 100,000 displayed on the display panel 2131. Located in the map M10. Here, the center position of the map M5 is displaced to the right of the map M10. On the map M10, camera images Ci1 to Ci6 and captions CST1 to CST6 of the camera images Ci1 to Ci6 are displayed in a superimposed manner.
 地図M10に表示されているビデオカメラ30の映像の数は、地図M5に表示されているそれよりも多い。これに伴って、地図M10において字幕CST1~CST6が表示されている領域は、地図M5において字幕CST1~CST3が表示されている領域よりも狭い。地図M10における字幕CST1~CST3は、地図M5における字幕CST1~CST3よりもテキストデータが削減された字幕である。 The number of videos of the video camera 30 displayed on the map M10 is larger than that displayed on the map M5. Accordingly, the area where the subtitles CST1 to CST6 are displayed on the map M10 is narrower than the area where the subtitles CST1 to CST3 are displayed on the map M5. The subtitles CST1 to CST3 in the map M10 are subtitles with text data reduced from the subtitles CST1 to CST3 in the map M5.
 表示パネル2131に縮尺10万分の1の地図M10が表示されている場合、例えば、要約度合を20と設定し、撮影者情報、タイトルの2つの項目を含む要約字幕データが字幕として表示される。 When the map M10 with a scale of 1: 100,000 is displayed on the display panel 2131, for example, the summary degree is set to 20, and summary caption data including two items of photographer information and title is displayed as captions.
 地図M10のように複数のビデオカメラ30の映像が重なっている場合、ユーザが下側に位置する映像をクリックする等によって、表示順位設定部215は、下側に位置する映像を上側に位置させることができる。 When the images of the plurality of video cameras 30 are overlapped as in the map M10, the display order setting unit 215 positions the image located on the lower side by clicking the image located on the lower side or the like. be able to.
 図10Cにおいて、さらに、ユーザが地図の縮尺を20万分の1に変更すると、表示パネル2131に縮尺20万分の1の地図M20が表示されている状態で、カメラ番号01~10のビデオカメラ30が地図M10内に位置する。ここでは、地図M10の中心位置は地図M20の左方向へと変位されている。地図M20には、カメラ映像Ci1~Ci10と、カメラ映像Ci1~Ci10の字幕CST1~CST10とが重畳されて表示されている。 In FIG. 10C, when the user further changes the scale of the map to 1 / 200,000, the video camera 30 with the camera numbers 01 to 10 is displayed with the map M20 with a scale of 1 / 200,000 displayed on the display panel 2131. Located in the map M10. Here, the center position of the map M10 is displaced to the left of the map M20. On the map M20, camera videos Ci1 to Ci10 and captions CST1 to CST10 of the camera videos Ci1 to Ci10 are displayed in a superimposed manner.
 地図M20に表示されているビデオカメラ30の映像の数は、地図M10に表示されているそれよりも多い。これに伴って、地図M20において字幕CST1~CST10が表示されている領域は、地図M10において字幕CST1~CST6が表示されている領域よりも狭い。地図M20における字幕CST1~CST6は、地図M10における字幕CST1~CST6よりもテキストデータが削減された字幕である。 The number of videos of the video camera 30 displayed on the map M20 is larger than that displayed on the map M10. Accordingly, the area where the subtitles CST1 to CST10 are displayed on the map M20 is narrower than the area where the subtitles CST1 to CST6 are displayed on the map M10. The subtitles CST1 to CST6 in the map M20 are subtitles with text data reduced compared to the subtitles CST1 to CST6 in the map M10.
 表示パネル2131に縮尺20万分の1の地図M20が表示されている場合、例えば、要約度合を10と設定し、タイトルである1つの項目を含む要約字幕データが字幕として表示される。 When the map M20 with a scale of 1: 200,000 is displayed on the display panel 2131, for example, the summarization degree is set to 10, and the summary caption data including one item as the title is displayed as the caption.
 地図M20では、カメラ映像Ci1~Ci10の代わりに、それぞれのビデオカメラ30を示すカメラ映像Ci1~Ci10よりも小さなアイコンを表示してもよい。ここでは、アイコンにカメラ番号を付与することによりビデオカメラ30を識別できるようにしている。ここで、カメラ映像Ci1~Ci10の代わりに、アイコンを表示する場合には、各字幕を認識しやすいように各字幕をアイコンよりも大きく表示してもよい。 In the map M20, icons smaller than the camera images Ci1 to Ci10 indicating the respective video cameras 30 may be displayed instead of the camera images Ci1 to Ci10. Here, the video camera 30 can be identified by assigning a camera number to the icon. Here, when icons are displayed instead of the camera videos Ci1 to Ci10, each subtitle may be displayed larger than the icon so that each subtitle can be easily recognized.
 また、カメラ映像を表示しない場合には、地図の縮尺が同じ場合(例えば、地図の縮尺が10万分の1)であっても、カメラ映像を表示する場合よりも、カメラ映像の内容を補足して説明する必要がある。そこで、カメラ映像を表示しない場合には、カメラ映像を表示する場合よりも、要約字幕データが含むメタデータの項目数を増加させたり、要約字幕データの文字数を増加させたりする構成にしてもよい。 Further, when the camera video is not displayed, even if the map scale is the same (for example, the map scale is 1 / 100,000), the content of the camera video is supplemented as compared with the case where the camera video is displayed. Need to explain. Therefore, when the camera video is not displayed, the number of metadata items included in the summary subtitle data may be increased or the number of characters of the summary subtitle data may be increased as compared with the case where the camera video is displayed. .
 即ち、カメラ映像を表示しない場合には、地図の縮尺が同じ場合であっても、カメラ映像を表示する場合よりも、要約度合を大きくする。例えば、カメラ映像を表示する場合であって、地図の縮尺を10万分の1としたとき、要約度合を20と設定し、撮影者情報、タイトルの2つの項目を含む要約字幕データが字幕として表示される。一方で、カメラ映像を表示しない場合であって、地図の縮尺を10万分の1としたとき、要約度合を30と設定し、撮影者情報、製作者情報、タイトルの3つの項目を含む要約字幕データが字幕として表示される。 That is, when the camera video is not displayed, the summarization degree is set larger than when the camera video is displayed even if the map scale is the same. For example, in the case of displaying a camera image, when the map scale is set to 1 / 100,000, the summarization degree is set to 20, and summary caption data including two items of photographer information and title is displayed as captions. Is done. On the other hand, when the camera image is not displayed and the map scale is set to 1 / 100,000, the summarization degree is set to 30, and the summary caption including the three items of photographer information, producer information, and title Data is displayed as subtitles.
 図10A~図10Cでは、字幕CST1~CST10をカメラ映像Ci1~Ci10に隣接するように表示している。図11に示すように、地図M1及びM5の例えば右端部に字幕CST1~CST3を表示する領域を設定し、字幕CST1~CST3をカメラ映像Ci1~Ci3と離隔させた状態で表示してもよい。地図M10及びM20においても同様である。 10A to 10C, the subtitles CST1 to CST10 are displayed so as to be adjacent to the camera videos Ci1 to Ci10. As shown in FIG. 11, an area for displaying subtitles CST1 to CST3 may be set, for example, at the right end of the maps M1 and M5, and the subtitles CST1 to CST3 may be displayed separately from the camera videos Ci1 to Ci3. The same applies to the maps M10 and M20.
 また、図10A~図10Cのように、字幕CST1~CST10をカメラ映像Ci1~Ci10と隣り合うように外側に表示しつつ、図11に示すように、地図M1及びM5の例えば右端部に字幕CST1~CST3を表示する領域を設定し、字幕CST1~CST3をカメラ映像Ci1~Ci3と離隔させた状態で表示してもよい。この場合、カメラ映像と隣り合うように外側に表示する字幕要約データの要約度合と、カメラ映像と離隔させた状態で表示する字幕要約データの要約度合とを変えてもよい。 Further, as shown in FIGS. 10A to 10C, the captions CST1 to CST10 are displayed on the outside so as to be adjacent to the camera videos Ci1 to Ci10. A region for displaying CST3 may be set, and subtitles CST1 to CST3 may be displayed in a state of being separated from the camera videos Ci1 to Ci3. In this case, the summarization degree of the caption summary data displayed outside so as to be adjacent to the camera video and the summarization degree of the caption summary data displayed in a state separated from the camera video may be changed.
 以上のように、第2実施形態においては、図10A~図10Bで示したように、地図の縮尺に応じてカメラ映像の表示領域の大きさを変更することなく、字幕の表示領域の大きさを段階的に小さくして表示する字幕の種別を優先順位の低いものから段階的に表示しないようしている。これにより、第2実施形態によれば、地図上のどの位置で誰がどのような映像をどのような目的で撮影しているかを段階的に概念化して可視化することができる。 As described above, in the second embodiment, as shown in FIGS. 10A to 10B, the size of the subtitle display area is changed without changing the size of the camera video display area according to the scale of the map. The type of subtitles to be displayed is reduced step by step so as not to be displayed step by step from the lowest priority. Thus, according to the second embodiment, it is possible to conceptualize and visualize step by step who is taking what video for what purpose at which position on the map.
 また、第2実施形態においては、図10Cで示したように、所定の縮尺になるとカメラ映像の代わりに簡易的なアイコンを表示して字幕の表示領域の大きさを直前の縮尺の字幕の表示領域の大きさよりも大きくしている。これにより、第2実施形態によれば、ビデオカメラ30の映像の数が増加した場合でも、地図上のどの位置で誰がどのような映像をどのような目的で撮影しているかを字幕情報により補間的に可視化することができる。 In the second embodiment, as shown in FIG. 10C, when a predetermined scale is reached, a simple icon is displayed instead of the camera image, and the size of the subtitle display area is displayed to display the subtitle at the previous scale. It is larger than the size of the area. As a result, according to the second embodiment, even when the number of videos of the video camera 30 increases, the subtitle information interpolates who is shooting what video at what position on the map for what purpose. Can be visualized.
 図12A~図12Cを用いて、地図の縮尺を変更した場合の表示状態の遷移の他の例を説明する。図12A~図12Cにおいては、地図M1~M20の全てで、各地図に表示されているビデオカメラ30の映像の数に応じてカメラ映像の大きさを異ならせている。図12A~図12Cに示す例では、地図M20に表示されているカメラ映像Ci1~Ci10を除き、各カメラ映像の内部に各字幕が表示されている。 12A to 12C, another example of the display state transition when the map scale is changed will be described. 12A to 12C, in all of the maps M1 to M20, the size of the camera video is varied according to the number of videos of the video camera 30 displayed on each map. In the example shown in FIGS. 12A to 12C, each caption is displayed inside each camera video except for the camera videos Ci1 to Ci10 displayed on the map M20.
 図12A~図12Cにおいても、図10A~図10Cと同様に各カメラ映像の外部に各カメラ映像と隣接させて各字幕を表示してもよいし、図11と同様に各字幕を表示する領域を設定して、各字幕を各カメラ映像と離隔させた状態で表示してもよい。 12A to 12C, as in FIGS. 10A to 10C, subtitles may be displayed outside each camera video adjacent to each camera video, and each subtitle is displayed in the same manner as in FIG. And each subtitle may be displayed in a state separated from each camera video.
 図13は、地図内に表示されるカメラ映像の数である表示チャネル数に応じて要約度合を設定する場合の例を示している。図13に示すように、表示チャネル数が1~2、3~5、6~10、11~20、21以上で、要約度合はそれぞれ100、80、40、10、5とされる。表示チャネル数が所定の数以上で要約度合を0とし、字幕を表示しないようにしてもよい。表示チャネル数が21以上で要約度合を0としてもよい。 FIG. 13 shows an example in which the summarization degree is set according to the number of display channels, which is the number of camera videos displayed in the map. As shown in FIG. 13, the number of display channels is 1 to 2, 3 to 5, 6 to 10, 11 to 20, 21 or more, and the summarization degrees are 100, 80, 40, 10, 5 respectively. The number of display channels may be a predetermined number or more, the summarization degree may be set to 0, and captions may not be displayed. The number of display channels may be 21 or more, and the summarization degree may be 0.
 図14は、地図の縮尺に応じて要約度合を設定する場合の例を示している。図14に示すように、地図の縮尺が1万分の1以下、1万分の1超で5万分の1以下、5万分の1超で10万分の1以下、10万分の1超で、要約度合はそれぞれ100、70、20、10とされる。同様に、地図の縮尺が所定の縮尺を超えたら、要約度合を0として字幕を表示しないようにしてもよい。 FIG. 14 shows an example of setting the summarization degree according to the scale of the map. As shown in Fig. 14, the scale of the map is less than 1 / 10,000, less than 1 / 10,000, less than 1 / 50,000, more than 1 / 50,000, less than 1 / 100,000 and more than 1 / 100,000. Are 100, 70, 20, 10 respectively. Similarly, if the scale of the map exceeds a predetermined scale, the summarization degree may be set to 0 so that no caption is displayed.
 図15に示すフローチャートを用いて、第2実施形態の字幕生成装置の動作を説明する。図15において、処理が開始されると、地図データ取得部209は、ステップS201にて、地図データを取得する。これと並行して、カメラ映像取得部202及びメタデータ抽出部203は、ステップS202にて、各ビデオカメラ30から送信された映像データ及びメタデータを取得する。ステップS202は映像取得ステップである。 The operation of the caption generation device of the second embodiment will be described using the flowchart shown in FIG. In FIG. 15, when the process is started, the map data acquisition unit 209 acquires map data in step S201. In parallel with this, the camera image acquisition unit 202 and the metadata extraction unit 203 acquire the image data and metadata transmitted from each video camera 30 in step S202. Step S202 is a video acquisition step.
 表示領域内外判定部206は、ステップS203にて、表示部213(表示パネル2131)に表示される地図内に位置するビデオカメラ30を検出する。要約度合設定部207は、ステップS204にて、表示部213に表示される地図内に位置するビデオカメラ30の数に応じて要約度合を設定する。字幕要約部208は、ステップS205にて、メタデータに記述されたテキストデータを要約度合に応じて要約して要約字幕データを生成する。ステップS205は字幕要約ステップである。 The display area inside / outside determination unit 206 detects the video camera 30 located in the map displayed on the display unit 213 (display panel 2131) in step S203. In step S204, the summarization degree setting unit 207 sets the summarization degree according to the number of video cameras 30 located in the map displayed on the display unit 213. In step S205, the caption summarizing section 208 summarizes the text data described in the metadata according to the degree of summarization to generate summarized caption data. Step S205 is a caption summarization step.
 ステップS204及びS205と並行して、映像縮小部210は、ステップS206にて、表示部213に表示される地図内に位置するビデオカメラ30から送信された映像データを縮小する。 In parallel with steps S204 and S205, the video reduction unit 210 reduces the video data transmitted from the video camera 30 located in the map displayed on the display unit 213 in step S206.
 画像合成部212は、ステップS207にて、地図データ、縮小された映像データ、要約字幕データを画像合成する。表示部213は、ステップS208にて、合成画像を表示する。 In step S207, the image composition unit 212 synthesizes the map data, the reduced video data, and the summary caption data. In step S208, the display unit 213 displays the composite image.
 地図中心位置設定部216及び地図縮尺設定部217は、ステップS209にて、地図の中心位置または縮尺が変更されたか否かを判定する。地図縮尺設定部217による地図の縮尺を設定する処理は、地図縮尺設定ステップである。地図の中心位置または縮尺が変更されれば(YES)、ステップS201~S209の処理が繰り返される。地図の中心位置または縮尺が変更されなければ(NO)、ステップS210にて、地図表示装置20は、操作部214によって地図表示終了の指示がなされたか否かを判定する。 The map center position setting unit 216 and the map scale setting unit 217 determine whether or not the map center position or scale has been changed in step S209. The process of setting the map scale by the map scale setting unit 217 is a map scale setting step. If the center position or scale of the map is changed (YES), the processes in steps S201 to S209 are repeated. If the map center position or scale is not changed (NO), in step S210, the map display device 20 determines whether or not an instruction to end map display has been given by the operation unit 214.
 地図表示終了の指示がなされなければ(NO)、ステップS208~S210の処理が繰り返される。地図表示終了の指示がなされれば(YES)、地図表示装置20は処理を終了させる。 If the instruction to end the map display is not given (NO), the processing of steps S208 to S210 is repeated. If an instruction to end the map display is given (YES), the map display device 20 ends the process.
 図8に示す映像送受信システムにおいては、ビデオカメラ30が動画像を地図表示装置20に送信する場合を説明したが、ビデオカメラ30が所定間隔で静止画像を地図表示装置20に送信してもよい。この場合、ビデオカメラ30が静止画像を撮影して地図表示装置20に送信してもよいし、ビデオカメラ30の代わりにスチルカメラが静止画像を撮影して地図表示装置20に送信してもよい。即ち、カメラが被写体を撮影した映像が地図表示装置20に送信されればよい。ここで、所定間隔は、例えば3秒であるとする。 In the video transmission / reception system shown in FIG. 8, the case where the video camera 30 transmits a moving image to the map display device 20 has been described. However, the video camera 30 may transmit still images to the map display device 20 at predetermined intervals. . In this case, the video camera 30 may capture a still image and transmit it to the map display device 20, or a still camera may capture a still image and transmit it to the map display device 20 instead of the video camera 30. . That is, it is only necessary that an image obtained by photographing the subject by the camera is transmitted to the map display device 20. Here, it is assumed that the predetermined interval is, for example, 3 seconds.
 地図表示装置20において、各地図に複数のカメラ映像が重畳されるとき、複数のカメラ映像を所定の時間ごとに順に表示して、複数のカメラ映像が同時に表示されないようにしてもよい。また、地図表示装置20において、図11に示すように、右端部に字幕と共にカメラ映像を表示する領域を設定してもよい。 In the map display device 20, when a plurality of camera images are superimposed on each map, the plurality of camera images may be sequentially displayed at predetermined time intervals so that the plurality of camera images are not displayed simultaneously. Moreover, in the map display apparatus 20, as shown in FIG. 11, you may set the area | region which displays a camera image | video with a caption at the right end part.
 図9に示す地図表示装置20の各部は、集積回路等のハードウェアで構成されていてもよいし、ソフトウェア(コンピュータプログラム)で構成されていてもよい。ハードウェアとソフトウェアとの使い分けは任意である。図15に示すフローチャートは、第2実施形態の字幕生成プログラムがコンピュータに実行させる処理であってもよい。図9に示す地図表示装置20は地図を見るためのソフトウェアであるブラウザによって構成することができる。 Each part of the map display device 20 shown in FIG. 9 may be configured by hardware such as an integrated circuit, or may be configured by software (computer program). Use of hardware and software is optional. The flowchart shown in FIG. 15 may be processing that the subtitle generation program of the second embodiment causes the computer to execute. The map display device 20 shown in FIG. 9 can be configured by a browser which is software for viewing a map.
 同様に、字幕生成プログラムはネットワーク50を介して地図表示装置20に送信されてもよいし、非一時的な記憶媒体に記憶されて地図表示装置20に提供されてもよい。 Similarly, the caption generation program may be transmitted to the map display device 20 via the network 50, or may be stored in a non-temporary storage medium and provided to the map display device 20.
<第3実施形態>
 図16は、第3実施形態の字幕生成装置を含んで構成される投稿動画像配信システムを示す。ネットワーク50には、コンテンツサーバ60と、コンピュータ70とが接続されている。コンテンツサーバ60は、投稿動画像を記憶する動画像記憶部601、投稿動画像のサムネイル画像を生成するサムネイル画像生成部602、各投稿動画像に付随するテキストデータを記憶するテキストデータ記憶部603を備える。
<Third Embodiment>
FIG. 16 shows a posted moving image distribution system including the caption generation device according to the third embodiment. A content server 60 and a computer 70 are connected to the network 50. The content server 60 includes a moving image storage unit 601 that stores posted moving images, a thumbnail image generation unit 602 that generates thumbnail images of posted moving images, and a text data storage unit 603 that stores text data associated with each posted moving image. Prepare.
 また、コンテンツサーバ60は、要約度合設定部604と、字幕要約部605を備える。 Also, the content server 60 includes a summary degree setting unit 604 and a caption summary unit 605.
 テキストデータは、投稿動画像の内容の概略を記述する文字であってもよく、投稿動画像の内容を補足する文字であってもよく、投稿動画像に関連するコメントであってもよい。 The text data may be a character describing the outline of the content of the posted moving image, may be a character supplementing the content of the posted moving image, or may be a comment related to the posted moving image.
 コンテンツサーバ60が配信する投稿動画像または投稿動画像のサムネイル画像は、1または複数のチャネルの映像である。コンピュータ70は、コンテンツサーバ60が配信する投稿動画像またはサムネイル画像等を受信する。 The posted moving image distributed by the content server 60 or the thumbnail image of the posted moving image is a video of one or a plurality of channels. The computer 70 receives a posted moving image or a thumbnail image distributed by the content server 60.
 コンピュータ70が備える記憶部701には、コンテンツサーバ60が提供する投稿動画像を視聴するためのソフトウェアであるブラウザ702が記憶されている。コンピュータ70は、ブラウザ702を実行させることによって、表示部703に投稿動画像を選択するためのサムネイル画像を表示させたり、サムネイル画像を選択することによって投稿動画像を表示させたりすることができる。 The storage unit 701 provided in the computer 70 stores a browser 702 that is software for viewing a posted moving image provided by the content server 60. The computer 70 can display a thumbnail image for selecting a posted moving image on the display unit 703 by executing the browser 702, or can display a posted moving image by selecting a thumbnail image.
 図17Aは、コンピュータ70が、コンテンツサーバ60に、サイズの大きいサムネイル画像を表示させるように指示したときの表示部703の表示状態の一例を示している。投稿動画像001のサムネイル画像Ti1に隣接して、サムネイル画像Ti1の字幕TST1が表示され、投稿動画像002のサムネイル画像Ti2に隣接して、サムネイル画像Ti2の字幕TST2が表示されている。 FIG. 17A shows an example of the display state of the display unit 703 when the computer 70 instructs the content server 60 to display a large thumbnail image. The subtitle TST1 of the thumbnail image Ti1 is displayed adjacent to the thumbnail image Ti1 of the posted moving image 001, and the subtitle TST2 of the thumbnail image Ti2 is displayed adjacent to the thumbnail image Ti2 of the posted moving image 002.
 要約度合設定部604は、サイズの大きいサムネイル画像の表示指示に応答して、テキストデータの文字数を削減しないか文字数をさほど削減しない要約度合を設定する。字幕要約部605は、要約度合設定部604で設定された要約度合でテキストデータを要約して字幕TST1及びTST2の要約字幕データを生成する。コンテンツサーバ60は、サイズの大きいサムネイル画像Ti1及びTi2の映像データと、字幕TST1及びTST2の要約字幕データをコンピュータ70へと配信する。 The summary degree setting unit 604 sets a summary degree that does not reduce the number of characters in the text data or does not reduce the number of characters in response to an instruction to display a large thumbnail image. The subtitle summarizing section 605 summarizes the text data with the summarization degree set by the summarization degree setting section 604 to generate the subtitle data of the subtitles TST1 and TST2. The content server 60 distributes the video data of the large thumbnail images Ti1 and Ti2 and the summary caption data of the captions TST1 and TST2 to the computer 70.
 図17Bは、コンピュータ70が、コンテンツサーバ60に、サイズの小さいサムネイル画像を表示させるように指示したときの表示部703の表示状態の一例を示している。投稿動画像001~009のサムネイル画像Ti1~Ti9に隣接して字幕TST1~TST9が表示されている。 FIG. 17B shows an example of the display state of the display unit 703 when the computer 70 instructs the content server 60 to display a thumbnail image having a small size. Subtitles TST1 to TST9 are displayed adjacent to thumbnail images Ti1 to Ti9 of posted moving images 001 to 009.
 要約度合設定部604は、サイズの小さいサムネイル画像の表示指示に応答して、テキストデータの文字数を多く削減する要約度合を設定する。字幕要約部605は、要約度合設定部604で設定された要約度合でテキストデータを要約して字幕TST1~TST9の要約字幕データを生成する。コンテンツサーバ60は、サイズの小さいサムネイル画像Ti1~Ti9の映像データと、字幕TST1~TST9の要約字幕データをコンピュータ70へと配信する。 The summary degree setting unit 604 sets a summary degree that reduces the number of characters in the text data in response to an instruction to display a small thumbnail image. The caption summarizing section 605 summarizes the text data with the summarization degree set by the summarization degree setting section 604, and generates the summarization caption data of the captions TST1 to TST9. The content server 60 distributes video data of small thumbnail images Ti1 to Ti9 and summary caption data of captions TST1 to TST9 to the computer 70.
 図17Aまたは図17Bにおいて、いずれかのサムネイル画像が選択された投稿動画像の再生が指示されたら、コンテンツサーバ60はコンピュータ70に投稿動画像の動画像データを配信する。 In FIG. 17A or FIG. 17B, when the reproduction of the posted moving image in which any thumbnail image is selected is instructed, the content server 60 distributes the moving image data of the posted moving image to the computer 70.
 図16に示す投稿動画像配信システムにおいては、要約度合設定部604及び字幕要約部605をコンテンツサーバ60に設けているが、ブラウザ702に同様の機能を持たせて、図17A及び図17Bの表示状態を実現することも可能である。 In the posted moving image distribution system shown in FIG. 16, the summary degree setting unit 604 and the caption summary unit 605 are provided in the content server 60, but the browser 702 has the same function, and the display of FIGS. 17A and 17B. It is also possible to realize the state.
 以上のように、第1~第3実施形態の字幕生成装置及び字幕生成プログラムによれば、表示部5、213、703に表示される1または複数の映像のチャネル数に応じた態様で字幕を生成することができる。第1~第3実施形態の字幕生成装置を備えるか、第1~第3実施形態の字幕生成プログラムを実行する映像表示装置10、地図表示装置20、投稿動画像配信システム(コンピュータ70)によれば、映像のチャネル数が増加しても、ユーザは各チャネルの字幕を網羅的に把握すること可能となる。 As described above, according to the caption generation device and the caption generation program of the first to third embodiments, captions are generated in a mode corresponding to the number of channels of one or a plurality of videos displayed on the display units 5, 213, and 703. Can be generated. According to the video display device 10, the map display device 20, and the posted moving image distribution system (computer 70) that include the caption generation device of the first to third embodiments or execute the caption generation program of the first to third embodiments. For example, even if the number of video channels increases, the user can comprehensively grasp the subtitles of each channel.
 本発明は以上説明した第1~第3実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において種々変更可能である。 The present invention is not limited to the first to third embodiments described above, and various modifications can be made without departing from the scope of the present invention.
 本願の開示は、2018年3月26日に出願された特願2018-058459号に記載の主題と関連しており、それらの全ての開示内容は引用によりここに援用される。 The disclosure of the present application is related to the subject matter described in Japanese Patent Application No. 2018-05859 filed on March 26, 2018, the entire disclosure content of which is incorporated herein by reference.

Claims (7)

  1.  表示部に表示される映像の数、または前記表示部に表示される映像の大きさを示す表示映像サイズに応じて、前記映像に関連するテキストデータを要約した要約字幕を生成する字幕要約部を備える字幕生成装置。 A subtitle summarizing section that generates a summary subtitle summarizing text data related to the video according to the number of videos displayed on the display unit or a display video size indicating a size of the video displayed on the display unit. A caption generation device.
  2.  前記映像が撮影された位置を示す位置情報を取得する位置取得部と、
     前記表示部に表示される地図の縮尺を設定する地図縮尺設定部と、
     前記地図の縮尺と前記位置情報とから、前記表示部に表示させる前記映像の数を判定する判定部と、
     を備える請求項1に記載の字幕生成装置。
    A position acquisition unit that acquires position information indicating a position at which the video was shot;
    A map scale setting unit for setting a scale of a map displayed on the display unit;
    A determination unit that determines the number of images to be displayed on the display unit from the scale of the map and the position information;
    The caption generation device according to claim 1, comprising:
  3.  表示部に表示される地図の縮尺を設定する地図縮尺設定部と、
     前記地図の縮尺に応じて、前記表示部に表示される映像に関連するテキストデータを要約した要約字幕を生成する字幕要約部と、
     を備える字幕生成装置。
    A map scale setting unit for setting the scale of the map displayed on the display unit;
    According to the scale of the map, a caption summary unit that generates summary captions that summarize text data related to video displayed on the display unit;
    A caption generation device comprising:
  4.  音声データに基づいて前記映像に関連するテキストデータを生成する音声認識部と、
     を備える請求項1~3のいずれか1項に記載の字幕生成装置。
    A voice recognition unit that generates text data related to the video based on voice data;
    The caption generation device according to any one of claims 1 to 3, further comprising:
  5.  前記テキストデータは、前記映像と関連する各種の情報が記述されたメタデータである請求項1~3のいずれか1項に記載の字幕生成装置。 The caption generation device according to any one of claims 1 to 3, wherein the text data is metadata in which various information related to the video is described.
  6.  表示される映像の数、または前記表示される映像の大きさを示す表示映像サイズに応じて、前記映像に関連するテキストデータを要約した要約字幕を生成する字幕要約ステップをコンピュータに実行させる字幕生成プログラム。 Subtitle generation that causes a computer to execute a subtitle summarization step that generates summary subtitles summarizing text data related to the video according to the number of videos to be displayed or a display video size indicating the size of the video to be displayed program.
  7.  表示される地図の縮尺を設定する地図縮尺設定ステップと、
     前記地図の縮尺に応じて、映像に関連するテキストデータを要約した要約字幕を生成する字幕要約ステップと、
     をコンピュータに実行させる字幕生成プログラム。
    A map scale setting step for setting the scale of the displayed map;
    Subtitle summarizing step for generating summary subtitles summarizing text data related to video according to the scale of the map;
    A subtitle generation program that causes a computer to execute.
PCT/JP2019/010807 2018-03-26 2019-03-15 Subtitle generation device and subtitle generation program WO2019188406A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018-058459 2018-03-26
JP2018058459A JP2019169928A (en) 2018-03-26 2018-03-26 Subtitle generation device and subtitle generation program

Publications (1)

Publication Number Publication Date
WO2019188406A1 true WO2019188406A1 (en) 2019-10-03

Family

ID=68060075

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/010807 WO2019188406A1 (en) 2018-03-26 2019-03-15 Subtitle generation device and subtitle generation program

Country Status (2)

Country Link
JP (1) JP2019169928A (en)
WO (1) WO2019188406A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7282118B2 (en) * 2021-03-16 2023-05-26 株式会社ほぼ日 Program and information processing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009229172A (en) * 2008-03-21 2009-10-08 Alpine Electronics Inc Information-providing system and information-providing method
JP2017067834A (en) * 2015-09-28 2017-04-06 株式会社オプティム A taken image display device of unmanned aircraft, taken image display method, and taken image display program
JP2017131552A (en) * 2016-01-29 2017-08-03 ブラザー工業株式会社 Information processor and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009229172A (en) * 2008-03-21 2009-10-08 Alpine Electronics Inc Information-providing system and information-providing method
JP2017067834A (en) * 2015-09-28 2017-04-06 株式会社オプティム A taken image display device of unmanned aircraft, taken image display method, and taken image display program
JP2017131552A (en) * 2016-01-29 2017-08-03 ブラザー工業株式会社 Information processor and program

Also Published As

Publication number Publication date
JP2019169928A (en) 2019-10-03

Similar Documents

Publication Publication Date Title
US20090172512A1 (en) Screen generating apparatus and screen layout sharing system
US20160261927A1 (en) Method and System for Providing and Displaying Optional Overlays
JP5013832B2 (en) Image control apparatus and method
JP6399725B1 (en) Text content generation device, transmission device, reception device, and program
JP4935818B2 (en) Digital broadcast receiving apparatus and digital broadcast receiving method
US8543912B2 (en) Methods, systems, and computer products for implementing content conversion and presentation services
JP6700957B2 (en) Subtitle data generation device and program
JPH0965300A (en) Information transmission/reception system, transmission information generator and received information reproducing device used for this system
US20100083314A1 (en) Information processing apparatus, information acquisition method, recording medium recording information acquisition program, and information retrieval system
EP2566173A1 (en) Reception apparatus, reception method and external apparatus linking system
JP2004312208A (en) Device, method and program for displaying video
US20130117798A1 (en) Augmenting content generating apparatus and method, augmented broadcasting transmission apparatus and method, and augmented broadcasting reception apparatus and method
WO2019188406A1 (en) Subtitle generation device and subtitle generation program
US9342813B2 (en) Apparatus and method for displaying log information associated with a plurality of displayed contents
KR101587442B1 (en) Method of providing augmented contents and apparatus for performing the same, method of registering augmented contents and apparatus for performing the same, system for providing targeting augmented contents
US20130104165A1 (en) Method and apparatus for receiving augmented broadcasting content, method and apparatus for providing augmented content, and system for providing augmented content
US20080013917A1 (en) Information intermediation system
JP2006100949A (en) Program table video signal generating apparatus, program table video control apparatus, and television receiver
JP2009100163A (en) Content playback apparatus, content playback system, content playback method, and program
JP2015037290A (en) Video control device, video display device, video control system, and method
JP2007104540A (en) Device, program and method for distributing picked-up image
KR100889725B1 (en) Presentation Method and VOD System for Contents Information Provision in VOD Service
US20220224980A1 (en) Artificial intelligence information processing device and artificial intelligence information processing method
WO2023042403A1 (en) Content distribution server
KR100965387B1 (en) Rich media server and rich media transmission system and rich media transmission method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19774849

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19774849

Country of ref document: EP

Kind code of ref document: A1