WO2022065537A1

WO2022065537A1 - Video reproduction device for providing subtitle synchronization and method for operating same

Info

Publication number: WO2022065537A1
Application number: PCT/KR2020/012833
Authority: WO
Inventors: 이은진
Original assignee: 주식회사 파이프랩스
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2022-03-31

Abstract

A method for operating a video reproduction device according to an embodiment of the present invention comprises the steps of: acquiring video information to be reproduced; acquiring subtitle information corresponding to the video information; processing one or more timeline indices from one or more audio tracks extracted from the video information; comparing the one or more timeline indices with the subtitle information so as to calculate a correction time; and outputting the video information which enables synchronization of the subtitle information according to the correction time.

Description

Video reproducing apparatus providing subtitle synchronization and operating method therefor

The present invention relates to an image reproducing apparatus and an operating method thereof. More specifically, the present invention relates to a video reproducing apparatus providing subtitle synchronization and an operating method thereof.

In the midst of the modern flood of information, the number of video contents is continuously increasing, and displaying subtitles together for effective use of multimedia video contents is being established as a very common technique.

In particular, as the viewing rate of video content using mobile terminals is gradually increasing and globalized, video content with caption information is increasing day by day. This additional function is also gradually improved.

However, although this automatic subtitle addition function is being improved by artificial intelligence or machine learning, the recognition rate is not high, and due to incorrect recognition, there is a problem that the utterance timing of the character in the actual video information and the content are not accurately synchronized. there is.

In addition, although there are cases where a subtitle text file in which the actual time section is marked is separately provided for video content, minute errors occur due to differences in the operation of the playback application of the operating system for mobile terminals, so that the timing of speech in the video and the timing of audio speech are different. There are cases of discrepancy. In particular, in the process of distributing video content, there are also problems in which synchronization is out of sync due to an error occurring in the conversion or editing process for each device and operating system.

In order to solve this problem, a function for the viewer to manually adjust the audio sync within the video player is provided, but it is difficult to accurately adjust the audio sync. There is a problem that cannot be solved in the case of a different deviation for each section.

In particular, the difficulty of setting the subtitle synchronization becomes more and more significant when automated subtitle application technologies such as the aforementioned STT and machine translation are applied in a complex manner.

The present invention has been devised to solve the above problems, and by using the audio level-based timeline index extracted from the audio track information of the video information, the correction time of the subtitle information is calculated, and according to the correction time, the correction time is calculated. An object of the present invention is to provide a video reproducing apparatus and an operating method of the same that enable synchronization correction for each caption time section at an appropriate timing according to a synchronization method and setting desired by a user by outputting the video information in which the caption information can be synchronized. .

According to an embodiment of the present invention, there is provided a method of operating an image reproducing apparatus, the method comprising: acquiring image information to be reproduced; obtaining subtitle information corresponding to the image information; processing one or more timeline indices from one or more audio tracks extracted from the image information; calculating a correction time by comparing the one or more timeline indexes with the subtitle information; and outputting the image information with which the subtitle information can be synchronized according to the correction time.

In addition, an apparatus according to an embodiment of the present invention for solving the above problems is an image reproducing apparatus, comprising: an image information acquisition unit for acquiring image information to be reproduced; a caption obtaining unit obtaining caption information corresponding to the image information; a timeline index processing unit that processes one or more timeline indices from one or more audio tracks extracted from the image information; a correction time calculator for calculating a correction time by comparing the one or more timeline indexes with the subtitle information; and an output unit for outputting the image information with which the subtitle information can be synchronized according to the correction time.

According to an embodiment of the present invention, a correction time of subtitle information is calculated using a timeline index based on audio level extracted from audio track information of image information, and the subtitle information can be synchronized according to the correction time. By outputting image information, it is possible to provide an image reproducing apparatus and an operating method thereof that enable synchronization correction for each subtitle time section at an appropriate timing according to a synchronization method and setting desired by a user.

In addition, according to an embodiment of the present invention, a timeline index is synthesized and calculated according to audio track and STT subtitle information, thereby providing more accurate synchronization correction for each subtitle time section.

1 is a block diagram illustrating an image reproducing apparatus according to an embodiment of the present invention.

2 is a block diagram illustrating a subtitle synchronizer in more detail according to an embodiment of the present invention.

3 is a flowchart illustrating a method of operating an image reproducing apparatus according to an embodiment of the present invention.

4 to 6 are diagrams for explaining a correction time calculation method according to an embodiment of the present invention.

7 to 9 are diagrams for explaining an image playback interface according to an embodiment of the present invention.

10 to 11 are diagrams for exemplifying a timeline index according to each setting and synthesizing operation according to an embodiment of the present invention and correction of real captions accordingly.

The following is merely illustrative of the principles of the invention. Therefore, those skilled in the art will be able to devise various devices that, although not explicitly described or shown herein, embody the principles of the present invention and are included within the spirit and scope of the present invention. Further, it is to be understood that all conditional terms and examples listed herein are, in principle, expressly intended solely for the purpose of enabling the concept of the present invention to be understood, and not limited to the specifically enumerated embodiments and states as such. should be

Moreover, it is to be understood that all detailed description reciting the principles, aspects, and embodiments of the invention, as well as specific embodiments, are intended to cover structural and functional equivalents of such matters. It should also be understood that such equivalents include not only currently known equivalents, but also equivalents developed in the future, i.e., all devices invented to perform the same function, regardless of structure.

Thus, for example, the block diagrams herein are to be understood as representing conceptual views of illustrative circuitry embodying the principles of the invention. Similarly, all flowcharts, state transition diagrams, pseudo code, etc. may be tangibly embodied on computer-readable media and be understood to represent various processes performed by a computer or processor, whether or not a computer or processor is explicitly shown. should be

In addition, the clear use of terms presented as processor, control, or similar concepts should not be construed as exclusively referring to hardware having the ability to execute software, and without limitation, digital signal processor (DSP) hardware, ROM for storing software. It should be understood to implicitly include (ROM), RAM (RAM) and non-volatile memory. Other common hardware may also be included.

The above objects, features and advantages will become more apparent through the following detailed description in relation to the accompanying drawings, and accordingly, those of ordinary skill in the art to which the present invention pertains can easily implement the technical idea of the present invention. There will be. In addition, in the description of the present invention, if it is determined that a detailed description of a known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted.

Hereinafter, a preferred embodiment according to the present invention will be described in detail with reference to the accompanying drawings.

1 is a block diagram illustrating an image reproducing apparatus according to an embodiment of the present invention, and FIG. 2 is a block diagram illustrating a subtitle synchronizer according to an embodiment of the present invention in more detail.

Referring to FIG. 1 , an image reproducing apparatus according to an embodiment of the present invention includes a controller 110 , a communication unit 105 , a user input unit 120 , a video player 130 , a subtitle synchronizer 140 , and an output unit. 150 , a setting unit 160 and a storage unit 170 are included.

The controller 110 includes one or more microprocessors for controlling the overall functional operation of the image reproducing apparatus 100 and each component.

The communication unit 105 includes one or more communication modules for performing communication between the image reproducing apparatus 100 and a network.

Here, the communication unit 105 is a type connected through a wireless or wired method through a local area network (LAN) and an Internet network, a type connected through a USB (Universal Serial Bus) port, such as 3G, 4G, 5G A form connected through a mobile communication network, a form connected through a short-range wireless communication method such as Near Field Communication (NFC), Radio Frequency Identification (RFID), Wi-Fi, etc. For example, the communication unit 105 is a mobile communication It may include a module, a wireless Internet module, a short-range communication module, or a wired communication module.

The user input unit 120 may include one or more interface modules for receiving a user input. More specifically, the user input unit 120 may receive and process various interface input information for reproduction of a caption video, such as input information for video reproduction, setting information for caption synchronization, and caption synchronization input information.

The video reproducing unit 130 obtains and reproduces image information according to a user input, and the reproduced image information may be output through the output unit 150 .

The video reproducing unit 130 may include one or more image decoding modules for reproducing image information, and the image decoding module obtains and decodes a bitstream of image information encoded with one or more codecs, thereby outputting unit 150 It is possible to obtain an image frame and an audio signal in an outputable form. Here, the image information may be obtained from an image file pre-stored in the storage unit 170 of the image reproducing apparatus 100 or may be obtained from an image stream received from an external server and buffered in the storage unit 170 .

In addition, the video reproducing unit 130 may obtain caption information corresponding to the image information and insert it into the image frame information output from the output unit 150 . Here, the caption information is obtained from a caption file pre-stored in the storage unit 170 in response to the video file or video stream, or STT generated from the audio signal of the video file or video stream using an STT (Speech To Text) application process. It may be obtained from subtitle information, or STT subtitle information may be obtained from STT subtitle information of a second language obtained through machine translation.

Then, the subtitle synchronization unit 140 calculates a correction time for synchronizing the subtitle information to the image information according to the user input of the user input unit 120, and uses the correction time to determine the starting point of the dialogue of the subtitle section and Temporal correction processing of subtitle information is performed so that synchronization between the conversation start points of the video is made.

More specifically, the subtitle synchronizer 140 may process one or more timeline indexes from one or more audio tracks extracted from the image information, compare the one or more timeline indexes with the subtitle information, and adjust the correction time , and outputting image information with which the caption information can be synchronized according to the correction time through the output unit 150 .

In addition, the caption synchronizer 140 may output, through the output unit 150 , image information in which the caption sync is corrected according to the correction time synchronization for each caption section in response to a user request input through the video information interface.

Here, the output unit 150 may include a display module and an audio output module for outputting image reproduction information in which subtitle information is inserted, and receives a user input from the user input unit 120 under the control of the control unit 110 . A graphical user interface for receiving may be further output.

According to this embodiment of the present invention, the correction time of subtitle information is calculated using the timeline index based on the audio level extracted from the audio track information of the video information, and the subtitle information can be synchronized according to the correction time. By outputting the image information, it is possible to perform synchronization correction for each subtitle time section at an appropriate timing according to a synchronization method and setting desired by the user. This will be described in more detail with reference to FIG. 2 .

Referring to FIG. 2 , the caption synchronizer 140 according to an embodiment of the present invention includes an image information obtaining unit 141 , a caption obtaining unit 143 , an audio track extracting unit 145 , and a timeline index processing unit 147 . ), and a correction time calculation unit 149 .

First, the image information acquisition unit 141 acquires image information being reproduced by the video playback unit 130 .

Then, the subtitle acquisition unit 143 acquires the subtitle information obtained in advance in response to the image information reproduced by the video playback unit 130 as described above. As described above, the subtitle information is obtained from a subtitle file pre-stored in the storage unit 170 in response to an image file or an image stream, or from an audio signal of an image file or image stream using an STT (Speech To Text) application process. It may be obtained from generated STT subtitle information, or may be obtained from STT subtitle information of a second language obtained through machine translation by STT subtitle information.

In addition, the audio track extractor 145 extracts one or more audio track information from the image information reproduced by the video reproducing unit 130 . Here, there may be one or more audio track information, and a plurality of audio track information according to a plurality of languages may be included in the image file information.

Then, the timeline index processing unit 147 processes one or more timeline indexes from one or more audio tracks extracted from the image information.

Here, the timeline index processing may include mapping an amount of change in audio level for each timeline to a timeline index based on the volume information of the audio track. Accordingly, it is possible to determine a comparison criterion between the dialogue starting point in the image information and the dialogue starting point based on the amount of change in the audio level.

Accordingly, the correction time calculator 149 calculates a correction time by comparing the one or more timeline indices with the subtitle information, and the output unit 150 outputs the image information with which the subtitle information can be synchronized according to the correction time. to output

More specifically, the correction time calculator 149 may variably determine the synchronization processing between the subtitle information and the image information based on the timeline index according to the format of the subtitle information and the setting of the setting unit 160 .

For example, the caption information may include caption section file text information obtained from the caption file. In this case, the correction time calculation unit 149 may calculate the correction time for each subtitle section by comparing the dialogue start point information identified from the timeline index with the dialogue start point information of the caption section file text information.

In addition, the caption information may include STT text information for a caption section obtained by STT (Speech To Text) conversion from the audio track information. In this case, the correction time calculator 149 may calculate the correction time for each subtitle section by comparing the conversation start point information identified from the timeline index with the conversation start point information of the STT text information of the subtitle section.

Also, the timeline index processing unit 147 may map and allocate the timeline index for each subtitle time section divided according to the subtitle section threshold setting. This enables more accurate dialogue starting point information to be determined by extending or reducing the number of subtitle sections in the same audio track.

That is, there may be a plurality of dialogue starting point information, and the number may be increased or decreased according to the subtitle section threshold setting, which may be determined according to a user setting or the like. As the caption angle threshold is set lower, a more accurate correction time may be calculated, but the amount of computation of the image reproducing apparatus 100 may increase.

In addition, the timeline index processing unit 147 is configured to perform a caption section synthesis operation using the audio level index for each time period calculated from the audio track and the caption section index obtained by STT (Speech to Text) conversion from the audio track information. , it is also possible to create a new timeline index.

And, referring back to FIG. 1 , the setting unit 160 may set the subtitle section threshold information and the weight information for each index for the synthesis operation of the timeline index processing unit 147 , and accordingly, the user can provide more accurate subtitles. Detailed settings for synchronization can be determined by entering them through a simple interface.

In addition, the output unit 150 may include at least one display module and an audio output module for outputting image information, audio information extracted from the image information, and subtitle information, and setting for the above-described subtitle synchronization setting information Interface can be printed.

In addition, the output unit 150 reproduces the video information reproduced by the video playback unit 130 together with the unsynchronized subtitle information, and outputs a notification interface indicating that the subtitle information can be synchronized when the correction time is calculated. and, according to a user input corresponding to the notification interface, the image information in which the subtitle information is synchronized according to the correction time may be output.

Accordingly, the user can easily check whether synchronization is ready according to the calculation of the correction time, and by simply inputting the execution through the subtitle synchronization interface, the image information synchronization of the subtitle information according to the calculation of the correction time is quickly processed, thereby improving user convenience. can

3 is a flowchart for explaining a method of operating an image reproducing apparatus according to an embodiment of the present invention, FIGS. 4 to 6 are views for explaining a correction time calculation method according to an embodiment of the present invention, and FIGS. 7 to 9 is a diagram for explaining an image playback interface according to an embodiment of the present invention.

Referring to FIG. 3 , first, the video reproducing apparatus 100 according to an embodiment of the present invention obtains caption information corresponding to the video reproduction information reproduced by the video reproducing unit 130 ( S101 ).

Here, as described above, the caption information may be obtained from a separate caption file, obtained by STT, or may be obtained from text data processed by machine translation from STT.

Then, the image reproducing apparatus 100 extracts audio track information from the image information being reproduced (S103).

Thereafter, the image reproducing apparatus 100 performs timeline index processing from one or more audio tracks (S105).

Here, the timeline index processing is devised to calculate the correction time. Referring to FIGS. 4 to 6 , the timeline index may be mapped with level information of the volume increase calculated for each time section from the audio track. That is, the timeline index processing unit 147 may map the volume increase amount information for each timeline to the timeline index information. When the volume increase is equal to or greater than a certain value, it can be seen that it can correspond to the conversation start point information in the video information, and accordingly, synchronization with the subtitle information can be performed based on the increase in the volume.

More specifically, as shown in FIG. 4 , the timeline index processing unit 147 performs the index verification based on the audio track information extracted from the audio track information, even if the dialogue start time identified from the video and the existing subtitle start time are different. Since the synchronization correction time between the video and the subtitles is calculated and the dialogue starting point time correction processing of subtitle information based on the correction time is performed, very accurate and fast subtitle synchronization correction is possible.

For example, as shown in FIG. 5 , when the subtitle file is obtained as subtitle information, the timeline index processing unit 147 compares the time information of the subtitle file with the timeline index to perform synchronization correction processing. can

And, as shown in FIG. 6 , the video reproducing apparatus 100 compares and synchronizes the STT caption information obtained from the audio track with the timeline index information to the caption information, thereby outputting caption information synchronized with the audio of the video information. be able to do This can compensate for problems that occur because the recognition rate of STT is not high based on timeline index information, and as a result, it can also bring about the effect of improving the performance of STT.

When the timeline index process is completed in this way, the video reproducing apparatus 100 outputs caption-synchronized video information (S107).

Here, referring to FIGS. 7 to 9 , the image reproducing apparatus 100 according to an embodiment of the present invention outputs an interface for image reproduction, but when the timeline index processing is not completed, a 'subtitle sync OFF' notification interface can be inserted and outputted.

Also, referring to FIG. 8 , when the timeline index process is completed, the video reproducing apparatus 100 according to an embodiment of the present invention outputs a 'subtitle sync ON' function interface indicating that synchronized subtitle information can be inserted. can

In this way, when the user simply selects the 'Subtitle Sync ON' function button, automatic subtitle sync correction can be provided. Accordingly, when subtitle sync correction is processed, correction application completion information as shown in the upper part of FIG. 8 . can be output. However, depending on the interface setting, only the correction application completion message is displayed, and detailed correction time information (+30) may be omitted.

Meanwhile, FIG. 9 is a diagram illustrating a setting interface of the setting unit 160 according to an embodiment of the present invention.

Referring to FIG. 9 , a setting interface according to an embodiment of the present invention is a reference element for video and subtitle synchronization, including audio level reference setting, audio level and subtitle (STT or subtitle file) reference synthesis setting, audio level and subtitle ( At least one of the machine translation) standard synthesis settings may be performed. In particular, FIG. 9 shows a case in which the user inputs audio level and machine-translated subtitle information as current settings.

In addition, through the setting interface, the setting unit 160 is an index acquisition target for generating a timeline index for each audio track of the user, and includes a timeline audio index, an STT text index, an STT machine translation index, and synthesis weight information. Information can be stored and managed.

More specifically, referring to FIGS. 10 and 11 , the timeline index may be mapped for each subtitle section. For example, the first section may be from 02:01 to 02:42, ..., the 131st section may be from 36:01 to 36:09. Accordingly, the audio conversation section of the video for each timeline may be mapped for each timeline index.

In particular, the voice dialogue section index of the audio track may be separated and indexed into each dialogue section according to the calculation of the timeline audio level, and the separation unit may be, for example, 1000 milliseconds.

And, for index mapping, the timeline index processing unit 147 may acquire level type information from timeline audio information obtained from an audio track. The level type information may include dialogue (D) type information, background sound (M) type information, dialogue and background sound (DM) type information, or machine learning audio model type information.

Accordingly, the timeline index processing unit 147 may allocate each level type according to the subtitle section to which the timeline index is assigned, and the subtitle sections to which the timeline index is assigned are classified according to various level types or configured separately. can be

Meanwhile, the correction time calculator 149 may calculate the difference time information calculated according to the comparison between the audio section index corresponding to the timeline index process and the caption section index of the subtitle file or STT subtitle information as the correction time.

Here, the synchronization reference time is referred to as T1 (eg: 00:07), and ATn is referred to as the conversation start time (eg, 02:31) of the audio level section of the timeline index first started after the synchronization reference time T1. , assuming that STn is the conversation start time after T1 based on the subtitle section index corresponding to the audio level section (eg 02:01), the correction time STa may be calculated as Atn - Stn, and the actual correction time is, for example, 02 :31 - 02:01 = +30 milliseconds.

Meanwhile, referring to FIG. 11 , the audio level-based timeline index may include information sampled for each critical time period from the timeline. Until the point in time when the next conversation start point does not appear within a specific timeline index, it can be the end point of that section. The setting unit 160 may set the comparison target subtitle time period to be subdivided or merged as necessary by differently setting the threshold time to be short, medium, long, or the like.

However, as shown in FIG. 11 , the timeline index processing unit 147 sets the threshold time having the smallest distribution error between the subtitle section obtained from the caption information and the timeline index section obtained from the audio level information as the basic section. It is preferable to set

Meanwhile, referring to FIG. 12 , the timeline index processing unit 147 synthesizes the timeline index calculated according to the audio level and the timeline index calculated from the STT according to a predetermined weight to generate the timeline index. can

To this end, the image reproducing apparatus 100 may receive a user synthesis weight setting input through the setting unit 160 , and the timeline index processing unit 147 based on the received synthesis weight information and the audio level obtained from the image information. By combining the timeline index and the STT-based timeline index, it is possible to more effectively calculate the correction time. This can be processed for each subtitle section, so that accurate image synchronization can be processed for each subtitle section.

Meanwhile, the timeline index processing unit 147 may generate a morpheme-based index using the original language and the translated language in generating the STT timeline index, and may be used to calculate the correction time.

For example, the timeline index processing unit 147 may generate an STT index for each timeline from the audio track information, and if the original subtitle is different from the device language information, machine translation or language synchronization suitable for the language of the target device may be processed. there is.

In addition, the timeline index processing unit 147 may perform morpheme analysis for each subtitle section and classify the subtitle section by part-of-speech (eg, classification of nouns/verbs/propositions, etc., basic type conversion).

Accordingly, the timeline index processing unit 147 generates an audio level index for each timeline based on the morpheme, but also indexes the subtitle information based on the morpheme section information and uses it to calculate and compare the correction time. Image synchronization for each morpheme section may be processed.

On the other hand, when the original subtitle file exists and STT-processed subtitle information also exists, both the original subtitle morpheme index and the STT subtitle morpheme index may exist, and the timeline index processing unit 147 performs the STT subtitle morpheme index and The error correction of the timeline may be performed by comparing the morpheme indexes of the original subtitles. That is, in this case, by generating a timeline index for comparison and sharing based on a morpheme regardless of an audio level, more accurate video synchronization for each morpheme section of the subtitle can be processed.

For example, the timeline index processing unit 147 may compare the original subtitle index and the morpheme of the STT subtitle index, and a corrected timeline index may be generated in a state in which the same morpheme is shared.

For example, the case in which the morpheme of the timeline index of the original subtitle and the STT subtitle is shared may be exemplified as follows.

- Original (02:01 James , hello) vs STT (02:31 James )

- Original (02:31 help , yesterday, bank ) vs STT (03:01 help , bank )

Accordingly, the correction time calculation unit 149 may calculate the error between the original subtitle index corresponding to the morpheme sharing index and the STT subtitle index as the correction time, which may take 30 seconds according to the above example.

Such correction processes of the timeline index processing unit 147 may be combined and set by the setting unit 160 .

The above-described correction processes may include, for example, at least one of an audio level-based timeline index generation process, an original subtitle timeline index generation process, and an STT subtitle timeline index generation process in the timeline index processing unit 147 .

In addition, the correction time calculation unit 149 uses at least one of an audio level-based timeline index, an original subtitle timeline index, and an STT subtitle timeline index, and uses comparison error correction, synthesis processing correction using weight values, and morpheme comparison. Alternatively, at least one of error correction by sharing morphemes may be performed.

The setting unit 160 may receive one or more setting information for this and store it in the storage unit 170 , thus enabling effective and optimized subtitle synchronization setting for each image reproducing apparatus 100 .

The method according to the present invention described above may be produced as a program to be executed by a computer and stored in a computer-readable recording medium. Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape. , floppy disks, and optical data storage devices.

The computer-readable recording medium is distributed in a network-connected computer system, so that the computer-readable code can be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the method can be easily inferred by programmers in the art to which the present invention pertains.

In addition, although preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the present invention belongs without departing from the gist of the present invention as claimed in the claims Various modifications may be made by those of ordinary skill in the art, and these modifications should not be individually understood from the technical spirit or perspective of the present invention.

Claims

In the method of operating a video reproducing apparatus,

acquiring image information to be played back;

obtaining subtitle information corresponding to the image information;

processing one or more timeline indices from one or more audio tracks extracted from the image information;

calculating a correction time by comparing the one or more timeline indexes with the subtitle information; and

outputting the image information in which the subtitle information can be synchronized according to the correction time

A method of operating a video reproducing device.
According to claim 1,

The step of processing the timeline index,

based on the volume information of the audio track, mapping the amount of change in audio level for each timeline to a timeline index

A method of operating a video reproducing device.
According to claim 1,

The subtitle information includes subtitle section file text information obtained from the subtitle file,

Calculating the correction time comprises:

Comparing the dialogue starting point information identified from the timeline index with the dialogue starting point information of the subtitle section file text information, calculating the correction time

A method of operating a video reproducing device.
According to claim 1,

The subtitle information includes STT text information of a subtitle section obtained according to STT (Speech To Text) conversion from the audio track information,

Calculating the correction time comprises:

Comparing the dialogue starting point information identified from the timeline index with the dialogue starting point information of the subtitle section STT text information, calculating the correction time

A method of operating a video reproducing device.
According to claim 1,

wherein the timeline index is allocated to each subtitle time period divided according to the setting of the subtitle period threshold value.

A method of operating a video reproducing device.
According to claim 1,

The timeline index is calculated according to a caption section synthesis operation using an audio level index for each time period calculated from the audio track and a caption section index obtained from STT (Speech to Text) conversion from the audio track information. doing

A method of operating a video reproducing device.
7. The method of claim 6,

The method further comprising the step of setting subtitle section threshold information and weight information for each index for the synthesis operation

A method of operating a video reproducing device.
According to claim 1,

The step of outputting the image information includes:

reproducing video information together with unsynchronized subtitle information;

outputting a notification interface informing that the subtitle information can be synchronized when the correction time is calculated; and

according to a user input corresponding to the notification interface, outputting the image information in which the subtitle information is synchronized according to the correction time

A method of operating a video reproducing device.
In the video reproducing apparatus,

an image information acquisition unit for acquiring image information to be reproduced;

a caption obtaining unit obtaining caption information corresponding to the image information;

a timeline index processing unit that processes one or more timeline indices from one or more audio tracks extracted from the image information;

a correction time calculator for calculating a correction time by comparing the one or more timeline indexes with the subtitle information; and

and an output unit for outputting the image information in which the subtitle information can be synchronized according to the correction time

video playback device.
10. The method of claim 9,

The timeline index processing unit;

Based on the volume information of the audio track, the audio level change amount for each timeline is mapped to a timeline index.

video playback device.
10. The method of claim 9,

The subtitle information includes subtitle section file text information obtained from the subtitle file,

The correction time calculation unit,

Comparing the dialogue starting point information identified from the timeline index with the dialogue starting point information of the subtitle section file text information to calculate the correction time for each subtitle section

video playback device.
10. The method of claim 9,

The subtitle information includes STT text information of a subtitle section obtained according to STT (Speech To Text) conversion from the audio track information,

Calculating the correction time comprises:

Comparing the dialogue starting point information identified from the timeline index with the dialogue starting point information of the subtitle section STT text information to calculate the correction time for each subtitle section

video playback device.
10. The method of claim 9,

wherein the timeline index is allocated to each subtitle time period divided according to the setting of the subtitle period threshold value.

video playback device.
10. The method of claim 9,

The timeline index is calculated according to a caption section synthesis operation using an audio level index for each time period calculated from the audio track and a caption section index obtained by STT (Speech to Text) conversion from the audio track information. doing

video playback device.
10. The method of claim 9,

The timeline index processing unit

Setting the subtitle section threshold information and weight information for each index for the synthesis operation

video playback device.
10. The method of claim 9,

The output unit reproduces the video information together with the unsynchronized caption information, and when the correction time is calculated, outputs a notification interface informing that the caption information can be synchronized, and according to a user input corresponding to the notification interface, the Outputting the image information in which the subtitle information is synchronized according to the correction time

video playback device.