WO2018121001A1 - Method and system for outputting simultaneous interpretation of digital television program, and smart terminal - Google Patents

Method and system for outputting simultaneous interpretation of digital television program, and smart terminal Download PDF

Info

Publication number
WO2018121001A1
WO2018121001A1 PCT/CN2017/106377 CN2017106377W WO2018121001A1 WO 2018121001 A1 WO2018121001 A1 WO 2018121001A1 CN 2017106377 W CN2017106377 W CN 2017106377W WO 2018121001 A1 WO2018121001 A1 WO 2018121001A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
audio
audio data
simultaneous interpretation
program
Prior art date
Application number
PCT/CN2017/106377
Other languages
French (fr)
Chinese (zh)
Inventor
何加军
Original Assignee
深圳市九洲电器有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市九洲电器有限公司 filed Critical 深圳市九洲电器有限公司
Publication of WO2018121001A1 publication Critical patent/WO2018121001A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/23614Multiplexing of additional data and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs

Definitions

  • the present application relates to the field of digital television, and in particular, to a method and system for simultaneous interpretation of digital television programs and an intelligent terminal.
  • the digital television program sounds played by the digital TV set-top box are the original sounds corresponding to the programs, so that the user can watch the television programs in an original manner.
  • the original sound of the program may be a foreign language, such as an English TV program.
  • TV programs that are spoken in a foreign language often provide bilingual subtitles for viewers who do not understand foreign languages. If the audience does not understand foreign languages, they can only rely on the Chinese subtitles at the bottom of the screen to understand the content of the TV programs.
  • the Chinese subtitles below often do not take into account the content of the TV program, which will greatly affect the viewer's viewing effect, so that the viewer can not watch the TV program very well, causing inconvenience to the audience.
  • S10 controlling the audio and video terminal to buffer the stored television program data stream
  • S20 parsing and separating the video data, the audio data, and the subtitle data from the buffered television program data stream, and marking the time stamp when separating, marking the synchronization label for the three;
  • S40 Send the segmented original PCM data to the cloud server, perform timbre learning through a preset timbre database, and match and identify the timbre of the audio data;
  • S50 Perform original text translation of the PCM data in the language required by the user in the cloud server, compare the translation result with the subtitle data, and use the subtitle data to synchronously correct the content and time of the translation result;
  • S60 Convert the corrected translation result into speech data of the same timbre according to the recognized timbre, and synthesize the speech data in synchronization with the video data and the subtitle data according to the time stamp, and synthesize a new program data stream for playing.
  • step S20 further includes:
  • the ambient sounds other than the human voice are filtered.
  • the step S30 further includes: parsing punctuation marks in the subtitle data, acquiring a time position at each period, and segmenting the audio data according to the time position at the period.
  • the step S60 further includes: comparing the amplitude of the converted voice data with the amplitude of the original audio data, so that the amplitude of the converted voice data is consistent with the amplitude of the original audio data.
  • a television program buffer module for controlling an audio and video terminal to buffer a television program data stream
  • the data separation module separately parses and separates the video data, the audio data and the subtitle data from the buffered television program data stream, and marks the time stamp when separating, and marks the synchronization label for the three;
  • the audio segmentation module segments the audio data, and decodes the segmented audio data to generate segmented original PCM data;
  • the tone matching module sends the segmented original PCM data to the cloud server to learn the tone color through the preset tone color database, and matches the tone color of the recognized audio data;
  • the audio translation module converts the original PCM data into the language of the user's desired language in the cloud server, compares the translation result with the subtitle data, and uses the subtitle data to synchronously correct the content and time of the translation result;
  • the audio synthesis module converts the corrected translation result into voice data of the same timbre according to the recognized timbre, and synchronizes the voice data with the video data and the subtitle data according to the time stamp, and synthesizes a new program data stream for playing.
  • the data separation module filters ambient sounds other than vocals after the audio data is acquired.
  • the audio segmentation module parses the punctuation marks in the subtitle data, acquires the temporal position at each period, and segments the audio data according to the temporal position at the period.
  • the audio synthesis module compares the amplitude of the converted voice data with the amplitude of the original audio data, so that the amplitude of the converted voice data is consistent with the amplitude of the original audio data.
  • At least one processor and,
  • the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform simultaneous interpretation of digital television programs as described above method.
  • the embodiment of the present application further provides a non-transitory computer readable storage medium storing computer executable instructions for causing a smart terminal to execute as described above
  • the simultaneous interpretation method of digital TV programs
  • the embodiment of the present application further provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instruction is by a smart terminal When executed, the smart terminal is caused to perform the simultaneous interpretation output method of the digital television program as described above.
  • the method, system and intelligent terminal for simultaneous interpretation of digital television programs provided by the embodiments of the present application separate the buffered television program data streams into video, audio and subtitles, and then segment, timbre and translate the audio data. Processing and other processing, and using the subtitle data and time stamp for correction and synchronization processing, complete the simultaneous interpretation of the original audio data into the audio data of the user's desired language, and then broadcast The user is allowed to understand the audio of the TV program without watching the subtitles, which brings great convenience to the user to watch the TV program, and the user does not miss the screen content of the TV program, thereby greatly improving the user's viewing experience. .
  • FIG. 1 is a flow chart of a method for simultaneous interpretation output of a digital television program in an embodiment
  • FIG. 2 is a structural diagram of a simultaneous interpretation output system of a digital television program in an embodiment
  • FIG. 3 is a schematic diagram of a hardware structure of an intelligent terminal in an embodiment.
  • FIG. 1 is a flow chart of a method for simultaneous interpretation output of a digital television program in an embodiment. As shown in FIG. 1, the method includes the following steps:
  • S10 Control the audio and video terminal to buffer the stored TV program data stream.
  • S20 separately parsing and separating the video data, the audio data, and the caption data from the buffered TV program data stream, and marking the time stamp when separating, and marking the synchronization tag for the three.
  • the video data, the audio data, and the subtitle data are separated for subsequent audio conversion.
  • the time stamps are marked when the three are separated, and the synchronization labels are marked for the three, thus ensuring subsequent synchronization operations.
  • the step further includes: after acquiring the audio data, the voice is removed. The external ambient sound is filtered.
  • S30 Segment the audio data, and decode the segmented audio data to generate segmented original PCM (an encoding format, also called pulse code modulation) data.
  • PCM an encoding format, also called pulse code modulation
  • the audio data needs to be segmented, and the translation processing is also facilitated after segmentation.
  • the audio data is decoded into original PCM data after segmentation so that it can be identified and processed.
  • segmenting the audio data in the step is specifically: parsing the punctuation marks in the subtitle data, obtaining the time position at each period, and segmenting the audio data according to the time position at the period, so that according to the statement The integrity and consistency of the audio data is segmented.
  • S40 Send the segmented original PCM data to the cloud server for timbre learning through a preset timbre database, and match the timbre that identifies the audio data.
  • the tone of the audio is also an important parameter in the audio translation.
  • the accurate translation of the tone can greatly ensure the effect of simultaneous interpretation. Therefore, in this embodiment, after the audio data is converted into PCM data, the audio is transmitted.
  • the preset tone database is built by inputting sounds of different ages and genders.
  • the original PCM data is translated in the language required by the user in the cloud server, and the translation result is compared with the subtitle data, and the subtitle data is used to synchronously correct the content and time of the translation result.
  • the original PCM data is uttered in a foreign language, it needs to be translated and translated into the language required by the user.
  • the original PCM data is translated into the text statement of the language required by the user in the cloud server.
  • the translation result is compared with the subtitle data, and the subtitle data is used to The translation results are corrected for content and synchronized in time to eliminate errors in the content and time synchronization of the translation results.
  • S60 Convert the corrected translation result into speech data of the same timbre according to the recognized timbre, and synthesize the speech data in synchronization with the video data and the subtitle data according to the time stamp, and synthesize a new program data stream for playing.
  • the speech result of the text translation is corrected and corrected, since the timbre data of the audio data has been obtained before, the speech result is combined with the recognized timbre, and the corrected translation result is converted into the speech data of the same timbre.
  • the translated new audio data is finally synchronized with the video data and the subtitle data according to the time stamp, and the translated program data stream is played for playing, so that the simultaneous interpretation of the television program can be completed, so that the user can understand the television program. Audio to meet user needs.
  • the step further comprises: comparing the amplitude of the converted voice data with the amplitude of the original audio data, so that the amplitude of the converted voice data is consistent with the amplitude of the original audio data.
  • the digital television program simultaneous voice output method separates the buffered television program data stream into video, audio and subtitle, and then performs segmentation, timbre recognition and translation processing on the audio data, and uses the subtitle data and time.
  • the stamp is corrected and synchronized, and the original audio data is simultaneously translated into the audio data of the user's desired language, and then played to the user, so that the user can understand the audio of the television program without watching the subtitle, and the user can watch the television program.
  • the user will not miss the screen content of the TV program, which greatly improves the user's viewing experience.
  • the application also provides a simultaneous interpretation output system for digital television programs, as shown in FIG. 2, the system includes:
  • the TV program cache module 100 controls the audio and video terminal to buffer the stored TV program data stream.
  • the TV program cache module 100 first needs to cache the TV program data stream.
  • the television program data stream is processed by the cache time.
  • the data separation module 200 separately parses and separates the video data, the audio data, and the subtitle data from the buffered television program data stream, and marks the time stamp when separated, and marks the synchronization label for the three.
  • the data separation module 200 separates the video data, the audio data, and the subtitle data for subsequent audio conversion.
  • the data separation module 200 marks the time stamp when the three are separated, and marks the synchronization label for the three, thus ensuring subsequent synchronization operations.
  • the audio data also includes a large number of environmental sounds, and the environmental sound will be human.
  • the sound causes interference, so further, the data separation module 200 filters the ambient sounds other than the human voice after acquiring the audio data.
  • the audio segmentation module 300 segments the audio data and decodes the segmented audio data to generate segmented original PCM (an encoding format, also referred to as pulse code modulation) data.
  • PCM an encoding format, also referred to as pulse code modulation
  • the audio segmentation module 300 needs to segment the audio data, and facilitates the translation process after segmentation.
  • the audio data is decoded into original PCM data after segmentation so that it can be identified and processed.
  • the audio segmentation module 300 parses the punctuation marks in the subtitle data, obtains the time position at each period, and segments the audio data according to the time position at the period, so that the integrity and consistency of the statement are good.
  • the audio data is segmented.
  • the timbre matching module 400 sends the segmented original PCM data to the cloud server for timbre learning through a preset timbre database, and matches the timbre of the recognized audio data.
  • the tone of the audio is also an important parameter in the audio translation.
  • the accurate translation of the tone can greatly ensure the effect of simultaneous interpretation. Therefore, in this embodiment, after the audio data is converted into PCM data, the audio is transmitted.
  • the timbre matching module 400 uses the timbre database set in advance to match the timbre in the PCM data, the most likely true restoration.
  • the preset tone database is built by inputting sounds of different ages and genders.
  • the audio translation module 500 performs the text translation of the original PCM data in the language required by the user in the cloud server, compares the translation result with the subtitle data, and uses the subtitle data to synchronously correct the content and time of the translation result.
  • the audio translation module 500 first translates the original PCM data into a text statement of the language required by the user in the cloud server. After the translation of the text sentence is completed, the translation result may be compared with the subtitle data, and the translation result is compared. The subtitle data is used to correct the content of the translation, and the time synchronization is performed to eliminate the error of the translation result in content and time synchronization.
  • the audio synthesis module 600 converts the corrected translation result into the same sound according to the recognized timbre Color voice data, and the voice data is synchronized with the video data and the caption data according to the time stamp, and a new program data stream is synthesized for playing.
  • the audio synthesis module 600 After the translation result of the text translation is corrected and corrected, since the timbre data of the audio data has been obtained before, the audio synthesis module 600 combines the recognized timbre to synthesize the translation result, and convert the corrected translation result into the same timbre. The voice data is obtained, and the translated new audio data is obtained. Finally, the time stamp is synthesized synchronously with the video data and the subtitle data, and the translated program data stream is played for playing, so that the simultaneous interpretation of the television program can be completed, so that the user can listen. Understand the audio of TV programs to meet user needs.
  • the audio synthesis module 600 compares the amplitude of the converted speech data with the amplitude of the original audio data, so that the amplitude of the converted speech data is consistent with the amplitude of the original audio data.
  • the digital television program simultaneous voice output system separates the buffered television program data stream into video, audio and subtitles, and then performs segmentation, tone recognition and translation processing on the audio data, and uses the caption data and time.
  • the stamp is corrected and synchronized, and the original audio data is simultaneously translated into the audio data of the user's desired language, and then played to the user, so that the user can understand the audio of the television program without watching the subtitle, and the user can watch the television program.
  • the user will not miss the screen content of the TV program, which greatly improves the user's viewing experience.
  • the method and system for simultaneous interpretation of digital television program of the present application separates the buffered television program data stream into three parts: video, audio and subtitle, and then processes the audio data by segmentation, timbre recognition and translation processing, and uses subtitles.
  • the data and time stamp are corrected and synchronized, and the original audio data is simultaneously translated into the audio data of the user's desired language, and then played to the user, so that the user can understand the audio of the television program without watching the subtitle, and view the audio for the user.
  • the TV program brings great convenience, and the user does not miss the screen content of the TV program, thereby greatly improving the user's viewing experience.
  • FIG. 3 is a schematic diagram of a hardware structure of an intelligent terminal according to an embodiment of the present application.
  • the smart terminal 700 can perform the simultaneous interpretation output method of the digital television program described in any of the foregoing method embodiments.
  • the smart terminal 700 can include, but is not limited to, a set top box, a television, a mobile phone, a tablet, and the like.
  • the smart terminal 700 includes:
  • processors 701 and a memory 702 are exemplified by a processor 701 in FIG.
  • the processor 701 and the memory 702 may be connected by a bus or other means, as exemplified by a bus connection in FIG.
  • the memory 702 is used as a non-transitory computer readable storage medium, and can be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, and corresponding to the digital television program simultaneous translation output method in the embodiment of the present application.
  • Program instructions/modules for example, the television program cache module 100, the data separation module 200, the audio segmentation module 300, the tone color matching module 400, the audio translation module 500, and the audio synthesis module 600 shown in FIG. 2).
  • the processor 701 performs various functional applications and data processing of the digital television program simultaneous translation output system by running non-transitory software programs, instructions, and modules stored in the memory 702, ie, the numbers implementing any of the above method embodiments Simultaneous translation output method for TV programs.
  • the memory 702 can include a storage program area and a storage data area, wherein the storage program area can store an operating system, an application required for at least one function; the storage data area can be stored according to the use of the digital television program simultaneous translation output system. Data, etc.
  • memory 702 can include high speed random access memory, and can also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device.
  • memory 702 can optionally include memory remotely located relative to processor 701 that can be connected to smart terminal 700 over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the one or more modules are stored in the memory 702, and when executed by the one or more processors 701, perform a digital television program simultaneous translation output method in any of the above method embodiments, for example, performing the above
  • the described method steps S10 to S60 in Fig. 1 implement the functions of the modules 100-600 in Fig. 2.
  • the embodiment of the present application further provides a non-transitory computer readable storage medium storing computer executable instructions executed by one or more processors, for example, Executed by a processor 701 in FIG. 3, the one or more processors may perform the digital television program simultaneous translation output method in any of the foregoing method embodiments, for example,
  • the functions of the modules 100-600 of Figure 2 are implemented in the method steps S10 to S60 of Figure 1 described above.
  • the embodiment of the present application further provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when When the program instruction is executed by the smart terminal, the smart terminal is configured to perform the digital TV program simultaneous translation output method in any of the foregoing method embodiments, for example, performing the method steps S10 to S60 in FIG. 1 described above to implement the method in FIG.
  • the functionality of the module 100-600 is a computer program product, the computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when When the program instruction is executed by the smart terminal, the smart terminal is configured to perform the digital TV program simultaneous translation output method in any of the foregoing method embodiments, for example, performing the method steps S10 to S60 in FIG. 1 described above to implement the method in FIG.
  • the functionality of the module 100-600 The functionality of the module 100-600.
  • the various embodiments can be implemented by means of software plus a general hardware platform, and of course, by hardware.
  • a person skilled in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a non-transitory computer readable storage medium.
  • the program when executed, may include the flow of an embodiment of the methods as described above.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Abstract

The present application relates to a method and system for outputting simultaneous interpretation of a digital television program, and a smart terminal. The method comprises: separating the video, audio, and subtitle of a cached television program data stream, performing segmentation, timbre recognition, translation, and other processing on audio data, performing correction and synchronization by using subtitle data and timestamps to simultaneously interpret the original audio data into audio data in a language that is required by a user, and playing the audio for the user, so that the user is capable of understanding the audio of a television program without watching subtitles, thereby bringing great convenience to users when watching television programs, preventing the users from missing frame contents of the television programs, and greatly improving watching experience of the users.

Description

数字电视节目同声翻译输出方法、系统及智能终端Simultaneous translation output method, system and intelligent terminal for digital television program 技术领域Technical field
本申请涉及数字电视领域,尤其涉及一种数字电视节目同声翻译输出方法、系统及智能终端。The present application relates to the field of digital television, and in particular, to a method and system for simultaneous interpretation of digital television programs and an intelligent terminal.
背景技术Background technique
目前,数字电视机顶盒(或电视机)播放的数字电视节目声音,都是节目中对应的原始声音,使得用户能够原汁原味的观看电视节目。At present, the digital television program sounds played by the digital TV set-top box (or television) are the original sounds corresponding to the programs, so that the user can watch the television programs in an original manner.
但是,节目的原始声音有可能是外语发声,例如英语电视节目。外语发声的电视节目为使听不懂外语的观众能够正常观看,往往会提供双语字幕,观众如果听不懂外语,就只能依赖于看屏幕下方的中文字幕才能看懂电视节目内容,而观看下方的中文字幕,往往会顾及不到电视节目中的内容画面,这将会很大程度的影响观众的观看效果,使得观众不能很好的观看电视节目,给观众带来不便。However, the original sound of the program may be a foreign language, such as an English TV program. TV programs that are spoken in a foreign language often provide bilingual subtitles for viewers who do not understand foreign languages. If the audience does not understand foreign languages, they can only rely on the Chinese subtitles at the bottom of the screen to understand the content of the TV programs. The Chinese subtitles below often do not take into account the content of the TV program, which will greatly affect the viewer's viewing effect, so that the viewer can not watch the TV program very well, causing inconvenience to the audience.
申请内容Application content
有鉴于此,有必要针对上述外语发声电视节目,观众观看中文字幕影响观众观看电视节目,带来不便的问题,提供一种数字电视节目同声翻译输出方法、系统及智能终端。In view of this, it is necessary to provide a method, system and intelligent terminal for simultaneous interpretation of digital television programs in view of the above-mentioned foreign language sounding television programs, viewers watching Chinese subtitles affecting viewers watching television programs, causing inconvenience.
本申请实施例提供的一种数字电视节目同声翻译输出方法,包括如下步骤:A method for simultaneous interpretation output of a digital television program provided by an embodiment of the present application includes the following steps:
S10:控制音视频终端缓冲存储电视节目数据流;S10: controlling the audio and video terminal to buffer the stored television program data stream;
S20:由缓冲存储的电视节目数据流中分别解析分离出视频数据、音频数据以及字幕数据,并在分离时标记时间戳,为三者标记上同步标签;S20: parsing and separating the video data, the audio data, and the subtitle data from the buffered television program data stream, and marking the time stamp when separating, marking the synchronization label for the three;
S30:对音频数据进行分段,并将分段后的音频数据进行解码处理,生成分段的原始PCM数据; S30: segmenting the audio data, and decoding the segmented audio data to generate segmented original PCM data;
S40:将分段的原始PCM数据发送到云端服务器通过预设的音色数据库进行音色学习,匹配识别出音频数据的音色;S40: Send the segmented original PCM data to the cloud server, perform timbre learning through a preset timbre database, and match and identify the timbre of the audio data;
S50:将原始的PCM数据在云端服务器进行用户所需语言的文字翻译,并将翻译结果与字幕数据进行比对,采用字幕数据对翻译结果进行内容和时间的同步修正;S50: Perform original text translation of the PCM data in the language required by the user in the cloud server, compare the translation result with the subtitle data, and use the subtitle data to synchronously correct the content and time of the translation result;
S60:根据识别出的音色,将修正后的翻译结果转换成相同音色的语音数据,并将语音数据按照时间戳与视频数据、字幕数据进行同步合成,合成新的节目数据流进行播放。S60: Convert the corrected translation result into speech data of the same timbre according to the recognized timbre, and synthesize the speech data in synchronization with the video data and the subtitle data according to the time stamp, and synthesize a new program data stream for playing.
在其中的一个实施方式中,所述步骤S20还包括:In one embodiment, the step S20 further includes:
在获取到音频数据后,对除人声之外的环境声音进行过滤。After the audio data is acquired, the ambient sounds other than the human voice are filtered.
在其中的一个实施方式中,所述步骤S30还包括:解析字幕数据中的标点符号,获取每一个句号处的时间位置,按照句号处的时间位置对音频数据进行分段。In one embodiment, the step S30 further includes: parsing punctuation marks in the subtitle data, acquiring a time position at each period, and segmenting the audio data according to the time position at the period.
在其中的一个实施方式中,所述步骤S60还包括:将转换后的语音数据的振幅与原音频数据的振幅进行比对调整,使转换后语音数据的振幅与原音频数据的振幅保持一致。In one embodiment, the step S60 further includes: comparing the amplitude of the converted voice data with the amplitude of the original audio data, so that the amplitude of the converted voice data is consistent with the amplitude of the original audio data.
本申请实施例提供的一种数字电视节目同声翻译输出系统,包括:A simultaneous interpretation output system for digital television programs provided by the embodiments of the present application includes:
电视节目缓存模块,控制音视频终端缓冲存储电视节目数据流;a television program buffer module for controlling an audio and video terminal to buffer a television program data stream;
数据分离模块,由缓冲存储的电视节目数据流中分别解析分离出视频数据、音频数据以及字幕数据,并在分离时标记时间戳,为三者标记上同步标签;The data separation module separately parses and separates the video data, the audio data and the subtitle data from the buffered television program data stream, and marks the time stamp when separating, and marks the synchronization label for the three;
音频分段模块,对音频数据进行分段,并将分段后的音频数据进行解码处理,生成分段的原始PCM数据;The audio segmentation module segments the audio data, and decodes the segmented audio data to generate segmented original PCM data;
音色匹配模块,将分段的原始PCM数据发送到云端服务器通过预设的音色数据库进行音色学习,匹配识别出音频数据的音色;The tone matching module sends the segmented original PCM data to the cloud server to learn the tone color through the preset tone color database, and matches the tone color of the recognized audio data;
音频翻译模块,将原始的PCM数据在云端服务器进行用户所需语言的文字翻译,并将翻译结果与字幕数据进行比对,采用字幕数据对翻译结果进行内容和时间的同步修正; The audio translation module converts the original PCM data into the language of the user's desired language in the cloud server, compares the translation result with the subtitle data, and uses the subtitle data to synchronously correct the content and time of the translation result;
音频合成模块,根据识别出的音色,将修正后的翻译结果转换成相同音色的语音数据,并将语音数据按照时间戳与视频数据、字幕数据进行同步合成,合成新的节目数据流进行播放。The audio synthesis module converts the corrected translation result into voice data of the same timbre according to the recognized timbre, and synchronizes the voice data with the video data and the subtitle data according to the time stamp, and synthesizes a new program data stream for playing.
在其中的一个实施方式中,所述数据分离模块在获取到音频数据后,对除人声之外的环境声音进行过滤。In one embodiment, the data separation module filters ambient sounds other than vocals after the audio data is acquired.
在其中的一个实施方式中,所述音频分段模块解析字幕数据中的标点符号,获取每一个句号处的时间位置,按照句号处的时间位置对音频数据进行分段。In one of the embodiments, the audio segmentation module parses the punctuation marks in the subtitle data, acquires the temporal position at each period, and segments the audio data according to the temporal position at the period.
在其中的一个实施方式中,所述音频合成模块将转换后的语音数据的振幅与原音频数据的振幅进行比对调整,使转换后语音数据的振幅与原音频数据的振幅保持一致。In one embodiment, the audio synthesis module compares the amplitude of the converted voice data with the amplitude of the original audio data, so that the amplitude of the converted voice data is consistent with the amplitude of the original audio data.
本申请实施例提供的一种智能终端,包括:An intelligent terminal provided by the embodiment of the present application includes:
至少一个处理器;以及,At least one processor; and,
与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如上所述的数字电视节目同声翻译输出方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform simultaneous interpretation of digital television programs as described above method.
本申请实施例还提供了一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使智能终端执行如上所述的数字电视节目同声翻译输出方法。The embodiment of the present application further provides a non-transitory computer readable storage medium storing computer executable instructions for causing a smart terminal to execute as described above The simultaneous interpretation method of digital TV programs.
本申请实施例还提供了一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被智能终端执行时,使所述智能终端执行如上所述的数字电视节目同声翻译输出方法。The embodiment of the present application further provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instruction is by a smart terminal When executed, the smart terminal is caused to perform the simultaneous interpretation output method of the digital television program as described above.
本申请实施例提供的数字电视节目同声翻译输出方法、系统及智能终端,将缓存的电视节目数据流进行视频、音频和字幕三者的分离,然后对音频数据进行分段、音色识别和翻译处理等处理,并利用字幕数据和时间戳进行修正和同步处理,完成将原始音频数据同声翻译成用户所需语言的音频数据,进而播 放给用户,使得用户能够无需观看字幕就能够听懂电视节目的音频,给用户观看电视节目带来了极大的便利,用户不会因此错过电视节目的画面内容,大大提高了用户的观看体验。The method, system and intelligent terminal for simultaneous interpretation of digital television programs provided by the embodiments of the present application separate the buffered television program data streams into video, audio and subtitles, and then segment, timbre and translate the audio data. Processing and other processing, and using the subtitle data and time stamp for correction and synchronization processing, complete the simultaneous interpretation of the original audio data into the audio data of the user's desired language, and then broadcast The user is allowed to understand the audio of the TV program without watching the subtitles, which brings great convenience to the user to watch the TV program, and the user does not miss the screen content of the TV program, thereby greatly improving the user's viewing experience. .
附图说明DRAWINGS
图1是一个实施例中的数字电视节目同声翻译输出方法的流程图;1 is a flow chart of a method for simultaneous interpretation output of a digital television program in an embodiment;
图2是一个实施例中的数字电视节目同声翻译输出系统的结构图;2 is a structural diagram of a simultaneous interpretation output system of a digital television program in an embodiment;
图3是一个实施例中的智能终端的硬件结构示意图。FIG. 3 is a schematic diagram of a hardware structure of an intelligent terminal in an embodiment.
具体实施方式detailed description
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.
图1是一个实施例中的数字电视节目同声翻译输出方法的流程图,如图1所示,该方法包括如下步骤:1 is a flow chart of a method for simultaneous interpretation output of a digital television program in an embodiment. As shown in FIG. 1, the method includes the following steps:
S10:控制音视频终端缓冲存储电视节目数据流。S10: Control the audio and video terminal to buffer the stored TV program data stream.
由于电视节目很多是实时节目,电视节目数据流很多是实时流,故为使得能够对电视节目进行同声翻译,该实施例中,首先需要对电视节目数据流进行缓存播放,通过缓存时间对电视节目数据流进行处理。Since many TV programs are real-time programs, many of the TV program data streams are real-time streams. Therefore, in order to enable simultaneous interpretation of the TV programs, in this embodiment, it is first necessary to cache the TV program data streams, and buffer the time to the TV. The program data stream is processed.
S20:由缓冲存储的电视节目数据流中分别解析分离出视频数据、音频数据以及字幕数据,并在分离时标记时间戳,为三者标记上同步标签。S20: separately parsing and separating the video data, the audio data, and the caption data from the buffered TV program data stream, and marking the time stamp when separating, and marking the synchronization tag for the three.
在缓存存储电视节目数据流之后,将视频数据、音频数据以及字幕数据三者分离,以便后续进行音频转换。该实施例中,为保证分离后重组能够同步,在三者分离时标记时间戳,并为三者标记上同步标签,这样保证后续的同步性操作。After the cached television program data stream is stored, the video data, the audio data, and the subtitle data are separated for subsequent audio conversion. In this embodiment, in order to ensure that the reorganization can be synchronized after separation, the time stamps are marked when the three are separated, and the synchronization labels are marked for the three, thus ensuring subsequent synchronization operations.
由于音频数据除包含人声之外,还包括大量的环境声音,环境声音会对人声造成干扰,故进一步的,该步骤还包括:在获取到音频数据后,对除人声之 外的环境声音进行过滤。Since the audio data includes a large number of environmental sounds in addition to the human voice, the ambient sound may interfere with the human voice. Further, the step further includes: after acquiring the audio data, the voice is removed. The external ambient sound is filtered.
S30:对音频数据进行分段,并将分段后的音频数据进行解码处理,生成分段的原始PCM(一种编码格式,也称为脉冲编码调制)数据。S30: Segment the audio data, and decode the segmented audio data to generate segmented original PCM (an encoding format, also called pulse code modulation) data.
为保证音频数据语句的完整性和合理性,需要对音频数据进行分段,分段后也便利翻译处理。分段后将音频数据解码成原始PCM数据,以便能够识别和处理。In order to ensure the integrity and rationality of the audio data statement, the audio data needs to be segmented, and the translation processing is also facilitated after segmentation. The audio data is decoded into original PCM data after segmentation so that it can be identified and processed.
进一步的,该步骤中对音频数据进行分段具体为:解析字幕数据中的标点符号,获取每一个句号处的时间位置,按照句号处的时间位置对音频数据进行分段,这样就按照语句的完整性和连贯性很好的对音频数据进行了分段。Further, segmenting the audio data in the step is specifically: parsing the punctuation marks in the subtitle data, obtaining the time position at each period, and segmenting the audio data according to the time position at the period, so that according to the statement The integrity and consistency of the audio data is segmented.
S40:将分段的原始PCM数据发送到云端服务器通过预设的音色数据库进行音色学习,匹配识别出音频数据的音色。S40: Send the segmented original PCM data to the cloud server for timbre learning through a preset timbre database, and match the timbre that identifies the audio data.
由于音频翻译时,除音频内容外,音频的音色也是重要的参数,音色的准确翻译能够极大的保证同声翻译的效果,故该实施例中,在将音频数据转换为PCM数据后,发送到前端进行音色学习处理,利用预先设置的音色数据库来匹配PCM数据中的音色,最大可能的真实还原。预设的音色数据库通过输入不同年龄和性别的声音来构建。Due to the audio content, the tone of the audio is also an important parameter in the audio translation. The accurate translation of the tone can greatly ensure the effect of simultaneous interpretation. Therefore, in this embodiment, after the audio data is converted into PCM data, the audio is transmitted. To the front end for tone learning processing, using the preset tone database to match the tone in the PCM data, the most possible true restoration. The preset tone database is built by inputting sounds of different ages and genders.
S50:将原始的PCM数据在云端服务器进行用户所需语言的文字翻译,并将翻译结果与字幕数据进行比对,采用字幕数据对翻译结果进行内容和时间的同步修正。S50: The original PCM data is translated in the language required by the user in the cloud server, and the translation result is compared with the subtitle data, and the subtitle data is used to synchronously correct the content and time of the translation result.
在音色学习完毕后,由于原始的PCM数据为外语发声,故需要进行翻译,翻译成用户所需要的语言发声。首先将原始的PCM数据在云端服务器翻译成用户所需语言的文字语句,文字语句翻译完毕后,由于翻译可能存在较大的误差,故将翻译结果与字幕数据进行比对,利用字幕数据来对翻译结果进行内容修正,并且进行时间上的同步,消除翻译结果在内容和时间同步上的误差。After the timbre learning is completed, since the original PCM data is uttered in a foreign language, it needs to be translated and translated into the language required by the user. Firstly, the original PCM data is translated into the text statement of the language required by the user in the cloud server. After the translation of the text statement, there may be a large error in the translation, so the translation result is compared with the subtitle data, and the subtitle data is used to The translation results are corrected for content and synchronized in time to eliminate errors in the content and time synchronization of the translation results.
S60:根据识别出的音色,将修正后的翻译结果转换成相同音色的语音数据,并将语音数据按照时间戳与视频数据、字幕数据进行同步合成,合成新的节目数据流进行播放。 S60: Convert the corrected translation result into speech data of the same timbre according to the recognized timbre, and synthesize the speech data in synchronization with the video data and the subtitle data according to the time stamp, and synthesize a new program data stream for playing.
在文字翻译得到翻译结果并修正后,由于之前已经得到音频数据的音色数据,则结合识别出的音色,来对翻译结果进行语音合成,将修正后的翻译结果转换成相同音色的语音数据,得到翻译后的新音频数据,最后按照时间戳与视频数据、字幕数据进行同步合成,得到翻译后的节目数据流进行播放,即可完成对电视节目的同声翻译,使得用户能够听懂电视节目的音频,满足用户需求。After the translation result of the text translation is corrected and corrected, since the timbre data of the audio data has been obtained before, the speech result is combined with the recognized timbre, and the corrected translation result is converted into the speech data of the same timbre. The translated new audio data is finally synchronized with the video data and the subtitle data according to the time stamp, and the translated program data stream is played for playing, so that the simultaneous interpretation of the television program can be completed, so that the user can understand the television program. Audio to meet user needs.
此外,为进一步提高同声翻译的效果,该步骤还包括:将转换后的语音数据的振幅与原音频数据的振幅进行比对调整,使转换后语音数据的振幅与原音频数据的振幅保持一致。In addition, in order to further improve the effect of simultaneous interpretation, the step further comprises: comparing the amplitude of the converted voice data with the amplitude of the original audio data, so that the amplitude of the converted voice data is consistent with the amplitude of the original audio data. .
该数字电视节目同声翻译输出方法,将缓存的电视节目数据流进行视频、音频和字幕三者的分离,然后对音频数据进行分段、音色识别和翻译处理等处理,并利用字幕数据和时间戳进行修正和同步处理,完成将原始音频数据同声翻译成用户所需语言的音频数据,进而播放给用户,使得用户能够无需观看字幕就能够听懂电视节目的音频,给用户观看电视节目带来了极大的便利,用户不会因此错过电视节目的画面内容,大大提高了用户的观看体验。The digital television program simultaneous voice output method separates the buffered television program data stream into video, audio and subtitle, and then performs segmentation, timbre recognition and translation processing on the audio data, and uses the subtitle data and time. The stamp is corrected and synchronized, and the original audio data is simultaneously translated into the audio data of the user's desired language, and then played to the user, so that the user can understand the audio of the television program without watching the subtitle, and the user can watch the television program. With great convenience, the user will not miss the screen content of the TV program, which greatly improves the user's viewing experience.
同时,本申请还提供一种数字电视节目同声翻译输出系统,如图2所示,该系统包括:Meanwhile, the application also provides a simultaneous interpretation output system for digital television programs, as shown in FIG. 2, the system includes:
电视节目缓存模块100,控制音视频终端缓冲存储电视节目数据流。The TV program cache module 100 controls the audio and video terminal to buffer the stored TV program data stream.
由于电视节目很多是实时节目,电视节目数据流很多是实时流,故为使得能够对电视节目进行同声翻译,该实施例中,电视节目缓存模块100首先需要对电视节目数据流进行缓存播放,通过缓存时间对电视节目数据流进行处理。Since many TV programs are real-time programs, the TV program data stream is a real-time stream. Therefore, in order to enable simultaneous interpretation of the TV program, in this embodiment, the TV program cache module 100 first needs to cache the TV program data stream. The television program data stream is processed by the cache time.
数据分离模块200,由缓冲存储的电视节目数据流中分别解析分离出视频数据、音频数据以及字幕数据,并在分离时标记时间戳,为三者标记上同步标签。The data separation module 200 separately parses and separates the video data, the audio data, and the subtitle data from the buffered television program data stream, and marks the time stamp when separated, and marks the synchronization label for the three.
在缓存存储电视节目数据流之后,数据分离模块200将视频数据、音频数据以及字幕数据三者分离,以便后续进行音频转换。该实施例中,为保证分离后重组能够同步,数据分离模块200在三者分离时标记时间戳,并为三者标记上同步标签,这样保证后续的同步性操作。After the cached television program data stream is stored, the data separation module 200 separates the video data, the audio data, and the subtitle data for subsequent audio conversion. In this embodiment, in order to ensure that the reorganization can be synchronized after the separation, the data separation module 200 marks the time stamp when the three are separated, and marks the synchronization label for the three, thus ensuring subsequent synchronization operations.
由于音频数据除包含人声之外,还包括大量的环境声音,环境声音会对人 声造成干扰,故进一步的,数据分离模块200在获取到音频数据后,对除人声之外的环境声音进行过滤。In addition to the human voice, the audio data also includes a large number of environmental sounds, and the environmental sound will be human. The sound causes interference, so further, the data separation module 200 filters the ambient sounds other than the human voice after acquiring the audio data.
音频分段模块300,对音频数据进行分段,并将分段后的音频数据进行解码处理,生成分段的原始PCM(一种编码格式,也称为脉冲编码调制)数据。The audio segmentation module 300 segments the audio data and decodes the segmented audio data to generate segmented original PCM (an encoding format, also referred to as pulse code modulation) data.
为保证音频数据语句的完整性和合理性,音频分段模块300需要对音频数据进行分段,分段后也便利翻译处理。分段后将音频数据解码成原始PCM数据,以便能够识别和处理。In order to ensure the integrity and rationality of the audio data statement, the audio segmentation module 300 needs to segment the audio data, and facilitates the translation process after segmentation. The audio data is decoded into original PCM data after segmentation so that it can be identified and processed.
进一步的,音频分段模块300解析字幕数据中的标点符号,获取每一个句号处的时间位置,按照句号处的时间位置对音频数据进行分段,这样就按照语句的完整性和连贯性很好的对音频数据进行了分段。Further, the audio segmentation module 300 parses the punctuation marks in the subtitle data, obtains the time position at each period, and segments the audio data according to the time position at the period, so that the integrity and consistency of the statement are good. The audio data is segmented.
音色匹配模块400,将分段的原始PCM数据发送到云端服务器通过预设的音色数据库进行音色学习,匹配识别出音频数据的音色。The timbre matching module 400 sends the segmented original PCM data to the cloud server for timbre learning through a preset timbre database, and matches the timbre of the recognized audio data.
由于音频翻译时,除音频内容外,音频的音色也是重要的参数,音色的准确翻译能够极大的保证同声翻译的效果,故该实施例中,在将音频数据转换为PCM数据后,发送到前端进行音色学习处理,音色匹配模块400利用预先设置的音色数据库来匹配PCM数据中的音色,最大可能的真实还原。预设的音色数据库通过输入不同年龄和性别的声音来构建。Due to the audio content, the tone of the audio is also an important parameter in the audio translation. The accurate translation of the tone can greatly ensure the effect of simultaneous interpretation. Therefore, in this embodiment, after the audio data is converted into PCM data, the audio is transmitted. To the front end for timbre learning processing, the timbre matching module 400 uses the timbre database set in advance to match the timbre in the PCM data, the most likely true restoration. The preset tone database is built by inputting sounds of different ages and genders.
音频翻译模块500,将原始的PCM数据在云端服务器进行用户所需语言的文字翻译,并将翻译结果与字幕数据进行比对,采用字幕数据对翻译结果进行内容和时间的同步修正。The audio translation module 500 performs the text translation of the original PCM data in the language required by the user in the cloud server, compares the translation result with the subtitle data, and uses the subtitle data to synchronously correct the content and time of the translation result.
在音色学习完毕后,由于原始的PCM数据为外语发声,故需要进行翻译,翻译成用户所需要的语言发声。音频翻译模块500首先将原始的PCM数据在云端服务器翻译成用户所需语言的文字语句,文字语句翻译完毕后,由于翻译可能存在较大的误差,故将翻译结果与字幕数据进行比对,利用字幕数据来对翻译结果进行内容修正,并且进行时间上的同步,消除翻译结果在内容和时间同步上的误差。After the timbre learning is completed, since the original PCM data is uttered in a foreign language, it needs to be translated and translated into the language required by the user. The audio translation module 500 first translates the original PCM data into a text statement of the language required by the user in the cloud server. After the translation of the text sentence is completed, the translation result may be compared with the subtitle data, and the translation result is compared. The subtitle data is used to correct the content of the translation, and the time synchronization is performed to eliminate the error of the translation result in content and time synchronization.
音频合成模块600,根据识别出的音色,将修正后的翻译结果转换成相同音 色的语音数据,并将语音数据按照时间戳与视频数据、字幕数据进行同步合成,合成新的节目数据流进行播放。The audio synthesis module 600 converts the corrected translation result into the same sound according to the recognized timbre Color voice data, and the voice data is synchronized with the video data and the caption data according to the time stamp, and a new program data stream is synthesized for playing.
在文字翻译得到翻译结果并修正后,由于之前已经得到音频数据的音色数据,音频合成模块600则结合识别出的音色,来对翻译结果进行语音合成,将修正后的翻译结果转换成相同音色的语音数据,得到翻译后的新音频数据,最后按照时间戳与视频数据、字幕数据进行同步合成,得到翻译后的节目数据流进行播放,即可完成对电视节目的同声翻译,使得用户能够听懂电视节目的音频,满足用户需求。After the translation result of the text translation is corrected and corrected, since the timbre data of the audio data has been obtained before, the audio synthesis module 600 combines the recognized timbre to synthesize the translation result, and convert the corrected translation result into the same timbre. The voice data is obtained, and the translated new audio data is obtained. Finally, the time stamp is synthesized synchronously with the video data and the subtitle data, and the translated program data stream is played for playing, so that the simultaneous interpretation of the television program can be completed, so that the user can listen. Understand the audio of TV programs to meet user needs.
此外,为进一步提高同声翻译的效果,音频合成模块600将转换后的语音数据的振幅与原音频数据的振幅进行比对调整,使转换后语音数据的振幅与原音频数据的振幅保持一致。In addition, in order to further improve the effect of the simultaneous interpretation, the audio synthesis module 600 compares the amplitude of the converted speech data with the amplitude of the original audio data, so that the amplitude of the converted speech data is consistent with the amplitude of the original audio data.
该数字电视节目同声翻译输出系统,将缓存的电视节目数据流进行视频、音频和字幕三者的分离,然后对音频数据进行分段、音色识别和翻译处理等处理,并利用字幕数据和时间戳进行修正和同步处理,完成将原始音频数据同声翻译成用户所需语言的音频数据,进而播放给用户,使得用户能够无需观看字幕就能够听懂电视节目的音频,给用户观看电视节目带来了极大的便利,用户不会因此错过电视节目的画面内容,大大提高了用户的观看体验。The digital television program simultaneous voice output system separates the buffered television program data stream into video, audio and subtitles, and then performs segmentation, tone recognition and translation processing on the audio data, and uses the caption data and time. The stamp is corrected and synchronized, and the original audio data is simultaneously translated into the audio data of the user's desired language, and then played to the user, so that the user can understand the audio of the television program without watching the subtitle, and the user can watch the television program. With great convenience, the user will not miss the screen content of the TV program, which greatly improves the user's viewing experience.
本申请数字电视节目同声翻译输出方法及系统,将缓存的电视节目数据流进行视频、音频和字幕三者的分离,然后对音频数据进行分段、音色识别和翻译处理等处理,并利用字幕数据和时间戳进行修正和同步处理,完成将原始音频数据同声翻译成用户所需语言的音频数据,进而播放给用户,使得用户能够无需观看字幕就能够听懂电视节目的音频,给用户观看电视节目带来了极大的便利,用户不会因此错过电视节目的画面内容,大大提高了用户的观看体验。The method and system for simultaneous interpretation of digital television program of the present application separates the buffered television program data stream into three parts: video, audio and subtitle, and then processes the audio data by segmentation, timbre recognition and translation processing, and uses subtitles. The data and time stamp are corrected and synchronized, and the original audio data is simultaneously translated into the audio data of the user's desired language, and then played to the user, so that the user can understand the audio of the television program without watching the subtitle, and view the audio for the user. The TV program brings great convenience, and the user does not miss the screen content of the TV program, thereby greatly improving the user's viewing experience.
图3是本申请实施例提供的一种智能终端的硬件结构示意图,该智能终端700可以执行上述任意方法实施例中所述的数字电视节目同声翻译输出方法。该智能终端700可以包括但不限于:机顶盒、电视机、手机、平板电脑等。FIG. 3 is a schematic diagram of a hardware structure of an intelligent terminal according to an embodiment of the present application. The smart terminal 700 can perform the simultaneous interpretation output method of the digital television program described in any of the foregoing method embodiments. The smart terminal 700 can include, but is not limited to, a set top box, a television, a mobile phone, a tablet, and the like.
具体地,请参阅图3,该智能终端700包括: Specifically, referring to FIG. 3, the smart terminal 700 includes:
一个或多个处理器701以及存储器702,图3中以一个处理器701为例。One or more processors 701 and a memory 702 are exemplified by a processor 701 in FIG.
处理器701和存储器702可以通过总线或者其他方式连接,图3中以通过总线连接为例。The processor 701 and the memory 702 may be connected by a bus or other means, as exemplified by a bus connection in FIG.
存储器702作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序、非暂态性计算机可执行程序以及模块,如本申请实施例中的数字电视节目同声翻译输出方法对应的程序指令/模块(例如,附图2所示的电视节目缓存模块100、数据分离模块200、音频分段模块300、音色匹配模块400、音频翻译模块500以及音频合成模块600)。处理器701通过运行存储在存储器702中的非暂态软件程序、指令以及模块,从而执行数字电视节目同声翻译输出系统的各种功能应用以及数据处理,即实现上述任一方法实施例的数字电视节目同声翻译输出方法。The memory 702 is used as a non-transitory computer readable storage medium, and can be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, and corresponding to the digital television program simultaneous translation output method in the embodiment of the present application. Program instructions/modules (for example, the television program cache module 100, the data separation module 200, the audio segmentation module 300, the tone color matching module 400, the audio translation module 500, and the audio synthesis module 600 shown in FIG. 2). The processor 701 performs various functional applications and data processing of the digital television program simultaneous translation output system by running non-transitory software programs, instructions, and modules stored in the memory 702, ie, the numbers implementing any of the above method embodiments Simultaneous translation output method for TV programs.
存储器702可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据数字电视节目同声翻译输出系统的使用所创建的数据等。此外,存储器702可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施例中,存储器702可选包括相对于处理器701远程设置的存储器,这些远程存储器可以通过网络连接至智能终端700。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 702 can include a storage program area and a storage data area, wherein the storage program area can store an operating system, an application required for at least one function; the storage data area can be stored according to the use of the digital television program simultaneous translation output system. Data, etc. Moreover, memory 702 can include high speed random access memory, and can also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 702 can optionally include memory remotely located relative to processor 701 that can be connected to smart terminal 700 over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
所述一个或者多个模块存储在所述存储器702中,当被所述一个或者多个处理器701执行时,执行上述任意方法实施例中的数字电视节目同声翻译输出方法,例如,执行以上描述的图1中的方法步骤S10至S60,实现图2中的模块100-600的功能。The one or more modules are stored in the memory 702, and when executed by the one or more processors 701, perform a digital television program simultaneous translation output method in any of the above method embodiments, for example, performing the above The described method steps S10 to S60 in Fig. 1 implement the functions of the modules 100-600 in Fig. 2.
本申请实施例还提供了一种非暂态计算机可读存储介质,所述非暂态计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个或多个处理器执行,例如被图3中的一个处理器701执行,可使得上述一个或多个处理器执行上述任意方法实施例中的数字电视节目同声翻译输出方法,例如,执 行以上描述的图1中的方法步骤S10至S60,实现图2中的模块100-600的功能。The embodiment of the present application further provides a non-transitory computer readable storage medium storing computer executable instructions executed by one or more processors, for example, Executed by a processor 701 in FIG. 3, the one or more processors may perform the digital television program simultaneous translation output method in any of the foregoing method embodiments, for example, The functions of the modules 100-600 of Figure 2 are implemented in the method steps S10 to S60 of Figure 1 described above.
本申请实施例还提供了一种计算机程序产品,所述计算机程序产品所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被智能终端执行时,使所述智能终端执行上述任意方法实施例中的数字电视节目同声翻译输出方法,例如,执行以上描述的图1中的方法步骤S10至S60,实现图2中的模块100-600的功能。The embodiment of the present application further provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when When the program instruction is executed by the smart terminal, the smart terminal is configured to perform the digital TV program simultaneous translation output method in any of the foregoing method embodiments, for example, performing the method steps S10 to S60 in FIG. 1 described above to implement the method in FIG. The functionality of the module 100-600.
以上所描述的系统实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The system embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
通过以上的实施方式的描述,本领域普通技术人员可以清楚地了解到各实施方式可借助软件加通用硬件平台的方式来实现,当然也可以通过硬件。本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非暂态计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。Through the description of the above embodiments, those skilled in the art can clearly understand that the various embodiments can be implemented by means of software plus a general hardware platform, and of course, by hardware. A person skilled in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a non-transitory computer readable storage medium. The program, when executed, may include the flow of an embodiment of the methods as described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
上述产品可执行本申请实施例所提供的方法,具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本申请实施例所提供的方法。The above products can perform the methods provided by the embodiments of the present application, and have the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiments of the present application.
以上仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本申请的保护范围之内。 The above is only the preferred embodiment of the present application, and is not intended to limit the application. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are included in the scope of the present application. Inside.

Claims (11)

  1. 一种数字电视节目同声翻译输出方法,其特征在于,包括如下步骤:A simultaneous interpretation output method for digital television programs, characterized in that the method comprises the following steps:
    S10:控制音视频终端缓冲存储电视节目数据流;S10: controlling the audio and video terminal to buffer the stored television program data stream;
    S20:由缓冲存储的电视节目数据流中分别解析分离出视频数据、音频数据以及字幕数据,并在分离时标记时间戳,为三者标记上同步标签;S20: parsing and separating the video data, the audio data, and the subtitle data from the buffered television program data stream, and marking the time stamp when separating, marking the synchronization label for the three;
    S30:对音频数据进行分段,并将分段后的音频数据进行解码处理,生成分段的原始PCM数据;S30: segmenting the audio data, and decoding the segmented audio data to generate segmented original PCM data;
    S40:将分段的原始PCM数据发送到云端服务器通过预设的音色数据库进行音色学习,匹配识别出音频数据的音色;S40: Send the segmented original PCM data to the cloud server, perform timbre learning through a preset timbre database, and match and identify the timbre of the audio data;
    S50:将原始的PCM数据在云端服务器进行用户所需语言的文字翻译,并将翻译结果与字幕数据进行比对,采用字幕数据对翻译结果进行内容和时间的同步修正;S50: Perform original text translation of the PCM data in the language required by the user in the cloud server, compare the translation result with the subtitle data, and use the subtitle data to synchronously correct the content and time of the translation result;
    S60:根据识别出的音色,将修正后的翻译结果转换成相同音色的语音数据,并将语音数据按照时间戳与视频数据、字幕数据进行同步合成,合成新的节目数据流进行播放。S60: Convert the corrected translation result into speech data of the same timbre according to the recognized timbre, and synthesize the speech data in synchronization with the video data and the subtitle data according to the time stamp, and synthesize a new program data stream for playing.
  2. 根据权利要求1所述的数字电视节目同声翻译输出方法,其特征在于,所述步骤S20还包括:The method for outputting a simultaneous interpretation of a digital television program according to claim 1, wherein the step S20 further comprises:
    在获取到音频数据后,对除人声之外的环境声音进行过滤。After the audio data is acquired, the ambient sounds other than the human voice are filtered.
  3. 根据权利要求2所述的数字电视节目同声翻译输出方法,其特征在于,所述步骤S30还包括:解析字幕数据中的标点符号,获取每一个句号处的时间位置,按照句号处的时间位置对音频数据进行分段。The method for outputting a simultaneous interpretation of a digital television program according to claim 2, wherein said step S30 further comprises: parsing punctuation marks in the subtitle data, obtaining a time position at each period, according to a time position at the period Segment the audio data.
  4. 根据权利要求3所述的数字电视节目同声翻译输出方法,其特征在于,所述步骤S60还包括:将转换后的语音数据的振幅与原音频数据的振幅进行比对调整,使转换后语音数据的振幅与原音频数据的振幅保持一致。 The method for outputting a simultaneous interpretation of a digital television program according to claim 3, wherein the step S60 further comprises: comparing the amplitude of the converted voice data with the amplitude of the original audio data to make the converted voice The amplitude of the data is consistent with the amplitude of the original audio data.
  5. 一种数字电视节目同声翻译输出系统,其特征在于,包括:A simultaneous interpretation output system for digital television programs, characterized in that it comprises:
    电视节目缓存模块,控制音视频终端缓冲存储电视节目数据流;a television program buffer module for controlling an audio and video terminal to buffer a television program data stream;
    数据分离模块,由缓冲存储的电视节目数据流中分别解析分离出视频数据、音频数据以及字幕数据,并在分离时标记时间戳,为三者标记上同步标签;The data separation module separately parses and separates the video data, the audio data and the subtitle data from the buffered television program data stream, and marks the time stamp when separating, and marks the synchronization label for the three;
    音频分段模块,对音频数据进行分段,并将分段后的音频数据进行解码处理,生成分段的原始PCM数据;The audio segmentation module segments the audio data, and decodes the segmented audio data to generate segmented original PCM data;
    音色匹配模块,将分段的原始PCM数据发送到云端服务器通过预设的音色数据库进行音色学习,匹配识别出音频数据的音色;The tone matching module sends the segmented original PCM data to the cloud server to learn the tone color through the preset tone color database, and matches the tone color of the recognized audio data;
    音频翻译模块,将原始的PCM数据在云端服务器进行用户所需语言的文字翻译,并将翻译结果与字幕数据进行比对,采用字幕数据对翻译结果进行内容和时间的同步修正;The audio translation module converts the original PCM data into the language of the user's desired language in the cloud server, compares the translation result with the subtitle data, and uses the subtitle data to synchronously correct the content and time of the translation result;
    音频合成模块,根据识别出的音色,将修正后的翻译结果转换成相同音色的语音数据,并将语音数据按照时间戳与视频数据、字幕数据进行同步合成,合成新的节目数据流进行播放。The audio synthesis module converts the corrected translation result into voice data of the same timbre according to the recognized timbre, and synchronizes the voice data with the video data and the subtitle data according to the time stamp, and synthesizes a new program data stream for playing.
  6. 根据权利要求5所述的数字电视节目同声翻译输出系统,其特征在于,所述数据分离模块在获取到音频数据后,对除人声之外的环境声音进行过滤。The simultaneous interpretation output system for digital television programs according to claim 5, wherein the data separation module filters ambient sounds other than vocals after the audio data is acquired.
  7. 根据权利要求6所述的数字电视节目同声翻译输出系统,其特征在于,所述音频分段模块解析字幕数据中的标点符号,获取每一个句号处的时间位置,按照句号处的时间位置对音频数据进行分段。The simultaneous interpretation output system for digital television programs according to claim 6, wherein the audio segmentation module parses punctuation marks in the subtitle data to obtain a time position at each period, according to the time position at the period The audio data is segmented.
  8. 根据权利要求7所述的数字电视节目同声翻译输出系统,其特征在于,所述音频合成模块将转换后的语音数据的振幅与原音频数据的振幅进行比对调整,使转换后语音数据的振幅与原音频数据的振幅保持一致。The simultaneous interpretation output system for digital television programs according to claim 7, wherein the audio synthesis module compares the amplitude of the converted voice data with the amplitude of the original audio data to make the converted voice data. The amplitude is consistent with the amplitude of the original audio data.
  9. 一种智能终端,其特征在于,包括:An intelligent terminal, comprising:
    至少一个处理器;以及,At least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中, a memory communicatively coupled to the at least one processor; wherein
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1-4任一项所述的数字电视节目同声翻译输出方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the method of any of claims 1-4 The simultaneous interpretation method of digital TV programs.
  10. 一种非暂态计算机可读存储介质,其特征在于,所述非暂态计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使智能终端执行如权利要求1-4任一项所述的数字电视节目同声翻译输出方法。A non-transitory computer readable storage medium, wherein the non-transitory computer readable storage medium stores computer executable instructions for causing an intelligent terminal to perform the claims 1-4 A simultaneous interpretation output method for digital television programs according to any one of the preceding claims.
  11. 一种计算机程序产品,其特征在于,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被智能终端执行时,使所述智能终端执行如权利要求1-4任一项所述的数字电视节目同声翻译输出方法。 A computer program product, comprising: a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a smart terminal, The smart terminal is configured to perform the simultaneous interpretation output method of the digital television program according to any one of claims 1-4.
PCT/CN2017/106377 2016-12-30 2017-10-16 Method and system for outputting simultaneous interpretation of digital television program, and smart terminal WO2018121001A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611253202.2A CN106791913A (en) 2016-12-30 2016-12-30 Digital television program simultaneous interpretation output intent and system
CN201611253202.2 2016-12-30

Publications (1)

Publication Number Publication Date
WO2018121001A1 true WO2018121001A1 (en) 2018-07-05

Family

ID=58953091

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/106377 WO2018121001A1 (en) 2016-12-30 2017-10-16 Method and system for outputting simultaneous interpretation of digital television program, and smart terminal

Country Status (2)

Country Link
CN (1) CN106791913A (en)
WO (1) WO2018121001A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113225615A (en) * 2021-04-20 2021-08-06 深圳市九洲电器有限公司 Television program playing method, terminal equipment, server and storage medium
CN113473238A (en) * 2020-04-29 2021-10-01 海信集团有限公司 Intelligent device and simultaneous interpretation method during video call
CN113891168A (en) * 2021-10-19 2022-01-04 北京有竹居网络技术有限公司 Subtitle processing method, subtitle processing device, electronic equipment and storage medium
CN114157920A (en) * 2021-12-10 2022-03-08 深圳Tcl新技术有限公司 Playing method and device for displaying sign language, smart television and storage medium
CN114283227A (en) * 2021-11-26 2022-04-05 北京百度网讯科技有限公司 Virtual character driving method and device, electronic device and readable storage medium

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106791913A (en) * 2016-12-30 2017-05-31 深圳市九洲电器有限公司 Digital television program simultaneous interpretation output intent and system
CN107222792A (en) * 2017-07-11 2017-09-29 成都德芯数字科技股份有限公司 A kind of caption superposition method and device
CN107527618A (en) * 2017-07-13 2017-12-29 安徽声讯信息技术有限公司 A kind of audio word synchronous playing system
CN107688792B (en) * 2017-09-05 2020-06-05 语联网(武汉)信息技术有限公司 Video translation method and system
CN107992485A (en) * 2017-11-27 2018-05-04 北京搜狗科技发展有限公司 A kind of simultaneous interpretation method and device
CN109963092B (en) * 2017-12-26 2021-12-17 深圳市优必选科技有限公司 Subtitle processing method and device and terminal
CN108366305A (en) * 2018-02-07 2018-08-03 深圳佳力拓科技有限公司 A kind of code stream without subtitle shows the method and system of subtitle by speech recognition
US11582527B2 (en) 2018-02-26 2023-02-14 Google Llc Automated voice translation dubbing for prerecorded video
CN108447486B (en) * 2018-02-28 2021-12-03 科大讯飞股份有限公司 Voice translation method and device
US11277674B2 (en) 2018-04-04 2022-03-15 Nooggi Pte Ltd Method and system for promoting interaction during live streaming events
CN108962293B (en) * 2018-07-10 2021-11-05 武汉轻工大学 Video correction method, system, terminal device and storage medium
CN109119063B (en) * 2018-08-31 2019-11-22 腾讯科技(深圳)有限公司 Video dubs generation method, device, equipment and storage medium
CN110121097A (en) * 2019-05-13 2019-08-13 深圳市亿联智能有限公司 Multimedia playing apparatus and method with accessible function
CN110335610A (en) * 2019-07-19 2019-10-15 北京硬壳科技有限公司 The control method and display of multimedia translation
KR20210032809A (en) 2019-09-17 2021-03-25 삼성전자주식회사 Real-time interpretation method and apparatus
CN110767233A (en) * 2019-10-30 2020-02-07 合肥名阳信息技术有限公司 Voice conversion system and method
CN111931523A (en) * 2020-04-26 2020-11-13 永康龙飘传感科技有限公司 Method and system for translating characters and sign language in news broadcast in real time
CN113808576A (en) * 2020-06-16 2021-12-17 阿里巴巴集团控股有限公司 Voice conversion method, device and computer system
CN111916053B (en) * 2020-08-17 2022-05-20 北京字节跳动网络技术有限公司 Voice generation method, device, equipment and computer readable medium
CN112423106A (en) * 2020-11-06 2021-02-26 四川长虹电器股份有限公司 Method and system for automatically translating accompanying sound
CN114007116A (en) * 2022-01-05 2022-02-01 凯新创达(深圳)科技发展有限公司 Video processing method and video processing device
CN114554238B (en) * 2022-02-23 2023-08-11 北京有竹居网络技术有限公司 Live broadcast voice simultaneous transmission method, device, medium and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001224002A (en) * 2000-02-08 2001-08-17 Atr Interpreting Telecommunications Res Lab Sound.video synchronization method and computer readable recoding medium for recording sound.video processing program
CN1774715A (en) * 2003-04-14 2006-05-17 皇家飞利浦电子股份有限公司 System and method for performing automatic dubbing on an audio-visual stream
CN102821259A (en) * 2012-07-20 2012-12-12 冠捷显示科技(厦门)有限公司 TV (television) system with multi-language speech translation and realization method thereof
CN103491429A (en) * 2013-09-04 2014-01-01 张家港保税区润桐电子技术研发有限公司 Audio processing method and audio processing equipment
CN104427294A (en) * 2013-08-29 2015-03-18 中兴通讯股份有限公司 Method for supporting video conference simultaneous interpretation and cloud-terminal server thereof
CN204697226U (en) * 2015-06-19 2015-10-07 深圳市人和智聚科技开发有限公司 A kind of electronic equipment with video playback capability
CN105227967A (en) * 2015-10-08 2016-01-06 微鲸科技有限公司 Support the television set of intelligent translation
CN105704579A (en) * 2014-11-27 2016-06-22 南京苏宁软件技术有限公司 Real-time automatic caption translation method during media playing and system
CN106791913A (en) * 2016-12-30 2017-05-31 深圳市九洲电器有限公司 Digital television program simultaneous interpretation output intent and system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6850266B1 (en) * 1998-06-04 2005-02-01 Roberto Trinca Process for carrying out videoconferences with the simultaneous insertion of auxiliary information and films with television modalities
CN102881283B (en) * 2011-07-13 2014-05-28 三星电子(中国)研发中心 Method and system for processing voice
KR20150025750A (en) * 2013-08-30 2015-03-11 삼성전자주식회사 user terminal apparatus and two way translation method thereof
CN104299619B (en) * 2014-09-29 2017-09-19 广东欧珀移动通信有限公司 A kind of processing method and processing device of audio file
CN105280179A (en) * 2015-11-02 2016-01-27 小天才科技有限公司 Text-to-speech processing method and system
CN105957517A (en) * 2016-04-29 2016-09-21 中国南方电网有限责任公司电网技术研究中心 Voice data structural transformation method based on open source API and system thereof

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001224002A (en) * 2000-02-08 2001-08-17 Atr Interpreting Telecommunications Res Lab Sound.video synchronization method and computer readable recoding medium for recording sound.video processing program
CN1774715A (en) * 2003-04-14 2006-05-17 皇家飞利浦电子股份有限公司 System and method for performing automatic dubbing on an audio-visual stream
CN102821259A (en) * 2012-07-20 2012-12-12 冠捷显示科技(厦门)有限公司 TV (television) system with multi-language speech translation and realization method thereof
CN104427294A (en) * 2013-08-29 2015-03-18 中兴通讯股份有限公司 Method for supporting video conference simultaneous interpretation and cloud-terminal server thereof
CN103491429A (en) * 2013-09-04 2014-01-01 张家港保税区润桐电子技术研发有限公司 Audio processing method and audio processing equipment
CN105704579A (en) * 2014-11-27 2016-06-22 南京苏宁软件技术有限公司 Real-time automatic caption translation method during media playing and system
CN204697226U (en) * 2015-06-19 2015-10-07 深圳市人和智聚科技开发有限公司 A kind of electronic equipment with video playback capability
CN105227967A (en) * 2015-10-08 2016-01-06 微鲸科技有限公司 Support the television set of intelligent translation
CN106791913A (en) * 2016-12-30 2017-05-31 深圳市九洲电器有限公司 Digital television program simultaneous interpretation output intent and system

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113473238A (en) * 2020-04-29 2021-10-01 海信集团有限公司 Intelligent device and simultaneous interpretation method during video call
CN113473238B (en) * 2020-04-29 2022-10-18 海信集团有限公司 Intelligent device and simultaneous interpretation method during video call
CN113225615A (en) * 2021-04-20 2021-08-06 深圳市九洲电器有限公司 Television program playing method, terminal equipment, server and storage medium
CN113225615B (en) * 2021-04-20 2023-08-08 深圳市九洲电器有限公司 Television program playing method, terminal equipment, server and storage medium
CN113891168A (en) * 2021-10-19 2022-01-04 北京有竹居网络技术有限公司 Subtitle processing method, subtitle processing device, electronic equipment and storage medium
CN113891168B (en) * 2021-10-19 2023-12-19 北京有竹居网络技术有限公司 Subtitle processing method, subtitle processing device, electronic equipment and storage medium
CN114283227A (en) * 2021-11-26 2022-04-05 北京百度网讯科技有限公司 Virtual character driving method and device, electronic device and readable storage medium
CN114283227B (en) * 2021-11-26 2023-04-07 北京百度网讯科技有限公司 Virtual character driving method and device, electronic equipment and readable storage medium
CN114157920A (en) * 2021-12-10 2022-03-08 深圳Tcl新技术有限公司 Playing method and device for displaying sign language, smart television and storage medium
CN114157920B (en) * 2021-12-10 2023-07-25 深圳Tcl新技术有限公司 Method and device for playing sign language, intelligent television and storage medium

Also Published As

Publication number Publication date
CN106791913A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
WO2018121001A1 (en) Method and system for outputting simultaneous interpretation of digital television program, and smart terminal
US20200336796A1 (en) Video stream processing method and apparatus, computer device, and storage medium
US11252444B2 (en) Video stream processing method, computer device, and storage medium
WO2020024353A1 (en) Video playback method and device, terminal device, and storage medium
US8229748B2 (en) Methods and apparatus to present a video program to a visually impaired person
WO2016037440A1 (en) Video voice conversion method and device and server
CN111010586A (en) Live broadcast method, device, equipment and storage medium based on artificial intelligence
KR101899588B1 (en) System for automatically generating a sign language animation data, broadcasting system using the same and broadcasting method
US20160066055A1 (en) Method and system for automatically adding subtitles to streaming media content
US20060285654A1 (en) System and method for performing automatic dubbing on an audio-visual stream
CN112437337B (en) Method, system and equipment for realizing live caption
CN103067775A (en) Subtitle display method for audio/video terminal, audio/video terminal and server
US20130076981A1 (en) Optimizing timed text generation for live closed captions and subtitles
KR102044689B1 (en) System and method for creating broadcast subtitle
US20160098395A1 (en) System and method for separate audio program translation
KR20150021258A (en) Display apparatus and control method thereof
CN109785832A (en) A kind of old man's set-top box Intelligent voice recognition method suitable for accent again
US10582270B2 (en) Sending device, sending method, receiving device, receiving method, information processing device, and information processing method
WO2021020825A1 (en) Electronic device, control method thereof, and recording medium
CN103902531A (en) Audio and video recording and broadcasting method for Chinese and foreign language automatic real-time voice translation and subtitle annotation
KR101990019B1 (en) Terminal for performing hybrid caption effect, and method thereby
JP2021090172A (en) Caption data generation device, content distribution system, video reproduction device, program, and caption data generation method
CN113709579B (en) Audio and video data transmission method and device and storage medium
CN112055253B (en) Method and device for adding and multiplexing independent subtitle stream
KR102160117B1 (en) a real-time broadcast content generating system for disabled

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17888555

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25/10/2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17888555

Country of ref document: EP

Kind code of ref document: A1