WO2017054488A1 - Television play control method, server and television play control system - Google Patents

Television play control method, server and television play control system Download PDF

Info

Publication number
WO2017054488A1
WO2017054488A1 PCT/CN2016/084461 CN2016084461W WO2017054488A1 WO 2017054488 A1 WO2017054488 A1 WO 2017054488A1 CN 2016084461 W CN2016084461 W CN 2016084461W WO 2017054488 A1 WO2017054488 A1 WO 2017054488A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio data
audio
data
television terminal
time segment
Prior art date
Application number
PCT/CN2016/084461
Other languages
French (fr)
Chinese (zh)
Inventor
戚炎兴
Original Assignee
深圳Tcl新技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳Tcl新技术有限公司 filed Critical 深圳Tcl新技术有限公司
Publication of WO2017054488A1 publication Critical patent/WO2017054488A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4852End-user interface for client configuration for modifying audio parameters, e.g. switching between mono and stereo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4856End-user interface for client configuration for language selection, e.g. for the menu or subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Definitions

  • the present invention relates to the field of television technologies, and in particular, to a television broadcast control method, a server, and a television broadcast control system.
  • Most video files may only provide one voice, but more than two subtitles are provided.
  • the user can only listen to the default voice provided in the video file, but when the user does not understand the default language, You can only understand the character dialogue and plot by watching the subtitles. This will reduce the user's audiovisual experience.
  • the main objective of the present invention is to provide a television broadcast control method, a server and a television broadcast control system, which are designed to provide audio that can be understood by a user according to the language requirements of different users, so as to avoid the use of subtitles to understand the character dialogue and The flaws in the plot, thus improving the user experience of watching TV.
  • the present invention provides a television broadcast control method, and the television broadcast control method includes the following steps:
  • the present invention further provides a server, where the server includes:
  • a first receiving module configured to receive first audio data and subtitle data sent by the television terminal
  • a generating processing module configured to perform recognition processing on the first audio data and the caption data, and generate a role list and sample audio parameters
  • a synthesis processing module configured to send the role list and sample audio parameters to the television terminal, and when receiving the user setting parameter that is reported by the television terminal according to the role list and the sample audio parameter, The first audio data is synthesized into the second audio data;
  • a first sending module configured to send the second audio data to the television terminal to control the second audio data and the subtitle data to be played at the television terminal.
  • the present invention further provides a television broadcast control system, the television broadcast control system comprising a television terminal and a server as described above, the television terminal comprising:
  • a second sending module configured to send first audio data and subtitle data to the server
  • a second receiving module configured to receive a role list and sample audio parameters generated by the server after the first audio data and the caption data are identified and processed;
  • a feedback module configured to generate a user setting parameter according to the role list and the sample audio parameter, and feed back the user setting parameter to the server;
  • An acquiring module configured to acquire second audio data that is synthesized by the server when the user setting parameter is received, where the first audio data is synthesized;
  • a synchronous play module configured to synchronously play the second audio data, the video data, and the caption data
  • the television terminal extracts the video data, the first audio data, and the subtitle data from a video file.
  • the television broadcast control method, the server and the television broadcast control system provided by the present invention first receive the first audio data and the caption data sent by the television terminal through the server, and perform recognition processing to generate a character list and sample audio parameters, and then Transmitting the role list and the sample audio parameters to the television terminal, and synthesizing the first audio data into the second audio data according to the user setting parameter when receiving the user setting parameter fed back by the television terminal, and finally Transmitting the second audio data to the television terminal to control the second audio data and the caption data to be played at the television terminal.
  • the audio that can be understood by the user can be provided correspondingly, and the user's personalized requirement for the character dialogue can be satisfied, thereby avoiding the defect of the character dialogue and the plot only by using the subtitle, thereby improving the user. Watch the experience of TV.
  • FIG. 1 is a schematic flow chart of an embodiment of a television broadcast control method according to the present invention.
  • FIG. 2 is a schematic diagram of a refinement process of the step of FIG. 1 for identifying the first audio data, and generating a role list and sample audio parameters;
  • 3 is a waveform diagram of a subtitle time stamp and first audio data
  • FIG. 4 is a schematic diagram of a refinement process of performing spectrum analysis on the first audio data in the time segment in the step of FIG. 2, and performing a classification to generate a role list;
  • FIG. 5 is a schematic diagram of a refinement process of generating a sample audio parameter corresponding to the role list by using a speech synthesis technology in the step of FIG. 2;
  • FIG. 6 is a step of FIG. 1 for transmitting the role list and sample audio parameters to the television terminal, and when receiving the user setting parameter fed back by the television terminal, the first setting according to the user setting parameter
  • the audio data is synthesized into a refinement flow diagram of the second audio data
  • FIG. 7 is a schematic diagram showing a synthesized waveform of the second audio data
  • FIG. 8 is a schematic diagram of functional modules of an embodiment of a server according to the present invention.
  • FIG. 9 is a schematic diagram of a refinement function module of the generation processing module in FIG. 8;
  • FIG. 10 is a schematic diagram of a refinement function module of the categorizing unit of FIG. 9;
  • FIG. 11 is a schematic diagram of a refinement function module of the generating unit in FIG. 9;
  • FIG. 12 is a schematic diagram of a refinement function module of the synthesis processing module of FIG. 8;
  • FIG. 13 is a schematic diagram of functional modules of an embodiment of a television broadcast control system according to the present invention.
  • FIG. 14 is a schematic diagram of a refinement function module of the television terminal of FIG.
  • a television broadcast control method of the television terminal includes the following steps:
  • Step S10 the server receives the first audio data and the caption data sent by the television terminal;
  • the audio and video playback of the television is completed by the television terminal and the server.
  • the television terminal completes the collation and transmission of audio, subtitle and other data, and provides a user interface for the user to set parameters.
  • the server receives data such as audio and subtitles sent by the television terminal, and completes processing of the audio and subtitle data to synthesize the audio data and transmit it to the television terminal for display.
  • the television terminal when the user turns on the dubbing setting function through the remote controller of the television terminal, the television terminal quickly decodes the video file, extracts the audio or default audio selected by the user, and the subtitle or default subtitle selected by the user, and the audio data. And the subtitle data is packaged and sent to the server.
  • Step S20 performing identification processing on the first audio data and the caption data, and generating a role list and sample audio parameters
  • the server performs identification processing on the first audio data and the caption data to generate a character list and sample audio parameters.
  • the generation of the role list may first select a predetermined number of timestamps, such as selecting three timestamps, and then separately identifying and analyzing the audio data in the three timestamps, and similarly sounding each timestamp.
  • the voices are classified into one type, namely, role 1, role 2, and the like.
  • the specific classification method can distinguish statistics according to the audio spectrum.
  • the sample audio is a preset fixed audio, which can be audio of different genders, and sample audio corresponding to different frequencies of different genders.
  • sample audio such as male treble audio, male vocal audio, male woofer audio, female treble audio, female vocal audio, female woofer audio, etc.
  • the frequency range of the audio can be further fined. The points are not limited to the high, medium and low audio ranges in the embodiment.
  • the sample audio can also be audio of a famous dubbing person or a professional dubbing person.
  • Step S30 sending the role list and the sample audio parameters to the television terminal, and synthesizing the first audio data into the second audio data when receiving the user setting parameter fed back by the television terminal;
  • the server sends the generated role list and sample audio parameters to the television terminal, and the television terminal presents a user interface on the television screen for the user to input and select a role list on the user interface. And the sample audio parameters to generate the user setting parameters, and then the television terminal feeds back the user setting parameters to the server.
  • the server synthesizes the first audio data into the second audio data according to the user setting parameter, wherein the synthesis process of the second audio data requires a text speech engine and a vocal cancellation program, which may be specifically
  • the text speech engine generates new audio data corresponding to the subtitle data (specifically according to user setting parameters), and performs vocal cancellation on the first audio data according to the vocal cancellation program, and then The new audio data and the vocal-removed first audio data are combined into the second audio data.
  • Step S40 Send the second audio data to the television terminal to control the second audio data and the subtitle data to be played in the television terminal.
  • the server sends the synthesized second audio data corresponding to the role list to the television terminal. It can be understood that, in addition to extracting audio data and subtitle data from the video file, the television terminal extracts video data from the video file, and at this time, the television terminal receives the second In the case of audio data, the video data is synchronized with the second audio data and the subtitle data, and finally played.
  • the television broadcast control method provided by the present invention first receives the first audio data and the caption data sent by the television terminal through the server, and performs recognition processing to generate a character list and sample audio parameters, and then the character list and the sample audio.
  • Sending parameters to the television terminal when receiving the user setting parameter fed back by the television terminal, synthesizing the first audio data into second audio data according to the user setting parameter, and finally using the second audio data Sending to the television terminal to control the second audio data and the subtitle data to be played at the television terminal.
  • the audio that can be understood by the user can be provided correspondingly, and the user's personalized requirement for the character dialogue can be satisfied, thereby avoiding the defect of the character dialogue and the plot only by using the subtitle, thereby improving the user. Watch the experience of TV.
  • the step S20 includes:
  • Step S201 the server extracts a subtitle timestamp from the subtitle data
  • Step S202 searching for a time segment in which the first audio data appears according to the subtitle timestamp
  • the server extracts the subtitle timestamp from the subtitle data, and finds a time segment in which the character dubbing appears according to the subtitle timestamp, and calls the speech recognition module to perform recognition processing, and statistics A plurality of audio data having a higher frequency appear in the time segment for the user to select.
  • Step S203 performing spectrum analysis on the first audio data in the time segment, and performing classification to generate a role list
  • the step S203 may specifically include:
  • Step S2031 respectively acquiring first audio data in the first time segment and the second time segment
  • Step S2032 determining whether the spectrum range and the spectrum amplitude of the first audio data in the first time segment and the second time segment are consistent;
  • the server respectively acquires first audio data in the first time segment and the second time segment, and separately processes the first time segment and A spectrum range and a spectral amplitude of the first audio data in the second time segment are analyzed, and determining whether a spectrum range and a spectrum amplitude of the first audio data in the first time segment are different from a spectrum of the first audio data in the second time segment The range and spectrum are consistent.
  • Step S2034 if no, classifying the first audio data in the first time segment and the second time segment into different roles.
  • the server determines that the first audio data in the first time segment and the second time segment is classified into the same role, if not, the first time segment and the second time are The first audio data within the segment is classified into different roles.
  • the spectral range and the spectral amplitude of the first audio data in the two time segments are consistent, if the similarity between the two is greater than or equal to 90%, the determination is consistent.
  • the value of the similarity is not limited to the embodiment, but may be reasonably selected according to actual needs.
  • the audio spectrum of one of the time segments is used as a reference, and is defined as role 1, and then compared with the audio spectrum in subsequent time segments. If the characteristics of the two spectra are determined to be close, the audio in the two time segments is taken. Classified as role 1; if it is judged that the features of the two spectra do not match, the audio in the subsequent time segment is classified as role 2 until the audio spectrum recognition in all time segments is completed. Finally, the number of occurrences of the character is counted, and the characters with more occurrences are the main personas.
  • this embodiment mainly analyzes the audio spectrum of the audio data in the time stamp. Because the pronunciation of each persona is different in spectrum, for example, the pronunciation spectrum of male voice is mainly concentrated in the middle and low frequency regions, while the pronunciation spectrum of female voice is concentrated in the middle and high frequency regions. In addition, in the pronunciation spectrum between characters, the spectral amplitudes of the respective frequency points also differ. Therefore, the pronunciation and audio between the characters can be distinguished by combining the spectral range and the spectral amplitude.
  • Step S204 Generate a sample audio parameter corresponding to the character list by using a speech synthesis technology.
  • the timestamp indicates that the audio data in the time segment has persona audio, and the time segment The audio data is speech-recognized to recognize the audio of one of the personas.
  • the sample audio is a preset fixed audio, which may be audio of different genders, and sample audio corresponding to different frequencies of different genders, for example, selecting a specific audio “whether or not the voice of the segment is selected as a character. Dubbing?", and provide male treble audio, male vocal audio, male woofer audio, female treble audio, female vocal audio, female woofer audio and other sample audio, of course, in other embodiments, you can also audio
  • the frequency range is further subdivided and is not limited to the high, medium and low audio ranges in this embodiment.
  • the user interface mode is popped up for the user to select, wherein the role list is the result of the role classification in the foregoing text; and the sample parameter refers to each The timestamp parameter in the role collation and the sample audio that the user can choose to preview.
  • the timestamp parameter allows the user to preview the original dubbing as well as the sample audio.
  • the step S204 includes:
  • Step S2041 Extract, for each role in the role list, a predetermined number of subtitle timestamps from the subtitle data;
  • Step S2042 Generate, by the text-to-speech engine, a predetermined number of sample audio parameters corresponding to the predetermined number of subtitle timestamps, to send to the television terminal for preview selection.
  • the user can preview the original voice of the character and the selected sample audio.
  • the television terminal transmits the corresponding parameter to the server, and the server uses the text.
  • the speech engine generates a predetermined number of sample audio parameters corresponding to the predetermined number of subtitle timestamps for transmission to the television terminal for preview selection.
  • the server provides three time stamps for each role classification, and simultaneously sends the generated sample audio to the television terminal.
  • the user may select the provided time stamp for each role categorization to play the audio of the corresponding time at the television terminal, so that the user recognizes the person represented by the role classification.
  • the user can preview the sample audio produced by the audition text speech engine to select and determine the appropriate sample audio parameters.
  • the step S30 includes:
  • Step S301 the generated role list and sample audio parameters are sent to the television terminal
  • Step S302 receiving user setting parameters fed back by the television terminal
  • the server sends the generated role list and sample audio parameters to the television terminal, and the television terminal presents a user interface on the television screen for the user to input and select a role list on the user interface.
  • a sample audio parameter to generate the user setting parameter, and then the television terminal feeds back the user setting parameter to the server.
  • Step S303 performing audio filtering on the first audio data, and synthesizing the second audio data corresponding to the role list by using a text speech engine and combining the user setting parameters.
  • new audio data corresponding to the subtitle data may be generated according to the text speech engine (specifically according to setting parameters of the user), and the first An audio data is subjected to vocal cancellation, and then the new audio data and the vocal-removed first audio data are synthesized into second audio data corresponding to the character list.
  • the existing vocal elimination method mainly uses the same vocal pronunciation in the left and right channels, and subtracts the left and right channels to remove the same part of the two channels, but this method is not only
  • the background sound causes a large loss (especially in the low frequency part), and when the vocal pronunciation is different in the two channels, the vocal is not well eliminated.
  • a bandpass filter is used, and in the frequency band of the bandpass filter, only the amplitude of the original pronunciation is reduced, and the discrimination of the synthesized audio is not affected, so that the low frequency and high frequency portions can be better preserved. .
  • the server 1 includes:
  • the first receiving module 10 is configured to receive first audio data and subtitle data sent by the television terminal;
  • the audio and video playback of the television is completed by the television terminal in cooperation with the server 1.
  • the television terminal completes the collation and transmission of audio and subtitle data, and provides a user interface for the user to set parameters.
  • the server 1 receives data such as audio and subtitles transmitted by the television terminal, and completes processing of audio and subtitle data to synthesize the audio data and transmit it to the television terminal for display.
  • the television terminal when the user turns on the dubbing setting function through the remote controller of the television terminal, the television terminal quickly decodes the video file, extracts the audio or default audio selected by the user, and the subtitle or default subtitle selected by the user, and the audio data. And the subtitle data is packaged and sent to the server 1.
  • the generating processing module 20 is configured to perform recognition processing on the first audio data and the caption data to generate a role list and sample audio parameters;
  • the server 1 performs identification processing on the first audio data and the caption data to generate a character list and sample audio parameters.
  • the generation of the role list may first select a predetermined number of timestamps, such as selecting three timestamps, and then separately identifying and analyzing the audio data in the three timestamps, and similarly sounding each timestamp.
  • the voices are classified into one type, namely, role 1, role 2, and the like.
  • the specific classification method can distinguish statistics according to the audio spectrum.
  • the sample audio is a preset fixed audio, which can be audio of different genders, and sample audio corresponding to different frequencies of different genders.
  • sample audio such as male treble audio, male vocal audio, male woofer audio, female treble audio, female vocal audio, female woofer audio, etc.
  • the frequency range of the audio can be further fined. The points are not limited to the high, medium and low audio ranges in the embodiment.
  • the sample audio can also be audio of a famous dubbing person or a professional dubbing person.
  • a synthesis processing module 30 configured to send the role list and sample audio parameters to the television terminal, and when receiving the user setting parameters fed back by the television terminal according to the role list and the sample audio parameters, The first audio data is synthesized into second audio data;
  • the server 1 sends the generated role list and sample audio parameters to the television terminal, and the television terminal presents a user interface on the television screen for the user to input and select a role on the user interface.
  • the list and sample audio parameters are generated to generate the user setting parameters, and then the television terminal feeds back the user setting parameters to the server 1.
  • the server 1 synthesizes the first audio data into the second audio data according to the user setting parameter, wherein the synthesis process of the second audio data requires a text speech engine and a vocal cancellation program, specifically Generating new audio data corresponding to the subtitle data according to the text speech engine (specifically according to user setting parameters), and performing vocal cancellation on the first audio data according to the vocal cancellation program, and then The new audio data and the vocal-removed first audio data are combined into the second audio data.
  • the first sending module 40 is configured to send the second audio data to the television terminal to control the second audio data and the caption data to be played in the television terminal.
  • the server 1 sends the synthesized second audio data corresponding to the role list to the television terminal. It can be understood that, in addition to extracting audio data and subtitle data from the video file, the television terminal extracts video data from the video file, and at this time, the television terminal receives the second In the case of audio data, the video data is synchronized with the second audio data and the subtitle data, and finally played.
  • the server 1 provided by the present invention first receives the first audio data and the caption data sent by the television terminal, and performs recognition processing to generate a character list and sample audio parameters, and then sends the role list and the sample audio parameters to The television terminal, when receiving the user setting parameter fed back by the television terminal, synthesizing the first audio data into second audio data according to the user setting parameter, and finally transmitting the second audio data to the The television terminal controls the second audio data and the subtitle data to be played at the television terminal.
  • the audio that can be understood by the user can be provided correspondingly, and the user's personalized requirement for the character dialogue can be satisfied, thereby avoiding the defect of the character dialogue and the plot only by using the subtitle, thereby improving the user. Watch the experience of TV.
  • the generation processing module 20 includes:
  • An obtaining unit 201 configured to extract a subtitle timestamp from the subtitle data
  • the searching unit 202 is configured to search, according to the subtitle timestamp, a time segment in which the first audio data appears;
  • the server 1 extracts a subtitle timestamp from the subtitle data, and finds a time segment in which the character dubbing appears according to the subtitle timestamp, and calls the speech recognition module to perform recognition processing, and statistics are performed. A plurality of audio data having a higher frequency appear in the time segment for the user to select.
  • the categorizing unit 203 is configured to perform spectrum analysis on the first audio data in the time segment, and perform categorization to generate a role list.
  • the categorizing unit 203 includes:
  • the obtaining subunit 2031 is configured to respectively acquire first audio data in the first time segment and the second time segment;
  • a determining subunit 2032 configured to determine whether a spectrum range and a spectrum amplitude of the first audio data in the first time segment and the second time segment are consistent;
  • the server respectively acquires first audio data in the first time segment and the second time segment, and separately processes the first time segment and A spectrum range and a spectral amplitude of the first audio data in the second time segment are analyzed, and determining whether a spectrum range and a spectrum amplitude of the first audio data in the first time segment are different from a spectrum of the first audio data in the second time segment The range and spectrum are consistent.
  • a first categorization sub-unit 2033 configured to: when determining a spectrum range and a spectrum amplitude of the first audio data in the first time segment and the second time segment, the first time segment and the second time The first audio data within the segment is classified into the same role;
  • a second collation sub-unit 2034 configured to: when determining that a spectrum range and/or a spectrum amplitude of the first audio data in the first time segment and the second time segment are inconsistent, then the first time segment and the first time segment The first audio data within the two time segments is classified into different roles.
  • the generating unit 204 is configured to generate a sample audio parameter corresponding to the character list by using a voice synthesis technology.
  • the generating unit 204 includes:
  • An extracting subunit 2041 configured to extract, from each of the roles in the role list, a predetermined number of subtitle timestamps from the subtitle data;
  • the generating subunit 2042 is configured to generate, by the text speech engine, a predetermined number of sample audio parameters corresponding to the predetermined number of subtitle timestamps, to send to the television terminal for preview selection.
  • the synthesis processing module 30 includes:
  • the sending unit 301 is configured to send the generated role list and sample audio parameters to the television terminal;
  • the receiving unit 302 is configured to receive user setting parameters that are feedback by the television terminal according to the role list and the sample audio parameters.
  • the server 1 sends the generated role list and sample audio parameters to the television terminal, and the television terminal presents a user interface on the television screen for the user to input and select a role list on the user interface. And the sample audio parameters to generate the user setting parameters, and then the television terminal feeds back the user setting parameters to the server 1.
  • the synthesizing unit 303 is configured to perform audio filtering on the first audio data, and synthesize the second audio data corresponding to the role list by using a text speech engine and combining the user setting parameters.
  • the new audio data corresponding to the subtitle data may be generated according to the text speech engine (specifically according to a setting parameter of the user), and according to the vocal elimination program, The first audio data is subjected to vocal cancellation, and then the new audio data and the vocal-removed first audio data are synthesized into second audio data corresponding to the character list.
  • the existing vocal elimination method mainly uses the same vocal pronunciation in the left and right channels, and subtracts the left and right channels to remove the same part of the two channels, but this method is not only
  • the background sound causes a large loss (especially in the low frequency part), and when the vocal pronunciation is different in the two channels, the vocal is not well eliminated.
  • a bandpass filter is used, and in the frequency band of the bandpass filter, only the amplitude of the original pronunciation is reduced, and the discrimination of the synthesized audio is not affected, so that the low frequency and high frequency portions can be better preserved. .
  • the present invention also provides a television broadcast control system 100.
  • the television broadcast control system 100 includes a television terminal 2 and a server 1 as described above.
  • the television terminal 2 include:
  • a second sending module 50 configured to send first audio data and caption data to the server 1;
  • the second receiving module 60 is configured to receive a role list and sample audio parameters generated by the server 1 after the first audio data and the caption data are identified and processed;
  • the feedback module 70 is configured to generate a user setting parameter according to the role list and the sample audio parameter, and feed back the user setting parameter to the server 1;
  • the obtaining module 80 is configured to acquire second audio data that is synthesized by the server 1 when the user setting parameter is received, where the first audio data is synthesized;
  • the synchronous play module 90 is configured to synchronously play the second audio data, the video data, and the caption data.
  • the television terminal 2 when receiving the second audio data corresponding to the role list synthesized by the server 1, the television terminal 2 synchronizes the second audio data with the video data and the caption data. Finally, the playback is performed, so that the audio of the video file is pre-processed by the server 1 to synthesize a language that the user can understand, which can enhance the user's viewing experience; in addition, the user can also provide various character audio selections, thereby further enhancing the user's viewing experience. User experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Computer Security & Cryptography (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Disclosed is a television play control method, comprising the following steps: a server receiving first audio data and subtitle data transmitted by a television terminal; performing an identification process on the first audio data and the subtitle data to generate a role list and a sample audio parameter; transmitting the role list and the sample audio parameter to the television terminal, and when receiving a user setting parameter fed back by the television terminal according to the role list and the sample audio parameter, synthesizing the first audio data into second audio data; and transmitting the second audio data to the television terminal, so as to control the playing of the second audio data and the subtitle data on the television terminal. Also disclosed are a server and a television play control system. The present invention can correspondingly provide audio which can be understood by a user according to language requirements of different users, so as to avoid the defect that character dialogues and the plots can only be understood by means of subtitles, thereby improving the user experience in watching television.

Description

电视播放控制方法、服务器及电视播放控制系统  Television broadcast control method, server and television broadcast control system
技术领域Technical field
本发明涉及电视技术领域,尤其涉及一种电视播放控制方法、服务器及电视播放控制系统。The present invention relates to the field of television technologies, and in particular, to a television broadcast control method, a server, and a television broadcast control system.
背景技术Background technique
目前的电视终端在进行视频文件播放时,通常根据视频文件中音轨及字幕数据来切换人物配音和字幕,以方便不同的用户可以选择自己理解的语言进行播放。然而,这种视频播放方式至少存在以下缺陷:In current video terminals, video files are usually played according to the audio track and subtitle data in the video file, so that different users can select the language they understand. However, this video playback method has at least the following drawbacks:
大多数视频文件可能只提供了一种语音,但同时提供了两种以上的字幕,这种情况下,用户只能收听视频文件中提供的默认语音,而在用户听不懂该默认语言时,就只能通过观看字幕来了解人物对白和剧情。这样,会降低用户的视听体验效果。Most video files may only provide one voice, but more than two subtitles are provided. In this case, the user can only listen to the default voice provided in the video file, but when the user does not understand the default language, You can only understand the character dialogue and plot by watching the subtitles. This will reduce the user's audiovisual experience.
发明内容Summary of the invention
本发明的主要目的在于提供一种电视播放控制方法、服务器及电视播放控制系统,旨在根据不同用户的语言需求,对应提供可被用户理解的音频,以避免只能借助字幕来了解人物对白及剧情的缺陷,从而提高用户观看电视的体验感。 The main objective of the present invention is to provide a television broadcast control method, a server and a television broadcast control system, which are designed to provide audio that can be understood by a user according to the language requirements of different users, so as to avoid the use of subtitles to understand the character dialogue and The flaws in the plot, thus improving the user experience of watching TV.
为实现上述目的,本发明提供一种电视播放控制方法,所述电视播放控制方法包括以下步骤:To achieve the above object, the present invention provides a television broadcast control method, and the television broadcast control method includes the following steps:
服务器接收电视终端发送的第一音频数据和字幕数据; Receiving, by the server, first audio data and subtitle data sent by the television terminal;
对所述第一音频数据和字幕数据进行识别处理,生成角色列表和样例音频参数; Performing recognition processing on the first audio data and the caption data to generate a character list and sample audio parameters;
将所述角色列表和样例音频参数发送至所述电视终端,并在接收到所述电视终端根据所述角色列表和样例音频参数反馈的用户设置参数时,将所述第一音频数据合成为第二音频数据;Transmitting the role list and sample audio parameters to the television terminal, and synthesizing the first audio data upon receiving a user setting parameter fed back by the television terminal according to the character list and sample audio parameters Is the second audio data;
将所述第二音频数据发送至所述电视终端,以控制所述第二音频数据以及所述字幕数据在所述电视终端进行播放。Transmitting the second audio data to the television terminal to control the second audio data and the caption data to be played at the television terminal.
此外,为实现上述目的,本发明还提供一种服务器,所述服务器包括:In addition, to achieve the above object, the present invention further provides a server, where the server includes:
第一接收模块,用于接收电视终端发送的第一音频数据和字幕数据; a first receiving module, configured to receive first audio data and subtitle data sent by the television terminal;
生成处理模块,用于对所述第一音频数据和字幕数据进行识别处理,生成角色列表和样例音频参数; a generating processing module, configured to perform recognition processing on the first audio data and the caption data, and generate a role list and sample audio parameters;
合成处理模块,用于将所述角色列表和样例音频参数发送至所述电视终端,并在接收到所述电视终端根据所述角色列表和样例音频参数反馈的用户设置参数时,将所述第一音频数据合成为第二音频数据;a synthesis processing module, configured to send the role list and sample audio parameters to the television terminal, and when receiving the user setting parameter that is reported by the television terminal according to the role list and the sample audio parameter, The first audio data is synthesized into the second audio data;
第一发送模块,用于将所述第二音频数据发送至所述电视终端,以控制所述第二音频数据以及所述字幕数据在所述电视终端进行播放。And a first sending module, configured to send the second audio data to the television terminal to control the second audio data and the subtitle data to be played at the television terminal.
此外,为实现上述目的,本发明还提供一种电视播放控制系统,所述电视播放控制系统包括电视终端以及如上所述的服务器,所述电视终端包括:In addition, in order to achieve the above object, the present invention further provides a television broadcast control system, the television broadcast control system comprising a television terminal and a server as described above, the television terminal comprising:
第二发送模块,用于向服务器发送第一音频数据和字幕数据;a second sending module, configured to send first audio data and subtitle data to the server;
第二接收模块,用于接收所述服务器对所述第一音频数据和字幕数据进行识别处理后,生成的角色列表和样例音频参数;a second receiving module, configured to receive a role list and sample audio parameters generated by the server after the first audio data and the caption data are identified and processed;
反馈模块,用于根据所述角色列表和样例音频参数生成用户设置参数,并将所述用户设置参数反馈给所述服务器;a feedback module, configured to generate a user setting parameter according to the role list and the sample audio parameter, and feed back the user setting parameter to the server;
获取模块,用于获取所述服务器在接收到所述用户设置参数时,将所述第一音频数据合成的第二音频数据;An acquiring module, configured to acquire second audio data that is synthesized by the server when the user setting parameter is received, where the first audio data is synthesized;
同步播放模块,用于将所述第二音频数据、视频数据以及字幕数据进行同步播放;a synchronous play module, configured to synchronously play the second audio data, the video data, and the caption data;
其中,所述电视终端从视频文件中提取出所述视频数据、所述第一音频数据以及所述字幕数据。The television terminal extracts the video data, the first audio data, and the subtitle data from a video file.
本发明提供的电视播放控制方法、服务器以及电视播放控制系统,首先通过服务器接收电视终端发送的第一音频数据和字幕数据,并进行识别处理,以生成角色列表和样例音频参数,然后将所述角色列表和样例音频参数发送至所述电视终端,在接收到所述电视终端反馈的用户设置参数时,根据所述用户设置参数将所述第一音频数据合成为第二音频数据,最终将所述第二音频数据发送至所述电视终端,以控制所述第二音频数据以及所述字幕数据在所述电视终端进行播放。这样,可以根据不同用户的语言需求,对应提供可被用户理解的音频,还可以满足用户对人物对白的个性化要求,从而可以避免只能借助字幕来了解人物对白及剧情的缺陷,进而提高用户观看电视的体验感。The television broadcast control method, the server and the television broadcast control system provided by the present invention first receive the first audio data and the caption data sent by the television terminal through the server, and perform recognition processing to generate a character list and sample audio parameters, and then Transmitting the role list and the sample audio parameters to the television terminal, and synthesizing the first audio data into the second audio data according to the user setting parameter when receiving the user setting parameter fed back by the television terminal, and finally Transmitting the second audio data to the television terminal to control the second audio data and the caption data to be played at the television terminal. In this way, according to the language requirements of different users, the audio that can be understood by the user can be provided correspondingly, and the user's personalized requirement for the character dialogue can be satisfied, thereby avoiding the defect of the character dialogue and the plot only by using the subtitle, thereby improving the user. Watch the experience of TV.
附图说明DRAWINGS
图1为本发明电视播放控制方法一实施例的流程示意图;1 is a schematic flow chart of an embodiment of a television broadcast control method according to the present invention;
图2为图1中步骤对所述第一音频数据进行识别处理,生成角色列表和样例音频参数的细化流程示意图;2 is a schematic diagram of a refinement process of the step of FIG. 1 for identifying the first audio data, and generating a role list and sample audio parameters;
图3为字幕时间戳与第一音频数据的波形示意图;3 is a waveform diagram of a subtitle time stamp and first audio data;
图4为图2中步骤对所述时间片段内的所述第一音频数据进行频谱分析,并进行归类生成角色列表的细化流程示意图;4 is a schematic diagram of a refinement process of performing spectrum analysis on the first audio data in the time segment in the step of FIG. 2, and performing a classification to generate a role list;
图5为图2中步骤利用语音合成技术,生成与所述角色列表对应的样例音频参数的细化流程示意图;5 is a schematic diagram of a refinement process of generating a sample audio parameter corresponding to the role list by using a speech synthesis technology in the step of FIG. 2;
图6为图1中步骤将所述角色列表和样例音频参数发送至所述电视终端,并在接收到所述电视终端反馈的用户设置参数时,根据所述用户设置参数将所述第一音频数据合成为第二音频数据的细化流程示意图;6 is a step of FIG. 1 for transmitting the role list and sample audio parameters to the television terminal, and when receiving the user setting parameter fed back by the television terminal, the first setting according to the user setting parameter The audio data is synthesized into a refinement flow diagram of the second audio data;
图7为为第二音频数据的合成波形示意图;7 is a schematic diagram showing a synthesized waveform of the second audio data;
图8为本发明服务器一实施例的功能模块示意图;FIG. 8 is a schematic diagram of functional modules of an embodiment of a server according to the present invention; FIG.
图9为图8中生成处理模块的细化功能模块示意图;9 is a schematic diagram of a refinement function module of the generation processing module in FIG. 8;
图10为图9中归类单元的细化功能模块示意图;10 is a schematic diagram of a refinement function module of the categorizing unit of FIG. 9;
图11为图9中生成单元的细化功能模块示意图;11 is a schematic diagram of a refinement function module of the generating unit in FIG. 9;
图12为图8中合成处理模块的细化功能模块示意图;12 is a schematic diagram of a refinement function module of the synthesis processing module of FIG. 8;
图13为本发明电视播放控制系统一实施例的功能模块示意图;13 is a schematic diagram of functional modules of an embodiment of a television broadcast control system according to the present invention;
图14为图13中电视终端的细化功能模块示意图。FIG. 14 is a schematic diagram of a refinement function module of the television terminal of FIG.
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The implementation, functional features, and advantages of the present invention will be further described in conjunction with the embodiments.
具体实施方式detailed description
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
本发明提供一种电视播放控制方法,参照图1,在一实施例中,所述电视终端的电视播放控制方法包括以下步骤:The present invention provides a television broadcast control method. Referring to FIG. 1, in an embodiment, a television broadcast control method of the television terminal includes the following steps:
步骤S10,服务器接收电视终端发送的第一音频数据和字幕数据; Step S10, the server receives the first audio data and the caption data sent by the television terminal;
本实施例中,电视的音频和视频播放,由电视终端与服务器协作完成,所述电视终端完成音频、字幕等数据的整理与传输,并提供用户界面以供用户进行参数设置。而服务器接收电视终端发送的音频、字幕等数据,并完成音频、字幕数据的处理,以合成音频数据后传输给电视终端进行显示。In this embodiment, the audio and video playback of the television is completed by the television terminal and the server. The television terminal completes the collation and transmission of audio, subtitle and other data, and provides a user interface for the user to set parameters. The server receives data such as audio and subtitles sent by the television terminal, and completes processing of the audio and subtitle data to synthesize the audio data and transmit it to the television terminal for display.
本实施例中,用户通过电视终端的遥控器开启配音设置功能时,电视终端则对视频文件进行快速解码,提取出用户选择的音频或默认音频及用户选择的字幕或默认字幕,并将音频数据及字幕数据打包发送到服务器。In this embodiment, when the user turns on the dubbing setting function through the remote controller of the television terminal, the television terminal quickly decodes the video file, extracts the audio or default audio selected by the user, and the subtitle or default subtitle selected by the user, and the audio data. And the subtitle data is packaged and sent to the server.
步骤S20,对所述第一音频数据和字幕数据进行识别处理,生成角色列表和样例音频参数;Step S20, performing identification processing on the first audio data and the caption data, and generating a role list and sample audio parameters;
所述服务器对所述第一音频数据和字幕数据进行识别处理,生成角色列表和样例音频参数。其中,所述角色列表的生成,可以先选取预定数量的时间戳,如选取三段时间戳,然后分别对所述三段时间戳内的音频数据进行识别分析,并将各时间戳内发音类似的语音归为一类即角色1、角色2等不同的角色。具体归类方法,可以根据音频频谱来进行区分统计。而样例音频是预设的固定音频,可为不同性别的音频,以及不同性别不同频率对应的样例音频,例如,选取一段特定的音频“是否选择本段语音为人物角色的配音?”,并分别提供男声高音音频、男声中音音频、男声低音音频、女声高音音频、女声中音音频、女声低音音频等样例音频,当然,在其他实施例中,还可以将音频的频率范围进一步细分,并不局限于本实施例中的高、中、低三种音频范围。此外,所述样例音频还可以是著名配音人员或专业配音人员的音频。The server performs identification processing on the first audio data and the caption data to generate a character list and sample audio parameters. The generation of the role list may first select a predetermined number of timestamps, such as selecting three timestamps, and then separately identifying and analyzing the audio data in the three timestamps, and similarly sounding each timestamp. The voices are classified into one type, namely, role 1, role 2, and the like. The specific classification method can distinguish statistics according to the audio spectrum. The sample audio is a preset fixed audio, which can be audio of different genders, and sample audio corresponding to different frequencies of different genders. For example, select a specific audio "Do you choose this voice as the voiceover of the character?" It also provides sample audio such as male treble audio, male vocal audio, male woofer audio, female treble audio, female vocal audio, female woofer audio, etc. Of course, in other embodiments, the frequency range of the audio can be further fined. The points are not limited to the high, medium and low audio ranges in the embodiment. In addition, the sample audio can also be audio of a famous dubbing person or a professional dubbing person.
步骤S30,将所述角色列表和样例音频参数发送至所述电视终端,并在接收到所述电视终端反馈的用户设置参数时,将所述第一音频数据合成为第二音频数据;Step S30, sending the role list and the sample audio parameters to the television terminal, and synthesizing the first audio data into the second audio data when receiving the user setting parameter fed back by the television terminal;
本实施例中,服务器将生成的所述角色列表和样例音频参数发送至所述电视终端,所述电视终端在电视屏幕上呈现出用户界面,以供用户在用户界面上输入并选择角色列表和样例音频参数,从而生成所述用户设置参数,然后所述电视终端将所述用户设置参数反馈给所述服务器。所述服务器根据所述用户设置参数将所述第一音频数据合成为第二音频数据,其中,所述第二音频数据的合成过程,需要用到文本语音引擎和人声消除程序,具体可以根据所述文本语音引擎产生对应所述字幕数据的新的音频数据(具体根据用户设置参数而不同),并根据所述人声消除程序将所述第一音频数据进行人声消除,然后将所述新的音频数据与经人声消除的第一音频数据合成为第二音频数据。In this embodiment, the server sends the generated role list and sample audio parameters to the television terminal, and the television terminal presents a user interface on the television screen for the user to input and select a role list on the user interface. And the sample audio parameters to generate the user setting parameters, and then the television terminal feeds back the user setting parameters to the server. The server synthesizes the first audio data into the second audio data according to the user setting parameter, wherein the synthesis process of the second audio data requires a text speech engine and a vocal cancellation program, which may be specifically The text speech engine generates new audio data corresponding to the subtitle data (specifically according to user setting parameters), and performs vocal cancellation on the first audio data according to the vocal cancellation program, and then The new audio data and the vocal-removed first audio data are combined into the second audio data.
步骤S40,将所述第二音频数据发送至所述电视终端,以控制所述第二音频数据以及所述字幕数据在所述电视终端进行播放。Step S40: Send the second audio data to the television terminal to control the second audio data and the subtitle data to be played in the television terminal.
本实施例中,所述服务器将合成的与所述角色列表对应的所述第二音频数据,发送至所述电视终端。可以理解的是,所述电视终端除了从所述视频文件中提取出音频数据、字幕数据外,还会从所述视频文件中提取出视频数据,此时,电视终端在接收到所述第二音频数据时,会将所述视频数据与所述第二音频数据以及字幕数据进行同步处理,最后进行播放。In this embodiment, the server sends the synthesized second audio data corresponding to the role list to the television terminal. It can be understood that, in addition to extracting audio data and subtitle data from the video file, the television terminal extracts video data from the video file, and at this time, the television terminal receives the second In the case of audio data, the video data is synchronized with the second audio data and the subtitle data, and finally played.
本发明提供的电视播放控制方法,首先通过服务器接收电视终端发送的第一音频数据和字幕数据,并进行识别处理,以生成角色列表和样例音频参数,然后将所述角色列表和样例音频参数发送至所述电视终端,在接收到所述电视终端反馈的用户设置参数时,根据所述用户设置参数将所述第一音频数据合成为第二音频数据,最终将所述第二音频数据发送至所述电视终端,以控制所述第二音频数据以及所述字幕数据在所述电视终端进行播放。这样,可以根据不同用户的语言需求,对应提供可被用户理解的音频,还可以满足用户对人物对白的个性化要求,从而可以避免只能借助字幕来了解人物对白及剧情的缺陷,进而提高用户观看电视的体验感。The television broadcast control method provided by the present invention first receives the first audio data and the caption data sent by the television terminal through the server, and performs recognition processing to generate a character list and sample audio parameters, and then the character list and the sample audio. Sending parameters to the television terminal, when receiving the user setting parameter fed back by the television terminal, synthesizing the first audio data into second audio data according to the user setting parameter, and finally using the second audio data Sending to the television terminal to control the second audio data and the subtitle data to be played at the television terminal. In this way, according to the language requirements of different users, the audio that can be understood by the user can be provided correspondingly, and the user's personalized requirement for the character dialogue can be satisfied, thereby avoiding the defect of the character dialogue and the plot only by using the subtitle, thereby improving the user. Watch the experience of TV.
在一实施例中,如图2所示,在上述图1所示的基础上,所述步骤S20包括:In an embodiment, as shown in FIG. 2, on the basis of the foregoing FIG. 1, the step S20 includes:
步骤S201,所述服务器从所述字幕数据中提取出字幕时间戳;Step S201, the server extracts a subtitle timestamp from the subtitle data;
步骤S202,根据所述字幕时间戳,查找出所述第一音频数据出现的时间片段;Step S202, searching for a time segment in which the first audio data appears according to the subtitle timestamp;
本实施例中,参照图3,服务器从所述字幕数据中提取出字幕时间戳,并根据所述字幕时间戳查找出角色配音出现的时间片段,并调用语音识别模块进行识别处理,统计出所述时间片段内出现频次较高的若干个音频数据供用户进行选择。In this embodiment, referring to FIG. 3, the server extracts the subtitle timestamp from the subtitle data, and finds a time segment in which the character dubbing appears according to the subtitle timestamp, and calls the speech recognition module to perform recognition processing, and statistics A plurality of audio data having a higher frequency appear in the time segment for the user to select.
可以理解的是,电视配音中存在很多个角色,而主要人物角色的配音通常较多,而那些出现频次较低的配音可能也较多,如果都由用户来选择,则会增加用户的操作负担。Understandably, there are many roles in TV dubbing, and the main characters are usually more dubbed, and those with lower frequency may have more dubbing. If they are selected by the user, the user's operation burden will increase. .
步骤S203,对所述时间片段内的所述第一音频数据进行频谱分析,并进行归类生成角色列表;Step S203, performing spectrum analysis on the first audio data in the time segment, and performing classification to generate a role list;
本实施例中,通过对所述时间片段内的第一音频数据进行频谱分析,利用频谱范围及频谱幅度,找出频谱接近的音频,并归为同一类而生成角色列表。In this embodiment, by performing spectrum analysis on the first audio data in the time segment, using the spectrum range and the spectrum amplitude, finding the audio with the spectrum close, and classifying the same type to generate a role list.
在一可选实施例中,如图4所示,所述步骤S203可具体包括:In an optional embodiment, as shown in FIG. 4, the step S203 may specifically include:
步骤S2031,分别获取第一时间片段和第二时间片段内的第一音频数据;Step S2031, respectively acquiring first audio data in the first time segment and the second time segment;
步骤S2032,判断所述第一时间片段和第二时间片段内的第一音频数据的频谱范围及频谱幅度是否一致;Step S2032, determining whether the spectrum range and the spectrum amplitude of the first audio data in the first time segment and the second time segment are consistent;
本实施例中,以两个时间片段如第一时间片段和第二时间片段为例,服务器分别获取第一时间片段和第二时间片段内的第一音频数据,并分别对第一时间片段和第二时间片段内的第一音频数据的频谱范围及频谱幅度进行分析,判断第一时间片段内的第一音频数据的频谱范围及频谱幅度是否与第二时间片段内的第一音频数据的频谱范围及频谱幅度一致。In this embodiment, taking two time segments, such as a first time segment and a second time segment, as an example, the server respectively acquires first audio data in the first time segment and the second time segment, and separately processes the first time segment and A spectrum range and a spectral amplitude of the first audio data in the second time segment are analyzed, and determining whether a spectrum range and a spectrum amplitude of the first audio data in the first time segment are different from a spectrum of the first audio data in the second time segment The range and spectrum are consistent.
步骤S2033,若是,则将所述第一时间片段和第二时间片段内的第一音频数据归类为同一角色;Step S2033, if yes, classifying the first audio data in the first time segment and the second time segment into the same role;
步骤S2034,若否,则将所述第一时间片段和第二时间片段内的第一音频数据归类为不同角色。Step S2034, if no, classifying the first audio data in the first time segment and the second time segment into different roles.
本实施例中,若服务器判断一致,则将所述第一时间片段和第二时间片段内的第一音频数据归类为同一角色,若不一致,则将所述第一时间片段和第二时间片段内的第一音频数据归类为不同角色。In this embodiment, if the server determines that the first audio data in the first time segment and the second time segment is classified into the same role, if not, the first time segment and the second time are The first audio data within the segment is classified into different roles.
可以理解的是,在两个时间片段内的第一音频数据的频谱范围及频谱幅度是否一致时,可以设定二者的相似度大于或等于90%时,则判定一致。当然,在其他实施例中,相似度的取值并不局限于本实施例,而是可以根据实际需要合理选择。It can be understood that when the spectral range and the spectral amplitude of the first audio data in the two time segments are consistent, if the similarity between the two is greater than or equal to 90%, the determination is consistent. Of course, in other embodiments, the value of the similarity is not limited to the embodiment, but may be reasonably selected according to actual needs.
以其中的一个时间片段的音频频谱作为基准,并定义为角色1,然后与后续各个时间片段内的音频频谱进行比对,若判断两个频谱的特征接近,则把两个时间片段内的音频归类为角色1;若判断两个频谱的特征不匹配,则将后续的时间片段内的音频归类为角色2,直到所有的时间片段内的音频频谱识别完成。最终,对角色出现的次数进行统计,其中出现次数较多的角色,则是主要的人物角色。The audio spectrum of one of the time segments is used as a reference, and is defined as role 1, and then compared with the audio spectrum in subsequent time segments. If the characteristics of the two spectra are determined to be close, the audio in the two time segments is taken. Classified as role 1; if it is judged that the features of the two spectra do not match, the audio in the subsequent time segment is classified as role 2 until the audio spectrum recognition in all time segments is completed. Finally, the number of occurrences of the character is counted, and the characters with more occurrences are the main personas.
本实施例中,并不需要识别音频数据的具体内容,因为音频数据的内容在字幕数据中已经提供,本实施例主要是对时间戳内的音频数据进行音频频谱的分析。由于每个人物角色的发音在频谱上是存在差异的,如男声的发音频谱主要集中在中低频区域,而女声的发音频谱则集中在中高频区域。此外,角色之间的发音频谱中,各个频率点的频谱幅度也存在差异。因此,结合频谱范围及频谱幅度即可对角色间的发音音频进行区分。In this embodiment, it is not necessary to identify the specific content of the audio data. Since the content of the audio data is already provided in the subtitle data, this embodiment mainly analyzes the audio spectrum of the audio data in the time stamp. Because the pronunciation of each persona is different in spectrum, for example, the pronunciation spectrum of male voice is mainly concentrated in the middle and low frequency regions, while the pronunciation spectrum of female voice is concentrated in the middle and high frequency regions. In addition, in the pronunciation spectrum between characters, the spectral amplitudes of the respective frequency points also differ. Therefore, the pronunciation and audio between the characters can be distinguished by combining the spectral range and the spectral amplitude.
步骤S204,利用语音合成技术,生成与所述角色列表对应的样例音频参数。Step S204: Generate a sample audio parameter corresponding to the character list by using a speech synthesis technology.
本实施例中,以读取的时间戳00:01:02:100~00:01:05:100为例,该时间戳表示该时间片段内的音频数据中具有人物角色音频,对该时间片段的音频数据进行语音识别,即可识别出其中一个人物角色的音频。In this embodiment, taking the read time stamp 00:01:02:100~00:01:05:100 as an example, the timestamp indicates that the audio data in the time segment has persona audio, and the time segment The audio data is speech-recognized to recognize the audio of one of the personas.
本实施例中,样例音频是预设的固定音频,可为不同性别的音频,以及不同性别不同频率对应的样例音频,例如,选取一段特定的音频“是否选择本段语音为人物角色的配音?”,并分别提供男声高音音频、男声中音音频、男声低音音频、女声高音音频、女声中音音频、女声低音音频等样例音频,当然,在其他实施例中,还可以将音频的频率范围进一步细分,并不局限于本实施例中的高、中、低三种音频范围。本实施例中,电视终端在服务器发送的角色列表及样例音频参数后,弹出用户界面方式供用户进行选择,其中,角色列表是前文中统计的角色归类结果;而样例参数是指每个角色归类中的时间戳参数及可供用户选择预览的样例音频。通过时间戳参数,用户可以预览原配音以及样例音频。In this embodiment, the sample audio is a preset fixed audio, which may be audio of different genders, and sample audio corresponding to different frequencies of different genders, for example, selecting a specific audio “whether or not the voice of the segment is selected as a character. Dubbing?", and provide male treble audio, male vocal audio, male woofer audio, female treble audio, female vocal audio, female woofer audio and other sample audio, of course, in other embodiments, you can also audio The frequency range is further subdivided and is not limited to the high, medium and low audio ranges in this embodiment. In this embodiment, after the TV terminal sends the role list and the sample audio parameters, the user interface mode is popped up for the user to select, wherein the role list is the result of the role classification in the foregoing text; and the sample parameter refers to each The timestamp parameter in the role collation and the sample audio that the user can choose to preview. The timestamp parameter allows the user to preview the original dubbing as well as the sample audio.
在一实施例中,如图5所示,在上述图2所示的基础上,所述步骤S204包括:In an embodiment, as shown in FIG. 5, on the basis of the foregoing FIG. 2, the step S204 includes:
步骤S2041,针对所述角色列表中的每个角色,从所述字幕数据中提取出预定数量的字幕时间戳;Step S2041: Extract, for each role in the role list, a predetermined number of subtitle timestamps from the subtitle data;
步骤S2042,通过文本语音引擎,对应所述预定数量的字幕时间戳生成预定数量的样例音频参数,以发至所述电视终端进行预览选择。Step S2042: Generate, by the text-to-speech engine, a predetermined number of sample audio parameters corresponding to the predetermined number of subtitle timestamps, to send to the television terminal for preview selection.
本实施例中,用户可以预览角色原配音及可选择的样例音频,电视终端在接收到用户设置参数即用户选择的样例音频时,将对应的参数传送至所述服务器,服务器则利用文本语音引擎,对应所述预定数量的字幕时间戳生成预定数量的样例音频参数,以发至所述电视终端进行预览选择。In this embodiment, the user can preview the original voice of the character and the selected sample audio. When receiving the user setting parameter, that is, the sample audio selected by the user, the television terminal transmits the corresponding parameter to the server, and the server uses the text. The speech engine generates a predetermined number of sample audio parameters corresponding to the predetermined number of subtitle timestamps for transmission to the television terminal for preview selection.
例如:在角色音频识别过程中,统计得到3个类似发音的角色归类,服务器为每个角色归类提供3个时间戳,并同时将生成的样例音频发送至所述电视终端。此时,用户可以为每个角色归类选择提供的所述时间戳,以在所述电视终端播放相应时间的音频,以使用户识别出该角色归类所代表的人物。另外,用户还可以预览试听文本语音引擎产生的样例音频,以选择并确定合适的样例音频参数。For example, in the character audio recognition process, three similar pronunciation-like roles are classified, and the server provides three time stamps for each role classification, and simultaneously sends the generated sample audio to the television terminal. At this time, the user may select the provided time stamp for each role categorization to play the audio of the corresponding time at the television terminal, so that the user recognizes the person represented by the role classification. In addition, the user can preview the sample audio produced by the audition text speech engine to select and determine the appropriate sample audio parameters.
在一实施例中,如图6所示,在上述图1所示的基础上,所述步骤S30包括:In an embodiment, as shown in FIG. 6, on the basis of the foregoing FIG. 1, the step S30 includes:
步骤S301,将生成的所述角色列表和样例音频参数发送至所述电视终端;Step S301, the generated role list and sample audio parameters are sent to the television terminal;
步骤S302,接收所述电视终端反馈的用户设置参数;Step S302, receiving user setting parameters fed back by the television terminal;
本实施例中,服务器将生成的所述角色列表和样例音频参数发送至所述电视终端,电视终端通过在电视屏幕上呈现出用户界面,以供用户在用户界面上输入并选择角色列表和样例音频参数,从而生成所述用户设置参数,然后所述电视终端将所述用户设置参数反馈给所述服务器。In this embodiment, the server sends the generated role list and sample audio parameters to the television terminal, and the television terminal presents a user interface on the television screen for the user to input and select a role list on the user interface. A sample audio parameter to generate the user setting parameter, and then the television terminal feeds back the user setting parameter to the server.
步骤S303,对所述第一音频数据进行音频过滤,通过文本语音引擎并结合所述用户设置参数,合成与所述角色列表对应的所述第二音频数据。Step S303, performing audio filtering on the first audio data, and synthesizing the second audio data corresponding to the role list by using a text speech engine and combining the user setting parameters.
本实施例中,参照图7,可以根据所述文本语音引擎产生对应所述字幕数据的新的音频数据(具体根据用户的设置参数而不同),并根据所述人声消除程序将所述第一音频数据进行人声消除,然后将所述新的音频数据与经人声消除的第一音频数据合成为与所述角色列表对应的第二音频数据。In this embodiment, referring to FIG. 7, new audio data corresponding to the subtitle data may be generated according to the text speech engine (specifically according to setting parameters of the user), and the first An audio data is subjected to vocal cancellation, and then the new audio data and the vocal-removed first audio data are synthesized into second audio data corresponding to the character list.
其中,现有人声消除方法主要是利用左右两个声道中人声发音相同的特点,将左右两个声道进行相减,从而去除两个声道中相同的部分,但这种方法不仅对背景声造成较大的损失(特别在低频部分),而且在人声发音在两个声道中不相同时,无法很好地消除人声。本申请采用带通滤波器的方法,在带通滤波器的频带范围内,只需达到降低原发音的幅度,不影响合成音频的辨别即可,从而可以较好地保留低频及高频部分。此外,对时间戳范围外的音频数据也没有造成任何影响。Among them, the existing vocal elimination method mainly uses the same vocal pronunciation in the left and right channels, and subtracts the left and right channels to remove the same part of the two channels, but this method is not only The background sound causes a large loss (especially in the low frequency part), and when the vocal pronunciation is different in the two channels, the vocal is not well eliminated. In the present application, a bandpass filter is used, and in the frequency band of the bandpass filter, only the amplitude of the original pronunciation is reduced, and the discrimination of the synthesized audio is not affected, so that the low frequency and high frequency portions can be better preserved. . In addition, there is no impact on the audio data outside the timestamp range.
本发明还提供一种服务器1,参照图8,在一实施例中,所述服务器1包括:The present invention also provides a server 1. Referring to FIG. 8, in an embodiment, the server 1 includes:
第一接收模块10,用于接收电视终端发送的第一音频数据和字幕数据; The first receiving module 10 is configured to receive first audio data and subtitle data sent by the television terminal;
本实施例中,电视的音频和视频播放,由电视终端与服务器1协作完成,所述电视终端完成音频、字幕等数据的整理与传输,并提供用户界面以供用户进行参数设置。而服务器1接收电视终端发送的音频、字幕等数据,并完成音频、字幕数据的处理,以合成音频数据后传输给电视终端进行显示。In this embodiment, the audio and video playback of the television is completed by the television terminal in cooperation with the server 1. The television terminal completes the collation and transmission of audio and subtitle data, and provides a user interface for the user to set parameters. The server 1 receives data such as audio and subtitles transmitted by the television terminal, and completes processing of audio and subtitle data to synthesize the audio data and transmit it to the television terminal for display.
本实施例中,用户通过电视终端的遥控器开启配音设置功能时,电视终端则对视频文件进行快速解码,提取出用户选择的音频或默认音频及用户选择的字幕或默认字幕,并将音频数据及字幕数据打包发送到服务器1。In this embodiment, when the user turns on the dubbing setting function through the remote controller of the television terminal, the television terminal quickly decodes the video file, extracts the audio or default audio selected by the user, and the subtitle or default subtitle selected by the user, and the audio data. And the subtitle data is packaged and sent to the server 1.
生成处理模块20,用于对所述第一音频数据和字幕数据进行识别处理,生成角色列表和样例音频参数;The generating processing module 20 is configured to perform recognition processing on the first audio data and the caption data to generate a role list and sample audio parameters;
所述服务器1对所述第一音频数据和字幕数据进行识别处理,生成角色列表和样例音频参数。其中,所述角色列表的生成,可以先选取预定数量的时间戳,如选取三段时间戳,然后分别对所述三段时间戳内的音频数据进行识别分析,并将各时间戳内发音类似的语音归为一类即角色1、角色2等不同的角色。具体归类方法,可以根据音频频谱来进行区分统计。而样例音频是预设的固定音频,可为不同性别的音频,以及不同性别不同频率对应的样例音频,例如,选取一段特定的音频“是否选择本段语音为人物角色的配音?”,并分别提供男声高音音频、男声中音音频、男声低音音频、女声高音音频、女声中音音频、女声低音音频等样例音频,当然,在其他实施例中,还可以将音频的频率范围进一步细分,并不局限于本实施例中的高、中、低三种音频范围。此外,所述样例音频还可以是著名配音人员或专业配音人员的音频。The server 1 performs identification processing on the first audio data and the caption data to generate a character list and sample audio parameters. The generation of the role list may first select a predetermined number of timestamps, such as selecting three timestamps, and then separately identifying and analyzing the audio data in the three timestamps, and similarly sounding each timestamp. The voices are classified into one type, namely, role 1, role 2, and the like. The specific classification method can distinguish statistics according to the audio spectrum. The sample audio is a preset fixed audio, which can be audio of different genders, and sample audio corresponding to different frequencies of different genders. For example, select a specific audio "Do you choose this voice as the voiceover of the character?" It also provides sample audio such as male treble audio, male vocal audio, male woofer audio, female treble audio, female vocal audio, female woofer audio, etc. Of course, in other embodiments, the frequency range of the audio can be further fined. The points are not limited to the high, medium and low audio ranges in the embodiment. In addition, the sample audio can also be audio of a famous dubbing person or a professional dubbing person.
合成处理模块30,用于将所述角色列表和样例音频参数发送至所述电视终端,并在接收到所述电视终端根据所述角色列表和样例音频参数反馈的用户设置参数时,将所述第一音频数据合成为第二音频数据;a synthesis processing module 30, configured to send the role list and sample audio parameters to the television terminal, and when receiving the user setting parameters fed back by the television terminal according to the role list and the sample audio parameters, The first audio data is synthesized into second audio data;
本实施例中,服务器1将生成的所述角色列表和样例音频参数发送至所述电视终端,所述电视终端在电视屏幕上呈现出用户界面,以供用户在用户界面上输入并选择角色列表和样例音频参数,从而生成所述用户设置参数,然后所述电视终端将所述用户设置参数反馈给所述服务器1。所述服务器1根据所述用户设置参数将所述第一音频数据合成为第二音频数据,其中,所述第二音频数据的合成过程,需要用到文本语音引擎和人声消除程序,具体可以根据所述文本语音引擎产生对应所述字幕数据的新的音频数据(具体根据用户设置参数而不同),并根据所述人声消除程序将所述第一音频数据进行人声消除,然后将所述新的音频数据与经人声消除的第一音频数据合成为第二音频数据。In this embodiment, the server 1 sends the generated role list and sample audio parameters to the television terminal, and the television terminal presents a user interface on the television screen for the user to input and select a role on the user interface. The list and sample audio parameters are generated to generate the user setting parameters, and then the television terminal feeds back the user setting parameters to the server 1. The server 1 synthesizes the first audio data into the second audio data according to the user setting parameter, wherein the synthesis process of the second audio data requires a text speech engine and a vocal cancellation program, specifically Generating new audio data corresponding to the subtitle data according to the text speech engine (specifically according to user setting parameters), and performing vocal cancellation on the first audio data according to the vocal cancellation program, and then The new audio data and the vocal-removed first audio data are combined into the second audio data.
第一发送模块40,用于将所述第二音频数据发送至所述电视终端,以控制所述第二音频数据以及所述字幕数据在所述电视终端进行播放。The first sending module 40 is configured to send the second audio data to the television terminal to control the second audio data and the caption data to be played in the television terminal.
本实施例中,所述服务器1将合成的与所述角色列表对应的所述第二音频数据,发送至所述电视终端。可以理解的是,所述电视终端除了从所述视频文件中提取出音频数据、字幕数据外,还会从所述视频文件中提取出视频数据,此时,电视终端在接收到所述第二音频数据时,会将所述视频数据与所述第二音频数据以及字幕数据进行同步处理,最后进行播放。In this embodiment, the server 1 sends the synthesized second audio data corresponding to the role list to the television terminal. It can be understood that, in addition to extracting audio data and subtitle data from the video file, the television terminal extracts video data from the video file, and at this time, the television terminal receives the second In the case of audio data, the video data is synchronized with the second audio data and the subtitle data, and finally played.
本发明提供的服务器1,首先通过接收电视终端发送的第一音频数据和字幕数据,并进行识别处理,以生成角色列表和样例音频参数,然后将所述角色列表和样例音频参数发送至所述电视终端,在接收到所述电视终端反馈的用户设置参数时,根据所述用户设置参数将所述第一音频数据合成为第二音频数据,最终将所述第二音频数据发送至所述电视终端,以控制所述第二音频数据以及所述字幕数据在所述电视终端进行播放。这样,可以根据不同用户的语言需求,对应提供可被用户理解的音频,还可以满足用户对人物对白的个性化要求,从而可以避免只能借助字幕来了解人物对白及剧情的缺陷,进而提高用户观看电视的体验感。The server 1 provided by the present invention first receives the first audio data and the caption data sent by the television terminal, and performs recognition processing to generate a character list and sample audio parameters, and then sends the role list and the sample audio parameters to The television terminal, when receiving the user setting parameter fed back by the television terminal, synthesizing the first audio data into second audio data according to the user setting parameter, and finally transmitting the second audio data to the The television terminal controls the second audio data and the subtitle data to be played at the television terminal. In this way, according to the language requirements of different users, the audio that can be understood by the user can be provided correspondingly, and the user's personalized requirement for the character dialogue can be satisfied, thereby avoiding the defect of the character dialogue and the plot only by using the subtitle, thereby improving the user. Watch the experience of TV.
在一实施例中,如图9所示,在上述图8所示的基础上,所述生成处理模块20包括:In an embodiment, as shown in FIG. 9, on the basis of the foregoing FIG. 8, the generation processing module 20 includes:
获取单元201,用于从所述字幕数据中提取出字幕时间戳;An obtaining unit 201, configured to extract a subtitle timestamp from the subtitle data;
查找单元202,用于根据所述字幕时间戳,查找出所述第一音频数据出现的时间片段;The searching unit 202 is configured to search, according to the subtitle timestamp, a time segment in which the first audio data appears;
本实施例中,参照图3,服务器1从所述字幕数据中提取出字幕时间戳,并根据所述字幕时间戳查找出角色配音出现的时间片段,并调用语音识别模块进行识别处理,统计出所述时间片段内出现频次较高的若干个音频数据供用户进行选择。In this embodiment, referring to FIG. 3, the server 1 extracts a subtitle timestamp from the subtitle data, and finds a time segment in which the character dubbing appears according to the subtitle timestamp, and calls the speech recognition module to perform recognition processing, and statistics are performed. A plurality of audio data having a higher frequency appear in the time segment for the user to select.
可以理解的是,电视配音中存在很多个角色,而主要人物角色的配音通常较多,而那些出现频次较低的配音可能也较多,如果都由用户来选择,则会增加用户的操作负担。Understandably, there are many roles in TV dubbing, and the main characters are usually more dubbed, and those with lower frequency may have more dubbing. If they are selected by the user, the user's operation burden will increase. .
归类单元203,用于对所述时间片段内的所述第一音频数据进行频谱分析,并进行归类生成角色列表;The categorizing unit 203 is configured to perform spectrum analysis on the first audio data in the time segment, and perform categorization to generate a role list.
本实施例中,通过对所述时间片段内的第一音频数据进行频谱分析,利用频谱范围及频谱幅度,找出频谱接近的音频,并归为同一类而生成角色列表。In this embodiment, by performing spectrum analysis on the first audio data in the time segment, using the spectrum range and the spectrum amplitude, finding the audio with the spectrum close, and classifying the same type to generate a role list.
在一可选实施例中,参照图10,所述归类单元203包括:In an optional embodiment, referring to FIG. 10, the categorizing unit 203 includes:
获取子单元2031,用于分别获取第一时间片段和第二时间片段内的第一音频数据;The obtaining subunit 2031 is configured to respectively acquire first audio data in the first time segment and the second time segment;
判断子单元2032,用于判断所述第一时间片段和第二时间片段内的第一音频数据的频谱范围及频谱幅度是否一致;a determining subunit 2032, configured to determine whether a spectrum range and a spectrum amplitude of the first audio data in the first time segment and the second time segment are consistent;
本实施例中,以两个时间片段如第一时间片段和第二时间片段为例,服务器分别获取第一时间片段和第二时间片段内的第一音频数据,并分别对第一时间片段和第二时间片段内的第一音频数据的频谱范围及频谱幅度进行分析,判断第一时间片段内的第一音频数据的频谱范围及频谱幅度是否与第二时间片段内的第一音频数据的频谱范围及频谱幅度一致。In this embodiment, taking two time segments, such as a first time segment and a second time segment, as an example, the server respectively acquires first audio data in the first time segment and the second time segment, and separately processes the first time segment and A spectrum range and a spectral amplitude of the first audio data in the second time segment are analyzed, and determining whether a spectrum range and a spectrum amplitude of the first audio data in the first time segment are different from a spectrum of the first audio data in the second time segment The range and spectrum are consistent.
第一归类子单元2033,用于在判断所述第一时间片段和第二时间片段内的第一音频数据的频谱范围及频谱幅度一致时,则将所述第一时间片段和第二时间片段内的第一音频数据归类为同一角色;a first categorization sub-unit 2033, configured to: when determining a spectrum range and a spectrum amplitude of the first audio data in the first time segment and the second time segment, the first time segment and the second time The first audio data within the segment is classified into the same role;
第二归类子单元2034,用于在判断所述第一时间片段和第二时间片段内的第一音频数据的频谱范围和/或频谱幅度不一致时,则将所述第一时间片段和第二时间片段内的第一音频数据归类为不同角色。a second collation sub-unit 2034, configured to: when determining that a spectrum range and/or a spectrum amplitude of the first audio data in the first time segment and the second time segment are inconsistent, then the first time segment and the first time segment The first audio data within the two time segments is classified into different roles.
生成单元204,用于利用语音合成技术,生成与所述角色列表对应的样例音频参数。The generating unit 204 is configured to generate a sample audio parameter corresponding to the character list by using a voice synthesis technology.
在一实施例中,如图11所示,在上述图9所示的基础上,所述生成单元204包括:In an embodiment, as shown in FIG. 11, on the basis of the foregoing FIG. 9, the generating unit 204 includes:
提取子单元2041,用于针对所述角色列表中的每个角色,从所述字幕数据中提取出预定数量的字幕时间戳;An extracting subunit 2041, configured to extract, from each of the roles in the role list, a predetermined number of subtitle timestamps from the subtitle data;
生成子单元2042,用于通过文本语音引擎,对应所述预定数量的字幕时间戳生成预定数量的样例音频参数,以发至所述电视终端进行预览选择。The generating subunit 2042 is configured to generate, by the text speech engine, a predetermined number of sample audio parameters corresponding to the predetermined number of subtitle timestamps, to send to the television terminal for preview selection.
在一实施例中,如图12所示,在上述图8所示的基础上,所述合成处理模块30包括:In an embodiment, as shown in FIG. 12, on the basis of the foregoing FIG. 8, the synthesis processing module 30 includes:
发送单元301,用于将生成的所述角色列表和样例音频参数发送至所述电视终端;The sending unit 301 is configured to send the generated role list and sample audio parameters to the television terminal;
接收单元302,用于接收所述电视终端根据所述角色列表和样例音频参数反馈的用户设置参数;The receiving unit 302 is configured to receive user setting parameters that are feedback by the television terminal according to the role list and the sample audio parameters.
本实施例中,服务器1将生成的所述角色列表和样例音频参数发送至所述电视终端,电视终端通过在电视屏幕上呈现出用户界面,以供用户在用户界面上输入并选择角色列表和样例音频参数,从而生成所述用户设置参数,然后所述电视终端将所述用户设置参数反馈给所述服务器1。In this embodiment, the server 1 sends the generated role list and sample audio parameters to the television terminal, and the television terminal presents a user interface on the television screen for the user to input and select a role list on the user interface. And the sample audio parameters to generate the user setting parameters, and then the television terminal feeds back the user setting parameters to the server 1.
合成单元303,用于对所述第一音频数据进行音频过滤,通过文本语音引擎并结合所述用户设置参数,合成与所述角色列表对应的所述第二音频数据。The synthesizing unit 303 is configured to perform audio filtering on the first audio data, and synthesize the second audio data corresponding to the role list by using a text speech engine and combining the user setting parameters.
本实施例中,参照图7,具体可以根据所述文本语音引擎产生对应所述字幕数据的新的音频数据(具体根据用户的设置参数而不同),并根据所述人声消除程序将所述第一音频数据进行人声消除,然后将所述新的音频数据与经人声消除的第一音频数据合成为与所述角色列表对应的第二音频数据。In this embodiment, referring to FIG. 7 , specifically, the new audio data corresponding to the subtitle data may be generated according to the text speech engine (specifically according to a setting parameter of the user), and according to the vocal elimination program, The first audio data is subjected to vocal cancellation, and then the new audio data and the vocal-removed first audio data are synthesized into second audio data corresponding to the character list.
其中,现有人声消除方法主要是利用左右两个声道中人声发音相同的特点,将左右两个声道进行相减,从而去除两个声道中相同的部分,但这种方法不仅对背景声造成较大的损失(特别在低频部分),而且在人声发音在两个声道中不相同时,无法很好地消除人声。本申请采用带通滤波器的方法,在带通滤波器的频带范围内,只需达到降低原发音的幅度,不影响合成音频的辨别即可,从而可以较好地保留低频及高频部分。此外,对时间戳范围外的音频数据也没有造成任何影响。Among them, the existing vocal elimination method mainly uses the same vocal pronunciation in the left and right channels, and subtracts the left and right channels to remove the same part of the two channels, but this method is not only The background sound causes a large loss (especially in the low frequency part), and when the vocal pronunciation is different in the two channels, the vocal is not well eliminated. In the present application, a bandpass filter is used, and in the frequency band of the bandpass filter, only the amplitude of the original pronunciation is reduced, and the discrimination of the synthesized audio is not affected, so that the low frequency and high frequency portions can be better preserved. . In addition, there is no impact on the audio data outside the timestamp range.
本发明还提供一种电视播放控制系统100,参照图13,在一实施例中,所述电视播放控制系统100包括电视终端2以及如上所述的服务器1,参照图14,所述电视终端2包括:The present invention also provides a television broadcast control system 100. Referring to FIG. 13, in an embodiment, the television broadcast control system 100 includes a television terminal 2 and a server 1 as described above. Referring to FIG. 14, the television terminal 2 include:
第二发送模块50,用于向服务器1发送第一音频数据和字幕数据;a second sending module 50, configured to send first audio data and caption data to the server 1;
第二接收模块60,用于接收所述服务器1对所述第一音频数据和字幕数据进行识别处理后,生成的角色列表和样例音频参数;The second receiving module 60 is configured to receive a role list and sample audio parameters generated by the server 1 after the first audio data and the caption data are identified and processed;
反馈模块70,用于根据所述角色列表和样例音频参数生成用户设置参数,并将所述用户设置参数反馈给所述服务器1;The feedback module 70 is configured to generate a user setting parameter according to the role list and the sample audio parameter, and feed back the user setting parameter to the server 1;
获取模块80,用于获取所述服务器1在接收到所述用户设置参数时,将所述第一音频数据合成的第二音频数据;The obtaining module 80 is configured to acquire second audio data that is synthesized by the server 1 when the user setting parameter is received, where the first audio data is synthesized;
同步播放模块90,用于将所述第二音频数据、视频数据以及字幕数据进行同步播放。The synchronous play module 90 is configured to synchronously play the second audio data, the video data, and the caption data.
本实施例中,电视终端2在接收到所述服务器1合成的与所述角色列表对应的所述第二音频数据时,将所述第二音频数据与视频数据以及字幕数据进行同步处理后,最终进行播放,这样,通过服务器1对视频文件的音频进行预处理,合成用户可以理解的语言,可以增强用户的观看体验;此外,还可以为用户提供多种角色音频的选择,从而进一步增强了用户体验感。In this embodiment, when receiving the second audio data corresponding to the role list synthesized by the server 1, the television terminal 2 synchronizes the second audio data with the video data and the caption data. Finally, the playback is performed, so that the audio of the video file is pre-processed by the server 1 to synthesize a language that the user can understand, which can enhance the user's viewing experience; in addition, the user can also provide various character audio selections, thereby further enhancing the user's viewing experience. User experience.
本实施例原理,请参照上述各实施例,在此不再赘述。For the principle of the embodiment, please refer to the foregoing embodiments, and details are not described herein again.
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。The above are only the preferred embodiments of the present invention, and are not intended to limit the scope of the invention, and the equivalent structure or equivalent process transformations made by the description of the present invention and the drawings are directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of the present invention.

Claims (14)

  1. 一种电视播放控制方法,其特征在于,所述电视播放控制方法包括以下步骤:A television broadcast control method, characterized in that the television broadcast control method comprises the following steps:
    服务器接收电视终端发送的第一音频数据和字幕数据; Receiving, by the server, first audio data and subtitle data sent by the television terminal;
    对所述第一音频数据和字幕数据进行识别处理,生成角色列表和样例音频参数; Performing recognition processing on the first audio data and the caption data to generate a character list and sample audio parameters;
    将所述角色列表和样例音频参数发送至所述电视终端,并在接收到所述电视终端根据所述角色列表和样例音频参数反馈的用户设置参数时,将所述第一音频数据合成为第二音频数据;Transmitting the role list and sample audio parameters to the television terminal, and synthesizing the first audio data upon receiving a user setting parameter fed back by the television terminal according to the character list and sample audio parameters Is the second audio data;
    将所述第二音频数据发送至所述电视终端,以控制所述第二音频数据以及所述字幕数据在所述电视终端进行播放;Transmitting the second audio data to the television terminal to control the second audio data and the subtitle data to be played at the television terminal;
    其中,所述对所述第一音频数据和字幕数据进行识别处理,生成角色列表和样例音频参数的步骤包括:The step of performing the identification process on the first audio data and the caption data to generate a role list and sample audio parameters includes:
    所述服务器从所述字幕数据中提取出字幕时间戳;The server extracts a subtitle timestamp from the subtitle data;
    根据所述字幕时间戳,查找出所述第一音频数据出现的时间片段;And finding, according to the subtitle timestamp, a time segment in which the first audio data appears;
    对所述时间片段内的所述第一音频数据进行频谱分析,并进行归类生成角色列表;Performing spectrum analysis on the first audio data in the time segment, and performing categorization to generate a role list;
    利用语音合成技术,生成与所述角色列表对应的样例音频参数;Generating a sample audio parameter corresponding to the character list by using a speech synthesis technology;
    其中,所述电视终端从视频文件中提取出所述第一音频数据和所述字幕数据,并将所述第一音频数据和字幕数据发送至所述服务器。The television terminal extracts the first audio data and the subtitle data from a video file, and sends the first audio data and subtitle data to the server.
  2. 如权利要求1所述的电视播放控制方法,其特征在于,所述利用语音合成技术,生成与所述角色列表对应的样例音频参数的步骤包括:The television broadcast control method according to claim 1, wherein the step of generating a sample audio parameter corresponding to the character list by using a speech synthesis technology comprises:
    针对所述角色列表中的每个角色,从所述字幕数据中提取出预定数量的字幕时间戳;Extracting a predetermined number of subtitle timestamps from the subtitle data for each character in the role list;
    通过文本语音引擎,对应所述预定数量的字幕时间戳生成预定数量的样例音频参数,以发至所述电视终端进行预览选择。A predetermined number of sample audio parameters are generated by the text-to-speech engine corresponding to the predetermined number of subtitle timestamps for transmission to the television terminal for preview selection.
  3. 如权利要求1所述的电视播放控制方法,其特征在于,所述将所述角色列表和样例音频参数发送至所述电视终端,并在接收到所述电视终端根据所述角色列表和样例音频参数反馈的用户设置参数时,将所述第一音频数据合成为第二音频数据的步骤包括:The television broadcast control method according to claim 1, wherein said character list and sample audio parameters are transmitted to said television terminal, and said television terminal is received according to said character list and sample When the user sets the parameters of the audio parameter feedback, the step of synthesizing the first audio data into the second audio data includes:
    将生成的所述角色列表和样例音频参数发送至所述电视终端;Sending the generated role list and sample audio parameters to the television terminal;
    接收所述电视终端根据所述角色列表和样例音频参数反馈的用户设置参数;Receiving user setting parameters that are feedback by the television terminal according to the role list and the sample audio parameters;
    对所述第一音频数据进行音频过滤,通过文本语音引擎并结合所述用户设置参数,合成与所述角色列表对应的所述第二音频数据;Performing audio filtering on the first audio data, synthesizing the second audio data corresponding to the role list by using a text speech engine and combining the user setting parameters;
    其中,所述电视终端接收用户通过用户界面选择的角色列表和样例音频参数,以生成所述用户设置参数,并将所述用户设置参数反馈给所述服务器。The television terminal receives a role list and a sample audio parameter selected by the user through the user interface to generate the user setting parameter, and feeds back the user setting parameter to the server.
  4. 如权利要求1所述的电视播放控制方法,其特征在于,所述对所述时间片段内的所述第一音频数据进行频谱分析,并进行归类生成角色列表的步骤包括:The television broadcast control method according to claim 1, wherein the step of performing spectrum analysis on the first audio data in the time segment and performing categorization to generate a role list comprises:
    分别获取第一时间片段和第二时间片段内的第一音频数据;Acquiring first audio data in the first time segment and the second time segment, respectively;
    判断所述第一时间片段和第二时间片段内的第一音频数据的频谱范围及频谱幅度是否一致;Determining whether a spectrum range and a spectrum amplitude of the first audio data in the first time segment and the second time segment are consistent;
    若是,则将所述第一时间片段和第二时间片段内的第一音频数据归类为同一角色;If yes, classifying the first audio data in the first time segment and the second time segment into the same role;
    若否,则将所述第一时间片段和第二时间片段内的第一音频数据归类为不同角色。If not, the first audio data in the first time segment and the second time segment are classified into different roles.
  5. 一种服务器,其特征在于,所述服务器包括:A server, wherein the server comprises:
    第一接收模块,用于接收电视终端发送的第一音频数据和字幕数据; a first receiving module, configured to receive first audio data and subtitle data sent by the television terminal;
    生成处理模块,用于对所述第一音频数据和字幕数据进行识别处理,生成角色列表和样例音频参数; a generating processing module, configured to perform recognition processing on the first audio data and the caption data, and generate a role list and sample audio parameters;
    合成处理模块,用于将所述角色列表和样例音频参数发送至所述电视终端,并在接收到所述电视终端根据所述角色列表和样例音频参数反馈的用户设置参数时,将所述第一音频数据合成为第二音频数据;a synthesis processing module, configured to send the role list and sample audio parameters to the television terminal, and when receiving the user setting parameter that is reported by the television terminal according to the role list and the sample audio parameter, The first audio data is synthesized into the second audio data;
    第一发送模块,用于将所述第二音频数据发送至所述电视终端,以控制所述第二音频数据以及所述字幕数据在所述电视终端进行播放。And a first sending module, configured to send the second audio data to the television terminal to control the second audio data and the subtitle data to be played at the television terminal.
  6. 如权利要求5所述的服务器,其特征在于,所述生成处理模块包括:The server according to claim 5, wherein the generation processing module comprises:
    获取单元,用于从所述字幕数据中提取出字幕时间戳;An obtaining unit, configured to extract a subtitle timestamp from the subtitle data;
    查找单元,用于根据所述字幕时间戳,查找出所述第一音频数据出现的时间片段;a searching unit, configured to find a time segment in which the first audio data appears according to the subtitle time stamp;
    归类单元,用于对所述时间片段内的所述第一音频数据进行频谱分析,并进行归类生成角色列表;a categorizing unit, configured to perform spectrum analysis on the first audio data in the time segment, and perform categorization to generate a role list;
    生成单元,用于利用语音合成技术,生成与所述角色列表对应的样例音频参数;a generating unit, configured to generate a sample audio parameter corresponding to the role list by using a voice synthesis technology;
    其中,所述电视终端从视频文件中提取出所述第一音频数据和所述字幕数据,并将所述第一音频数据和字幕数据发送至所述服务器。The television terminal extracts the first audio data and the subtitle data from a video file, and sends the first audio data and subtitle data to the server.
  7. 如权利要求6所述的服务器,其特征在于,所述生成单元包括:The server according to claim 6, wherein the generating unit comprises:
    提取子单元,用于针对所述角色列表中的每个角色,从所述字幕数据中提取出预定数量的字幕时间戳;Extracting a subunit, configured to extract a predetermined number of subtitle timestamps from the subtitle data for each role in the role list;
    生成子单元,用于通过文本语音引擎,对应所述预定数量的字幕时间戳生成预定数量的样例音频参数,以发至所述电视终端进行预览选择。And generating a subunit, configured to generate, by the text and speech engine, a predetermined number of sample audio parameters corresponding to the predetermined number of subtitle timestamps, to send to the television terminal for preview selection.
  8. 如权利要求6所述的服务器,其特征在于,所述合成处理模块包括:The server according to claim 6, wherein said synthesis processing module comprises:
    发送单元,用于将生成的所述角色列表和样例音频参数发送至所述电视终端;a sending unit, configured to send the generated role list and sample audio parameters to the television terminal;
    接收单元,用于接收所述电视终端根据所述角色列表和样例音频参数反馈的用户设置参数;a receiving unit, configured to receive user setting parameters that are feedback by the television terminal according to the role list and sample audio parameters;
    合成单元,用于对所述第一音频数据进行音频过滤,通过文本语音引擎并结合所述用户设置参数,合成与所述角色列表对应的所述第二音频数据;a synthesizing unit, configured to perform audio filtering on the first audio data, and synthesize the second audio data corresponding to the role list by using a text speech engine and combining the user setting parameters;
    其中,所述电视终端接收用户通过用户界面选择的角色列表和样例音频参数,以生成所述用户设置参数,并将所述用户设置参数反馈给所述服务器。The television terminal receives a role list and a sample audio parameter selected by the user through the user interface to generate the user setting parameter, and feeds back the user setting parameter to the server.
  9. 如权利要求6所述的服务器,其特征在于,所述归类单元包括:The server according to claim 6, wherein said categorizing unit comprises:
    获取子单元,用于分别获取第一时间片段和第二时间片段内的第一音频数据;Obtaining a subunit, configured to respectively acquire first audio data in the first time segment and the second time segment;
    判断子单元,用于判断所述第一时间片段和第二时间片段内的第一音频数据的频谱范围及频谱幅度是否一致;a determining subunit, configured to determine whether a spectrum range and a spectrum amplitude of the first audio data in the first time segment and the second time segment are consistent;
    第一归类子单元,用于在判断所述第一时间片段和第二时间片段内的第一音频数据的频谱范围及频谱幅度一致时,将所述第一时间片段和第二时间片段内的第一音频数据归类为同一角色;a first categorization subunit, configured to: when determining a spectrum range and a spectrum amplitude of the first audio data in the first time segment and the second time segment, the first time segment and the second time segment The first audio data is classified into the same role;
    第二归类子单元,用于在判断所述第一时间片段和第二时间片段内的第一音频数据的频谱范围和/或频谱幅度不一致时,将所述第一时间片段和第二时间片段内的第一音频数据归类为不同角色。a second categorization subunit, configured to: when determining that a spectrum range and/or a spectrum amplitude of the first audio data in the first time segment and the second time segment are inconsistent, the first time segment and the second time The first audio data within the segment is classified into different roles.
  10. 一种电视播放控制系统,其特征在于,所述电视播放控制系统包括电视终端以及服务器,所述电视终端包括:A television broadcast control system, comprising: a television terminal and a server, the television terminal comprising:
    第二发送模块,用于向服务器发送第一音频数据和字幕数据;a second sending module, configured to send first audio data and subtitle data to the server;
    第二接收模块,用于接收所述服务器对所述第一音频数据和字幕数据进行识别处理后,生成的角色列表和样例音频参数;a second receiving module, configured to receive a role list and sample audio parameters generated by the server after the first audio data and the caption data are identified and processed;
    反馈模块,用于根据所述角色列表和样例音频参数生成用户设置参数,并将所述用户设置参数反馈给所述服务器;a feedback module, configured to generate a user setting parameter according to the role list and the sample audio parameter, and feed back the user setting parameter to the server;
    获取模块,用于获取所述服务器在接收到所述用户设置参数时,将所述第一音频数据合成的第二字幕数据;An acquiring module, configured to acquire second subtitle data that is synthesized by the server when the user setting parameter is received, where the first audio data is synthesized;
    同步播放模块,用于将所述第二音频数据、视频数据以及字幕数据进行同步播放;a synchronous play module, configured to synchronously play the second audio data, the video data, and the caption data;
    其中,所述电视终端从视频文件中提取出所述视频数据、所述第一音频数据以及所述字幕数据;The television terminal extracts the video data, the first audio data, and the caption data from a video file;
    所述服务器包括:The server includes:
    第一接收模块,用于接收电视终端发送的第一音频数据和字幕数据; a first receiving module, configured to receive first audio data and subtitle data sent by the television terminal;
    生成处理模块,用于对所述第一音频数据和字幕数据进行识别处理,生成角色列表和样例音频参数; a generating processing module, configured to perform recognition processing on the first audio data and the caption data, and generate a role list and sample audio parameters;
    合成处理模块,用于将所述角色列表和样例音频参数发送至所述电视终端,并在接收到所述电视终端根据所述角色列表和样例音频参数反馈的用户设置参数时,将所述第一音频数据合成为第二音频数据;a synthesis processing module, configured to send the role list and sample audio parameters to the television terminal, and when receiving the user setting parameter that is reported by the television terminal according to the role list and the sample audio parameter, The first audio data is synthesized into the second audio data;
    第一发送模块,用于将所述第二音频数据发送至所述电视终端,以控制所述第二音频数据以及所述字幕数据在所述电视终端进行播放。And a first sending module, configured to send the second audio data to the television terminal to control the second audio data and the subtitle data to be played at the television terminal.
  11. 根据权利要求10所述的电视播放控制系统,其特征在于,所述生成处理模块包括:The television broadcast control system according to claim 10, wherein the generating processing module comprises:
    获取单元,用于从所述字幕数据中提取出字幕时间戳;An obtaining unit, configured to extract a subtitle timestamp from the subtitle data;
    查找单元,用于根据所述字幕时间戳,查找出所述第一音频数据出现的时间片段;a searching unit, configured to find a time segment in which the first audio data appears according to the subtitle time stamp;
    归类单元,用于对所述时间片段内的所述第一音频数据进行频谱分析,并进行归类生成角色列表;a categorizing unit, configured to perform spectrum analysis on the first audio data in the time segment, and perform categorization to generate a role list;
    生成单元,用于利用语音合成技术,生成与所述角色列表对应的样例音频参数。And a generating unit, configured to generate a sample audio parameter corresponding to the character list by using a speech synthesis technology.
  12. 根据权利要求11所述的电视播放控制系统,其特征在于,所述生成单元包括:The television broadcast control system according to claim 11, wherein the generating unit comprises:
    提取子单元,用于针对所述角色列表中的每个角色,从所述字幕数据中提取出预定数量的字幕时间戳;Extracting a subunit, configured to extract a predetermined number of subtitle timestamps from the subtitle data for each role in the role list;
    生成子单元,用于通过文本语音引擎,对应所述预定数量的字幕时间戳生成预定数量的样例音频参数,以发至所述电视终端进行预览选择。And generating a subunit, configured to generate, by the text and speech engine, a predetermined number of sample audio parameters corresponding to the predetermined number of subtitle timestamps, to send to the television terminal for preview selection.
  13. 根据权利要求11所述的电视播放控制系统,其特征在于,所述合成处理模块包括:The television broadcast control system according to claim 11, wherein the synthesis processing module comprises:
    发送单元,用于将生成的所述角色列表和样例音频参数发送至所述电视终端;a sending unit, configured to send the generated role list and sample audio parameters to the television terminal;
    接收单元,用于接收所述电视终端根据所述角色列表和样例音频参数反馈的用户设置参数;a receiving unit, configured to receive user setting parameters that are feedback by the television terminal according to the role list and sample audio parameters;
    合成单元,用于对所述第一音频数据进行音频过滤,通过文本语音引擎并结合所述用户设置参数,合成与所述角色列表对应的所述第二音频数据;a synthesizing unit, configured to perform audio filtering on the first audio data, and synthesize the second audio data corresponding to the role list by using a text speech engine and combining the user setting parameters;
    其中,所述电视终端接收用户通过用户界面选择的角色列表和样例音频参数,以生成所述用户设置参数,并将所述用户设置参数反馈给所述服务器。The television terminal receives a role list and a sample audio parameter selected by the user through the user interface to generate the user setting parameter, and feeds back the user setting parameter to the server.
  14. 根据权利要求11所述的电视播放控制系统,其特征在于,所述归类单元包括:The television broadcast control system according to claim 11, wherein the categorizing unit comprises:
    获取子单元,用于分别获取第一时间片段和第二时间片段内的第一音频数据;Obtaining a subunit, configured to respectively acquire first audio data in the first time segment and the second time segment;
    判断子单元,用于判断所述第一时间片段和第二时间片段内的第一音频数据的频谱范围及频谱幅度是否一致;a determining subunit, configured to determine whether a spectrum range and a spectrum amplitude of the first audio data in the first time segment and the second time segment are consistent;
    第一归类子单元,用于在判断所述第一时间片段和第二时间片段内的第一音频数据的频谱范围及频谱幅度一致时,将所述第一时间片段和第二时间片段内的第一音频数据归类为同一角色;a first categorization subunit, configured to: when determining a spectrum range and a spectrum amplitude of the first audio data in the first time segment and the second time segment, the first time segment and the second time segment The first audio data is classified into the same role;
    第二归类子单元,用于在判断所述第一时间片段和第二时间片段内的第一音频数据的频谱范围和/或频谱幅度不一致时,将所述第一时间片段和第二时间片段内的第一音频数据归类为不同角色。a second categorization subunit, configured to: when determining that a spectrum range and/or a spectrum amplitude of the first audio data in the first time segment and the second time segment are inconsistent, the first time segment and the second time The first audio data within the segment is classified into different roles.
PCT/CN2016/084461 2015-09-29 2016-06-02 Television play control method, server and television play control system WO2017054488A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510633934.3 2015-09-29
CN201510633934.3A CN105227966A (en) 2015-09-29 2015-09-29 To televise control method, server and control system of televising

Publications (1)

Publication Number Publication Date
WO2017054488A1 true WO2017054488A1 (en) 2017-04-06

Family

ID=54996603

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/084461 WO2017054488A1 (en) 2015-09-29 2016-06-02 Television play control method, server and television play control system

Country Status (2)

Country Link
CN (1) CN105227966A (en)
WO (1) WO2017054488A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112714348A (en) * 2020-12-28 2021-04-27 深圳市亿联智能有限公司 Intelligent audio and video synchronization method

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105227966A (en) * 2015-09-29 2016-01-06 深圳Tcl新技术有限公司 To televise control method, server and control system of televising
CN107659850B (en) * 2016-11-24 2019-09-17 腾讯科技(北京)有限公司 Media information processing method and device
CN107172449A (en) * 2017-06-19 2017-09-15 微鲸科技有限公司 Multi-medium play method, device and multimedia storage method
CN107484016A (en) * 2017-09-05 2017-12-15 深圳Tcl新技术有限公司 Video dubs switching method, television set and computer-readable recording medium
CN109242802B (en) * 2018-09-28 2021-06-15 Oppo广东移动通信有限公司 Image processing method, image processing device, electronic equipment and computer readable medium
CN110366032B (en) * 2019-08-09 2020-12-15 腾讯科技(深圳)有限公司 Video data processing method and device and video playing method and device
CN113766288B (en) * 2021-08-04 2023-05-23 深圳Tcl新技术有限公司 Electric quantity prompting method, device and computer readable storage medium
CN114554285B (en) * 2022-02-25 2024-08-02 京东方科技集团股份有限公司 Video interpolation processing method, video interpolation processing device and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1774715A (en) * 2003-04-14 2006-05-17 皇家飞利浦电子股份有限公司 System and method for performing automatic dubbing on an audio-visual stream
CN101189657A (en) * 2005-05-31 2008-05-28 皇家飞利浦电子股份有限公司 A method and a device for performing an automatic dubbing on a multimedia signal
CN101359473A (en) * 2007-07-30 2009-02-04 国际商业机器公司 Auto speech conversion method and apparatus
US20120105719A1 (en) * 2010-10-29 2012-05-03 Lsi Corporation Speech substitution of a real-time multimedia presentation
US20120259630A1 (en) * 2011-04-11 2012-10-11 Samsung Electronics Co., Ltd. Display apparatus and voice conversion method thereof
CN105227966A (en) * 2015-09-29 2016-01-06 深圳Tcl新技术有限公司 To televise control method, server and control system of televising

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1534595A (en) * 2003-03-28 2004-10-06 中颖电子(上海)有限公司 Speech sound change over synthesis device and its method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1774715A (en) * 2003-04-14 2006-05-17 皇家飞利浦电子股份有限公司 System and method for performing automatic dubbing on an audio-visual stream
CN101189657A (en) * 2005-05-31 2008-05-28 皇家飞利浦电子股份有限公司 A method and a device for performing an automatic dubbing on a multimedia signal
CN101359473A (en) * 2007-07-30 2009-02-04 国际商业机器公司 Auto speech conversion method and apparatus
US20120105719A1 (en) * 2010-10-29 2012-05-03 Lsi Corporation Speech substitution of a real-time multimedia presentation
US20120259630A1 (en) * 2011-04-11 2012-10-11 Samsung Electronics Co., Ltd. Display apparatus and voice conversion method thereof
CN105227966A (en) * 2015-09-29 2016-01-06 深圳Tcl新技术有限公司 To televise control method, server and control system of televising

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112714348A (en) * 2020-12-28 2021-04-27 深圳市亿联智能有限公司 Intelligent audio and video synchronization method

Also Published As

Publication number Publication date
CN105227966A (en) 2016-01-06

Similar Documents

Publication Publication Date Title
WO2017054488A1 (en) Television play control method, server and television play control system
WO2014107101A1 (en) Display apparatus and method for controlling the same
WO2014003283A1 (en) Display apparatus, method for controlling display apparatus, and interactive system
WO2017143692A1 (en) Smart television and voice control method therefor
WO2018043991A1 (en) Speech recognition method and apparatus based on speaker recognition
WO2017177524A1 (en) Audio and video playing synchronization method and device
WO2017160073A1 (en) Method and device for accelerated playback, transmission and storage of media files
WO2014107097A1 (en) Display apparatus and method for controlling the display apparatus
WO2019080406A1 (en) Television voice interaction method, voice interaction control device and storage medium
WO2018032680A1 (en) Method and system for playing audio and video
WO2018006489A1 (en) Terminal voice interaction method and device
WO2014107102A1 (en) Display apparatus and method of controlling display apparatus
WO2017045441A1 (en) Smart television-based audio playback method and apparatus
WO2016032021A1 (en) Apparatus and method for recognizing voice commands
WO2016091011A1 (en) Subtitle switching method and device
WO2017005066A1 (en) Method and apparatus for recording audio and video synchronization timestamp
WO2019051902A1 (en) Terminal control method, air conditioner and computer-readable storage medium
WO2018028124A1 (en) Television set and signal source switching method thereof
WO2021261830A1 (en) Video quality assessment method and apparatus
WO2019114127A1 (en) Voice output method and device for air conditioner
WO2017020649A1 (en) Audio/video playback control method and device thereof
WO2019085543A1 (en) Television system and television control method
WO2018233221A1 (en) Multi-window sound output method, television, and computer-readable storage medium
WO2017121066A1 (en) Application program display method and system
WO2016095280A1 (en) Karaoke scoring method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16850113

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16850113

Country of ref document: EP

Kind code of ref document: A1