WO2018130173A1 - Dubbing method, terminal device, server and storage medium - Google Patents

Dubbing method, terminal device, server and storage medium Download PDF

Info

Publication number
WO2018130173A1
WO2018130173A1 PCT/CN2018/072201 CN2018072201W WO2018130173A1 WO 2018130173 A1 WO2018130173 A1 WO 2018130173A1 CN 2018072201 W CN2018072201 W CN 2018072201W WO 2018130173 A1 WO2018130173 A1 WO 2018130173A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
information
server
point
recorded
Prior art date
Application number
PCT/CN2018/072201
Other languages
French (fr)
Chinese (zh)
Inventor
李钟伟
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to CN201710029246.5 priority Critical
Priority to CN201710029246.5A priority patent/CN107071512B/en
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018130173A1 publication Critical patent/WO2018130173A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/4104Peripherals receiving signals from specially adapted client devices
    • H04N21/4122Peripherals receiving signals from specially adapted client devices additional display device, e.g. video projector
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/4104Peripherals receiving signals from specially adapted client devices
    • H04N21/4126The peripheral being portable, e.g. PDAs or mobile phones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4396Processing of audio elementary streams by muting the audio signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Abstract

Disclosed are a dubbing method, apparatus and system. The dubbing method comprises: in response to a user instruction, playing back a video; acquiring a video start point and a video termination point in the video, said points being selected by the user; generating, according to the video start point and the video termination point, video information of the video to be dubbed; and sending the video information to a server so that the server generates, according to the video information, a video to be dubbed.

Description

配音方法、终端设备、服务器及存储介质Dubbing method, terminal device, server and storage medium
本申请要求于2017年1月16日提交中国专利局、申请号为201710029246.5、发明名称为“配音方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application, filed Jan.
技术领域Technical field
本申请涉及视频处理领域,尤其涉及一种配音方法、终端设备、服务器及存储介质。The present application relates to the field of video processing, and in particular, to a dubbing method, a terminal device, a server, and a storage medium.
背景background
目前,一些配音软件可以提供用户配音功能,即接收用户针对用户选定的待配音视频提交的音频数据,从而生成用户配音的视频。待配音视频一般由配音软件运营方提供,供用户从中选择感兴趣的。一些配音软件也允许用户上传自行拍摄的视频文件作为待配音视频。At present, some dubbing software can provide user dubbing function, that is, receiving audio data submitted by the user for the user-selected to-be-recorded video, thereby generating a video of the user dubbing. The video to be dubbed is generally provided by the dubbing software operator for the user to select from which to be interested. Some dubbing software also allows users to upload self-portrait video files as to-be-recorded video.
技术内容Technical content
本申请实施例提出了一种配音方法、装置及系统。The embodiment of the present application proposes a dubbing method, device and system.
本申请实施例的一种配音方法可以应用于终端设备,其中,所述方法包括:A voice-over method of the embodiment of the present application can be applied to a terminal device, where the method includes:
响应于用户指令,播放视频;Playing a video in response to a user instruction;
获取用户在所述视频中选择的视频起始点与视频终止点;Obtaining a video starting point and a video ending point selected by the user in the video;
根据视频起始点和视频终止点生成待配音视频的视频信息;Generating video information of the to-be-recorded video according to the video starting point and the video ending point;
将所述视频信息发送至服务器,使得所述服务器根据所述视频信息生成待配音视频。Sending the video information to a server, so that the server generates a to-be-recorded video according to the video information.
一种配音方法,其中,应用于服务器,所述方法包括:A voice-over method, wherein the method is applied to a server, and the method includes:
获取来自终端设备的待配音视频的视频信息,其中,所述视频信息为终端设备根据用户在播放的视频中选择的起始点和视频终止点生成;Obtaining video information of the to-be-recorded video from the terminal device, where the video information is generated by the terminal device according to a starting point and a video termination point selected by the user in the played video;
根据所述视频信息生成待配音视频。And generating a to-be-recorded video according to the video information.
一种终端设备,其中,包括处理器和存储器,所述存储器中存储有计算机可读指令,所述指令可以使所述处理器执行以下操作:A terminal device, comprising a processor and a memory, wherein the memory stores computer readable instructions that cause the processor to:
响应于用户指令,播放视频;Playing a video in response to a user instruction;
获取用户在所述视频中选择的视频起始点与视频终止点;Obtaining a video starting point and a video ending point selected by the user in the video;
根据视频起始点和视频终止点生成待配音视频的视频信息;Generating video information of the to-be-recorded video according to the video starting point and the video ending point;
将所述视频信息发送至服务器,使得所述服务器得到根据所述视频信息生成待配音视频。Sending the video information to a server, so that the server obtains a to-be-recorded video according to the video information.
一种服务器,包括:处理器和存储器,所述存储器中存储有计算机可读指令,所述指令可以使所述处理器执行以下操作:A server comprising: a processor and a memory, the memory storing computer readable instructions, the instructions causing the processor to:
获取来自终端设备的待配音视频的视频信息,其中,所述视频信息为终端设备根据用户在播放的视频中选择的起始点和视频终止点生成;Obtaining video information of the to-be-recorded video from the terminal device, where the video information is generated by the terminal device according to a starting point and a video termination point selected by the user in the played video;
根据所述视频信息生成待配音视频。And generating a to-be-recorded video according to the video information.
本申请实施例还提供了一种非易失性计算机可读存储介质,存储有计算机可读指令,可以使至少一个处理器执行如上所述的方法。The embodiment of the present application further provides a non-transitory computer readable storage medium storing computer readable instructions, which can cause at least one processor to perform the method as described above.
本申请实施例的技术方案可以根据用户指令在终端设备播放的视频中截取用户指定的视频内容,生成待配音视频,丰富了配音系统的素材来源,提高了配音系统的服务能力。The technical solution of the embodiment of the present application can intercept the video content specified by the user in the video played by the terminal device according to the user instruction, generate the audio to be dubbed video, enrich the material source of the dubbing system, and improve the service capability of the dubbing system.
附图简要说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any inventive effort.
图1是本申请实例提供的实施环境的示意图;1 is a schematic diagram of an implementation environment provided by an example of the present application;
图2是本申请实施例提供的服务器集群架构示意图;2 is a schematic diagram of a server cluster architecture provided by an embodiment of the present application;
图3是本申请实施例提供的配音方法的流程图;3 is a flowchart of a dubbing method provided by an embodiment of the present application;
图4A是本申请实施例提供的第一客户端得到待配音视频的方法的流程图;4A is a flowchart of a method for a first client to obtain a to-be-recorded video according to an embodiment of the present application;
图4B是本申请实施例提供的得到待配音视频的方法的流程图;4B is a flowchart of a method for obtaining a to-be-recorded video according to an embodiment of the present application;
图5是本申请实施例提供的视频编辑方法;FIG. 5 is a video editing method provided by an embodiment of the present application;
图6是本申请实施例提供的对视频进行编辑的流程示意图;6 is a schematic flowchart of editing a video according to an embodiment of the present application;
图7A是本申请实施例提供的配音方法流程图;7A is a flowchart of a dubbing method provided by an embodiment of the present application;
图7B是本申请实施例提供的配音方法流程图;7B is a flowchart of a dubbing method provided by an embodiment of the present application;
图8是本申请实施例提供的目标视频生成方法流程图;FIG. 8 is a flowchart of a method for generating a target video according to an embodiment of the present application;
图9是本申请实施例提供的字幕获取方法的流程图;FIG. 9 is a flowchart of a method for acquiring a caption provided by an embodiment of the present application;
图10是本申请实施例提供的语音识别的方法的流程图;10 is a flowchart of a method for voice recognition provided by an embodiment of the present application;
图11是本申请实施例提供的配音装置框图;11 is a block diagram of a dubbing device provided by an embodiment of the present application;
图12是本申请实施例提供的目标视频生成模块的框图;FIG. 12 is a block diagram of a target video generating module according to an embodiment of the present application;
图13是本申请实施例提供的标识生成模块的框图;FIG. 13 is a block diagram of an identity generation module provided by an embodiment of the present application;
图14是本申请实施例提供的终端的结构框图;FIG. 14 is a structural block diagram of a terminal according to an embodiment of the present application;
图15是本申请实施例提供的服务器的结构框图。FIG. 15 is a structural block diagram of a server provided by an embodiment of the present application.
实施方式Implementation
本文所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。The embodiments described herein are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present application without creative efforts are within the scope of the present application.
请参考图1,其示出了本申请一个实施例提供的实施环境的示意图。该实施环境包括:第一终端120、服务器140和第二终端160。Please refer to FIG. 1 , which shows a schematic diagram of an implementation environment provided by an embodiment of the present application. The implementation environment includes a first terminal 120, a server 140, and a second terminal 160.
第一终端120中运行有第一客户端。第一终端120可以是手机、平板电脑、OTT设备、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。OTT设备是连接电视机与互联网的设备,可以使电视机通过OTT设备连接互联网,播放从互联网获得的内容。OTT设备可以包括智能电视机、机顶盒、网络电视盒等。网络电视盒,也叫智能电视盒,是一种连接电视机与互联网的设备。网络电视盒从互联网获取网络节目的数据,提供给电视机展示。The first client 120 runs a first client. The first terminal 120 may be a mobile phone, a tablet computer, an OTT device, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop portable computer, a desktop computer, or the like. The OTT device is a device that connects a TV to the Internet, and allows the TV to connect to the Internet through an OTT device to play content obtained from the Internet. OTT devices may include smart televisions, set top boxes, network television boxes, and the like. The network TV box, also called the smart TV box, is a device that connects the TV to the Internet. The network television box obtains data of the network program from the Internet and provides it to the television set.
服务器140可以是一台服务器,也可以是由若干台服务器组成的服务器集群,或者是一个云计算服务中心。The server 140 can be a server, a server cluster consisting of several servers, or a cloud computing service center.
第二终端160中运行有第二客户端。第二终端160可以是手机、平板电脑、膝上型便携计算机和台式计算机等等。A second client is running in the second terminal 160. The second terminal 160 can be a cell phone, a tablet, a laptop portable computer, a desktop computer, and the like.
服务器140可以通过通信网络分别与第一终端120和第二终端160建立通信连接。该网络可以是无线网络,也可以是有线网络。The server 140 can establish a communication connection with the first terminal 120 and the second terminal 160 through the communication network, respectively. The network can be either a wireless network or a wired network.
在本申请实施例中,第一客户端可以是任何具有用户界面(User Interface,UI) 接口并能够与服务器140通信的客户端。例如,第一客户端可以是视频服务类客户端、有线电视客户端、游戏客户端、浏览器、专用于视频配音的客户端等等。In the embodiment of the present application, the first client may be any client that has a User Interface (UI) interface and is capable of communicating with the server 140. For example, the first client may be a video service client, a cable client, a game client, a browser, a client dedicated to video dubbing, and the like.
在本申请实施例中,第二客户端可以是任何具有用户界面(User Interface,UI)接口并能够与服务器140通信的客户端。例如,第二客户端可以是视频编辑类客户端、社交类应用客户端、即时通信客户端、支付类应用客户端、专用于视频配音的客户端等等。In the embodiment of the present application, the second client may be any client that has a User Interface (UI) interface and is capable of communicating with the server 140. For example, the second client may be a video editing client, a social application client, an instant messaging client, a payment application client, a client dedicated to video dubbing, and the like.
在实际应用中,第一客户端和第二客户端可以是两个具有不同功能的客户端,第一客户端和第二客户端也可以是两个具有相同功能的客户端。相应地,第一终端和第二终端均为终端设备。当该终端设备中运行的客户端用于实现各实施例方法示例中第一客户端侧的功能时,该终端设备即作为第一终端;当该终端设备中运行的客户端用于实现本申请方法示例中第二客户端侧的功能时,该终端设备即作为第二终端。在实际应用中,对于同一个客户端来讲,其可以作为第一客户端,也可以作为第二客户端。对于同一台终端,其可以作为第一终端,也可以作为第二终端。In a practical application, the first client and the second client may be two clients with different functions, and the first client and the second client may also be two clients with the same function. Correspondingly, the first terminal and the second terminal are both terminal devices. When the client running in the terminal device is used to implement the function of the first client side in the method example of each embodiment, the terminal device is used as the first terminal; when the client running in the terminal device is used to implement the application In the example of the second client side of the method example, the terminal device acts as the second terminal. In practical applications, for the same client, it can be used as the first client or as the second client. For the same terminal, it can be used as the first terminal or as the second terminal.
在一个示例中,如图2所示,当后台服务器140为集群架构时,后台服务器140可以包括:通讯服务器142、管理服务器144和视频服务器146。In one example, as shown in FIG. 2, when the background server 140 is a cluster architecture, the background server 140 may include a communication server 142, a management server 144, and a video server 146.
通讯服务器142用于提供与第一客户端和与第二客户端的通讯服务,还用于提供与管理服务器144和视频服务器146的通信服务。The communication server 142 is for providing communication services with the first client and the second client, and for providing communication services with the management server 144 and the video server 146.
管理服务器144用于提供对视频文件以及音频文件进行管理的功能。The management server 144 is used to provide functions for managing video files as well as audio files.
视频服务器146用于提供对视频进行编辑和配音功能。Video server 146 is used to provide editing and dubbing functions for the video.
上述各个服务器之间可通过通信网络建立通信连接。该网络可以是无线网络,也可以是有线网络。A communication connection can be established between the above various servers through a communication network. The network can be either a wireless network or a wired network.
请参考图3,其示出了本申请一个实施例提供的配音方法的流程图。该方法可应用于图1所示实施环境中。该方法可以包括如下步骤。Please refer to FIG. 3, which shows a flowchart of a dubbing method provided by an embodiment of the present application. This method can be applied to the implementation environment shown in FIG. The method can include the following steps.
步骤301,第一客户端响应于用户指令,得到待配音视频。Step 301: The first client obtains a to-be-recorded video in response to a user instruction.
若第一客户端运行于智能电视机、机顶盒等带有遥控器的终端设备上,所述用户指令可以通过按下或长按遥控器上的指定按钮的方式触发,也可以通过遥控器单击或双击指定图标的方式触发;若第一客户端运行于电视机、台式机或便携式计算机等带有按键和屏幕的终端设备之上,所述用户指令可以通过按下或长按指定按钮的方式触发,也可以通过单击或双击指定图标的方式触发;若第一客户端运行于手 机或平板电脑上,所述用户指令也可以通过单击、双击、活动、拖动等手势触发。响应于所述用户指令,第一客户端进入配音模式。请参考图4A,其示出了在所述配音模式下,第一客户端得到待配音视频的方法的流程图。If the first client runs on a terminal device with a remote controller such as a smart TV or a set top box, the user command can be triggered by pressing or long pressing a designated button on the remote controller, or can be clicked through the remote controller. Or triggering by double-clicking on the specified icon; if the first client is running on a terminal device with buttons and screens such as a television, desktop or portable computer, the user command can be pressed or long pressed by a designated button. Triggering can also be triggered by clicking or double-clicking the specified icon; if the first client is running on a mobile phone or tablet, the user command can also be triggered by clicking, double-clicking, moving, dragging, and the like. In response to the user instruction, the first client enters a dubbing mode. Please refer to FIG. 4A, which shows a flowchart of a method for the first client to obtain a video to be dubbed in the dubbing mode.
步骤3011A,获取用户选择的视频标识;Step 3011A: Obtain a video identifier selected by a user.
步骤3012A,获取用户选择的视频起始点与视频终止点;Step 3012A: Obtain a video starting point and a video ending point selected by the user;
步骤3013A,在与所述视频标识对应的视频文件中,拷贝所述视频起始点和视频终止点之间的视频内容,得到待配音视频。Step 3013A: In the video file corresponding to the video identifier, copy the video content between the video starting point and the video ending point to obtain a to-be-recorded video.
在一些实例中,获取待配音视频还可通过如图4B所示的方法得到。该方法可以包括以下步骤。In some examples, obtaining the audio to video may also be obtained by the method as shown in FIG. 4B. The method can include the following steps.
步骤3011B,响应于用户指令,播放视频;Step 3011B, playing a video in response to a user instruction;
步骤3012B,获取用户在所述视频中选择的视频起始点与视频终止点;Step 3012B: Acquire a video starting point and a video termination point selected by the user in the video.
步骤3013B,根据视频起始点和视频终止点生成待配音视频的视频信息;Step 3013B: Generate video information of the to-be-recorded video according to the video starting point and the video ending point;
步骤3014B,将所述视频信息发送至服务器,使得所述服务器根据所述视频信息生成待配音视频。Step 3014B: Send the video information to a server, so that the server generates a to-be-recorded video according to the video information.
本申请实施例的技术方案可以根据用户指令在终端设备播放的视频中截取用户指定的视频内容,生成待配音视频,丰富了配音系统的素材来源,提高了配音系统的服务能力。The technical solution of the embodiment of the present application can intercept the video content specified by the user in the video played by the terminal device according to the user instruction, generate the audio to be dubbed video, enrich the material source of the dubbing system, and improve the service capability of the dubbing system.
各实施例中,播放的视频为终端设备通过互联网获得的视频,例如OTT视频。In various embodiments, the played video is a video obtained by the terminal device via the Internet, such as an OTT video.
在一些实例中,终端设备可以在所述视频中,截取所述视频起始点和视频终止点之间的视频数据,将所述视频数据作为所述视频信息发送至所述服务器,使得所述服务器将所述视频数据存储为所述待配音视频。In some examples, the terminal device may intercept video data between the video start point and the video termination point in the video, and send the video data as the video information to the server, such that the server The video data is stored as the to-be-recorded video.
在一些实例中,终端设备可以将所述视频的视频标识、所述视频起始点的信息与所述视频终止点的信息作为所述视频信息发送至所述服务器,以使所述服务器根据所述视频起始点的信息与所述视频终止点的信息从所述视频标识对应的视频中截取所述待配音视频。In some examples, the terminal device may send a video identifier of the video, information of the video starting point, and information of the video termination point to the server as the video information, so that the server according to the The information of the video start point and the information of the video termination point intercept the to-be-recorded video from the video corresponding to the video identifier.
在一些实例中,所述视频起始点的信息包括所述视频中对应所述视频起始点的第一视频截图,所述视频终止点的信息包括所述视频中对应所述视频终止点的第二视频截图。终端设备可以将所述视频信息发送至服务器,使得所述服务器根据所述第一视频截图和所述第二视频截图在所述视频标识对应的视频中确定所述视频起始 点与所述视频终止点,根据所述视频起始点与所述视频终止点从所述视频中截取所述待配音视频。In some examples, the information of the video starting point includes a first video screenshot of the video corresponding to the video starting point, and the information of the video termination point includes a second of the video corresponding to the video termination point. Video screenshot. The terminal device may send the video information to the server, so that the server determines, according to the first video screenshot and the second video screenshot, that the video starting point and the video are terminated in a video corresponding to the video identifier. Pointing, the to-be-recorded video is intercepted from the video according to the video starting point and the video termination point.
在一些实例中,所述视频起始点的信息包括所述视频中对应所述视频起始点的第一时间,所述视频终止点的信息包括所述视频中对应所述视频终止点的第二时间。终端设备可以将所述视频信息发送至服务器,使得所述服务器根据所述第一时间和所述第二时间从所述视频中截取所述待配音视频。In some examples, the information of the video starting point includes a first time in the video corresponding to the video starting point, and the information of the video ending point includes a second time in the video corresponding to the video ending point. . The terminal device may send the video information to the server, so that the server intercepts the to-be-recorded video from the video according to the first time and the second time.
在一些实例中,终端设备还可通过与服务器交互的方式编辑待配音视频。所述编辑操作包括但不限于画面裁剪、视频裁减、视频增加、消音、配音及图形处理。In some instances, the terminal device can also edit the to-be-recorded video by interacting with the server. The editing operations include, but are not limited to, screen cropping, video clipping, video addition, mute, dubbing, and graphics processing.
在一些实例中,该方法还可以包括:In some examples, the method can also include:
响应于配音指令,生成与所述待配音视频对应的音频文件;Generating an audio file corresponding to the to-be-recorded video in response to the voice-over instruction;
将所述音频文件发送至服务器,使得所述服务器根据对应于所述视频标识的待配音视频和对应于所述视频标识的音频文件生成配音后的视频文件。And transmitting the audio file to a server, so that the server generates a dubbed video file according to the audio to video corresponding to the video identifier and the audio file corresponding to the video identifier.
这里,终端设备可以通过各种带有拾音器的设备获取用户输入的音频,生成音频文件。这些带有拾音器的设备可以包括麦克风、带有麦克风的遥控器、手机,等。终端设备可以利用有线连接或无线连接(例如红外、蓝牙、Wi-Fi等)与带有拾音器的设备进行通信。Here, the terminal device can acquire audio input by the user through various devices with pickups to generate an audio file. These devices with pickups can include a microphone, a remote control with a microphone, a cell phone, and the like. The terminal device can communicate with the device with the pickup using a wired connection or a wireless connection (eg, infrared, Bluetooth, Wi-Fi, etc.).
步骤302,第一客户端将所述待配音视频发送至服务器。Step 302: The first client sends the to-be-recorded video to the server.
进一步地,所述第一客户端在将所述待配音视频发送至服务器之前,还可以在本地保存所述待配音视频。Further, the first client may also save the to-be-recorded video locally before sending the to-be-recorded video to the server.
步骤303,服务器获取所述待配音视频,服务器根据所述待配音视频生成目标视频。Step 303: The server acquires the to-be-recorded video, and the server generates a target video according to the to-be-dubbed video.
具体地,若所述待配音视频符合目标视频的相关限定,则所述待配音视频可以直接作为目标视频;若所述待配音视频不符合目标视频的相关限定,则对所述待配音视频进行编辑后生成目标视频。所述目标视频的相关限定包括但不限于所述目标视频中无音频数据。Specifically, if the to-be-dubbed video meets the relevant definition of the target video, the to-be-dubbed video may directly serve as the target video; if the to-be-matched audio-video does not meet the relevant definition of the target video, then the to-be-recorded video is performed. The target video is generated after editing. The relevant definition of the target video includes, but is not limited to, no audio data in the target video.
步骤304,服务器生成与所述目标视频对应的管理标识,并得到与所述管理标识对应的交互标识。Step 304: The server generates a management identifier corresponding to the target video, and obtains an interaction identifier corresponding to the management identifier.
具体地,所述管理标识可以为用于标识目标视频的ID(identification)号或key值(键值)。所有与所述目标视频相关的音频文件和视频文件均具有同样的管理标识, 服务器根据所述管理标识对视频文件和/或音频文件进行管理。Specifically, the management identifier may be an ID number or a key value (key value) for identifying the target video. All audio files and video files associated with the target video have the same management identity, and the server manages the video files and/or audio files according to the management identity.
所述交互标识用于使得第二客户端能够获取服务器生成的目标视频以及所述管理标识;所述交互标识可以与所述管理标识相同,也可以与所述管理标识不同。所述交互标识根据所述管理标识生成,所述交互标识包括但不限于网址、二维码、条形码以及它们的组合等形式。The interaction identifier is configured to enable the second client to obtain the target video generated by the server and the management identifier; the interaction identifier may be the same as the management identifier, or may be different from the management identifier. The interaction identifier is generated according to the management identifier, and the interaction identifier includes, but is not limited to, a web address, a two-dimensional code, a barcode, and a combination thereof.
本申请的一个实施例中,所述交互标识包括与管理标识对应的网址以及以二维码形式表示的所述网址。所述网址所在位置下,存储有所述目标视频以及所述管理标识。In an embodiment of the present application, the interaction identifier includes a web address corresponding to the management identifier and the web address represented by a two-dimensional code. The target video and the management identifier are stored under the location of the web address.
步骤305,服务器将所述交互标识发送至第一客户端。Step 305: The server sends the interaction identifier to the first client.
步骤306,第一客户端获取来自所述服务器的所述交互标识,并使得所述交互标识能够被第二客户端获取。Step 306: The first client acquires the interaction identifier from the server, and enables the interaction identifier to be acquired by the second client.
在一些实例中,该方法还可以包括:In some examples, the method can also include:
所述终端设备展示服务器发送的所述待配音视频的交互标识,所述交互标识能够被一终端设备识别从而从所述服务器得到所述待配音视频。这里,所述第二客户端可以运行于所述终端设备上。The terminal device displays an interaction identifier of the to-be-recorded video sent by the server, and the interaction identifier can be recognized by a terminal device to obtain the to-be-recorded video from the server. Here, the second client can be run on the terminal device.
步骤307,第二客户端根据所述交互标识从所述服务器得到所述目标视频和所述管理标识。Step 307: The second client obtains the target video and the management identifier from the server according to the interaction identifier.
第一客户端获取所述二维码,所述第二客户端即可通过扫码的方式得到所述二维码,通过所述二维码,所述第二客户端即可登录所述二维码表示的网址,从而获取所述目标视频以及所述管理标识。The first client obtains the two-dimensional code, and the second client can obtain the two-dimensional code by scanning a code, and the second client can log in to the second client by using the two-dimensional code. The web address represented by the dimension code, thereby obtaining the target video and the management identifier.
进一步地,第二客户端还可以对于所述目标视频进行编辑操作,所述编辑操作包括但不限于画面裁剪、视频裁减、视频增加、消音、配音及图形处理,从而得到编辑后的目标视频,并将编辑后的目标视频以及所述管理标识发送至服务器以替换服务器端与所述管理标识对应的目标视频。Further, the second client may further perform an editing operation on the target video, where the editing operation includes, but is not limited to, screen cropping, video clipping, video addition, silencing, dubbing, and graphics processing, thereby obtaining the edited target video. And sending the edited target video and the management identifier to the server to replace the target video corresponding to the management identifier on the server side.
进一步地,第二客户端还可以通过与服务器交互,向服务器发布视频编辑指令,所述编辑指令中还包括所述管理标识。由服务器对与所述管理标识对应的目标视频进行编辑操作,所述编辑操作包括但不限于画面裁剪、视频裁减、视频增加、消音、配音及图形处理。服务器得到编辑后的目标视频,并以编辑后的目标视频替换原目标视频,并将编辑后的目标视频推送至第二客户端。Further, the second client may also issue a video editing instruction to the server by interacting with the server, where the editing instruction further includes the management identifier. Editing operations are performed by the server on the target video corresponding to the management identifier, including but not limited to screen cropping, video clipping, video addition, mute, dubbing, and graphics processing. The server obtains the edited target video, replaces the original target video with the edited target video, and pushes the edited target video to the second client.
步骤308,响应于配音指令,生成与所述管理标识对应的音频文件并将所述音频文件发送至所述服务器。Step 308, in response to the dubbing instruction, generate an audio file corresponding to the management identifier and send the audio file to the server.
具体地,响应于配音指令,第二客户端可以通过录制音频文件、选取已有音频文件等方式以获取音频文件,并将所述音频文件与所述管理标识发送至服务器使得服务器能够获取所述音频文件。Specifically, in response to the voice-over instruction, the second client may acquire the audio file by recording an audio file, selecting an existing audio file, and the like, and sending the audio file and the management identifier to the server, so that the server can obtain the Audio file.
进一步地,若通过录制音频文件的形式生成音频文件,则在录制过程中,播放目标视频以便用户进行配音;若在步骤308之前,所述第二客户端通过与服务器交互的方式,或通过自身的编辑功能编辑过目标视频,则在录制过程中,播放编辑后的目标视频以便用户进行配音。Further, if the audio file is generated by recording an audio file, during the recording, the target video is played for the user to perform dubbing; if the second client interacts with the server before step 308, or by itself The editing function edits the target video, and during the recording process, the edited target video is played for the user to dub.
在一些实例中,该方法还可以包括:In some examples, the method can also include:
响应于配音指令,终端设备可以生成与所述待配音视频对应的音频文件;将所述音频文件发送至服务器,使得所述服务器根据对应于所述视频标识的待配音视频和对应于所述视频标识的音频文件生成配音后的视频文件。In response to the dubbing instruction, the terminal device may generate an audio file corresponding to the to-be-recorded video; send the audio file to a server, such that the server according to the to-be-recorded video corresponding to the video identification and corresponding to the video The identified audio file generates a dubbed video file.
例如,终端设备可以通过语音输入设备,如麦克风等录制音频文件,并生成音频文件,在录制过程中,终端设备可以同时播放视频以便用户进行配音。For example, the terminal device can record an audio file through a voice input device, such as a microphone, and generate an audio file. During the recording process, the terminal device can simultaneously play the video for the user to perform dubbing.
步骤309,服务器根据对应于所述管理标识的音频文件和对应于所述管理标识的目标视频生成配音后的视频文件。Step 309: The server generates the dubbed video file according to the audio file corresponding to the management identifier and the target video corresponding to the management identifier.
若在步骤308之前,所述第二客户端通过与服务器交互的方式,或通过自身的编辑功能编辑过目标视频,则服务器中的目标视频已经被替换,则服务器根据所述音频文件与被替换过的目标视频得到配音后的视频文件。If, before step 308, the second client edits the target video by means of interaction with the server or by its own editing function, the target video in the server has been replaced, and the server is replaced according to the audio file. The target video that has passed is the dubbed video file.
进一步地,响应于第二客户端的发送指令,服务器可以将所述视频文件发送至第二客户端。Further, in response to the sending instruction of the second client, the server may send the video file to the second client.
进一步地,响应于第二客户端发送的分享指令,服务器还可以将所述视频文件分享至其它用户。Further, in response to the sharing instruction sent by the second client, the server may also share the video file to other users.
综上所述,本实施例提供的方法,通过第一客户端、第二客户端与服务器之间的三方交互,实现了对于视频的配音。配音的具体工作在服务器端完成,用户只需选取待配音视频并录制音频文件即可,从而简化用户配音流程。进一步地,待配音视频的来源不限,可以为用户在某些视频库中选取的视频资源,也可以用户在电视机上观看的视频资源,比如OTT视频。In summary, the method provided in this embodiment implements voiceover for video by three-way interaction between the first client and the second client and the server. The specific work of dubbing is done on the server side, and the user only needs to select the audio to be dubbed and record the audio file, thereby simplifying the user dubbing process. Further, the source of the audio-visual video is not limited, and may be a video resource selected by the user in some video libraries, or a video resource that the user watches on the television, such as an OTT video.
OTT是“Over The Top”的缩写,是指通过互联网向用户提供各种应用服务。这种应用和目前运营商所提供的通信业务不同,它仅利用运营商的网络,而服务由运营商之外的第三方提供。目前,典型的OTT业务有互联网电视业务,苹果应用商店等。互联网企业利用电信运营商的宽带网络发展自己的业务,如国外的谷歌、苹果、Skype、Netflix、国内的QQ等。Netflix网络视频以及各种移动应用商店里的应用都是OTT。本申请实施例可以直接基于OTT视频进行配音,从而显著拓宽配音素材的来源。OTT is an abbreviation of "Over The Top", which refers to providing various application services to users through the Internet. This kind of application is different from the communication service provided by the current operator. It only uses the operator's network, and the service is provided by a third party other than the operator. Currently, typical OTT services include Internet TV services, Apple App Store, and others. Internet companies use telecom operators' broadband networks to develop their own businesses, such as Google, Apple, Skype, Netflix, and domestic QQ. Netflix web video and apps in various mobile app stores are OTT. The embodiment of the present application can directly perform dubbing based on the OTT video, thereby significantly broadening the source of the dubbing material.
进一步地,在步骤308之前,所述目标视频可以被服务器或第二客户端编辑,请参考图5,其示出本申请的视频编辑方法,包括以下步骤:Further, before the step 308, the target video may be edited by the server or the second client. Referring to FIG. 5, the video editing method of the present application includes the following steps:
步骤S310,对所述目标视频按照时间轴先后顺序逐帧分解为视频帧的组合;所述时间轴指的是两个以上时间点按先后顺序排列而成的直线。Step S310, the target video is decomposed into a combination of video frames frame by frame in a time-axis sequence; the time axis refers to a straight line in which two or more time points are arranged in order.
根据所述视频帧的组合生成分解后的临时文件,所述视频帧中包括图形数据。A decomposed temporary file is generated according to the combination of the video frames, and the video frame includes graphic data.
步骤S320,接收视频编辑指令,并根据所述视频编辑指令,对所述按帧分解的视频帧进行编辑。Step S320, receiving a video editing instruction, and editing the frame-decomposed video frame according to the video editing instruction.
步骤S330,根据编辑结果得到编辑后的目标视频。In step S330, the edited target video is obtained according to the editing result.
以画面裁剪为例,若视频编辑指令为画面裁剪指令,则所述画面裁剪指令包括视频画面的宽度数据和高度数据。Taking the screen cropping as an example, if the video editing command is a screen cropping instruction, the screen cropping instruction includes width data and height data of the video screen.
(1)若画面剪辑在第二客户端完成,则由第二客户端直接根据所述视频画面的宽度数据和高度数据对临时文件中的每一个视频帧进行编辑,并根据编辑结果得到画面裁剪后的目标文件。(1) If the screen clip is completed on the second client, the second client directly edits each video frame in the temporary file according to the width data and the height data of the video screen, and obtains the screen clipping according to the editing result. After the target file.
(2)若画面剪辑在服务器端完成,则第二客户端响应于画面裁剪指令,得到画面裁剪后的视频画面的宽度数据和高度数据;将所述宽度数据和所述高度数据传输至服务器使得所述服务器按照所述宽度数据和所述高度数据对服务器中的目标视频进行画面裁剪,所述画面裁剪的方法与(1)一致。(2) if the screen clip is completed on the server side, the second client obtains the width data and the height data of the screen-cut video screen in response to the screen cropping instruction; and transmits the width data and the height data to the server so that The server performs screen clipping on the target video in the server according to the width data and the height data, and the method for cropping the screen is consistent with (1).
进一步地,还可以接收用户的其它视频编辑指令,包括视频裁减、视频增加、消音、配音及图形处理等。Further, other video editing instructions of the user may be received, including video clipping, video addition, mute, dubbing, and graphics processing.
本申请实例通过对目标视频进行多种编辑,可以满足用户的多种编辑需求,最终取得更好的配音效果;通过进行画面裁剪,可以去掉目标视频中的原有字幕。By performing various editing on the target video, the example of the present application can satisfy various editing requirements of the user, and finally obtain a better dubbing effect; by performing screen clipping, the original subtitle in the target video can be removed.
进一步的,对于视频裁减、视频增加、消音、配音及图形处理等视频编辑指令, 参照图6,其示出根据视频裁减、视频增加、消音、配音及图形处理等视频编辑指令对视频进行编辑的流程示意图。上述步骤S320具体包括:Further, for video editing instructions such as video clipping, video addition, mute, dubbing, and graphics processing, referring to FIG. 6, the video editing instruction is edited according to video editing instructions such as video clipping, video addition, mute, dubbing, and graphics processing. Schematic diagram of the process. The above step S320 specifically includes:
S3201、接收视频编辑指令,其中所述视频编辑指令包括视频编辑的起点与终点以及视频编辑的类型;S3201. Receive a video editing instruction, where the video editing instruction includes a start point and an end point of video editing and a type of video editing;
S3022、将所述起点及终点分别与所述时间轴上的时间点进行匹配,获取与起点对应的第一匹配时间点和与终点对应的第二匹配时间点;S3022: Match the start point and the end point with time points on the time axis, respectively, and obtain a first matching time point corresponding to the starting point and a second matching time point corresponding to the ending point;
S3203、查找与所述第一匹配时间点对应的第一视频帧和与所述第二匹配时间点对应的第二视频帧;S3203. Search for a first video frame corresponding to the first matching time point and a second video frame corresponding to the second matching time point.
S3204、根据所述视频编辑的类型,对所第一视频帧与第二视频帧之间的视频帧进行编辑。S3204: Edit a video frame between the first video frame and the second video frame according to the type of the video editing.
下面将根据具体地视频编辑的类型对步骤S320进行描述。Step S320 will be described below based on the type of video editing in particular.
(一)视频裁剪处理(1) Video cropping processing
若视频编辑的类型为视频裁剪处理,则将所述起点及终点分别与所述时间轴上的时间点进行匹配,获取与起点对应的第一匹配时间点和与终点对应的第二匹配时间点;查找与所述第一匹配时间点对应的第一视频帧和与所述第二匹配时间点对应的第二视频帧,对在所述临时文件中将第一视频帧与第二视频帧中的视频帧进行裁剪。If the type of the video editing is the video cropping process, the start point and the end point are respectively matched with the time points on the time axis, and the first matching time point corresponding to the starting point and the second matching time point corresponding to the ending point are obtained. Finding a first video frame corresponding to the first matching time point and a second video frame corresponding to the second matching time point, in the first video frame and the second video frame in the temporary file The video frame is cropped.
(二)视频增加处理(2) Video increase processing
若视频编辑的类型为视频增加处理,则将所述起点及终点分别与所述时间轴上的时间点进行匹配,获取与起点对应的第一匹配时间点和与终点对应的第二匹配时间点;查找与所述第一匹配时间点对应的第一视频帧和与所述第二匹配时间点对应的第二视频帧。若起点与终点为相邻两帧图像数据所对应的时间点,则将待添加的视频帧,插入至第一视频帧和第二视频帧之间。若起点与终点之间包括多帧图形数据所对应的时间点,则可以按照预设规则插入至第一视频帧和第二视频帧之间的预设位置。If the type of the video editing is a video addition process, the start point and the end point are respectively matched with the time points on the time axis, and the first matching time point corresponding to the starting point and the second matching time point corresponding to the ending point are obtained. Finding a first video frame corresponding to the first matching time point and a second video frame corresponding to the second matching time point. If the start point and the end point are time points corresponding to the adjacent two frames of image data, the video frame to be added is inserted between the first video frame and the second video frame. If the time point corresponding to the multi-frame graphic data is included between the start point and the end point, the preset position between the first video frame and the second video frame may be inserted according to a preset rule.
(三)消音处理(3) Silencing treatment
若视频编辑的类型为消音处理,则将所述起点及终点分别与所述时间轴上的时间点进行匹配,获取与起点对应的第一匹配时间点和与终点对应的第二匹配时间点;查找与所述第一匹配时间点对应的第一视频帧和与所述第二匹配时间点对应的第二 视频帧。然后,将第一视频帧和第二视频帧间声音数据删除。If the type of the video editing is the mute processing, the start point and the end point are respectively matched with the time points on the time axis, and the first matching time point corresponding to the starting point and the second matching time point corresponding to the ending point are obtained; Finding a first video frame corresponding to the first matching time point and a second video frame corresponding to the second matching time point. Then, the sound data between the first video frame and the second video frame is deleted.
(四)配音处理(4) Dubbing processing
若视频编辑的类型为配音处理,则将所述起点及终点分别与所述时间轴上的时间点进行匹配,获取与起点对应的第一匹配时间点和与终点对应的第二匹配时间点;查找与所述第一匹配时间点对应的第一视频帧和与所述第二匹配时间点对应的第二视频帧。然后,将第一视频帧和第二视频帧之间加入用户所选的声音数据,若第一视频帧和第二视频帧之间的视频帧中原本带有声音数据,则将原本带有的声音数据抹除,然后加入用户所选的声音数据。If the type of the video editing is the dubbing processing, the starting point and the end point are respectively matched with the time points on the time axis, and the first matching time point corresponding to the starting point and the second matching time point corresponding to the ending point are obtained; Finding a first video frame corresponding to the first matching time point and a second video frame corresponding to the second matching time point. Then, the sound data selected by the user is added between the first video frame and the second video frame, and if the video frame between the first video frame and the second video frame originally carries the sound data, the original video The sound data is erased and then the sound data selected by the user is added.
(五)图形处理(5) Graphics processing
若视频编辑的类型为图形处理,则将所述起点及终点分别与所述时间轴上的时间点进行匹配,获取与起点对应的第一匹配时间点和与终点对应的第二匹配时间点;查找与所述第一匹配时间点对应的第一视频帧和与所述第二匹配时间点对应的第二视频帧。然后,对第一视频帧和第二视频帧之间的视频帧之间的图像数据的对比度,亮度,以及色饱和度进行调整。If the type of the video editing is graphics processing, the starting point and the ending point are respectively matched with the time points on the time axis, and the first matching time point corresponding to the starting point and the second matching time point corresponding to the ending point are obtained; Finding a first video frame corresponding to the first matching time point and a second video frame corresponding to the second matching time point. Then, the contrast, brightness, and color saturation of the image data between the video frames between the first video frame and the second video frame are adjusted.
当然,步骤S320的视频编辑处理不限定于上述几种处理。也可以包括其他的处理。而且上述处理可以灵活组合,例如可以先对视频帧进行消音处理,然后再对消音处理的视频帧进行配音处理;或者先对视频帧进行视频裁剪,然后再对裁剪处理后的视频帧的对应位置插入待添加的视频帧等等。在这里需要说明的是,若视频编辑指令中,不包括起点及终点,则该起点默认设置为整个视频帧时间轴的起始时间点,终点默认设置为整个视频信号时间轴的最后一时间点。Of course, the video editing process of step S320 is not limited to the above several processes. Other processing can also be included. Moreover, the above processing can be flexibly combined. For example, the video frame can be silenced first, and then the silenced video frame can be dubbed; or the video frame is first cropped, and then the corresponding position of the clipped video frame is performed. Insert the video frame to be added, and so on. It should be noted that if the video editing command does not include the start point and the end point, the start point is set to the start time point of the time axis of the entire video frame by default, and the end point is set to the last time point of the time axis of the entire video signal by default. .
本申请实例通过将需处理的目标视频逐帧进行分解,从而使得目标视频进行编辑处理时可以精确到每一帧,提高了视频处理的精确度,改善了编辑效果。The example of the present application can decompose the target video to be processed frame by frame, so that the target video can be accurately processed to each frame, which improves the accuracy of the video processing and improves the editing effect.
请参考图7A,其示出了一种配音方法,所述方法包括如下步骤:Please refer to FIG. 7A, which illustrates a dubbing method, the method comprising the following steps:
步骤S401A,获取来自终端设备的待配音视频的视频信息,其中,所述视频信息为终端设备根据用户在播放的视频中选择的起始点和视频终止点生成;Step S401A: Acquire video information of a to-be-recorded video from a terminal device, where the video information is generated by the terminal device according to a starting point and a video termination point selected by the user in the played video;
步骤S402A,根据所述视频信息生成待配音视频。Step S402A: Generate a to-be-recorded video according to the video information.
在一些实例中,服务器可以根据所述视频起始点的信息与所述视频终止点的信息从所述视频标识对应的视频中截取所述待配音视频。In some examples, the server may intercept the to-be-recorded video from the video corresponding to the video identifier according to the information of the video starting point and the information of the video termination point.
在一些实例中,所述视频起始点的信息包括所述视频中对应所述视频起始点的 第一视频截图,所述视频终止点的信息包括所述视频中对应所述视频终止点的第二视频截图。服务器可以根据所述第一视频截图和所述第二视频截图在所述视频标识对应的视频中确定所述视频起始点与所述视频终止点,从所述视频中截取所述视频起始点与所述视频终止点之间的视频数据作为所述待配音视频。In some examples, the information of the video starting point includes a first video screenshot of the video corresponding to the video starting point, and the information of the video termination point includes a second of the video corresponding to the video termination point. Video screenshot. The server may determine, according to the first video screenshot and the second video screenshot, the video starting point and the video termination point in a video corresponding to the video identifier, and intercept the video starting point from the video. The video data between the video termination points serves as the to-be-recorded video.
在一些实例中,所述视频起始点的信息包括所述视频中对应所述视频起始点的第一时间,所述视频终止点的信息包括所述视频中对应所述视频终止点的第二时间。服务器可以从所述视频中截取所述第一时间和所述第二时间之间的视频数据作为所述待配音视频。In some examples, the information of the video starting point includes a first time in the video corresponding to the video starting point, and the information of the video ending point includes a second time in the video corresponding to the video ending point. . The server may intercept video data between the first time and the second time as the to-be-recorded video from the video.
在一些实例中,服务器还可以接收所述终端设备发送的音频文件,根据对应于所述视频标识的待配音视频和对应于所述视频标识的音频文件生成配音后的视频文件。In some examples, the server may further receive an audio file sent by the terminal device, and generate a dubbed video file according to the audio to video corresponding to the video identifier and the audio file corresponding to the video identifier.
请参考图7B,其示出了一种配音方法,所述方法包括如下步骤:Please refer to FIG. 7B, which illustrates a dubbing method, the method comprising the following steps:
步骤S401B,获取来自第一客户端的待配音视频。Step S401B: Acquire a to-be-recorded video from the first client.
步骤S402B,根据所述待配音视频生成目标视频。Step S402B: Generate a target video according to the to-be-dubbed video.
请参考图8,其示出了目标视频生成方法:Please refer to FIG. 8, which shows a target video generation method:
S4021,判断所述待配音视频中是否还有音频数据;S4021: Determine whether there is audio data in the to-be-recorded video;
S4022,若是,则消除所述待配音视频中的音频数据,得到目标视频;S4022, if yes, eliminating audio data in the to-be-recorded video to obtain a target video;
S4023,若否,直接将所述待配音视频作为目标视频。S4023. If no, directly use the to-be-recorded video as the target video.
具体地,所述消除所述待配音视频中的音频数据可以通过下述两种方式实现:Specifically, the canceling the audio data in the to-be-recorded video can be implemented in the following two manners:
(1)解码所述待配音视频所在的文件,得到视频数据和音频数据;根据得到的视频数据重新编码得到目标视频;(1) decoding the file in which the to-be-recorded video is located, obtaining video data and audio data; re-encoding the obtained video data to obtain a target video;
(2)采用数字过滤的方式直接消除所述待配音视频中的音频数据,得到目标视频。(2) directly canceling the audio data in the to-be-recorded video by means of digital filtering to obtain a target video.
步骤S403B,生成与所述目标视频对应的管理标识,并得到与所述管理标识对应的交互标识,以使得第二客户端能够根据所述交互标识得到所述目标视频和所述管理标识。Step S403B: Generate a management identifier corresponding to the target video, and obtain an interaction identifier corresponding to the management identifier, so that the second client can obtain the target video and the management identifier according to the interaction identifier.
本申请实施例中可以按照预设的标识生成方法生成与所述目标视频对应的管理标识。所述标识生成方法包括但不限于随机生成标识,根据目标视频生成时间生成标识,根据目标视频生成时间以及其它属性参数生成标识。In this embodiment, the management identifier corresponding to the target video may be generated according to a preset identifier generation method. The identifier generation method includes, but is not limited to, randomly generating an identifier, generating an identifier according to the target video generation time, and generating an identifier according to the target video generation time and other attribute parameters.
本申请实施例中可以根据所述管理标识与预设的网址生成算法生成网址。生成的网址即为一种交互标识,所述网址与所述管理标识一一对应。所述网址生成后被推送至第一客户端。进一步地,推送至第一客户端的网址可以为字符串形式,也可以为二维码或条形码形式。In this embodiment, the website may generate a web address according to the management identifier and a preset web address generation algorithm. The generated web address is an interactive identifier, and the web address is in one-to-one correspondence with the management identifier. The URL is generated and pushed to the first client. Further, the URL pushed to the first client may be in the form of a string or a QR code or a barcode.
步骤S404B,获取来自第二客户端的与所述管理标识对应的音频文件。Step S404B: Acquire an audio file corresponding to the management identifier from the second client.
步骤S405B,根据对应于所述管理标识的音频文件和对应于所述管理标识的目标视频生成配音后的视频文件。Step S405B: Generate a dubbed video file according to the audio file corresponding to the management identifier and the target video corresponding to the management identifier.
进一步地,请参考图9,其示出了字幕获取方法的流程图。响应于字幕生成指令,所述获取来自第二客户端的与所述管理标识对应的音频文件之后,还包括:Further, please refer to FIG. 9, which shows a flowchart of a subtitle acquisition method. After the obtaining the audio file corresponding to the management identifier from the second client, the method further includes:
步骤S410,对所述音频文件中的音频进行语音识别。Step S410, performing voice recognition on the audio in the audio file.
具体地,请参考图10,其示出了对所述音频文件中的音频进行语音识别的方法的流程图,步骤S410包括如下步骤:Specifically, please refer to FIG. 10, which shows a flowchart of a method for voice recognition of audio in the audio file, and step S410 includes the following steps:
步骤S4101,得到音频文件中的音频数据。In step S4101, audio data in the audio file is obtained.
步骤S4102,根据说话的时间间隔对音频数据进行切分,得到音频数据段,并记录音频数据段的时间信息。Step S4102, the audio data is segmented according to the time interval of the speech, the audio data segment is obtained, and the time information of the audio data segment is recorded.
具体地,根据说话的时间间隔对音频数据进行切分是根据音频数据中音频的波形图通过语音识别来判断应该断句位置。由于人声的语速快慢不同,有一般语速、较快语速以及较慢语速,为了进一步的实现断句的精确性,可以根据音频数据中人声的语速分别设置停顿时间间隔、每段语音的时间间隔。其中,对音频数据进行切分以得到音频数据段保证了音视频画面中呈现出的字幕阅读量能够使得观看者感到舒适、方便消化理解字幕内容。Specifically, the segmentation of the audio data according to the time interval of the speech is determined by the speech recognition based on the waveform of the audio in the audio data. Due to the different speed of speech, there are general speech rate, faster speech rate and slower speech rate. In order to further realize the accuracy of sentence segmentation, the pause interval can be set according to the speech rate of the vocal in the audio data. The time interval of the segment speech. Among them, segmenting the audio data to obtain the audio data segment ensures that the subtitle reading amount presented in the audio and video picture can make the viewer feel comfortable and convenient to digest and understand the subtitle content.
步骤S4103,通过语音识别得到对应的文本数据段。Step S4103, obtaining a corresponding text data segment by voice recognition.
具体地,将音频数据段通过语音识别得到对应的文本数据段,包括:将所述音频数据段与词库进行匹配,得到对应音频数据段的分类词库;根据所匹配的分类词库进行语音识别。该分类词库包括:两种以上的语种分类词库、及两种以上的专业学科分类词库。通过将音频数据段与词库进行匹配可以得到与音频数据中原声语种对应语种分类词库,并可以利用该语种分类词库中的词汇进一步加快语音识别得到对应的文本数据、还可以通过将音频数据段与词库进行匹配得到与音频数据中的专业学科对应专业学科分类词库,例如历史题材的音频数据可以匹配到历史专业学科 分类词库,可利用该专业学科分类词库中的词汇进一步加快语音识别得到对应的文本数据。Specifically, the audio data segment is obtained by voice recognition to obtain a corresponding text data segment, including: matching the audio data segment with the thesaurus to obtain a classified thesaurus corresponding to the audio data segment; and performing voice according to the matched classified dictionary Identification. The taxonomy includes: two or more language classification lexicons, and two or more professional subject classification lexicons. By matching the audio data segment with the thesaurus, the categorization vocabulary corresponding to the original language of the audio data can be obtained, and the vocabulary in the vocabulary can be further used to further accelerate the speech recognition to obtain the corresponding text data, and the audio can also be obtained by The data segment is matched with the thesaurus to obtain a professional subject classification vocabulary corresponding to the professional subject in the audio data. For example, the audio data of the historical subject can be matched to the historical professional subject classification vocabulary, and the vocabulary in the professional subject classification vocabulary can be further utilized. Speed up speech recognition to get the corresponding text data.
具体地,将音频数据段通过语音识别得到对应的文本数据段可以是将音频数据段中的音频内容直接识别成原声对应语言的文本数据,当然,也可将音频数据段中的音频内容识别成其它语言的文字。将音频数据段中的音频内容识别成其它语言的文字的具体过程为:获取用户选择的语言类别,将音频数据段识别成原声对应语言的文本数据,然后将识别出的原声对应语言的文本数据翻译成用户所选择的用户选择的语言类别的文本数据。Specifically, the audio data segment is obtained by voice recognition to obtain a corresponding text data segment, which may be text data that directly recognizes the audio content in the audio data segment into an original sound corresponding language. Of course, the audio content in the audio data segment may also be identified as Text in other languages. The specific process of recognizing the audio content in the audio data segment into the text of the other language is: acquiring the language category selected by the user, identifying the audio data segment as the text data of the original sound corresponding language, and then identifying the text data of the original sound corresponding language Translated into text data of the language category selected by the user selected by the user.
在各实施例中,根据说话的时间间隔的长短,在对应的文本数据段中添加间隔标识符。由于通过语音识别得到文本数据段中包含了大量的标点符号,其很多标点符号不符合上下文的语境,为了方便进一步校对文本数据段,可对语音识别得到文本数据段进行过滤,将文本数据段中标点符号所占字节转换成对应字节的间隔标识符。以方便人工校对时,修改成符合语境的标点符号。In various embodiments, an interval identifier is added to the corresponding text data segment based on the length of the time interval in which the speech is spoken. Since the text data segment obtained by speech recognition contains a large number of punctuation marks, many punctuation marks do not conform to the context of the context. In order to facilitate further proofreading of the text data segment, the text segment of the speech recognition can be filtered, and the text data segment is segmented. The byte occupied by the punctuation symbol is converted into the interval identifier of the corresponding byte. In order to facilitate manual proofreading, it is modified into a punctuation mark that conforms to the context.
具体地,通过语音识别得到文本数据段,可以是根据每段文本数据段的开始时间和结束时间将文本数据进行分割和换行,形成对应于音频文件中的音频数据的字幕文本。具体地,将文本数据进行分割和换行的标准主要依据音视频中字幕与音频的配合。Specifically, the text data segment is obtained by voice recognition, and the text data may be segmented and line-wrapped according to the start time and the end time of each piece of the text data segment to form a caption text corresponding to the audio data in the audio file. Specifically, the standard for dividing and wrapping text data is mainly based on the cooperation of subtitles and audio in audio and video.
步骤S420,根据识别的结果生成与管理标识对应的字幕文件。Step S420, generating a subtitle file corresponding to the management identifier according to the recognized result.
以字幕文件的形式记录上述文本数据段。需要说明的是,生成音视频数据的字幕文件后,可以根据实际情况选择字幕文件的输出方式,字幕文件的输出方式包括但不限于:生成特定格式、符合字幕格式标准的字幕文件;在播放视频时,将字幕文件整合到音视频输出流中,让播放器去做字幕显示工作。The above text data segment is recorded in the form of a subtitle file. It should be noted that after the subtitle file of the audio and video data is generated, the output mode of the subtitle file may be selected according to actual conditions. The output manner of the subtitle file includes but is not limited to: generating a specific format, a subtitle file conforming to the subtitle format standard; playing the video When the subtitle file is integrated into the audio and video output stream, the player can do the subtitle display work.
步骤S430,将所述字幕文件传输至第二客户端使得第二客户端能够对所述字幕文件进行校正并返回修正结果。Step S430, transmitting the subtitle file to the second client, so that the second client can correct the subtitle file and return the correction result.
步骤S440,根据所述修正结果得到目标字幕文件。Step S440, obtaining a target subtitle file according to the correction result.
所述修正结果包括确认指令或修正后的字幕文件。若第二客户端对字幕文件进行了修正,则返回修正后的字幕文件,并以所述修正后的字幕文件作为目标字幕文件;若第二客户端对字幕文件没有修正,则直接返回确认指令,则以原字幕文件作为目标字幕文件。目标字幕文件也与管理标识对应。The correction result includes a confirmation instruction or a corrected subtitle file. If the second client corrects the subtitle file, returning the corrected subtitle file, and using the modified subtitle file as the target subtitle file; if the second client does not correct the subtitle file, directly returning the confirmation instruction , the original subtitle file is used as the target subtitle file. The target subtitle file also corresponds to the management identifier.
进一步地,获取目标字幕文件后,在步骤S405中,即可将对应于相同管理标识的音频文件、目标视频以及目标字幕文件合成,得到配音后的视频文件。Further, after acquiring the target subtitle file, in step S405, the audio file corresponding to the same management identifier, the target video, and the target subtitle file may be combined to obtain a dubbed video file.
本实施例提供了一种配音方法,通过语音识别的方式自动生成字幕文件,并基于管理标识生成了配音文件,用户只需录入对应于目标视频的声音得到音频文件,即可自动完成配音工作,并自动生成字幕,从而避免用户过多的接触复杂的配音文件生成工作,提升用户体验。The embodiment provides a dubbing method, which automatically generates a subtitle file by means of voice recognition, and generates a dubbing file based on the management identifier. The user only needs to input the sound corresponding to the target video to obtain an audio file, and the dubbing work can be completed automatically. And automatically generate subtitles, thus avoiding excessive user contact with complex dubbing file generation work and improving user experience.
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。The following is an embodiment of the apparatus of the present application, which may be used to implement the method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
请参考图11,其示出了一种配音装置,该装置具有实现上述方法示例中服务器的功能,所述功能可以由硬件实现,也可以由硬件执行相应的软件实现。该装置可以包括:Referring to FIG. 11, there is shown a dubbing apparatus having a function of implementing a server in the above method example, which may be implemented by hardware or by hardware to execute corresponding software. The device can include:
待配音视频获取模块501,用于获取来自第一客户端的待配音视频。可以用于执行上述步骤303和步骤S401。The to-be-sound video acquisition module 501 is configured to acquire a to-be-recorded video from the first client. It can be used to perform the above steps 303 and S401.
目标视频生成模块502,用于根据所述待配音视频生成目标视频。可以用于执行上述步骤303和步骤S402。The target video generating module 502 is configured to generate a target video according to the to-be-dubbed video. It can be used to perform the above steps 303 and S402.
标识生成模块503,用于生成与所述目标视频对应的管理标识,并得到与所述管理标识对应的交互标识,以使得第二客户端能够根据所述交互标识得到所述目标视频和所述管理标识。可以用于执行上述步骤304和步骤S403。The identifier generating module 503 is configured to generate a management identifier corresponding to the target video, and obtain an interaction identifier corresponding to the management identifier, so that the second client can obtain the target video according to the interaction identifier and the Management identity. It can be used to perform the above steps 304 and S403.
音频文件获取模块504,用于获取来自第二客户端的与所述管理标识对应的音频文件。可以用于执行上述步骤308和步骤S404。The audio file obtaining module 504 is configured to obtain an audio file corresponding to the management identifier from the second client. It can be used to perform the above steps 308 and S404.
合成模块505,用于根据对应于所述管理标识的音频文件和对应于所述管理标识的目标视频生成配音后的视频文件。可以用于执行上述步骤309和步骤S405。The synthesizing module 505 is configured to generate a dubbed video file according to the audio file corresponding to the management identifier and the target video corresponding to the management identifier. It can be used to perform the above steps 309 and S405.
具体地,请参考图12,其示出了目标视频生成模块的框图。所述目标视频生成模块502可以包括:Specifically, please refer to FIG. 12, which shows a block diagram of a target video generation module. The target video generating module 502 can include:
判断单元5021,用于判断所述待配音视频中是否还有音频数据。可以用于执行上述步骤S4021。The determining unit 5021 is configured to determine whether there is audio data in the to-be-recorded video. It can be used to perform the above step S4021.
消音单元5022,用于消除所述待配音视频中的音频数据。可以用于执行上述步骤3022。The muffling unit 5022 is configured to eliminate audio data in the to-be-recorded video. Can be used to perform step 3022 above.
具体地,请参考图13,其使出了标识生成模块的框图。所述标识生成模块 503可以包括:Specifically, please refer to FIG. 13, which shows a block diagram of the identity generation module. The identifier generating module 503 can include:
管理标识生成单元5031,用于按照预设的标识生成方法生成与所述目标视频对应的管理标识。可以用于执行上述步骤304和步骤S403。The management identifier generating unit 5031 is configured to generate a management identifier corresponding to the target video according to a preset identifier generation method. It can be used to perform the above steps 304 and S403.
网址生成单元5032,用于根据所述管理标识与预设的网址生成算法生成网址。可以用于执行上述步骤304和步骤S403。The website generating unit 5032 is configured to generate a web address according to the management identifier and a preset web address generation algorithm. It can be used to perform the above steps 304 and S403.
二维码生成单元5033,用于根据所述网址生成二维码。可以用于执行上述步骤304和步骤S403。The two-dimensional code generating unit 5033 is configured to generate a two-dimensional code according to the web address. It can be used to perform the above steps 304 and S403.
相应的,本装置还可以包括:二维码推送模块506,用于将所述二维码推送至所述第一客户端。可以用于执行上述步骤304。Correspondingly, the device may further include: a two-dimensional code pushing module 506, configured to push the two-dimensional code to the first client. Can be used to perform the above step 304.
进一步地,本装置还可以包括:Further, the device may further include:
语音识别模块507,用于对所述音频文件中的音频进行语音识别。可以用于执行上述步骤S410。The voice recognition module 507 is configured to perform voice recognition on the audio in the audio file. It can be used to perform the above step S410.
字幕文件生成模块508,用于根据识别的结果生成字幕文件。可以用于执行上述步骤S420。The subtitle file generating module 508 is configured to generate a subtitle file according to the recognized result. It can be used to perform the above step S420.
进一步地,本装置还可以包括:Further, the device may further include:
视频编辑模块509,用于进行视频编辑。The video editing module 509 is used for video editing.
视频文件发送模块510,用于将配音后的视频文件发送至第二客户端。The video file sending module 510 is configured to send the dubbed video file to the second client.
视频文件分享模块511,用于将配音后的视频文件分享至其它用户。The video file sharing module 511 is configured to share the dubbed video file to other users.
本申请一示例性实施例还提供了一种配音系统,所述系统包括第一客户端601、第二客户端602和服务器603;An exemplary embodiment of the present application further provides a voice over system, the system including a first client 601, a second client 602, and a server 603;
所述第一客户端601,用于响应于用户指令,得到待配音视频;将所述待配音视频发送至服务器;获取来自所述服务器的交互标识,并使得所述交互标识能够被第二客户端获取;The first client 601 is configured to obtain a to-be-dubbed video in response to a user instruction, send the to-be-recorded video to a server, acquire an interaction identifier from the server, and enable the interaction identifier to be used by the second client. Acquisition
所述第二客户端602,用于根据所述交互标识从所述服务器获取目标视频;响应于配音指令,生成与管理标识对应的音频文件并将所述音频文件发送至所述服务器;The second client 602 is configured to acquire a target video from the server according to the interaction identifier; generate an audio file corresponding to the management identifier and send the audio file to the server in response to the voiceover instruction;
所述服务器603,用于获取所述待配音视频;根据所述待配音视频生成目标视频;生成与所述目标视频对应的管理标识,并得到与所述管理标识对应的交互标识;将所述交互标识发送至第一客户端;向第二客户端发送目标视频;根据所述音频文 件与服务器中的目标视频得到配音后的视频文件。The server 603 is configured to acquire the to-be-recorded video; generate a target video according to the to-be-recorded video; generate a management identifier corresponding to the target video, and obtain an interaction identifier corresponding to the management identifier; The interaction identifier is sent to the first client; the target video is sent to the second client; and the dubbed video file is obtained according to the audio file and the target video in the server.
具体地,所述服务器603可以为上述的配音装置;Specifically, the server 603 may be the above-mentioned dubbing device;
所述第一客户端601可以包括:The first client 601 can include:
视频标识选择模块6011,用于获取用户选择的视频标识;The video identifier selection module 6011 is configured to acquire a video identifier selected by the user.
时间点获取模块6012,用于获取用户选择的视频起始点与视频终止点;a time point obtaining module 6012, configured to acquire a video starting point and a video ending point selected by the user;
待配音视频获取模块6013,用于在与所述视频标识对应的视频文件中,拷贝所述视频起始点和视频终止点之间的视频内容,得到待配音视频;The to-be-dubbed video acquisition module 6013 is configured to: in the video file corresponding to the video identifier, copy the video content between the video starting point and the video termination point to obtain a to-be-recorded video;
所述第二客户端602可以包括:The second client 602 can include:
交互标识获取模块6021,用于获取交互标识;The interaction identifier obtaining module 6021 is configured to acquire an interaction identifier.
交互结果获取模块6022,用于根据所述交互标识从服务器得到目标视频和管理标识;The interaction result obtaining module 6022 is configured to obtain a target video and a management identifier from the server according to the interaction identifier.
音频文件获取模块6023,用于生成与所述管理标识对应的音频文件;The audio file obtaining module 6023 is configured to generate an audio file corresponding to the management identifier;
音频文件发送模块6024,用于将所述音频文件发送至所述服务器。The audio file sending module 6024 is configured to send the audio file to the server.
进一步地,所述第二客户端还可以包括:Further, the second client may further include:
画面裁剪模块6025,响应于画面裁剪指令,得到画面裁剪后的视频画面的宽度数据和高度数据。The screen cropping module 6025 obtains the width data and the height data of the video screen after the screen is cropped in response to the screen cropping instruction.
需要说明的是,上述实施例提供的装置和系统,在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that, when the device and the system provided by the foregoing embodiments are implemented, only the division of the foregoing functional modules is illustrated. In actual applications, the function distribution may be completed by different functional modules as needed. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the foregoing embodiments are in the same concept, and the specific implementation process is described in detail in the method embodiment, and details are not described herein again.
请参考图14,其示出了本申请一个实施例提供的终端的结构示意图。该终端用于实施上述实施例中提供的配音方法。Please refer to FIG. 14, which is a schematic structural diagram of a terminal provided by an embodiment of the present application. The terminal is used to implement the dubbing method provided in the above embodiments.
所述终端可以包括RF(Radio Frequency,射频)电路110、包括有一个或一个以上计算机可读存储介质的存储器120、输入单元130、显示单元140、传感器150、音频电路160、WiFi(wireless fidelity,无线保真)模块170、包括有一个或者一个以上处理核心的处理器180、以及电源190等部件。本领域技术人员可以理解,图14中示出的终端结构并不构成对终端的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。The terminal may include an RF (Radio Frequency) circuit 110, a memory 120 including one or more computer readable storage media, an input unit 130, a display unit 140, a sensor 150, an audio circuit 160, and a WiFi (wireless fidelity, The Wireless Fidelity module 170 includes a processor 180 having one or more processing cores, and a power supply 190 and the like. It will be understood by those skilled in the art that the terminal structure shown in FIG. 14 does not constitute a limitation to the terminal, and may include more or less components than those illustrated, or a combination of certain components, or different component arrangements.
存储器120可用于存储软件程序以及模块,处理器180通过运行存储在存储器120的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器120可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、功能所需的应用程序等;存储数据区可存储根据所述终端的使用所创建的数据等。此外,存储器120可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器120还可以包括存储器控制器,以提供处理器180和输入单元130对存储器120的访问。The memory 120 can be used to store software programs and modules, and the processor 180 executes various functional applications and data processing by running software programs and modules stored in the memory 120. The memory 120 may mainly include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for the function, and the like; the storage data area may store data or the like created according to the use of the terminal. Moreover, memory 120 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 120 may also include a memory controller to provide access to memory 120 by processor 180 and input unit 130.
处理器180是所述终端的控制中心,利用各种接口和线路连接整个终端的各个部分,通过运行或执行存储在存储器120内的软件程序和/或模块,以及调用存储在存储器120内的数据,执行所述终端的各种功能和处理数据。The processor 180 is the control center of the terminal, connecting various portions of the entire terminal using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 120, and recalling data stored in the memory 120. Exercising various functions and processing data of the terminal.
终端还包括有存储器,以及一个或者一个以上的程序,其中一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行述一个或者一个以上程序包含用于执行上述配音方法的指令。The terminal also includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to execute, by the one or more processors, the one or more programs include Instructions.
请参考图15,其示出了本申请一个实施例提供的服务器的结构示意图。该服务器用于实施上述实施例中提供的服务器的配音方法。具体来讲:Please refer to FIG. 15 , which is a schematic structural diagram of a server provided by an embodiment of the present application. This server is used to implement the dubbing method of the server provided in the above embodiment. Specifically:
所述服务器1200包括中央处理单元(CPU)1201、包括随机存取存储器(RAM)1202和只读存储器(ROM)1203的系统存储器1204,以及连接系统存储器1204和中央处理单元1201的系统总线1205。所述服务器1200还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(I/O系统)1206,和用于存储操作系统1213、应用程序1214和其他程序模块1215的大容量存储设备1207。The server 1200 includes a central processing unit (CPU) 1201, a system memory 1204 including a random access memory (RAM) 1202 and a read only memory (ROM) 1203, and a system bus 1205 that connects the system memory 1204 and the central processing unit 1201. The server 1200 also includes a basic input/output system (I/O system) 1206 that facilitates transfer of information between various devices within the computer, and mass storage for storing the operating system 1213, applications 1214, and other program modules 1215. Device 1207.
所述基本输入/输出系统1206包括有用于显示信息的显示器1208和用于用户输入信息的诸如鼠标、键盘之类的输入设备1209。其中所述显示器1208和输入设备1209都通过连接到系统总线1205的输入输出控制器1210连接到中央处理单元1201。所述基本输入/输出系统1206还可以包括输入输出控制器1210以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器1210还提供输出到显示屏、打印机或其他类型的输出设备。The basic input/output system 1206 includes a display 1208 for displaying information and an input device 1209 such as a mouse, keyboard, etc. for user input of information. The display 1208 and the input device 1209 are both connected to the central processing unit 1201 via an input-output controller 1210 that is coupled to the system bus 1205. The basic input/output system 1206 can also include an input output controller 1210 for receiving and processing input from a plurality of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 1210 also provides output to a display screen, printer, or other type of output device.
所述大容量存储设备1207通过连接到系统总线1205的大容量存储控制器(未示出)连接到中央处理单元1201。所述大容量存储设备1207及其相关联的计算机可读介质为服务器1200提供非易失性存储。也就是说,所述大容量存储设备 1207可以包括诸如硬盘或者CD-ROM驱动器之类的计算机可读介质(未示出)。The mass storage device 1207 is connected to the central processing unit 1201 by a mass storage controller (not shown) connected to the system bus 1205. The mass storage device 1207 and its associated computer readable medium provide non-volatile storage for the server 1200. That is, the mass storage device 1207 can include a computer readable medium (not shown) such as a hard disk or a CD-ROM drive.
不失一般性,所述计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM、EEPROM、闪存或其他固态存储其技术,CD-ROM、DVD或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器1204和大容量存储设备1207可以统称为存储器。Without loss of generality, the computer readable medium can include computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media include RAM, ROM, EPROM, EEPROM, flash memory or other solid state storage technologies, CD-ROM, DVD or other optical storage, tape cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage medium is not limited to the above. The system memory 1204 and the mass storage device 1207 described above may be collectively referred to as a memory.
根据本申请的各种实施例,所述服务器1200还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即服务器1200可以通过连接在所述系统总线1205上的网络接口单元1211连接到网络1212,或者说,也可以使用网络接口单元1211来连接到其他类型的网络或远程计算机系统(未示出)。According to various embodiments of the present application, the server 1200 can also be operated by a remote computer connected to the network through a network such as the Internet. That is, the server 1200 can be connected to the network 1212 through the network interface unit 1211 connected to the system bus 1205, or can also be connected to other types of networks or remote computer systems (not shown) using the network interface unit 1211. .
所述存储器还包括一个或者一个以上的程序,所述一个或者一个以上程序存储于存储器中,且经配置以由一个或者一个以上处理器执行。上述一个或者一个以上程序包含用于执行上述服务器的方法的指令。The memory also includes one or more programs, the one or more programs being stored in a memory and configured to be executed by one or more processors. The one or more programs described above include instructions for executing the method of the server described above.
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器,上述指令可由终端的处理器执行以完成上述方法实施例中的各个步骤,或者上述指令由服务器的处理器执行以完成上述方法实施例中后台服务器侧的各个步骤。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer readable storage medium comprising instructions, such as a memory comprising instructions executable by a processor of a terminal to perform various steps in the above method embodiments, or The above instructions are executed by the processor of the server to complete the steps of the background server side in the above method embodiment. For example, the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。It should be understood that "a plurality" as referred to herein means two or more. "and/or", describing the association relationship of the associated objects, indicating that there may be three relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately. The character "/" generally indicates that the contextual object is an "or" relationship.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。A person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium. The storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
以上所述仅为本申请的部分实施例,并不用以限制本申请,凡在本申请的原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above is only a part of the embodiments of the present application, and is not intended to limit the present application. Any modifications, equivalent substitutions, improvements, etc., which are within the scope of the present application, should be included in the scope of the present application. .

Claims (24)

  1. 一种配音方法,应用于终端设备,所述方法包括:A voice-over method is applied to a terminal device, and the method includes:
    响应于用户指令,播放视频;Playing a video in response to a user instruction;
    获取用户在所述视频中选择的视频起始点与视频终止点;Obtaining a video starting point and a video ending point selected by the user in the video;
    根据视频起始点和视频终止点生成待配音视频的视频信息;Generating video information of the to-be-recorded video according to the video starting point and the video ending point;
    将所述视频信息发送至服务器,使得所述服务器根据所述视频信息生成待配音视频。Sending the video information to a server, so that the server generates a to-be-recorded video according to the video information.
  2. 根据权利要求1所述的方法,其中,所述根据视频起始点和视频终止点生成待配音视频的视频信息,将所述视频信息发送至服务器,使得所述服务器根据所述视频信息生成待配音视频包括:The method according to claim 1, wherein said generating video information of a video to be dubbed according to a video starting point and a video ending point, transmitting said video information to a server, so that said server generates a to-be-recorded voice according to said video information The video includes:
    在所述视频中,截取所述视频起始点和视频终止点之间的视频数据,将所述视频数据作为所述视频信息发送至所述服务器,使得所述服务器将所述视频数据存储为所述待配音视频。In the video, capturing video data between the video start point and a video termination point, and transmitting the video data as the video information to the server, so that the server stores the video data as a Tell the dubbing video.
  3. 根据权利要求1所述的方法,其中,所述根据视频起始点和视频终止点生成所述视频信息,将所述视频信息发送至服务器,使得所述服务器根据所述视频信息生成待配音视频包括:The method according to claim 1, wherein the generating the video information according to a video starting point and a video ending point, and transmitting the video information to a server, so that the server generates the to-be-recorded video according to the video information, including :
    将所述视频的视频标识、所述视频起始点的信息与所述视频终止点的信息作为所述视频信息发送至所述服务器,以使所述服务器根据所述视频起始点的信息与所述视频终止点的信息从所述视频标识对应的视频中截取所述待配音视频。Transmitting, to the server, the video identifier of the video, the information of the video starting point, and the information of the video termination point to the server, so that the server according to the information of the video starting point The information of the video termination point intercepts the to-be-recorded video from the video corresponding to the video identifier.
  4. 根据权利要求3所述的方法,其中,所述视频起始点的信息包括所述视频中对应所述视频起始点的第一视频截图,所述视频终止点的信息包括所述视频中对应所述视频终止点的第二视频截图,The method according to claim 3, wherein the information of the video starting point comprises a first video screenshot of the video corresponding to the video starting point, and the information of the video termination point includes the corresponding one of the videos. a second video screenshot of the video termination point,
    所述将所述视频信息发送至服务器,使得所述服务器根据所述视频信息生成待配音视频,包括:The sending the video information to the server, so that the server generates the to-be-recorded video according to the video information, including:
    将所述视频信息发送至服务器,使得所述服务器根据所述第一视频截图和所述第二视频截图在所述视频标识对应的视频中确定所述视频起始点与所述视频终止点,根据所述视频起始点与所述视频终止点从所述视频中截取所述待配音视频。Sending the video information to the server, so that the server determines the video starting point and the video termination point in the video corresponding to the video identifier according to the first video screenshot and the second video screenshot, according to The video starting point and the video termination point intercept the to-be-recorded video from the video.
  5. 根据权利要求3所述的方法,其中,所述视频起始点的信息包括所述视频中 对应所述视频起始点的第一时间,所述视频终止点的信息包括所述视频中对应所述视频终止点的第二时间,The method according to claim 3, wherein the information of the video starting point comprises a first time in the video corresponding to the video starting point, and the information of the video ending point comprises a corresponding video in the video. The second time of the termination point,
    所述将所述视频信息发送至服务器,使得所述服务器根据所述视频信息生成待配音视频,包括:The sending the video information to the server, so that the server generates the to-be-recorded video according to the video information, including:
    将所述视频信息发送至服务器,使得所述服务器根据所述第一时间和所述第二时间从所述视频中截取所述待配音视频。Sending the video information to a server, so that the server intercepts the to-be-recorded video from the video according to the first time and the second time.
  6. 根据权利要求1所述的方法,其中,所述方法进一步包括:The method of claim 1 wherein the method further comprises:
    响应于配音指令,生成与所述待配音视频对应的音频文件;Generating an audio file corresponding to the to-be-recorded video in response to the voice-over instruction;
    将所述音频文件发送至服务器,使得所述服务器根据对应于所述视频标识的待配音视频和对应于所述视频标识的音频文件生成配音后的视频文件。And transmitting the audio file to a server, so that the server generates a dubbed video file according to the audio to video corresponding to the video identifier and the audio file corresponding to the video identifier.
  7. 根据权利要求1所述的方法,其中,进一步包括:The method of claim 1 further comprising:
    展示服务器发送的所述待配音视频的交互标识,所述交互标识能够被一终端设备识别从而从所述服务器得到所述待配音视频。Displaying an interaction identifier of the to-be-recorded video sent by the server, the interaction identifier being recognizable by a terminal device to obtain the to-be-recorded video from the server.
  8. 一种配音方法,应用于服务器,所述方法包括:A voice-over method is applied to a server, and the method includes:
    获取来自终端设备的待配音视频的视频信息,其中,所述视频信息为终端设备根据用户在播放的视频中选择的视频起始点和视频终止点生成;Obtaining video information of the to-be-recorded video from the terminal device, where the video information is generated by the terminal device according to a video starting point and a video termination point selected by the user in the played video;
    根据所述视频信息生成待配音视频。And generating a to-be-recorded video according to the video information.
  9. 根据权利要求8所述的方法,其中,所述视频信息包括所述视频的视频标识、所述视频起始点的信息与所述视频终止点的信息,所述根据所述视频信息生成待配音视频包括:The method according to claim 8, wherein the video information comprises a video identifier of the video, information of the video starting point and information of the video termination point, and the to-be-recorded video is generated according to the video information. include:
    根据所述视频起始点的信息与所述视频终止点的信息从所述视频标识对应的视频中截取所述待配音视频。And the to-be-recorded video is intercepted from the video corresponding to the video identifier according to the information of the video starting point and the information of the video termination point.
  10. 根据权利要求9所述的方法,其中,所述视频起始点的信息包括所述视频中对应所述视频起始点的第一视频截图,所述视频终止点的信息包括所述视频中对应所述视频终止点的第二视频截图,The method according to claim 9, wherein the information of the video starting point comprises a first video screenshot of the video corresponding to the video starting point, and the information of the video termination point includes the corresponding one of the videos. a second video screenshot of the video termination point,
    根据所述视频起始点的信息与所述视频终止点的信息从所述视频标识对应的视频中截取所述待配音视频包括:The intercepting the to-be-recorded video from the video corresponding to the video identifier according to the information of the video starting point and the information of the video termination point includes:
    根据所述第一视频截图和所述第二视频截图在所述视频标识对应的视频中确定所述视频起始点与所述视频终止点,从所述视频中截取所述视频起始点与所述视频 终止点之间的视频数据作为所述待配音视频。Determining, according to the first video screenshot and the second video screenshot, the video starting point and the video termination point in a video corresponding to the video identifier, intercepting the video starting point from the video and the The video data between the video termination points serves as the to-be-recorded video.
  11. 根据权利要求9所述的方法,其中,所述视频起始点的信息包括所述视频中对应所述视频起始点的第一时间,所述视频终止点的信息包括所述视频中对应所述视频终止点的第二时间,The method according to claim 9, wherein the information of the video starting point includes a first time in the video corresponding to the video starting point, and the information of the video ending point includes the video corresponding to the video. The second time of the termination point,
    根据所述视频起始点的信息与所述视频终止点的信息从所述视频标识对应的视频中截取所述待配音视频包括:The intercepting the to-be-recorded video from the video corresponding to the video identifier according to the information of the video starting point and the information of the video termination point includes:
    从所述视频中截取所述第一时间和所述第二时间之间的视频数据作为所述待配音视频。Video data between the first time and the second time is intercepted from the video as the to-be-recorded video.
  12. 根据权利要求8所述的方法,其中,所述方法进一步包括:The method of claim 8 wherein the method further comprises:
    接收所述终端设备发送的音频文件,Receiving an audio file sent by the terminal device,
    根据对应于所述视频标识的待配音视频和对应于所述视频标识的音频文件生成配音后的视频文件。The dubbed video file is generated according to the audio to video corresponding to the video identification and the audio file corresponding to the video identification.
  13. 一种终端设备,其中,包括处理器和存储器,所述存储器中存储有计算机可读指令,所述指令可以使所述处理器执行以下操作:A terminal device, comprising a processor and a memory, wherein the memory stores computer readable instructions that cause the processor to:
    响应于用户指令,播放视频;Playing a video in response to a user instruction;
    获取用户在所述视频中选择的视频起始点与视频终止点;Obtaining a video starting point and a video ending point selected by the user in the video;
    根据视频起始点和视频终止点生成待配音视频的视频信息;Generating video information of the to-be-recorded video according to the video starting point and the video ending point;
    将所述视频信息发送至服务器,使得所述服务器根据所述视频信息生成待配音视频。Sending the video information to a server, so that the server generates a to-be-recorded video according to the video information.
  14. 根据权利要求13所述的终端设备,其中,所述指令可以使所述处理器执行以下操作:The terminal device of claim 13, wherein the instructions cause the processor to perform the following operations:
    在所述视频中,截取所述视频起始点和视频终止点之间的视频数据,In the video, capturing video data between the video start point and the video end point,
    将所述视频数据作为所述视频信息发送至所述服务器,使得所述服务器将所述视频数据存储为所述待配音视频。Transmitting the video data as the video information to the server such that the server stores the video data as the to-be-recorded video.
  15. 根据权利要求13所述的终端设备,其中,所述指令可以使所述处理器执行以下操作:The terminal device of claim 13, wherein the instructions cause the processor to perform the following operations:
    将所述视频的视频标识、所述视频起始点的信息与所述视频终止点的信息作为所述视频信息发送至所述服务器,以使所述服务器根据所述视频起始点的信息与所述视频终止点的信息从所述视频标识对应的视频中截取所述待配音视频。Transmitting, to the server, the video identifier of the video, the information of the video starting point, and the information of the video termination point to the server, so that the server according to the information of the video starting point The information of the video termination point intercepts the to-be-recorded video from the video corresponding to the video identifier.
  16. 根据权利要求13所述的终端设备,其中,所述指令可以使所述处理器执行以下操作:The terminal device of claim 13, wherein the instructions cause the processor to perform the following operations:
    响应于配音指令,生成与所述待配音视频对应的音频文件;Generating an audio file corresponding to the to-be-recorded video in response to the voice-over instruction;
    所述发送模块,进一步用于将所述音频文件发送至服务器,使得所述服务器根据对应于所述视频标识的待配音视频和对应于所述视频标识的音频文件生成配音后的视频文件。The sending module is further configured to send the audio file to a server, so that the server generates a dubbed video file according to the audio to video corresponding to the video identifier and the audio file corresponding to the video identifier.
  17. 根据权利要求13所述的终端设备,其中,所述指令可以使所述处理器执行以下操作:The terminal device of claim 13, wherein the instructions cause the processor to perform the following operations:
    展示服务器发送的所述待配音视频的交互标识,所述交互标识能够被一终端设备识别从而从所述服务器得到所述待配音视频。Displaying an interaction identifier of the to-be-recorded video sent by the server, the interaction identifier being recognizable by a terminal device to obtain the to-be-recorded video from the server.
  18. 一种服务器,包括:处理器和存储器,所述存储器中存储有计算机可读指令,所述指令可以使所述处理器执行以下操作:A server comprising: a processor and a memory, the memory storing computer readable instructions, the instructions causing the processor to:
    获取来自终端设备的待配音视频的视频信息,其中,所述视频信息为终端设备根据用户在播放的视频中选择的起始点和视频终止点生成;Obtaining video information of the to-be-recorded video from the terminal device, where the video information is generated by the terminal device according to a starting point and a video termination point selected by the user in the played video;
    根据所述视频信息生成待配音视频。And generating a to-be-recorded video according to the video information.
  19. 根据权利要求18所述的服务器,其中,所述指令可以使所述处理器执行以下操作:The server of claim 18, wherein the instructions cause the processor to perform the following operations:
    所述视频信息包括所述视频的视频标识、所述视频起始点的信息与所述视频终止点的信息,The video information includes a video identifier of the video, information of a starting point of the video, and information of a termination point of the video,
    根据所述视频起始点的信息与所述视频终止点的信息从所述视频标识对应的视频中截取所述待配音视频。And the to-be-recorded video is intercepted from the video corresponding to the video identifier according to the information of the video starting point and the information of the video termination point.
  20. 根据权利要求18所述的服务器,其中,所述指令可以使所述处理器执行以下操作:The server of claim 18, wherein the instructions cause the processor to perform the following operations:
    所述视频起始点的信息包括所述视频中对应所述视频起始点的第一视频截图,所述视频终止点的信息包括所述视频中对应所述视频终止点的第二视频截图,The information of the video starting point includes a first video screenshot of the video corresponding to the video starting point, and the information of the video termination point includes a second video screenshot of the video corresponding to the video termination point.
    根据所述第一视频截图和所述第二视频截图在所述视频标识对应的视频中确定所述视频起始点与所述视频终止点,从所述视频中截取所述视频起始点与所述视频终止点之间的视频数据作为所述待配音视频。Determining, according to the first video screenshot and the second video screenshot, the video starting point and the video termination point in a video corresponding to the video identifier, intercepting the video starting point from the video and the The video data between the video termination points serves as the to-be-recorded video.
  21. 根据权利要求18所述的服务器,其中,所述指令可以使所述处理器执行以 下操作:The server of claim 18, wherein the instructions cause the processor to perform the following operations:
    所述视频起始点的信息包括所述视频中对应所述视频起始点的第一时间,所述视频终止点的信息包括所述视频中对应所述视频终止点的第二时间,The information of the video starting point includes a first time in the video corresponding to the video starting point, and the information of the video ending point includes a second time in the video corresponding to the video termination point,
    从所述视频中截取所述第一时间和所述第二时间之间的视频数据作为所述待配音视频。Video data between the first time and the second time is intercepted from the video as the to-be-recorded video.
  22. 根据权利要求18所述的服务器,其中,所述指令可以使所述处理器执行以下操作:The server of claim 18, wherein the instructions cause the processor to perform the following operations:
    接收所述终端设备发送的音频文件,Receiving an audio file sent by the terminal device,
    根据对应于所述视频标识的待配音视频和对应于所述视频标识的音频文件生成配音后的视频文件。The dubbed video file is generated according to the audio to video corresponding to the video identification and the audio file corresponding to the video identification.
  23. 一种非易失性计算机可读存储介质,存储有计算机可读指令,可以使至少一个处理器执行如权利要求1至7任一项所述的方法。A non-transitory computer readable storage medium storing computer readable instructions for causing at least one processor to perform the method of any one of claims 1 to 7.
  24. 一种非易失性计算机可读存储介质,存储有计算机可读指令,可以使至少一个处理器执行如权利要求8至12任一项所述的方法。A non-transitory computer readable storage medium storing computer readable instructions, which may cause at least one processor to perform the method of any one of claims 8 to 12.
PCT/CN2018/072201 2017-01-16 2018-01-11 Dubbing method, terminal device, server and storage medium WO2018130173A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710029246.5 2017-01-16
CN201710029246.5A CN107071512B (en) 2017-01-16 2017-01-16 A kind of dubbing method, apparatus and system

Publications (1)

Publication Number Publication Date
WO2018130173A1 true WO2018130173A1 (en) 2018-07-19

Family

ID=59599023

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/072201 WO2018130173A1 (en) 2017-01-16 2018-01-11 Dubbing method, terminal device, server and storage medium

Country Status (2)

Country Link
CN (1) CN107071512B (en)
WO (1) WO2018130173A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107071512B (en) * 2017-01-16 2019-06-25 腾讯科技(深圳)有限公司 A kind of dubbing method, apparatus and system
CN108305636B (en) * 2017-11-06 2019-11-15 腾讯科技(深圳)有限公司 A kind of audio file processing method and processing device
CN109274900A (en) * 2018-09-05 2019-01-25 浙江工业大学 A kind of video dubbing method
CN109618116B (en) * 2018-12-25 2020-07-28 北京微播视界科技有限公司 Multimedia information processing method, electronic equipment and computer storage medium
CN110830851A (en) * 2019-10-30 2020-02-21 深圳点猫科技有限公司 Method and device for making video file
CN111986656B (en) * 2020-08-31 2021-07-30 上海松鼠课堂人工智能科技有限公司 Teaching video automatic caption processing method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060088283A1 (en) * 2004-10-27 2006-04-27 Funai Electric Co., Ltd. Video recorder to be connected to a digital video camcorder via IEEE 1394 serial bus
CN103763480A (en) * 2014-01-24 2014-04-30 三星电子(中国)研发中心 Method and equipment for obtaining video dubbing
CN105744182A (en) * 2016-04-22 2016-07-06 广东小天才科技有限公司 Video production method and device
CN105959773A (en) * 2016-04-29 2016-09-21 魔方天空科技(北京)有限公司 Multimedia file processing method and device
CN106060424A (en) * 2016-06-14 2016-10-26 徐文波 Video dubbing method and device
CN106293347A (en) * 2016-08-16 2017-01-04 广东小天才科技有限公司 The learning method of a kind of man-machine interaction and device, user terminal
CN106911900A (en) * 2017-04-06 2017-06-30 腾讯科技(深圳)有限公司 Video dubbing method and device
CN107071512A (en) * 2017-01-16 2017-08-18 腾讯科技(深圳)有限公司 A kind of dubbing method, apparatus and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103179450A (en) * 2011-12-23 2013-06-26 腾讯科技(深圳)有限公司 Method, device and system for playing video, and audio track server
US9467750B2 (en) * 2013-05-31 2016-10-11 Adobe Systems Incorporated Placing unobtrusive overlays in video content
CN105828220A (en) * 2016-03-23 2016-08-03 乐视网信息技术(北京)股份有限公司 Method and device of adding audio file in video file
CN106331749B (en) * 2016-08-31 2020-04-24 北京贝塔科技股份有限公司 Video request method and system
CN106792013A (en) * 2016-11-29 2017-05-31 青岛海尔多媒体有限公司 A kind of method, the TV interactive for television broadcast sounds

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060088283A1 (en) * 2004-10-27 2006-04-27 Funai Electric Co., Ltd. Video recorder to be connected to a digital video camcorder via IEEE 1394 serial bus
CN103763480A (en) * 2014-01-24 2014-04-30 三星电子(中国)研发中心 Method and equipment for obtaining video dubbing
CN105744182A (en) * 2016-04-22 2016-07-06 广东小天才科技有限公司 Video production method and device
CN105959773A (en) * 2016-04-29 2016-09-21 魔方天空科技(北京)有限公司 Multimedia file processing method and device
CN106060424A (en) * 2016-06-14 2016-10-26 徐文波 Video dubbing method and device
CN106293347A (en) * 2016-08-16 2017-01-04 广东小天才科技有限公司 The learning method of a kind of man-machine interaction and device, user terminal
CN107071512A (en) * 2017-01-16 2017-08-18 腾讯科技(深圳)有限公司 A kind of dubbing method, apparatus and system
CN106911900A (en) * 2017-04-06 2017-06-30 腾讯科技(深圳)有限公司 Video dubbing method and device

Also Published As

Publication number Publication date
CN107071512A (en) 2017-08-18
CN107071512B (en) 2019-06-25

Similar Documents

Publication Publication Date Title
WO2018130173A1 (en) Dubbing method, terminal device, server and storage medium
US9799375B2 (en) Method and device for adjusting playback progress of video file
US9329692B2 (en) Actionable content displayed on a touch screen
US8302010B2 (en) Transcript editor
JP5174675B2 (en) Interactive TV without trigger
WO2018227761A1 (en) Correction device for recorded and broadcasted data for teaching
WO2016119370A1 (en) Method and device for implementing sound recording, and mobile terminal
US9055193B2 (en) System and method of a remote conference
US20180308524A1 (en) System and method for preparing and capturing a video file embedded with an image file
CN109147791A (en) A kind of shorthand system and method
US10062130B2 (en) Generating authenticated instruments for oral agreements
WO2019047878A1 (en) Method for controlling terminal by voice, terminal, server and storage medium
CN105162839A (en) Data processing method, data processing device and data processing system
JP2019512144A (en) Real-time content editing using limited dialogue function
CN112261416A (en) Cloud-based video processing method and device, storage medium and electronic equipment
TW201624272A (en) Recording and playing script system and method
CN111161710A (en) Simultaneous interpretation method and device, electronic equipment and storage medium
WO2020010817A1 (en) Video processing method and device, and terminal and storage medium
CN104079948B (en) Generate the method and device of ring signal file
WO2020007083A1 (en) Method and apparatus for processing information associated with video, electronic device, and storage medium
US9049416B2 (en) System and method for constructing scene clip, and record medium thereof
WO2019227431A1 (en) Template sharing method used for generating multimedia content, apparatus and terminal device
CN113422998A (en) Method, device, equipment and storage medium for generating short video and note content
CN109782997B (en) Data processing method, device and storage medium
US20200097528A1 (en) Method and Device for Quickly Inserting Text of Speech Carrier

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18739442

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18739442

Country of ref document: EP

Kind code of ref document: A1