WO2022179530A1 - 一种视频配音的方法、相关设备以及计算机可读存储介质 - Google Patents

一种视频配音的方法、相关设备以及计算机可读存储介质 Download PDF

Info

Publication number
WO2022179530A1
WO2022179530A1 PCT/CN2022/077496 CN2022077496W WO2022179530A1 WO 2022179530 A1 WO2022179530 A1 WO 2022179530A1 CN 2022077496 W CN2022077496 W CN 2022077496W WO 2022179530 A1 WO2022179530 A1 WO 2022179530A1
Authority
WO
WIPO (PCT)
Prior art keywords
terminal
dubbing
video
interface
input operation
Prior art date
Application number
PCT/CN2022/077496
Other languages
English (en)
French (fr)
Inventor
马玉
王卫星
梅浩
Original Assignee
花瓣云科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 花瓣云科技有限公司 filed Critical 花瓣云科技有限公司
Priority to EP22758894.4A priority Critical patent/EP4284005A1/en
Publication of WO2022179530A1 publication Critical patent/WO2022179530A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8166Monomedia components thereof involving executable data, e.g. software
    • H04N21/8173End-user applications, e.g. Web browser, game
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04817Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance using icons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04847Interaction techniques to control parameter settings, e.g. interaction with sliders or dials
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/27Server based end-user applications
    • H04N21/278Content descriptor database or directory service for end-user access
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/432Content retrieval operation from a local storage medium, e.g. hard-disk
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/443OS processes, e.g. booting an STB, implementing a Java virtual machine in an STB or power management in an STB
    • H04N21/4438Window management, e.g. event handling following interaction with the user interface
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47217End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/482End-user interface for program selection
    • H04N21/4828End-user interface for program selection for searching program descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Definitions

  • the present application relates to the technical field of video dubbing, and in particular, to a video dubbing method, a related device, and a computer-readable storage medium.
  • the first type is to select the built-in sound ray of the APP, convert text to voice, and automatically synthesize sound with background music, which is suitable for short video self-media dubbing, advertising sales promotion dubbing, corporate publicity, commentary dubbing , audio reading and other scenarios, such APPs such as "micro-dubbing", etc.;
  • the second type is to make video materials based on their own or use existing video dubbing on the APP, which is more interesting, similar to moving "Sound Immersive" to the scene. Online, such APPs such as "dubbing show” and so on.
  • the current online video dubbing APP only supports the dubbing of the uploaded dubbing material, which makes the source of the dubbing material limited.
  • Embodiments of the present application provide a video dubbing method, a related device, and a computer-readable storage medium, which solve the problem of limited sources of dubbing materials.
  • an embodiment of the present application provides a method for dubbing a video, including: after a first terminal detects an operation instruction for intercepting and dubbing a currently displayed video, intercepting the currently displayed video, obtaining a video clip and displaying the video dubbing control; after the first terminal detects a triggering operation for the video dubbing control, it creates and displays a dubbing room for the video clipping segment; analyzing at the first terminal that the number of dubbing roles in the video clipping clip is not In the case of 0, after detecting the first input operation for the dubbing room, the first terminal displays a dubbing interface; wherein the dubbing interface includes a first display frame, and the first display frame is used to display and play the dubbing material.
  • the first terminal can directly intercept the video in the video application to obtain the dubbing material, so that the sources of the dubbing material are more extensive, and the problem of limited sources of the dubbing material is solved.
  • the method includes: the first terminal reporting to the network The device sends a request message; the request message includes the original video ID of the video clip, the start time of the video clip, and the end time of the video clip; the first terminal receives the transmission from the network device The first response includes the number information of the dubbing roles; the first terminal performs the first operation based on the dubbing role information.
  • the first terminal intercepts the video clip, it can directly create a dubbing room to obtain the dubbing material, and it is unnecessary to perform cumbersome operations such as uploading a video, adding a dubbing character, editing character subtitles, adding background music, adding tags, etc., which reduces the need for users to upload dubbing.
  • the complexity of the material thus, improves the user's dubbing experience.
  • the first terminal performs a first operation based on the dubbing role information, including:
  • the first terminal displays first prompt information; the first prompt information is used to indicate that the video clip is unavailable; if the number of dubbing characters is greater than 1, the After detecting the second input operation for the dubbing room, the first terminal sends a first instruction to the second terminal; the first instruction is used to instruct the video application account of the second terminal to access the dubbing room; When the video application account of the second terminal accesses the dubbing room, the first terminal assigns a dubbing role and generates first information; the first information is used to indicate access to the terminal in the dubbing room The corresponding relationship between the video application account and the dubbing role; the first terminal sends the first information to the network device; the first terminal receives the dubbing material sent by the network device; the dubbing The material is obtained by the network device based on the first information.
  • the first terminal invites other users to enter the dubbing room for dubbing, thereby realizing multi-user real-time dubbing and improving the user's
  • assigning a dubbing role to the first terminal and generating the first information includes: binding, by the first terminal, the dubbing role to a video application account accessing the terminal in the dubbing room; The first terminal generates the first information; the first information is used to indicate the correspondence between the video application account of the terminal in the dubbing room and the dubbing role; the first terminal reports to the first terminal. The two terminals send a notification message; the notification message is used to indicate the dubbing role assigned by the video application account of the second terminal.
  • assigning a dubbing role to the first terminal and generating the first information includes: the first terminal sending a second instruction to the second terminal; the second instruction is used to instruct the The video application account of the second terminal selects a dubbing role; the first terminal receives a confirmation message sent by the second terminal; the confirmation message is used to indicate the dubbing role selected by the second terminal; the first terminal The first information is generated based on the confirmation message; the first information is used to indicate the correspondence between the video application account of the terminal accessing the dubbing room and the dubbing role.
  • the method includes: after the first terminal detects a third input operation for the dubbing interface, suspending the dubbing mode ; wherein, the dubbing mode is: the first terminal collects external audio in real time as dubbing audio and plays the dubbing material in the first display frame; when the dubbing mode is paused, if the first terminal After detecting the fourth input operation for the dubbing interface, the first terminal sends a third instruction to the second terminal; the third instruction is used to instruct the video application account of the second terminal to enter the voice call model.
  • the first terminal turns on the voice mode, so that users accessing the terminal equipment in the dubbing room can have a real-time conversation, which improves the interactivity between users and thus improves the user's dubbing experience.
  • the method includes: after the first terminal detects a third input operation for the dubbing interface, suspending the dubbing mode ; wherein, the dubbing mode is: the first terminal collects external audio in real time as dubbing audio and plays the dubbing material in the first display frame; when the dubbing mode is paused, if the first terminal After detecting the fifth input operation for the dubbing interface, the first terminal displays the playback interface; the playback interface includes a second display frame; after the first terminal detects the sixth input operation for the playback interface , playing back the first video clip in the second display frame and playing back the external audio collected in real time by the first terminal and the second terminal in the dubbing mode; wherein the first video clip is the Dubbed video clips in dubbed footage.
  • the first terminal can play back the dubbed video, so that the user can preview the dubbing effect in advance, so that the user can adjust the subsequent dubbing strategy based on the dubbing effect
  • the method includes: the first terminal detects a seventh input operation for the dubbing interface. After the input operation, a preview interface is displayed; the preview interface includes a third display frame, and the third display frame is used to display a second video clip; wherein, the second video clip is the dubbed video in the dubbing material clip; after the first terminal detects the eighth input operation for the preview interface, it displays a cut interface; the cut interface includes a fourth display frame, and the fourth display frame is used to display the cut the second video clip; after the first terminal detects the ninth input operation on the cutting interface, it cuts the second video clip with the first terminal and the second terminal in dubbing mode external audio captured in real time.
  • the user can cut the dubbed video to obtain a personalized dubbing work that meets the needs, which improves the user's dubbing experience.
  • an embodiment of the present application provides a method for dubbing a video, including: a network device receiving a request message sent by a first terminal; the request message includes an original video ID of a video clipping segment and a start time of the video clipping clip and the end time of the video clipping segment; the network device finds the original video of the video clipping clip from the video resource library based on the original video ID of the video clipping clip; the network device finds the original video of the video clipping clip based on the start of the video clipping clip The time and the end time of the video clipping segment are obtained from the original video to obtain the playback position of the video clipping clip; the network device analyzes the clipping video based on the playing position of the video clipping clip in the original video.
  • the network device when the number of dubbing characters is greater than 1, the first terminal can invite multiple users to perform online dubbing, thereby realizing real-time online dubbing by multiple people, and improving the user's dubbing experience.
  • the method includes: the network device receives the first information sent by the first terminal; the first information is used for Indicate the correspondence between the video application account of the terminal in the dubbing room and the dubbing role; the network device intercepts the video clip corresponding to the playback position of the video clip in its original video, and obtains the clipped video clip ; the network device performs muting processing on the assigned dubbing role in the clipped video clip based on the first information to obtain dubbing material; the network device sends the dubbing material to the first terminal.
  • the network device sends the dubbing material to the first terminal, so that after the first terminal intercepts the video clip, it can directly create a dubbing room to obtain the dubbing material, without performing operations such as uploading a video, adding a dubbing character, editing character subtitles, and adding a background. Cumbersome operations such as music and adding tags reduce the complexity of uploading dubbing materials by the user, thereby improving the user's dubbing experience.
  • an embodiment of the present application provides a method for video dubbing, including: a second terminal receiving a first instruction sent by a first terminal; the first instruction is used to instruct a video application account of the second terminal to connect into the dubbing room created by the first terminal; in response to the first instruction, the second terminal accesses its video application account to the dubbing room created by the first terminal.
  • the method includes: the second terminal receiving the data sent by the first terminal. A notification message; the notification message is used to indicate the dubbing role assigned by the video application account of the second terminal.
  • the method includes: the second terminal receiving the data sent by the first terminal. the second instruction; the second instruction is used to instruct the video application account of the second terminal to select a dubbing role; the second terminal sends a confirmation message to the first terminal; the confirmation message is used to instruct the second terminal The voice-over character selected by the video app account of .
  • the method further includes: the second terminal receiving The third instruction sent by the first terminal; the third instruction is used to instruct the video application account of the second terminal to enter the voice call mode; the second terminal responds to the third instruction to make its video application account Enter the voice call mode.
  • the second terminal enters the voice mode, so that the users accessing the terminal equipment in the dubbing room can have a real-time conversation, which improves the interactivity between users, thereby improving the user's dubbing experience.
  • an embodiment of the present application provides a terminal, where the terminal may be the first terminal in the foregoing first aspect, including: a memory, a processor, and a touch screen;
  • the memory for storing a computer program, the computer program including program instructions
  • the processor is used to call the program instructions, so that the terminal performs the following steps: after detecting the operation instruction for intercepting and dubbing the currently displayed video, intercepting the currently displayed video, obtaining a video interception segment and instructing the touch screen Displaying a video dubbing control; after detecting a trigger operation for the video dubbing control, creating and instructing the touch screen to display a dubbing room for the video clipping segment; after analyzing the video clipping clip, the number of dubbing roles is not In the case of 0, after detecting the first input operation for the dubbing room, instruct the touch screen to display the dubbing interface.
  • creating and instructing the touch screen to display a dubbing room for the video clipping segment including: communicating via communication
  • the module sends a request message to a network device; receives a first response sent by the network device through a communication module; and performs a first operation based on the dubbing role information.
  • the processor performs a first operation based on the dubbing character information, including: if the number of the dubbing characters is 0, instructing the touch screen to display first prompt information; The number of the dubbing roles is greater than 1, and after detecting the second input operation for the dubbing room, a first instruction is sent to the second terminal through the communication module; the video application account of the second terminal is used to access the dubbing room In the case of , assign a dubbing role and generate first information; send the first information to the network device through the communication module; receive the dubbing material sent by the network device through the communication module.
  • the processor assigning a dubbing role and generating the first information includes: binding the dubbing role and a video application account accessing the terminal in the dubbing room; generating the first information ; Send a notification message to the second terminal through the communication module.
  • the processor assigning a dubbing role and generating the first information includes: sending a second instruction to the second terminal through a communication module; receiving a confirmation sent by the second terminal through the communication module message; generating the first information based on the confirmation message.
  • the method includes:
  • the method includes:
  • the dubbing mode is suspended; in the case of suspending the dubbing mode, if the fifth input operation for the dubbing interface is detected, the touch screen is instructed to display the playback interface; After detecting the sixth input operation on the playback interface, instruct the touch screen to play back the first video clip in the second display frame and play the dubbing of the first terminal and the second terminal. External audio captured in real-time in mode.
  • the method includes: detecting a seventh input operation for the dubbing interface. After the input operation, the touch screen is instructed to display the preview interface; after detecting the eighth input operation for the preview interface, the touch screen is instructed to display the cut interface; the ninth input operation for the cut interface is detected After the input operation, the second video clip and the external audio collected in real time by the first terminal and the second terminal in the dubbing mode are cut.
  • an embodiment of the present application provides a network device, where the network device may be the network device in the second aspect, including: a memory, a processor, and a communication module;
  • the memory for storing a computer program, the computer program including program instructions
  • the processor is configured to invoke the program instruction, so that the network device performs the following steps: receiving a request message sent by the first terminal through a communication module; finding the video clip from a video resource library based on the original video ID of the video clip The original video of the video clipping segment; the playback position of the video clipping clip is obtained in the original video based on the start time of the video clipping clip and the end time of the video clipping clip; The playback position in the original video analyzes the dubbable characters in the intercepted video, and obtains the quantity information of the dubbing characters; generates a first response based on the quantity information of the dubbing characters; sends the first response through the communication module to the first terminal.
  • the method includes:
  • the assigned dubbing characters in the subsequent video clips are muted to obtain dubbing materials; the dubbing materials are sent to the first terminal through a communication module.
  • an embodiment of the present application provides a terminal, where the terminal may be the second terminal in the third aspect, including: a memory, a processor, a communication module, and a touch screen;
  • the memory for storing a computer program, the computer program including program instructions
  • the processor is configured to invoke the program instruction, so that the terminal performs the following steps: receiving a first instruction sent by a first terminal through a communication module; responding to the first instruction, connecting its video application account to the first instruction A dubbing room created by the terminal.
  • the method includes: receiving a notification message sent by the first terminal through a communication module.
  • the method includes: receiving, through a communication module, a second instruction sent by the first terminal ; Send a confirmation message to the first terminal through the communication module.
  • the processor further includes: receiving the first instruction through a communication module.
  • the present application provides a terminal, where the terminal may be the first terminal in the above-mentioned first aspect, and includes: one or more functional modules.
  • One or more functional modules are used to perform the video dubbing method in any possible implementation manner of the first aspect.
  • the present application provides a network device, where the network device may be the network device in the foregoing second aspect, including: one or more functional modules.
  • One or more functional modules are used to perform the method for video dubbing in any possible implementation manner of the second aspect.
  • the present application provides a terminal, where the terminal may be the second terminal in the above-mentioned third aspect, including: one or more functional modules.
  • One or more functional modules are used to perform the method for video dubbing in any possible implementation manner of the third aspect.
  • an embodiment of the present application provides a computer storage medium, including computer instructions, which, when the computer instructions are executed on an electronic device, cause the communication apparatus to perform the video dubbing in any of the possible implementations of any of the foregoing aspects. method.
  • an embodiment of the present application provides a computer program product, when the computer program product runs on a computer, the computer performs the video dubbing method in any of the possible implementations of any of the foregoing aspects.
  • FIG. 1A is a diagram of a main interface of a terminal provided by an embodiment of the present application.
  • 1B-1D are interface diagrams of a video application provided by an embodiment of the present application.
  • 1E-FIG. 1G are interface diagrams of intercepting video of a video application provided by an embodiment of the present application.
  • FIGS. 2A-2E are interface diagrams between dubbing provided by an embodiment of the present application.
  • 3A-3D are interface diagrams of dubbing of a first terminal provided by an embodiment of the present application.
  • 3E is a diagram of a dubbing playback interface provided by an embodiment of the present application.
  • FIG. 3F is a dubbing interface diagram of another first terminal provided by an embodiment of the present application.
  • FIG. 4 is a dubbing interface diagram of a second terminal provided by an embodiment of the present application.
  • 5A-5C are preview interface diagrams of a first terminal provided by an embodiment of the present application.
  • Fig. 6 is a kind of dubbing personal homepage interface diagram provided by the embodiment of the present application.
  • FIG. 7 is a flowchart of a video dubbing method provided by an embodiment of the present application.
  • FIG. 9 is a flowchart of a communication between a first terminal and a network device provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a first terminal provided by an embodiment of the present application.
  • FIG. 11 is a software structural block diagram of a first terminal provided by an embodiment of the present application.
  • first and second are only used for descriptive purposes, and should not be construed as implying or implying relative importance or implying the number of indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of the features. In the description of the embodiments of the present application, unless otherwise specified, the “multiple” The meaning is two or more.
  • the first terminal and the second terminal in the embodiments of the present application may be communication devices such as a smart phone, a tablet computer, and a smart watch, and the network device may be a server such as a video server.
  • the first terminal may be terminal devices such as a smartphone, a tablet computer, and a notebook computer, and the embodiment of the present application takes a smartphone as the first terminal 100 as an example for illustration.
  • the first terminal 100 uses the video playback software to play the video, it receives the operation of receiving the user to intercept the video. After the interception of the video is successful, the first terminal 100 sends the intercepted and saved video clip to the network device, and the network device interprets the intercepted video clip. After parsing and processing, a dubbing segment is generated and sent to the video client of the first terminal 100 to provide the user with a video dubbing service.
  • the dubbing material of the first terminal 100 may be clipped clips in the video playback software, and the sources of the dubbing material are more extensive.
  • the first terminal 100 may display an interface 110 with a home screen, and the interface 110 displays a page on which application icons are placed, and the page includes a plurality of application icons (for example, setting application icons). , music application icon, memo application icon, cloud sharing application icon, video application icon 111, cloud sharing application icon, etc.).
  • a page indicator is also displayed below the multiple application icons to indicate the positional relationship between the currently displayed page and other pages. Below the page indicator are multiple tray icons (eg dialer application icon, message application icon, contact application icon, camera application icon), and the tray application icon remains displayed when switching pages.
  • the above-mentioned page may also include multiple application icons and page indicators, the page indicators may not be part of the page, but exist independently, and the above-mentioned icons are also optional, which are not limited in this embodiment of the present application.
  • the first terminal 100 may receive an input operation (eg, single click) of the user acting on the video application icon 111 , and in response to the input operation, the first terminal 100 may display the video main interface 120 as shown in FIG. 1B .
  • an input operation eg, single click
  • the main video interface 120 includes a video search box 121 and a recommended video display box 122 ; wherein, the video search box 121 is used for the first terminal 100 to detect an external input operation (such as inputting a character), and the video The video corresponding to the characters in the video search box 121 is searched in the library; the recommended video display box 122 includes a recommended video display page, and the recommended video display page is used to display the recommended video to the user.
  • the push of the recommended video may be based on the user watching the video It can also be determined based on the video playback volume, which is not limited in the embodiment of the present application; the recommended video display page includes a page indicator to indicate the current recommended video display page and other recommended videos.
  • the main video interface 120 further includes a personalized video display area 123, and the personalized video display area 123 is used for the first terminal 100 to display videos that meet the user's viewing needs to the user based on the user's video historical viewing data and big data.
  • the personalized video display area 123 It includes a "guess you like" control 1231, a push video display box 1232, and a push video name icon 1233.
  • tray application icons such as a homepage icon 124, a member application icon, a dubbing application icon 125, and a personal application icon
  • the first terminal 100 When the device 100 detects the user's input operation (eg, click) on the tray application icon, the first terminal 100 displays a different home interface in response to the user's operation; for example, when the first terminal 100 detects the user's single click on the home page icon 124 During the click operation, the first terminal 100 responds to the user's click operation, and displays the main video interface 120; when the first terminal 100 detects an input operation (eg, click) on the dubbing application icon 125, the first terminal 100 displays as shown in the figure 6 The dubbing personal home page.
  • the above-mentioned main video interface may include a plurality of tray application icons, and the tray application icon is optional; Controls and icons are also optional, which are not limited in this embodiment of the present application.
  • the first terminal 100 After the first terminal 100 detects the input operation of the user acting on the video search box 121, in response to the input operation, the first terminal 100 displays a keyboard input box in the video main interface 120; the first terminal 100 can detect that the user acts on the keyboard The input operation (eg click) of the input box, in response to the operation acting on the keyboard input box, the first terminal 100 displays the characters output by the user through the keyboard input box in the video search box; the first terminal 100 detects that the user acts on the search box In response to an input operation such as clicking on the icon 129, the first terminal 100 searches the video resource library for the target video input by the user, and displays a search result interface 140 as shown in FIG. 1D.
  • an input operation such as clicking on the icon 129
  • the first terminal 100 may receive an input operation such as a click performed by the user on the video search box 121 as shown in FIG. 1B , and in response to the input operation, the first terminal 100 may display a display as shown in FIG. 1C The search video interface 130 is shown.
  • the search video interface 130 includes a search box 131 and a search control 132 ; the user's search records are displayed below the search box 131 .
  • the first terminal 100 detects an input operation such as a click on the search box 131 by the user, in response to the input operation, the first terminal 100 displays a keyboard input box 133 in the search video interface 130 .
  • the first terminal 100 When the first terminal 100 detects an input operation such as a click by the user on the keyboard input box 133, in response to the input operation, the first terminal 100 displays the characters input by the user based on the keyboard input box 133 in the search box 131; After detecting the user's input operation on the search control 132 or clicking on the confirmation control 1331 in the keyboard input box 133, the first terminal 100 searches the video resource library for characters in the search box 131 in response to the input operation. The corresponding target video is displayed, and the search result interface 140 shown in FIG. 1D is displayed. In some embodiments, the search history display area in the video search interface is optional, which is not limited in this embodiment of the present application.
  • the search result interface 140 includes a video display area 141 , a search box 142 , a search control 143 , a return control 144 , a play control 145 , a download control 146 and a plurality of selection controls 147 .
  • the video display area 141 includes a first display area 1411 and a second display area 1412. The first display area is used to display the cover of the searched target video, and the second display area is used to display the information of the searched target video (such as the video name). , video type, video actors, etc.).
  • the first terminal 100 may display the previous display interface (eg, the video search interface 130 or the video main interface 120), or may also display the interface 110, This embodiment of the present application does not limit it.
  • the electronic device When the first terminal 100 detects an input operation such as the user's click on the search box 142, the electronic device responds to the input operation with reference to the above-mentioned embodiment.
  • the terminal 100 responds to the user's input operation acting on the video search box 121 , which is not repeated in this embodiment of the present application.
  • the electronic device When the first terminal 100 detects an input operation such as a click on the playback control 145 by the user, the electronic device responds to the input operation and starts to play the target video from the start time; A terminal 100 detects an input operation such as a click on the playback control 145 by the user, and the electronic device responds to the operation and starts playing the target video from the historical moment when the user last watched the target video based on historical data.
  • the first terminal 100 detects the user's input operation on the download control 146
  • the first terminal 100 detects the user's click on the selection control 147 and other input operations, and the first terminal 100 responds to the user's input operation and downloads the user's selected video.
  • the first terminal 100 does not detect the user's input operation on the download control 146, it detects an input operation such as the user's click on the selection control 147, and the first terminal 100 plays the video of the number of episodes corresponding to the selection control 147.
  • the embodiment of the present application uses the first terminal 100 to respond to the user's input operation on the playback control 145 as an example for illustration.
  • the first terminal 100 detects an input operation such as the user's click on the playback control 145, the playback starts from the start time.
  • the target video is displayed on the video playing interface 150 as shown in FIG. 1E .
  • the video playback interface 150 is used to display the video being played, and includes a lock screen control 151 , a play/pause control 152 , a video capture control 153 and a progress adjustment control 155 .
  • the left side of the time progress bar 154 can display the duration of the current video playback, and the right side of the time progress bar 154 can display the total duration of the currently played video.
  • the video playback ends; when the first terminal 100 detects the user's input operation on the progress adjustment control 155 (for example, left slide or right slide) , the first terminal 100 starts to play the video from the progress adjustment control 155 at the time corresponding to the position of the time progress bar 154; when the video is in the playing state, the first terminal 100 detects that the user clicks the play/pause control 152 and other input operations when the first terminal 100 responds to the input operation, pauses the video, and sets the function of the play/pause control 152 to play the video function (that is, when the first terminal 100 detects the input operation for the play/pause control 152 again, the first A terminal 100 plays a video); when the video is in a paused state; when the first terminal 100 detects an input operation such as a user's click on the play/pause control 152, the first terminal 100 responds to the input operation, plays the video, and send
  • the video clipping control 153 can be used by the first terminal 100 to clip a clip of the currently playing video.
  • the first terminal 100 detects an input operation such as a click by the user on the video clipping control 153, the first terminal 100 can display the video clip shown in FIG. 1F Interface 160 is captured.
  • the video playback interface 150 has other controls, and these controls are all optional, and the embodiments of the present application are only illustrative and not limiting.
  • the video clip capture interface 160 includes a time progress bar 161 , a first capture control 162 and a second capture control 163 .
  • the first terminal 100 detects the user's input operation on the first capture control 162 (for example, swipe left or right)
  • the first terminal 100 displays the video frame image of the first capture control 162 at the time corresponding to the position of the time progress bar 161
  • the left side of the time progress bar 161 displays the first interception control 162 at the time corresponding to the position of the time progress bar 161
  • the first terminal 100 detects the user's input operation on the second interception control 163 for example, sliding left or right Sliding
  • the first terminal 100 displays the video frame image of the second interception control 163 at the time corresponding to the position of the time progress bar 161, and the left side of the time progress bar 161 displays the second interception control 163 at the position of the time progress bar 161.
  • the corresponding time When the first terminal 100 detects the user's input operation (for example, a single click) on the determination control 164, the first terminal 100 uses the first interception control 162 at the time corresponding to the position of the time progress bar 161 (04:15 in FIG. 1F ) as The starting cut point is to cut the original video at the time corresponding to the position of the time progress bar 161 by the second cut control 163 (20:39 in FIG. 1F ) as the end cut point to obtain the cut video.
  • the first interception control 162 at the time corresponding to the position of the time progress bar 161 (04:15 in FIG. 1F ) as The starting cut point is to cut the original video at the time corresponding to the position of the time progress bar 161 by the second cut control 163 (20:39 in FIG. 1F ) as the end cut point to obtain the cut video.
  • the positions displayed by the first interception control of the video clip interception interface at the time corresponding to the position of the time progress bar and the second interception control at the time corresponding to the position of the time progress bar are variable, which are not limited in this embodiment of the present application.
  • the first terminal 100 displays a dubbing selection interface 170 as shown in FIG. 1G .
  • the dubbing selection interface 170 includes a clipped video clip display area 171 and a plurality of application icons (eg, a sharing friend application icon and a dubbing application icon 172 , etc.).
  • application icons in the dubbing selection interface are optional, and only two are listed in the embodiment of the present application for illustration. The selection of application icons is not limited in the embodiment of the present application.
  • the clipped segment display area 171 is used to display the cover of the clipped video clip; the first terminal 100 can detect the user's input operation (for example, click) on the dubbing application icon 172, and when the electronic device detects the input operation, the first terminal 100 The terminal 100 performs the input operation, creates a dubbing room, and displays the first interface 210 of the dubbing room shown in FIG. 2A .
  • the user's input operation for example, click
  • the first interface 210 of the dubbing room includes a dubbing room ID icon 211 , a return control 212 , a display area 213 and an information display box 214 .
  • the ID icon 211 of the dubbing room displays the ID number of the current dubbing room, and the ID of the dubbing room is unique and is used to distinguish the current dubbing room from other dubbing rooms;
  • the display area 213 is used to display the cover of the dubbing material, and the dubbing material is the above-mentioned interception
  • the information display box 214 is used to display the information of the dubbing material (for example, the name of the dubbing character, the gender of the dubbing character, and the source of the dubbing material).
  • the first interface 210 of the dubbing room also includes a start dubbing control 215 and an invite friend control 216 .
  • the first terminal 100 detects that the number of dubbable characters in the dubbing material is one, the first terminal 100 does not detect the user's input operation on the invite friend control 216 .
  • the first terminal 100 detects that the number of dubbable characters in the dubbing material is equal to one, and the first terminal 100 does not detect the user's input operation on the invite friend control 216
  • the first terminal 100 if the first terminal 100 detects that the start dubbing control 215 Input operation, the first terminal 100 responds to the input operation and displays the second interface of the dubbing room; wherein, the second interface of the dubbing room is a single-person dubbing interface.
  • the first terminal 100 When the first terminal 100 detects that the number of dubbable characters in the dubbing material is greater than one, after the first terminal 100 detects an input operation on the invite friend control 216, the first terminal 100 displays the picture on the first interface 210 of the dubbing room. Invite friends information box 217 shown in 2B.
  • the invite friend information box 217 includes other controls such as an invite WeChat friend control 2171 , an invite QQ friend control 2172 , and an invite video friend control 2173 .
  • the home screen of the first terminal 100 will display the friend list interface of communication software such as WeChat or QQ; when When the first terminal 100 detects an input operation on the buddy list interface, the first terminal 100 will send a dubbing request link to the second terminal 200, and the second terminal 200 is the smartphone or tablet of the selected buddy in the buddy list.
  • the second terminal 200 detects an input operation (eg, click) on the dubbing request link, the second terminal 200 detects the installed video application and opens the video application; in the second terminal 200 After detecting the input operation of logging in to the video app with a social account such as WeChat or QQ, log in to the video app and enter the dubbing room.
  • the embodiments of the present application illustrate that the first terminal 100 detects an input operation (eg, click) on the invite video friend control 2173, and the first terminal 100 displays the friend list interface 220 shown in FIG. 2C in response to the input operation as an example.
  • the buddy list interface 220 includes a plurality of buddy display boxes 221 , and each buddy display box includes a buddy name, a buddy icon, and a selection control 222 .
  • the first terminal 100 detects an input operation on the selection control 222 (eg, click)
  • the first terminal 100 will select a friend.
  • the buddy list interface 220 also includes a search box 223 and a search icon 224, so that after the first terminal 100 detects an input operation on the search box 233 and the search icon 224, the first terminal 100 searches the buddy list for the The user name of the corresponding friend in the search box 223; in FIG.
  • the first terminal 100 after the first terminal 100 detects the input operation of the selection controls corresponding to the user Li Hua and the user Xiao Li, the first terminal 100 responds to the input operation and sends a message to the user Li Hua.
  • the electronic devices 200 of Hua and user Xiaoli send the dubbing link, so that the video applications in the second electronic devices of Li Hu and Xiaoli can access the dubbing room.
  • the first terminal 100 After the first terminal 100 detects the completion control 225 in the buddy list interface 220, the first terminal 100 enters the role selection interface 230 shown in FIG. 2D.
  • the first terminal 100 when the first terminal 100 detects that the video applications in the second terminal 200 of the invited friend are all connected to the dubbing room, the first terminal 100 displays a role selection interface 230 , which includes a display area 231 And the role selection functional area 232.
  • the display area 231 is used for displaying the cover image of the dubbing material
  • the character selection function area 232 is used for assigning the dubbing characters.
  • the role selection function area 232 displays information such as the name and gender of the dubbing role, and each dubbing corresponds to a role assignment control 233 respectively.
  • the first terminal 100 When the first terminal 100 detects the user's input operation (eg click) on the role assignment control 233, the first terminal 100 displays a user selection box 234 on the role selection interface 230; wherein, the user selection box 234 displays the user names and For the avatar, the user selection box 234 includes a plurality of selection controls 235, each selection control 235 corresponds to a user.
  • the first terminal 100 detects an input operation (eg, click) on the selection control 235, the first terminal 100 corresponds to the selection control
  • the user assigns a dubbing role and displays the role selection interface as shown in Figure 2E. For example, in FIG.
  • the first terminal 100 after the first terminal 100 detects an input operation for the role assignment control 233 of the character B, the first terminal 100 responds to the operation, displays the user selection box 234 , and displays the entry dubbing in the user selection box 234 user name and user avatar; after detecting a click operation on the selection control 235 corresponding to user Li Hua, in response to the click operation, role B is assigned to user Li Hua.
  • the first terminal 100 displays the users who have been assigned roles in the role selection function area 232, and each user who has been assigned a role corresponds to a cancel control 236 .
  • the first terminal 100 detects an input operation such as the user's click on the cancel control 236, the first terminal 100 responds to the input operation, cancels the dubbing role assigned by the user, and moves the user's avatar and user name to other information.
  • the undo control 236 is replaced with a role assignment control 233. As shown in FIG.
  • the first terminal 100 responds to an input operation related to the assignment of the dubbing role, after the first terminal 100 detects an input operation such as a click on the start dubbing control 215, the first terminal 100 responds to the input operation and displays
  • the dubbing interface 310 in the dubbing room is shown in FIG. 3A.
  • the dubbing interface 310 includes a dubbing segment display area 311 , a subtitle display area 312 , a play/pause control 313 , a dubbing control 314 and a submission control 315 .
  • the first terminal 100 detects an input operation such as a click on the play/pause control 313 by the user, the first terminal 100 responds to the input operation and starts the dubbing mode, that is, the dubbing video is played in the dubbing segment display area 311, and the The subtitles are scrolled and displayed in the subtitle display area 312, and the external audio is collected in real time; the function of the play/pause control 313 is set as the function of pausing dubbing.
  • prompt information 316 is displayed in the subtitle display area 312, and the prompt information 316 is used to indicate the preparation time for dubbing.
  • the first terminal 100 detects an input operation such as clicking on the dubbing control 314, the first terminal 100 receives and saves the user's voice in response to the input operation.
  • the first terminal 100 is in the dubbing mode, if an input operation such as a click on the play/pause control 313 is detected, the first terminal 100 responds to the input operation, pauses the dubbing mode, and restores the function of the play/pause control 313 It is set to start the dubbing function, and a pause icon 317 is displayed in the dubbing segment display area 311 .
  • a completion prompt box 323 as shown in FIG. 3F is displayed in the dubbing display interface 310, and when the user's input operation of the "Yes" control is detected, the first terminal 100 displays a preview interface 510 as shown in FIG. 5A.
  • the first terminal 100 is a device for creating a dubbing room
  • the electronic device of the first terminal 100 inviting other users is the second terminal 200
  • the dubbing interface of the second terminal 200 is the dubbing interface 410 shown in FIG. 4
  • the dubbing interface 410 includes a dubbing control 411; when the second terminal 200 detects an input operation such as clicking on the dubbing control 411, the second terminal 200 receives and saves the user's voice in response to the input operation.
  • the first terminal 100 When the first terminal 100 detects an input operation such as a click on the play/pause control 313 or the pause control 317, the first terminal 100 resumes the dubbing mode in the dubbing room, and the first terminal 100 and the second terminal 200 continue to collect external data in real time. audio.
  • the dubbing mode is suspended, if the first terminal 100 detects an input operation such as a click on the more function control 318, a more operation function box 319 as shown in FIG. 3B is displayed.
  • the first terminal 100 when the first terminal 100 detects an input operation such as a click on the more function control 318 when the dubbing mode is paused, the first terminal 100 displays a more operation function box 319; the more operation function Block 319 includes a voice call function control 3191 and a playback dubbing function control 3192.
  • the first terminal 100 detects an input operation such as a click on the voice call function control 3191, the first terminal 100 responds to the input operation, enters the voice call mode, and displays the voice control 320 and Exit control 321.
  • the first terminal 100 When the first terminal 100 detects an input operation on the voice control 320, the first terminal 100 collects the user's voice in real time, and plays the user's voice in the dubbing room in real time; when the first terminal 100 detects the voice control again When the input operation of 320 is performed, the first terminal 100 stops collecting and playing the user's voice, and no longer opens the voice permission to the user of the first terminal 100 .
  • the voice call mode when the first terminal 100 detects an input operation such as a click on the exit control 321, the first terminal 100 displays the voice mode function box 322 shown in FIG. 3D in the dubbing interface 310.
  • the voice mode function box 322 includes a “Yes” control 3221 and a “No” control 3222.
  • the first terminal 100 detects an input operation such as clicking on the “Yes” control 3221, the first terminal 100 exits In the voice call mode, return to the dubbing interface 310 as shown in FIG. 3A .
  • the electronic device responds to the input operation and displays a dubbing playback interface 330 as shown in FIG. 3E .
  • the dubbing playback interface 330 includes a dubbing work display area 331 , a time progress bar 332 , a progress drag bar 333 , a back control 334 , a forward control 335 and a play/pause control 336 .
  • the dubbing work display area 331 is used to play the dubbing video recorded by the user
  • the precision drag bar 333 is used to adjust the playback progress of the dubbing work.
  • the dubbing work display area 331 displays the image of the dubbing video at the corresponding moment of the progress bar 333 on the time progress bar 332 frame.
  • the first terminal 100 When the first terminal 100 detects an input operation such as a click on the back control 334, the first terminal 100 rewinds the playback time of the dubbed video on the time progress bar 332 by a preset time period; as shown in FIG. 3E, the dubbed video is in The playback time on the time progress bar is 6s.
  • the first terminal 100 receives the user's click operation on the back control 334, if the preset time period is 5s, then the playback progress of the dubbed video is 1s, that is, the current dubbed video is played.
  • the time displayed on the time progress bar 332 is 1s.
  • the first terminal 100 In the case of playing the dubbed video, when the first terminal 100 detects an input operation such as clicking on the play/pause control 336, the first terminal 100 stops playing the dubbed video recorded by the user; When the first terminal 100 detects an input operation such as a click on the play/pause control 336, the first terminal 100 plays the dubbed video recorded by the user. When the first terminal 100 detects an input operation such as a click on the return control 336, the first terminal 100 returns to the dubbing interface 310 as shown in FIG. 3A.
  • the preview interface 510 includes a dubbed video playback area 511, a progress adjustment control 512, a vocal adjustment function control 513, a video cut function control 514, a vocal volume adjustment control 515, a background volume adjustment control 516, and a re-recording control.
  • Control 517 and Generate Work control 518 are examples of the preview interface 510.
  • the first terminal 100 displays the preview interface 510, the dubbed video playback area 511 starts to play the dubbed video; when the first terminal 100 detects an input operation on the progress adjustment control 512 (for example, slide left/right), the first terminal 100 adjusts the playback progress of the dubbed video; similarly, when the first terminal 100 detects an input operation of sliding left/right on the voice volume adjustment control 515, the first terminal 100 increases/decreases the dubbed video When the first terminal 100 detects an input operation of sliding left/right on the background volume adjustment control 516, the first terminal 100 turns up/down the background volume in the dubbed video; When a terminal 100 detects an input operation such as a single click on the re-recording control 517, the first terminal 100 displays the dubbing interface 310 as shown in FIG.
  • the first terminal 100 displays the dubbing interface 310 as shown in FIG.
  • the user in the dubbing room needs to re-dub; when the first terminal 100 detects that When input operations such as clicking on the vocal adjustment function control 513 are performed, the electronic device displays the vocal adjustment function box 519 shown in FIG. 5B in the preview interface 510; When an input operation such as a click is performed, the first terminal 100 uploads the dubbing work, and sends the dubbing work to the dubbing personal homepage of the user participating in the dubbing. When the first terminal 100 detects an input operation such as a click on the video cutting function control 515, the first terminal 100 displays a video cutting interface 530 as shown in FIG. 5C.
  • the vocal adjustment function box includes a noise reduction application icon 5191, a vocal back icon 5192, and a vocal forward icon 5193;
  • the first terminal 100 detects an input operation such as clicking on the noise reduction application icon 5191 , the first terminal 100 will reduce the volume of the "non-human voice" noise in the audio dubbed by the user (for example, the noise of the surrounding environment when the user is dubbing, etc.) to ensure the sound quality of the dubbed work;
  • an input operation such as a click of the back icon 5192 is performed, the first terminal 100 adjusts the audio of the dubbed character so that the audio of the dubbed character is delayed from the corresponding subtitles and the image frame of the dubbed video;
  • 5193 click and other input operations the first terminal 100 adjusts the audio of the dubbed character, so that the audio of the dubbed character is ahead of the corresponding subtitles and the image frame of the dubbed video;
  • the first terminal 100 detects a click on the save control 5194 When waiting for
  • the video cutting interface 530 includes a dubbed video playing area 531 , a subtitle preview area 534 , a first cutting control 532 , and a second cutting control 533 .
  • the first terminal 100 detects an input operation on the first cut control 532/second cut control 533 (for example, swipe left/right)
  • the first terminal 100 responds to the input operation in the dubbed video playback area 531
  • the image frame at the time corresponding to the dubbed video is displayed on the upper panel; the corresponding time is the time corresponding to the first cut control 532/second cut control on the time progress bar.
  • the first terminal 100 displays the subtitle corresponding to the image frame in the subtitle preview area. For example, in FIG.
  • the first terminal 100 when the first terminal 100 detects that the first cut control 532 has been slid left to 00:03 on the time progress bar, the first terminal 100 will display the dubbed video in the dubbed video playback area 431 at 00:03.
  • the image frame at 00:03, and the subtitle corresponding to the image frame at 00:03 is displayed in the subtitle preview area 534; in the above manner, when the user cuts the dubbing video, the first terminal 100 starts by showing the dubbing video to the user.
  • the image frames and subtitles of the cut point (eg, 00:03 in FIG. 5C ) and the end cut point (eg, 00:12 in FIG. 5C ) let the user know whether the determination of the cut point of his dubbed video is as expected.
  • the first terminal 100 uses the moment when the first cut control 532 is on the progress bar as the starting cut point, and the second cut control 533 Cut the dubbed video for the end cut point at the moment on the progress bar, and save the cut dubbed video.
  • the first terminal 100 After the first terminal 100 detects an input operation such as a click on the generated work control 518 of the preview interface 510 , the first terminal 100 uploads the dubbing work to the dubbing user's dubbing personal homepage 610 as shown in FIG. 6 .
  • the first terminal 100 detects an input operation such as clicking on the dubbing application icon 120 in the main video interface 120 in FIG. 1B , the first terminal 100 displays the dubbing personal homepage 610 as shown in FIG. 6 .
  • the dubbing personal homepage 610 includes a user information column 611, and the user information column 611 includes a work information column 6111, a friend information column 6112, a follow information column and a favorite information column.
  • the first terminal 100 detects an input operation (eg, single click) with respect to the dubbing work information field 6111, the first terminal 100 displays the user's dubbing work.
  • FIG. 7 is a flowchart of a video dubbing method provided by an embodiment of the present application.
  • the first terminal intercepts a video in its video application to obtain the intercepted video segment, and dubs the intercepted video segment to obtain a dubbing work. Finally, the first terminal uploads the dubbing work to the dubbing personal homepage of the dubbing user.
  • the first terminal may be the first terminal 100 in the foregoing embodiment
  • the second terminal may be the second terminal 200 in the foregoing embodiment.
  • Step S701 After the first terminal detects a video clipping instruction for its video playing interface, the clipping interface is displayed.
  • the video clipping instruction may be an input operation (such as clicking) on the video clipping control 153 in the video playback interface 150 in the above-mentioned embodiment of FIG. 1E; it may also be a voice signal, for example, when the voice function of the first terminal is used.
  • the module After the module receives the voice command to intercept the video, it can identify and parse the audio through the internal voice recognition module, and after the parsing is completed, generate a trigger signal to trigger the first terminal to intercept the video displayed in its video playback interface;
  • the embodiment of the application only exemplifies the interception of the dubbing operation instruction, and the specific form of the interception of the dubbing operation instruction is not limited in the embodiment of the present application.
  • the video playback interface 150 is not limited to triggering display through the process of FIGS. 1A-1D , which is not limited in this embodiment of the present application; the interception interface may be the video clip interception interface 160 in the above-mentioned embodiment of FIG. 1F .
  • the embodiments of the present application are only for illustration, and do not impose any limitations.
  • Step S702 After detecting an operation instruction for clipping and dubbing of the currently displayed video, the first terminal intercepts the currently displayed video, obtains a clipped segment of the video, and displays a video dubbing control.
  • the interception and dubbing operation instruction may be an input operation (eg, click) for the first interception control 162/second interception control 163 in the above-mentioned embodiment of FIG. 1F .
  • an input operation eg, click
  • the process of intercepting the video by the first terminal please refer to the above-mentioned FIG. 1F
  • the embodiments of the present application will not be repeated in the embodiments of the present application.
  • the video dubbing control is displayed.
  • the video dubbing control may be the dubbing application icon 172 in the above-mentioned embodiment of FIG. 1G , and the embodiment of the present application is only for illustration and not limitation.
  • Step S703 After detecting the triggering operation for the dubbing control, the first terminal creates and displays a dubbing room for the video clipping segment.
  • the triggering operation may be an input operation (such as clicking) on the dubbing application icon 172 in the above-mentioned embodiment of FIG. 1F , and the embodiment of the present application is only illustrative and not limiting.
  • the interface in the dubbing room may be the first interface in the dubbing room in the above-mentioned embodiment of FIG. 2A , and the embodiment of the present application is only for illustration and not limitation.
  • the process of interacting with a network device after the first terminal creates a dubbing room may be as shown in FIG. 8 ; wherein, the network device is a video server.
  • Step S801 The first terminal sends a request message to the network device.
  • the request message includes the original video ID of the video clip, the start time of the video clip, the end time of the video clip, and a dubbing request;
  • the start time of the video clip is the video clip in the original video.
  • the start playing time of the video clipping segment is the end playing time of the video clipping clip in the original video.
  • Step S802 The network device generates a first response based on the request message.
  • the network device finds the original video in the video resource library through the ID of the original video, and then based on the start playing time of the video clip in the original video and the end of the video clip in the original video Play Time Find the playback position of the video clip in the original video (the playback time period corresponding to the video clip in the original video), and use the AI module to analyze the information based on the original video (such as character information, audio information, etc.)
  • the information of the dubbed characters in the video in the playback position for example, the name of the character, the gender of the character, etc.
  • the network device responds to the dubbing request in the request message, and generates a dubbing room ID.
  • the dubbing room ID is unique and is used to distinguish the dubbing room created by the first terminal from other dubbing rooms, so as to avoid the process of interaction between the first terminal and the network device.
  • the network device incorrectly sends the message to other terminal devices. Then, the network device generates a first response; the first response includes the information of the dubbing characters, the ID of the dubbing room, and the quantity information of the dubbing characters.
  • Step S803 The network device sends the first response to the first terminal.
  • Step S804 The first terminal analyzes and processes the intercepted video segment based on the first response.
  • the first terminal After receiving the first response, the first terminal has the following three processing situations for the intercepted video clip based on the quantity information of the dubbing characters in the first response:
  • the intercepted video clip cannot be used for dubbing, and is an invalid video clip; the first terminal will display a prompt message on its display area such as a touch screen, the prompt The message is used to indicate to the user that the video clip is not available.
  • the first terminal may receive other video clips uploaded by the user or re-intercept the video clips as dubbing materials.
  • the dubbing room is a single-person dubbing room, and the first terminal cannot send a dubbing invitation link to other terminal devices;
  • the first terminal can invite at most N-1 video application accounts of the second terminal to access the dubbing room;
  • the first terminal sends an invitation link to the second terminal, where the invitation link is the second request message;
  • the second terminal is a video application on the first terminal
  • the terminal device corresponding to the friend in the friend list of the account.
  • the buddy list interface of the video application account of the first terminal may be the buddy list interface 220 in the above-mentioned embodiment of FIG. 2C .
  • FIGS. 2A to 2C For the specific operation and process of inviting friends by the first terminal, please refer to the above-mentioned embodiments of FIGS. 2A to 2C . The specific content of this application will not be repeated in this embodiment of the present application.
  • the buddy list of the video application account of the first terminal may be a buddy list in communication software such as WeChat and QQ, or a buddy list that is followed in the buddy list of the video application account.
  • the first terminal sends an invitation link to a friend list in a third-party software such as WeChat
  • the second terminal can log in to the video application with an account of the third-party software such as WeChat, thereby accessing the dubbing room created by the first terminal.
  • Step S704 In the case that the number of dubbing roles of the video clipping segment analyzed by the first terminal is not 0, the first terminal assigns the dubbing roles.
  • the first terminal allocates dubbing roles, and there are mainly two ways to allocate dubbing roles:
  • the first terminal assigns a dubbing role to the video application account accessing the terminal in the dubbing room.
  • a dubbing role For the content and process of assigning a dubbing role by the first terminal, refer to the above-mentioned contents in the embodiments of FIG. 2D to FIG. 2E . The application examples are not repeated here.
  • the first terminal After assigning the dubbing roles, the first terminal will send a notification message to the second terminal so that the video application account of the second terminal knows the corresponding dubbing roles. Then, the first terminal generates first information for indicating the correspondence between the video application account of the terminal in the dubbing room and the dubbing character, and sends it to the network device.
  • the first terminal opens the authority to allow the video application account of the terminal accessing the dubbing room to select the dubbing role, that is, the first terminal sends to the second terminal a second instruction instructing the video application account of the second terminal to select the dubbing role; wherein , each video application account can only select one dubbing role, and two or more video application accounts cannot select the same dubbing role.
  • the second terminal sends a confirmation message to the first terminal for notifying the first terminal of the dubbing role it has selected.
  • the first terminal generates, based on the confirmation message, first information for indicating the correspondence between the video application account of the terminal in the dubbing room and the dubbing character, and sends the first information to the network device.
  • FIG. 9 is a flowchart of a communication between a first terminal and a network device provided by an embodiment of the present application. The process is expanded and explained:
  • Step S901 The first terminal sends the first information to the network device.
  • Step S902 The network device intercepts the video clip corresponding to the playback position of the video clip in the original video, and obtains the clipped video clip.
  • Step S903 Based on the first information, the network device performs muting processing on the corresponding dubbing characters in the clipped video clips to obtain dubbing materials.
  • the network device performs muting processing on the selected dubbing character in the clipped video clip to obtain dubbing material. It should be noted that, when the number of dubbing roles in the video clip is N, if only N-1 dubbing roles are allocated; then the network device only allocates the N-1 dubbing roles in the clipped video clip. A voiceover character is silenced.
  • Step S904 The network device sends the dubbing material to the first terminal.
  • the first terminal does not need to assign and select a dubbing role, and the network device automatically sends the clipped video clip after muting to the first terminal as a dubbing material.
  • Step S705 In the case where the number of dubbing roles in the video clipping segment analyzed by the first terminal is not 0, the first terminal displays a dubbing interface after detecting the first input operation for the dubbing room.
  • the first input operation may be an input operation (such as a single click) for the start dubbing control 215 in the above-mentioned embodiment of FIG. 2A, or may be an operation of inputting a voice command between dubbing rooms.
  • the dubbing interface may be the dubbing interface 310 in the above-mentioned embodiment of FIG. 3A , and the layout and configuration of the dubbing interface are only illustrative and not limiting in this embodiment of the present application.
  • the second terminal accessing the dubbing room also displays a dubbing interface.
  • the first terminal is the main device for creating a dubbing room.
  • the first terminal has more operation authority; for example, the first terminal has the authority to pause/start dubbing mode, play back dubbing video , and the permission to turn on/off the voice call mode.
  • the authority of the first terminal to execute the pause/start dubbing mode may be, as in the above-mentioned embodiment of FIG. 3A , by detecting an input operation (such as a single click) on the play/pause control 313 to start/pause the dubbing mode.
  • the embodiment is for illustration only and not limited; when in the dubbing mode, the first terminal and the second terminal collect external audio in real time, and play and display the dubbing material in the first display frame of the dubbing interface.
  • the dubbing mode when the first terminal detects a third input operation for the dubbing interface, the dubbing mode is suspended; the third input operation may be the above-mentioned embodiment of FIG.
  • the input operation (for example, single click) of the pause control 313 is only exemplified in this embodiment of the present application, and is not limited.
  • the permission of the first terminal to enable/disable the voice call mode can be entered in the above-mentioned embodiment of FIG. 3B, and an input operation (eg, single click) on the voice call function control 3191 is detected to enable/disable the voice call mode.
  • the first terminal turns on the voice call mode, it will send a third instruction for instructing it to enter the voice call mode to the second terminal in the dubbing room; the interface of the voice call mode may be the dubbing in the above-mentioned embodiment of FIG. 3C
  • this embodiment of the present application is only used for illustration, and is not limited.
  • the terminal accessing the dubbing room can collect the user audio in real time, and transmit the collected user audio to the voice call platform in the dubbing room in real time, and the voice call platform sends the user real-time audio to the voice call platform. All terminals in the dubbing room can realize multi-user real-time voice communication.
  • the first terminal When the first terminal is in the dubbing pause mode, the first terminal detects a fifth input operation on the dubbing interface to execute the right to play back the dubbing video.
  • the fifth input operation may be an input operation such as a click on the playback dubbing function control 3192 in the above-mentioned embodiment of FIG. 3B , or may be an input operation such as a voice command, and the embodiment of the present application is only for illustration and does not limit it.
  • the first terminal displays a playback interface, and the playback interface includes a second display frame for displaying the playback dubbing work.
  • the playback interface may be the dubbing playback interface 330 in the above-mentioned embodiment of FIG.
  • the second display frame may be the dubbing work display area 331 in the above-mentioned embodiment of FIG. 3E;
  • the location layout of , the embodiment of the present application is only for illustration, and does not make any limitation.
  • the sixth input operation can be the input operation for the progress drag bar 333 in the above-mentioned embodiment of FIG. 3E (for example, slide left/right), or it can be directed to the back control 334/forward control 335 input operation (for example, clicking), the embodiment of the present application is only for illustration, and not limited.
  • the specific operation and process of playing back the dubbed work by the first terminal reference may be made to the specific content of the above-mentioned embodiment in FIG. 3E , which will not be repeated here.
  • Step S706 After detecting the seventh input operation for the dubbing interface, the first terminal displays a preview interface.
  • the seventh input operation may be the input operation (such as clicking) for the submission control 315 in the above-mentioned embodiment of FIG. 3A, or may be the user's voice input operation.
  • the embodiment of the present application is only for illustration and does not limit.
  • the preview interface includes a third display frame for displaying a second video clip; wherein the second video clip is a dubbed video clip in the dubbing material.
  • the preview interface may be the preview interface 510 in the above-mentioned embodiment of FIG. 5A
  • the third display frame may be the video playback area 511.
  • the layout and configuration of the preview interface the shape of the third display frame and the size of the third display frame in the preview interface.
  • the embodiments of the present application are only used for illustration and are not limited.
  • the first terminal displays a cutting interface, where the cutting interface includes a fourth display frame for playing the cut second video clip .
  • the clipping interface may be the video clipping interface 530 in the above-mentioned embodiment of FIG. 5C .
  • the layout and configuration of the clipping interface in this embodiment of the present application are only exemplified and not limited; the fourth display frame may be the above-mentioned video clipping interface.
  • the shape and position of the fourth display frame in the embodiment of the present application are only exemplified and not limited.
  • the first terminal When the first terminal detects the ninth input operation for the cut interface, it executes the input operation, cuts the second video clip and the user audio collected by the terminal in the dubbing room in dubbing mode, and obtains the cut The second video clip and the cut audio; exemplarily, the ninth input operation may be the input operation (eg, click) for the first cut control 532/second cut control 533 in the above-mentioned embodiment of FIG. 5C ,
  • the embodiments of the present application only provide examples for the ninth input operation, and the embodiments of the present application do not limit other forms of the ninth input operation.
  • For the specific process and content of cutting and dubbing the video by the first terminal please refer to the specific content in the above-mentioned embodiment of FIG. 5C , which is not repeated in this embodiment of the present application.
  • the first terminal responds to the work adjustment operation, and responds to the work adjustment operation, and responds to the user audio collected by the second video segment and the terminal accessing the dubbing room in the dubbing mode.
  • Adjustment and modification are performed to obtain the adjusted and modified second video clip and the adjusted and modified audio;
  • the work adjustment operation may be an input operation (for example, a single click) for the vocal adjustment function control 513 in the above-mentioned embodiment of FIG. 5A , It can also be an input operation for the voice volume adjustment control 515 (for example, slide left/right), or an input operation for the background volume adjustment control 516 (for example, slide left/right) and other input operations.
  • the application examples are not limited.
  • Step S707 After the first terminal detects the upload instruction for the dubbed video, upload the dubbed video to the network device.
  • the dubbing video includes the second video clip and is collected by the terminal accessing the dubbing room in the dubbing mode.
  • the first terminal detects an adjustment operation of the work, and the dubbed video is a video clip including the adjusted and modified second video clip and the adjusted and modified audio.
  • the first terminal encodes the video and audio in the dubbed video, and sends the encoded video and audio to the network device in the form of a data stream.
  • the upload instruction may be an input operation (such as a click) for generating the work control 518 in the above-mentioned embodiment of FIG. 5A , and this embodiment of the present application is only for illustration and not limited.
  • Step S708 The network device transcodes and synthesizes the video and audio in the dubbing video, generates a dubbing work, and uploads the dubbing work to the dubbing personal homepage of the terminal participating in the dubbing.
  • the terminal may display and play the dubbing work after detecting the triggering operation for viewing the work on the dubbing personal homepage.
  • the first terminal intercepts the video played in its video application after detecting the interception and dubbing operation instruction.
  • the first terminal may create a dubbing using the intercepted video as the dubbing material. It makes the materials that users can dub more abundant.
  • the video dubbing method described in the embodiment of the present application supports real-time dubbing by multiple people. When the number of dubbing characters is greater than 1, the first terminal can invite friends to enter the dubbing room for real-time dubbing. When dubbing, no longer speak to the air, which improves the user's dubbing experience.
  • the first terminal 1000 may include at least: may include a processor 1010, an external memory interface 1020, an internal memory 1021, an antenna 1, an antenna 2, a mobile communication module 1040, a wireless communication module 1050, an audio module 1060, a speaker 1060A, receiver 1060B, microphone 1061, sensor module 1070, motor 1081, display screen 1091, etc.
  • the sensor module 1070 may include a pressure sensor 1070A, a touch sensor 1070B, and the like.
  • the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the first terminal 1000 .
  • the first terminal 1000 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 1010 may include one or more processing units, for example, the processor 1010 may include an application processor (Application Processor, AP), a modem processor, a graphics processor (Graphics Processing Unit, GPU), an image signal processor (Image Signal Processor, ISP), controller, memory, video codec, Digital Signal Processor (DSP), baseband processor, and/or Neural-Network Processing Unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • an application processor Application Processor, AP
  • modem processor e.g., GPU
  • ISP image signal processor
  • DSP Digital Signal Processor
  • NPU Neural-Network Processing Unit
  • the controller may be the nerve center and command center of the first terminal 1000 .
  • the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 1010 for storing instructions and data.
  • the memory in processor 1010 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 1010 . If the processor 1010 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided, and the waiting time of the processor 1010 is reduced, thereby increasing the efficiency of the system.
  • the processor 1010 may include one or more interfaces.
  • the interface may include an integrated circuit (Inter-Integrated Circuit, I2C) interface, an integrated circuit built-in audio (Inter-Integrated Circuit Sound, I2S) interface, a pulse code modulation (Pulse Code Modulation, PCM) interface, Universal Asynchronous Transmitter (Universal Asynchronous Transmitter) Receiver/Transmitter, UART) interface, Mobile Industry Processor Interface (MIPI), General-Purpose Input/Output (GPIO) interface, Subscriber Identity Module (SIM) interface, and / or Universal Serial Bus (Universal Serial Bus, USB) interface, etc.
  • I2C Inter-Integrated Circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM Pulse Code Modulation
  • Universal Asynchronous Transmitter Universal Asynchronous Transmitter
  • GPIO General-Purpose Input/Output
  • SIM Subscriber Identity Module
  • USB Universal Serial Bus
  • the I2C interface is a bidirectional synchronous serial bus that includes a serial data line (Serial Data Line, SDA) and a serial clock line (Derail Clock Line, SCL).
  • the processor 1010 may contain multiple sets of I2C buses.
  • the processor 1010 can be respectively coupled to the touch sensor 1070B, the charger, the flash, etc. through different I2C bus interfaces.
  • the processor 1010 may couple the touch sensor 1070B through the I2C interface, so that the processor 1010 and the touch sensor 1070B communicate through the I2C bus interface, so as to realize the touch function of the first terminal 1000 .
  • the I2S interface can be used for audio communication.
  • the processor 1010 may contain multiple sets of I2S buses.
  • the processor 1010 can be coupled with the audio module 1060 through the I2S bus, so as to realize the communication between the processor 1010 and the audio module 1060 .
  • the audio module 1060 can transmit audio signals to the wireless communication module 1050 through the I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
  • the PCM interface can also be used for audio communications, sampling, quantizing and encoding analog signals.
  • the audio module 1060 and the wireless communication module 1050 may be coupled through a PCM bus interface.
  • the audio module 1060 can also transmit audio signals to the wireless communication module 1050 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • a UART interface is typically used to connect the processor 1010 with the wireless communication module 1050 .
  • the processor 1010 communicates with the Bluetooth module in the wireless communication module 1050 through the UART interface to implement the Bluetooth function.
  • the audio module 1060 can transmit the audio signal to the wireless communication module 1050 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.
  • the MIPI interface can be used to connect peripheral devices such as the processor 1010 and the display screen 1091 .
  • MIPI interface includes display serial interface (Display Serial Interface, DSI) and so on.
  • the processor 1010 communicates with the display screen 1091 through a DSI interface to implement the display function of the first terminal 1000 .
  • the GPIO interface can be configured by software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface may be used to connect the processor 1010 with the display screen 1091, the wireless communication module 1050, the audio module 1060, the sensor module 1070, and the like.
  • the GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the interface connection relationship between the modules illustrated in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the first terminal 1000 .
  • the first terminal 1000 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • the wireless communication function of the first terminal 1000 may be implemented by the antenna 1, the antenna 2, the mobile communication module 1040, the wireless communication module 1050, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the first terminal 1000 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 1040 may provide a wireless communication solution including 2G/3G/4G/5G and the like applied on the first terminal 1000 .
  • the mobile communication module 1040 may include at least one filter, switch, power amplifier, low noise amplifier (Low Noise Amplifier, LNA) and the like.
  • the mobile communication module 1040 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 1040 can also amplify the signal modulated by the modulation and demodulation processor, and then convert it into electromagnetic waves for radiation through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 1040 may be provided in the processor 1010 .
  • at least part of the functional modules of the mobile communication module 1040 may be provided in the same device as at least part of the modules of the processor 1010 .
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low frequency baseband signal is processed by the baseband processor and passed to the application processor.
  • the application processor outputs a sound signal through an audio device (not limited to the speaker 1060A, the receiver 1060B, etc.), or displays an image or video through the display screen 1091 .
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 1010, and may be provided in the same device as the mobile communication module 1040 or other functional modules.
  • the wireless communication module 1050 can provide applications on the first terminal 1000 including wireless local area networks (Wireless Local Area Networks, WLAN) (such as wireless fidelity (Wireless Fidelity, Wi-Fi) networks), Bluetooth (BlueTooth, BT), global navigation Satellite system (Global Navigation Satellite System, GNSS), frequency modulation (Frequency Modulation, FM), near field communication technology (Near Field Communication, NFC), infrared technology (InfRared, IR) and other wireless communication solutions.
  • the wireless communication module 1050 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 1050 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 1010 .
  • the wireless communication module 1050 can also receive the signal to be sent from the processor 1010 , perform frequency modulation and amplification on the signal, and then convert it into electromagnetic waves for radiation through the antenna 2 .
  • the antenna 1 of the first terminal 1000 is coupled with the mobile communication module 1040, and the antenna 2 is coupled with the wireless communication module 1050, so that the first terminal 1000 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include Global System For Mobile Communications (GSM), General Packet Radio Service (General Packet Radio Service, GPRS), Code Division Multiple Access (Code Division Multiple Access, CDMA), broadband Code Division Multiple Access (Wideband Code Division Multiple Access, WCDMA), Time Division Code Division Multiple Access (Time-Division Code Division Multiple Access, TD-SCDMA), Long Term Evolution (Long Term Evolution, LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • GSM Global System For Mobile Communications
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • TD-SCDMA Time Division Code Division Multiple Access
  • LTE Long Term Evolution
  • WLAN Wireless Local Area Network
  • NFC Wireless Fide
  • the GNSS can include Global Positioning System (Global Positioning System, GPS), Global Navigation Satellite System (Global Navigation Satellite System, GLONASS), Beidou Navigation Satellite System (Beidou Navigation Satellite System, BDS), Quasi-Zenith Satellite System (Quasi - Zenith Satellite System, QZSS) and/or Satellite Based Augmentation Systems (SBAS).
  • Global Positioning System Global Positioning System, GPS
  • Global Navigation Satellite System Global Navigation Satellite System
  • GLONASS Global Navigation Satellite System
  • Beidou Navigation Satellite System Beidou Navigation Satellite System
  • BDS Beidou Navigation Satellite System
  • Quasi-Zenith Satellite System Quasi- Zenith Satellite System
  • QZSS Satellite Based Augmentation Systems
  • the first terminal 1000 implements a display function through a GPU, a display screen 1091, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 1091 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 1010 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the display screen 1091 is used to display images, videos, and the like.
  • the display screen 1091 includes a display panel.
  • the display panel can be a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), an Active Matrix Organic Light Emitting Diode or an Active Matrix Organic Light Emitting Diode (Active-Matrix Organic Light).
  • Emitting Diode, AMOLED flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (Quantum Dot Light Emitting Diodes, QLED) and so on.
  • the first terminal 1000 may include 1 or N display screens 1091 , where N is a positive integer greater than 1.
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the first terminal 1000 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy, and the like.
  • Video codecs are used to compress or decompress digital video.
  • the first terminal 1000 may support one or more video codecs. In this way, the first terminal 1000 can play or record videos in multiple encoding formats, such as: Moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG Moving Picture Experts Group
  • MPEG2 MPEG2, MPEG3, MPEG4 and so on.
  • NPU is a neural network (Neural-Network, NN) computing processor.
  • NN neural network
  • Applications such as intelligent cognition of the first terminal 1000 can be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, and the like.
  • the external memory interface 1020 can be used to connect an external memory card, such as a Micro SD card, so as to expand the storage capacity of the first terminal 1000.
  • the external memory card communicates with the processor 1010 through the external memory interface 1020 to realize the data storage function. For example to save files like music, video etc in external memory card.
  • Internal memory 1021 may be used to store computer executable program code, which includes instructions.
  • the processor 1010 executes various functional applications and data processing of the first terminal 1000 by executing the instructions stored in the internal memory 1021 .
  • the internal memory 1021 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
  • the storage data area may store data (such as audio data, phone book, etc.) created during the use of the first terminal 1000 and the like.
  • the internal memory 1021 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, Universal Flash Storage (Universal Flash Storage, UFS), and the like.
  • the first terminal 1000 may implement an audio function through an audio module 1060, a speaker 1060A, a receiver 1060B, a microphone 1061, an application processor, and the like. Such as music playback, recording, dubbing, etc.
  • the audio module 1060 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 1060 may also be used to encode and decode audio signals. In some embodiments, the audio module 1060 may be provided in the processor 1010, or some functional modules of the audio module 1060 may be provided in the processor 1010.
  • Speaker 1060A also referred to as “speaker” is used to convert audio electrical signals into sound signals.
  • the first terminal 1000 can listen to music through the speaker 1060A, or listen to a hands-free call.
  • the receiver 1060B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be answered by placing the receiver 1060B close to the human ear.
  • the microphone 1061 also called “microphone” or “microphone”, is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 1061 through a human mouth, and input the sound signal into the microphone 1061 .
  • the first terminal 1000 may be provided with at least one microphone 1061 . In other embodiments, the first terminal 1000 may be provided with two microphones 1061, which may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the first terminal 1000 may further be provided with three, four or more microphones 1061 to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions. In this embodiment of the present application, the microphone 1061 can collect the user's audio in real time, so that the processor 1010 can match the user's audio with the processed dubbing material.
  • the pressure sensor 1070A is used to sense pressure signals, and can convert the pressure signals into electrical signals.
  • the pressure sensor 1070A may be disposed on the display screen 1091 .
  • the capacitive pressure sensor may be comprised of at least two parallel plates of conductive material. When a force is applied to the pressure sensor 1070A, the capacitance between the electrodes changes.
  • the first terminal 1000 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 1091, the first terminal 1000 detects the intensity of the touch operation according to the pressure sensor 1070A.
  • the first terminal 1000 may also calculate the touched position according to the detection signal of the pressure sensor 1070A.
  • touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example, when a touch operation whose intensity is less than the first pressure threshold acts on the short message application icon, the instruction for viewing the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, the instruction to create a new short message is executed.
  • Touch sensor 1070B also called “touch panel”.
  • the touch sensor 1070B may be disposed on the display screen 1091, and the touch sensor 1070B and the display screen 1091 form a touch screen, also referred to as a "touch screen”.
  • the touch sensor 1070B is used to detect touch operations on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations may be provided through display screen 1091 .
  • the touch sensor 1070B may also be disposed on the surface of the first terminal 1000, which is different from the position where the display screen 1091 is located.
  • Motor 1081 can generate vibrating cues.
  • the motor 1081 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback.
  • touch operations acting on different applications can correspond to different vibration feedback effects.
  • the motor 1091 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 1091 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the structure of the second terminal please refer to the structure of the first terminal in the embodiment of FIG. 10 , which is not repeated in this embodiment of the present application.
  • the software system of the first terminal 1000 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiments of the present invention take the Android system with a layered architecture as an example to exemplarily describe the software structure of the first terminal 1000 .
  • FIG. 11 is a software structural block diagram of a first terminal 1000 provided by an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
  • the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, an Android runtime (Android runtime) and a system library, and a kernel layer.
  • the application layer can include a series of application packages.
  • the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message and so on.
  • the application framework layer provides an application programming interface (Application Programming Interface, API) and a programming framework for applications in the application layer.
  • API Application Programming Interface
  • the application framework layer includes some predefined functions.
  • the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.
  • Content providers are used to store and retrieve data and make these data accessible to applications.
  • the data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications.
  • a display interface can consist of one or more views.
  • the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide the communication function of the first terminal 1000 .
  • the management of call status including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
  • the notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.
  • Android Runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.
  • the core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
  • a system library can include multiple functional modules. For example: Surface Manager (Surface Manager), Media Libraries (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL) and so on.
  • Surface Manager Surface Manager
  • Media Libraries Media Libraries
  • 3D graphics processing library eg: OpenGL ES
  • 2D graphics engine eg: SGL
  • the Surface Manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
  • 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.
  • the workflow of the software and hardware of the first terminal 800 is exemplarily described below with reference to the video dubbing scene.
  • a corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into raw input events (including touch coordinates, timestamps of touch operations, etc.). Raw input events are stored at the kernel layer.
  • the application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Taking the touch operation as a touch click operation, and the control corresponding to the click operation is the start dubbing control 215 in the above-mentioned embodiment of FIG. 2A as an example, the video application calls the interface of the application framework layer, starts the dubbing function, and then invokes the kernel layer by calling the interface of the application framework layer. Start the microphone driver, collect user audio in real time through the microphone, and match the user audio with the dubbing material.
  • the software system of the second terminal may adopt a layered architecture, an event-driven architecture, a micro-kernel architecture, and a micro-service architecture, and the various embodiments of the present application may be arbitrarily combined to achieve different technical effects.
  • the software system of the second terminal please refer to the embodiment in FIG. 11 , which is not repeated in this embodiment of the present application.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer program instructions when loaded and executed on a computer, result in whole or in part of the processes or functions described herein.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk), and the like.
  • the process can be completed by instructing the relevant hardware by a computer program, and the program can be stored in a computer-readable storage medium.
  • the program When the program is executed , which may include the processes of the foregoing method embodiments.
  • the aforementioned storage medium includes: ROM or random storage memory RAM, magnetic disk or optical disk and other mediums that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Television Signal Processing For Recording (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请提供了一种视频配音的方法、相关设备以及计算机可读存储介质;其中,所述方法包括:第一终端检测到针对当前显示视频的截取配音操作指令后,截取所述当前显示视频,得到视频截取片段并显示视频配音控件;所述第一终端检测到针对所述视频配音控件的触发操作后,创建并显示针对所述视频截取片段的配音间;在所述第一终端分析所述视频截取片段的配音角色数量不为0的情况下,所述第一终端检测到针对所述配音间的第一输入操作后,显示配音界面。通过上述方法,配音素材可以从视频资源库中获取,而不用将配音素材预先上传并处理,使得配音素材的来源更加广泛,从而给予用户更好的配音体验。

Description

一种视频配音的方法、相关设备以及计算机可读存储介质
本申请要求于2021年02月24日提交中国专利局、申请号为202110205548.X、申请名称为“一种视频配音的方法、相关设备以及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频配音技术领域,尤其涉及一种视频配音的方法、相关设备以及计算机可读存储介质。
背景技术
随着《声临其境》等配音类节目走向了前台,开辟了综艺节目的新领域,给观众带来了全新娱乐体验,同时也衍生了泛娱乐社交行业的新玩法,即线上配音。
目前,线上视频配音APP主要2类:第一类是选择APP内置声线,通过文字转语音,结合背景音乐自动合成声音,适合短视频自媒体配音、广告叫卖促销配音、企业宣传、解说配音、有声阅读等场景,此类APP比如“微配音”等;第二类是根据自己制作视频素材或者用APP上已有视频配音,比较有趣味性,类似于把《声临其境》搬到线上,此类APP比如“配音秀”等。
但是,目前的线上视频配音APP仅支持已上传的配音素材配音,使得配音素材的来源有限。
发明内容
本申请实施例提供了一种视频配音的方法、相关设备以及计算机可读存储介质,解决了配音素材来源有限的问题。
第一方面,本申请实施例提供了一种视频配音的方法,包括:第一终端检测到针对当前显示视频的截取配音操作指令后,截取所述当前显示视频,得到视频截取片段并显示视频配音控件;所述第一终端检测到针对所述视频配音控件的触发操作后,创建并显示针对所述视频截取片段的配音间;在所述第一终端分析所述视频截取片段的配音角色数量不为0的情况下,所述第一终端检测到针对所述配音间的第一输入操作后,显示配音界面;其中,所述配音界面包括第一展示框,所述第一展示框用于显示和播放配音素材。通过上述方法,第一终端可以直接在视频应用中截取视频获取配音素材,使得配音素材的来源更加广泛,解决了配音素材来源有限的问题。
在一种可能实现的方式中,所述第一终端检测到针对所述视频配音控件的触发操作后,创建并显示针对所述视频截取片段的配音间之后,包括:所述第一终端向网络设备发送请求消息;所述请求消息包括所述视频截取片段的原始视频ID、所述视频截取片段的起始时间以及所述视频截取片段的结束时间;所述第一终端接收所述网络设备发送的第一响应;所述第一响应包括配音角色的数量信息;所述第一终端基于所述配音角色信息执行第一操作。这样,第一终端截取视频片段后,可以直接创建配音间获取配音素材,不必执行诸如:上传视频、添加配音角色、编辑角色字幕、添加背景音乐、添加标签等繁琐的操作,降低了用户上传配音素材的复杂度,从而,提高了用户的配音体验。
在一种可能实现的方式中,所述第一终端基于所述配音角色信息执行第一操作,包括:
若所述配音角色的数量为0,所述第一终端显示第一提示信息;所述第一提示信息用于指示所述视频截取片段不可用;若所述配音角色的数量大于1,所述第一终端检测到针对所述配音间的第二输入操作后,向第二终端发送第一指令;所述第一指令用于指示所述第二终端的视频应用账号接入所述配音间;在所述第二终端的视频应用账号接入所述配音间的情况下,所述第一终端分配配音角色并生成第一信息;所述第一信息用于指示接入所述配音间内终端的视频应用账号与所述配音角色的对应关系;所述第一终端向所述网络设备发送所述第一信息;所述第一终端接收所述网络设备发送的所述配音素材;所述配音素材是所述网络设备基于所述第一信息得到的。通过上述方法,第一终端在配音角色数量大于1的情况下,邀请其它用户进入配音间配音,实现多用户实时配音,提高了用户的配音体验。
在一种可能实现的方式中,所述第一终端分配配音角色并生成第一信息,包括:所述第一终端绑定所述配音角色与接入所述配音间内终端的视频应用账号;所述第一终端生成所述第一信息;所述第一信息用于指示接入所述配音间内终端的视频应用账号与所述配音角色的对应关系;所述第一终端向所述第二终端发送通知消息;所述通知消息用于指示所述第二终端的视频应用账号所分配的配音角色。通过上述方法,第一终端生成第一信息后有利于网络设备生成应用于多人配音的配音素材,使得第一终端可以实现多人在线实时配音,提高了用户的配音体验。
在一种可能实现的方式中,所述第一终端分配配音角色并生成第一信息,包括:所述第一终端向所述第二终端发送第二指令;所述第二指令用于指示所述第二终端的视频应用账号选择配音角色;所述第一终端接收所述第二终端发送的确认消息;所述确认消息用于指示所述第二终端选择的配音角色;所述第一终端基于所述确认消息生成所述第一信息;所述第一信息用于指示接入所述配音间内终端的视频应用账号与所述配音角色的对应关系。通过上述方法,第一终端生成第一信息后有利于网络设备生成应用于多人配音的配音素材,使得第一终端可以实现多人在线实时配音,提高了用户的配音体验。
在一种可能实现的方式中,所述第一终端基于所述配音角色信息执行第一操作之后,包括:所述第一终端检测到针对所述配音界面的第三输入操作后,暂停配音模式;其中,所述配音模式为:所述第一终端实时采集外部音频作为配音音频并在所述第一展示框中播放所述配音素材;在暂停配音模式的情况下,若所述第一终端检测到针对所述配音界面的第四输入操作后,所述第一终端向所述第二终端发送第三指令;所述第三指令用于指示所述第二终端的视频应用账号进入语音通话模式。通过上述方法,第一终端开启语音模式,使得接入配音间内的终端设备的用户可以实时对话,提高了用户之间的互动性,从而提高了用户的配音体验。
在一种可能实现的方式中,所述第一终端基于所述配音角色信息执行第一操作之后,包括:所述第一终端检测到针对所述配音界面的第三输入操作后,暂停配音模式;其中,所述配音模式为:所述第一终端实时采集外部音频作为配音音频并在所述第一展示框中播放所述配音素材;在暂停配音模式的情况下,若所述第一终端检测到针对所述配音界面的第五输入操作,所述第一终端显示回放界面;所述回放界面包括第二展示框;所述第一终端检测到针 对所述回放界面的第六输入操作后,在所述第二展示框中回放第一视频片段并回放所述第一终端和所述第二终端在所述配音模式下实时采集的外部音频;其中,所述第一视频片段为所述配音素材中已配音的视频片段。通过上述方法,第一终端可以回放已配音的视频,从而使得用户可以提前预览配音效果,使得用户可以基于配音效果对后续配音的策略进行调整,提高了用户的配音体验。
在一种可能实现的方式中,所述第一终端检测到针对所述配音间的第一输入操作后,显示配音界面之后,包括:所述第一终端检测到针对所述配音界面的第七输入操作后,显示预览界面;所述预览界面包括第三展示框,所述第三展示框用于显示第二视频片段;其中,所述第二视频片段为所述配音素材中已配音的视频片段;所述第一终端检测到针对所述预览界面的第八输入操作后,显示剪切界面;所述剪切界面包括第四展示框,所述第四展示框用于显示剪切后的所述第二视频片段;所述第一终端检测到针对所述剪切界面的第九输入操作后,剪切所述第二视频片段与所述第一终端和所述第二终端在配音模式下实时采集的外部音频。通过上述方法,用户可以对已配音的视频进行剪切,得到符合需求个性化的配音作品,提高了用户的配音体验。
第二方面,本申请实施例提供了一种视频配音的方法,包括:网络设备接收第一终端发送的请求消息;所述请求消息包括视频截取片段的原始视频ID、视频截取片段的起始时间以及视频截取片段的结束时间;所述网络设备基于所述视频截取片段的原始视频ID从视频资源库中找到所述视频截取片段的原始视频;所述网络设备基于所述视频截取片段的起始时间以及所述视频截取片段的结束时间在所述原始视频中获取所述视频截取片段的播放位置;所述网络设备基于所述视频截取片段在所述原始视频中的播放位置分析所述截取视频中的可配音的角色,并得到配音角色的数量信息;所述网络设备基于所述配音角色的数量信息生成第一响应;所述网络设备将所述第一响应发送给所述第一终端。通过上述方法,第一终端可以在配音角色数量大于1的情况下,邀请多用户在线配音,从而实现多人实时在线配音,提高了用户的配音体验。
在一种可能实现的方式中,所述网络设备基于所述配音角色信息生成第一响应之后,包括:所述网络设备接收所述第一终端发送的第一信息;所述第一信息用于指示接入所述配音间内终端的视频应用账号与所述配音角色的对应关系;所述网络设备截取所述视频截取片段在其原始视频中播放位置对应的视频片段,得到截取后的视频片段;所述网络设备基于所述第一信息对所述截取后的视频片段中已分配的配音角色进行消音处理得到配音素材;所述网络设备将所述配音素材发送给所述第一终端。通过上述方法,网络设备将配音素材发送给第一终端,使得第一终端截取视频片段后,可以直接创建配音间获取配音素材,不必执行诸如:上传视频、添加配音角色、编辑角色字幕、添加背景音乐、添加标签等繁琐的操作,降低了用户上传配音素材的复杂度,从而,提高了用户的配音体验。
第三方面,本申请实施例提供了一种视频配音的方法,包括:第二终端接收第一终端发送的第一指令;所述第一指令用于指示所述第二终端的视频应用账号接入所述第一终端创建的配音间;所述第二终端响应所述第一指令,将其视频应用账号接入所述第一终端创建的配音间。通过上述方法,可以实现多人同时在线配音,用户配音不再对空讲话,提高了用户的 配音体验。
在一种可能实现的方式中,所述第二终端响应所述第二请求消息,接入所述第一终端创建的配音间之后,包括:所述第二终端接收所述第一终端发送的通知消息;所述通知消息用于指示所述第二终端的视频应用账号所分配的配音角色。通过上述方法,在接入配音间内的第二终端选择配音角色后,网络设备会对待配音视频片段中以选择的角色进行消音处理,保留未选择的配音角色的音频以及待配音视频片段的背景音,在一定程度上保证了配音素材内容的丰富性,使得用户在配音时有更好的配音体验。
在一种可能实现的方式中,所述第二终端响应所述第二请求消息,接入所述第一终端创建的配音间之后,包括:所述第二终端接收所述第一终端发送的第二指令;所述第二指令用于指示第二终端的视频应用账号选择配音角色;所述第二终端向所述第一终端发送确认消息;所述确认消息用于指示所述第二终端的视频应用账号选择的配音角色。通过上述方法,使得用户可以根据自身的需求和兴趣选择合适的配音角色进行配音,提高了用户的配音体验。
在一种可能实现的方式中,所述第二终端响应所述第一指令,将其视频应用的用户账号接入所述第一终端创建的配音间之后,还包括:所述第二终端接收所述第一终端发送的第三指令;所述第三指令用于指示所述第二终端的视频应用账号进入语音通话模式;所述第二终端响应所述第三指令,令其视频应用账号进入所述语音通话模式。通过上述方法,第二终端进入语音模式,使得接入配音间内的终端设备的用户可以实时对话,提高了用户之间的互动性,从而提高了用户的配音体验。
第四方面,本申请实施例提供一种终端,所述终端可以为上述第一方面中的第一终端,包括:存储器、处理器和触控屏;
所述存储器,用于存储计算机程序,所述计算机程序包括程序指令;
所述处理器用于调用所述程序指令,使得所述终端执行如下步骤:检测到针对当前显示视频的截取配音操作指令后,截取所述当前显示视频,得到视频截取片段并指示所述触控屏显示视频配音控件;检测到针对所述视频配音控件的触发操作后,创建并指示所述触控屏显示针对所述视频截取片段的配音间;在分析所述视频截取片段的配音角色数量不为0的情况下,检测到针对所述配音间的第一输入操作后,指示所述触控屏显示配音界面。
在一种可能实现的方式中,所述处理器检测到针对所述视频配音控件的触发操作后,创建并指示所述触控屏显示针对所述视频截取片段的配音间之后,包括:通过通信模块向网络设备发送请求消息;通过通信模块接收所述网络设备发送的第一响应;基于所述配音角色信息执行第一操作。
在一种可能实现的方式中,所述处理器基于所述配音角色信息执行第一操作,包括:若所述配音角色的数量为0,指示所述触控屏显示第一提示信息;若所述配音角色的数量大于1,检测到针对所述配音间的第二输入操作后,通过通信模块向第二终端发送第一指令;在所述第二终端的视频应用账号接入所述配音间的情况下,分配配音角色并生成第一信息;通过通信模块向所述网络设备发送所述第一信息;通过通信模块接收所述网络设备发送的配音素材。
在一种可能实现的方式中,所述处理器分配配音角色并生成第一信息,包括:绑定所述配音角色与接入所述配音间内终端的视频应用账号;生成所述第一信息;通过通信模块向所 述第二终端发送通知消息。
在一种可能实现的方式中,所述处理器分配配音角色并生成第一信息,包括:通过通信模块向所述第二终端发送第二指令;通过通信模块接收所述第二终端发送的确认消息;基于所述确认消息生成所述第一信息。
在一种可能实现的方式中,所述处理器基于所述配音角色信息执行第一操作之后,包括:
检测到针对所述配音界面的第三输入操作后,暂停配音模式;在暂停配音模式的情况下,检测到针对所述配音界面的第四输入操作后,通过通信模块向所述第二终端发送第三指令。
在一种可能实现的方式中,所述处理器基于所述配音角色信息执行第一操作之后,包括:
检测到针对所述配音界面的第三输入操作后,暂停配音模式;在暂停配音模式的情况下,若检测到针对所述配音界面的第五输入操作,指示所述触控屏显示回放界面;检测到针对所述回放界面的第六输入操作后,指示所述触控屏在所述第二展示框中回放第一视频片段并播放所述第一终端和所述第二终端在所述配音模式下实时采集的外部音频。
在一种可能实现的方式中,所述处理器检测到针对所述配音间的第一输入操作后,指示所述触控屏显示配音界面之后,包括:检测到针对所述配音界面的第七输入操作后,指示所述触控屏显示预览界面;检测到针对所述预览界面的第八输入操作后,指示所述触控屏显示剪切界面;检测到针对所述剪切界面的第九输入操作后,剪切所述第二视频片段与所述第一终端和所述第二终端在配音模式下实时采集的外部音频。
第五方面,本申请实施例提供一种网络设备,所述网络设备可以为上述第二方面中的网络设备,包括:存储器、处理器和通信模块;
所述存储器,用于存储计算机程序,所述计算机程序包括程序指令;
所述处理器用于调用所述程序指令,使得所述网络设备执行如下步骤:通过通信模块接收第一终端发送的请求消息;基于所述视频截取片段的原始视频ID从视频资源库中找到所述视频截取片段的原始视频;基于所述视频截取片段的起始时间以及所述视频截取片段的结束时间在所述原始视频中获取所述视频截取片段的播放位置;基于所述视频截取片段在所述原始视频中的播放位置分析所述截取视频中的可配音的角色,并得到配音角色的数量信息;基于所述配音角色的数量信息生成第一响应;通过通信模块将所述第一响应发送给所述第一终端。
在一种可能实现的方式中,所述处理器基于所述配音角色信息生成第一响应之后,包括:
通过通信模块接收所述第一终端发送的第一信息;截取所述视频截取片段在其原始视频中播放位置对应的视频片段,得到截取后的视频片段;基于所述第一信息对所述截取后的视频片段中已分配的配音角色进行消音处理得到配音素材;通过通信模块将所述配音素材发送给所述第一终端。
第六方面,本申请实施例提供一种终端,所述终端可以为上述第三方面中的第二终端,包括:存储器、处理器、通信模块和触控屏;
所述存储器,用于存储计算机程序,所述计算机程序包括程序指令;
所述处理器用于调用所述程序指令,使得所述终端执行如下步骤:通过通信模块接收第一终端发送的第一指令;响应所述第一指令,将其视频应用账号接入所述第一终端创建的配音间。
在一种可能实现的方式中,所述处理器响应所述第二请求消息,接入所述第一终端创建 的配音间之后,包括:通过通信模块接收所述第一终端发送的通知消息。
在一种可能实现的方式中,所述处理器响应所述第二请求消息,接入所述第一终端创建的配音间之后,包括:通过通信模块接收所述第一终端发送的第二指令;通过通信模块向所述第一终端发送确认消息。
在一种可能实现的方式中,所述处理器响应所述第一指令,将其视频应用的用户账号接入所述第一终端创建的配音间之后,还包括:通过通信模块接收所述第一终端发送的第三指令;响应所述第三指令,令其视频应用账号进入所述语音通话模式。
第七方面,本申请提供了一种终端,所述终端可以为上述第一方面中的第一终端,包括:一个或多个功能模块。一个或多个功能模块用于执行上述第一方面任一项可能的实现方式中的视频配音的方法。
第八方面,本申请提供了一种网络设备,所述网络设备可以为上述第二方面中的网络设备,包括:一个或多个功能模块。一个或多个功能模块用于执行上述第二方面任一项可能的实现方式中的视频配音的方法。
第九方面,本申请提供了一种终端,所述终端可以为上述第三方面中的第二终端,包括:一个或多个功能模块。一个或多个功能模块用于执行上述第三方面任一项可能的实现方式中的视频配音的方法。
第十方面,本申请实施例提供了一种计算机存储介质,包括计算机指令,当计算机指令在电子设备上运行时,使得通信装置执行上述任一方面任一项可能的实现方式中的视频配音的方法。
第十一方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行上述任一方面任一项可能的实现方式中的视频配音的方法。
附图说明
图1A为本申请实施例提供的一种终端主界面图;
图1B-图1D是本申请实施例提供的一种视频应用界面图;
图1E-图1G是本申请实施例提供的一种视频应用的截取视频界面图;
图2A-图2E是本申请实施例提供的一种的配音间界面图;
图3A-图3D是本申请实施例提供的一种第一终端的配音界面图;
图3E是本申请实施例提供的一种配音回放界面图;
图3F是本申请实施例提供的另一种第一终端的配音界面图;
图4是本申请实施例提供的一种第二终端的配音界面图;
图5A-图5C是本申请实施例提供的一种第一终端的预览界面图;
图6是本申请实施例提供的一种配音个人主页界面图;
图7是本申请实施例提供的一种视频配音方法流程图;
图8是本申请实施例提供的一种创建配音间的流程图;
图9是本申请实施例提供的一种第一终端与网络设备通信的流程图;
图10是本申请实施例提供的一种第一终端的结构示意图;
图11是本申请实施例提供的一种第一终端的软件结构框图。
具体实施方式
下面将结合附图对本申请实施例中的技术方案进行清楚、详尽地描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;文本中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,另外,在本申请实施例的描述中,“多个”是指两个或多于两个。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为暗示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征,在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
本申请实施例中的第一终端和第二终端可以为智能手机、平板电脑以及智能手表等通信设备,网络设备可以为视频服务端等服务器。
下面结合应用场景,介绍本申请实施例涉及的一种视频配音的方法。
在一些应用场景中,第一终端可以为智能手机、平板电脑以及笔记本电脑等终端设备,本申请实施例以智能手机为第一终端100为例,进行举例说明。第一终端100使用视频播放软件播放视频时,接收到接收用户截取视频的操作,在截取视频成功后,第一终端100将截取保存后的视频片段发送给网络设备,网络设备对截取的视频片段进行解析和处理后,生成配音片段发送给第一终端100的视频客户端,为用户提供视频配音服务。相较于传统的配音软件,第一终端100的配音素材可以为视频播放软件中的截取片段,配音素材来源更加广泛。
示例性的,如图1A所示,第一终端100可以显示有主屏幕的界面110,该界面110中显示了一个放置有应用图标的页面,该页面包括多个应用图标(例如,设置应用图标、音乐应用图标、备忘录应用图标、云共享应用图标、视频应用图标111、云共享应用图标等)。多个应用图标下面还显示包括有页面指示符,以表明当前显示页面与其它页面的位置关系。页面指示符下方有多个托盘图标(例如拨号应用图标、信息应用图标、联系人应用图标、相机应用图标),托盘应用图标在页面切换时保持显示。在一些实施例中,上述页面也可以包括多个应用图标和页面指示符,页面指示符可以不是页面的一部分,单独存在,上述图标也是可选的,本申请实施例对此不作限制。
第一终端100可以接收用户作用于视频应用图标111的输入操作(例如单击),响应于所述输入操作,第一终端100可以显示如图1B所示的视频主界面120。
如图1B所示,所述视频主界面120包括视频搜索框121、推荐视频展示框122;其中,视频搜索框121用于第一终端100检测到外部的输入操作(例如输入字符),在视频库中搜寻视频搜索框121中字符对应的视频;推荐视频展示框122包括推荐视频展示页面,所述推荐视频展示页面用于向用户展示推荐视频,所述推荐视频的推送可以是基于用户观看视频后的评分来决定的,也可以是基于视频的播放量来决定的,本申请实施例不做限制;在所述推荐视频展示页面包括页面指示符,以表明当前推荐视频展示页面与其它推荐视频展示页面的位置关系,所述页面指示符在推荐视频展示页面切换时保持显示。视频主界面120还包括个性视频展示区域123,所述个性视频展示区域123用于第一终端100基于用户视频历史观看数 据以及大数据向用户展示符合用户观看需求的视频,在个性视频展示区域123中包括“猜你喜欢”控件1231、推送视频展示框1232以及推送视频的名称图标1233。在视频主界面120的底部有多个托盘应用图标(例如首页图标124、会员应用图标、配音应用图标125以及个人应用图标),所述托盘应用图标在视频主界面的页面切换时保持显示。当设备100检测到用户对托盘应用图标的输入操作(例如单击),响应用户的操作,第一终端100显示不同的主界面;例如,当第一终端100检测到用户对首页图标124的单击操作时,第一终端100响应用户的单击操作,显示视频主界面120;当第一终端100检测到针对配音应用图标125的输入操作(例如单击)时,第一终端100显示如图6所述的配音个人主页。在一些实施例中,上述视频主界面可以包括多个托盘应用图标,且托盘应用图标是可选的;对于上述视频主界面中的推荐视频展示框、个性视频展示区域以及个性视频展示区域中的控件和图标也是可选的,本申请实施例对此不作限制。
第一终端100检测用户作用于视频搜索框121的输入操作后,响应所述输入操作,第一终端100在视频主界面120中显示键盘输入框;第一终端100可以检测用户作用于所述键盘输入框的输入操作(例如单击),响应所述作用于键盘输入框的操作,第一终端100在视频搜索框中显示用户通过键盘输入框输出的字符;第一终端100检测用户作用于搜索图标129的单击等输入操作,响应该输入操作,第一终端100在视频资源库中搜寻用户输入的目标视频,并显示如图1D所示的搜索结果界面140。
在一种可能实现的方式中,第一终端100可以接收用户作用于如图1B中的视频搜索框121的单击等输入操作,响应于所述输入操作,第一终端100可以显示如图1C所示的搜索视频界面130。
如图1C所示,搜索视频界面130包括搜索框131、搜索控件132;在搜索框131的下方显示用户的搜索记录。当第一终端100检测到用户对搜索框131的单击等输入操作后,响应于所述输入操作,第一终端100在搜索视频界面130中显示键盘输入框133。当第一终端100检测到用户对键盘输入框133的单击等输入操作,响应于该输入操作,第一终端100在搜索框131中显示用户基于键盘输入框133输入的字符;当第一终端100检测到用户对搜索控件132的输入操作或者对键盘输入框133中的确认控件1331的单击等输入操作后,响应该输入操作,第一终端100在视频资源库中搜寻搜索框131中字符对应的目标视频,并显示如图1D所示的搜索结果界面140。在一些实施例中,视频搜索界面中搜索历史展示区是可选的,本申请实施例对此不作限制。
如图1D所示,搜索结果界面140包括视频显示区域141、搜索框142、搜索控件143、返回控件144、播放控件145、下载控件146以及多个选集控件147。其中,视频显示区域141包括第一显示区域1411和第二显示区域1412,第一显示区域用于显示搜索的目标视频的封面,第二显示区域用于显示搜索的目标视频的信息(例如视频名称、视频类型、视频参演人员等等)。第一终端100检测到用于返回控件144的输入操作后,响应所述操作,第一终端100可以显示上一显示界面(例如视频搜索界面130或视频主界面120),也可以显示界面110,本申请实施例不做限制。当第一终端100检测到用户对搜索框142的单击等输入操作后,电子设备响应所述输入操作请参考上述实施例图1C中第一终端100响应用户对于搜索框142的操作或第一终端100响应用户作用于视频搜索框121的输入操作,本申请实施例不再赘述。当第一终端100检测到用户对播放控件145的单击等输入操作后,电子设备响应所述输入操作,从起始时刻开始播放所述目标视频;在一种可选的方式中,当第一终端100检测到用户对播放控件145的单击等输入操作,电子设备响应所述操作,基于历史数据,从用户上一次 观看所述目标视频的历史时刻开始播放所述目标视频。在第一终端100检测到用户针对下载控件146的输入操作后,第一终端100检测到用户对选集控件147的单击等输入操作,第一终端100响应用户的输入操作,下载用户所选的视频。在第一终端100未检测到用户针对下载控件146的输入操作的情况下,检测到用户对选集控件147的单击等输入操作,第一终端100播放选集控件147对应集数的视频。
本申请实施例以第一终端100响应用户对播放控件145的输入操作为例进行举例说明,当第一终端100检测到用户对于播放控件145的单击等输入操作后,从起始时刻开始播放目标视频,显示如图1E的视频播放界面150。
如图1E所示,视频播放界面150用于显示正在播放的视频,包括锁定屏幕控件151、播放/暂停控件152、视频截取控件153以及进度调整控件155。其中,时间进度条154的左侧可以显示当前视频播放的时长,时间进度条154的右侧可以显示当前播放视频的总时长,随着视频的播放,时间进度条154上的进度调整控件155从左至右滑动,当进度调整控件155滑动到时间进度条154的最右端时,视频播放结束;当第一终端100检测到用户对进度调整控件155的输入操作(例如左滑或右滑)时,第一终端100从进度调整控件155在时间进度条154所在位置对应的时刻开始播放视频;当视频处于播放状态时,第一终端100检测到用户对播放/暂停控件152的单击等输入操作时,第一终端100响应该输入操作,暂停视频,并将播放/暂停控件152的功能设置为播放视频功能(即当第一终端100再次检测到针对播放/暂停控件152的输入操作时,第一终端100播放视频);当视频处于暂停播放状态时;第一终端100检测到用户对播放/暂停控件152的单击等输入操作时,第一终端100响应该输入操作,播放视频,并将播放/暂停控件152的功能设置为暂停视频的功能(即当第一终端100再次检测到针对播放/暂停控件152的输入操作时,第一终端100暂停播放视频)。视频截取控件153可用于第一终端100截取当前播放视频的片段,第一终端100检测到用户对视频截取控件153的单击等输入操作时,第一终端100可以显示图1F所示的视频片段截取界面160。在一些实施例中,视频播放界面150除了上述控件外,还有其它控件,这些控件都是可选的,本申请实施例仅做举例说明,不做限制。
如图1F所示,视频片段截取界面160包括时间进度条161、第一截取控件162以及第二截取控件163。当第一终端100检测到用户对第一截取控件162的输入操作(例如左滑或者右滑)时,第一终端100显示第一截取控件162在时间进度条161所在位置对应时刻的视频帧图像,时间进度条161的左侧显示第一截取控件162在时间进度条161所在位置对应时刻;同理,当第一终端100检测到用户对第二截取控件163的输入操作(例如左滑或者右滑)时,第一终端100显示第二截取控件163在时间进度条161所在位置对应时刻的视频帧图像,时间进度条161的左侧显示第二截取控件163在时间进度条161所在位置对应时刻。当第一终端100检测到用户对确定控件164的输入操作(例如单击)后,第一终端100以第一截取控件162在时间进度条161所在位置对应时刻(图1F为04:15)为起始剪切点,以第二截取控件163在时间进度条161所在位置对应时刻(图1F为20:39)为结束剪切点来剪切原视频,得到剪切视频。在一些实施例中,视频片段截取界面的第一截取控件在时间进度条所在位置对应时刻和第二截取控件在时间进度条所在位置对应时刻显示的位置可变,本申请实施例不做限制。第一终端100剪切视频后,显示如图1G所示的配音选择界面170。
如图1G所示,配音选择界面170包括截取视频片段展示区域171和多个应用图标(例如分享好友应用图标和配音应用图标172等)。在一些实施例中,配音选择界面的应用图标是可选的,本申请实施例仅列举两个作举例说明,对于应用图标的选择,本申请实施例不做限 制。其中,截取片段展示区域171用于显示截取视频片段的封面;第一终端100可以检测用户对配音应用图标172的输入操作(例如单击),当电子设备检测到所述输入操作后,第一终端100执行所述输入操作,创建配音间,并显示图2A所示的配音间的第一界面210。
如图2A所示,配音间第一界面210包括配音间ID图标211、返回控件212、展示区域213以及信息展示框214。其中,配音间ID图标211显示当前配音间的ID号,配音间的ID是唯一的,用于区别当前配音间和其它配音间;展示区域213用于显示配音素材的封面,配音素材为上述截取的视频片段;信息展示框214用于显示配音素材的信息(例如配音角色名、配音角色性别以及配音素材的来源等信息)。配音间的第一界面210还包括开始配音控件215和邀请好友控件216。当第一终端100检测到配音素材中可配音角色的数量为一个时,第一终端100不检测用户对邀请好友控件216的输入操作。当第一终端100检测到配音素材中可配音角色的数量等于一个,且第一终端100未检测到用户对邀请好友控件216的输入操作时,若第一终端100检测到对开始配音控件215的输入操作,第一终端100响应该输入操作,显示配音间的第二界面;其中,所述配音间第二界面为单人配音界面。当第一终端100检测到配音素材中可配音角色的数量大于一个时,第一终端100检测到对邀请好友控件216的输入操作后,第一终端100在配音间的第一界面210中显示图2B所示的邀请好友信息框217。
如图2B所示,邀请好友信息框217包括邀请微信好友控件2171、邀请QQ好友控件2172以及邀请视频好友控件2173等其他控件。当第一终端100检测到对要求微信好友控件2171和邀请QQ好友控件2172的输入操作(例如单击)时,第一终端100的主屏幕会显示微信或QQ等通信软件的好友列表界面;当第一终端100检测到针对所述好友列表界面的输入操作时,第一终端100会向第二终端200发送配音请求链接,第二终端200为所述好友列表中被选中好友的智能手机或平板电脑等终端设备;当第二终端200检测到对所述配音请求链接的输入操作后(例如单击),第二终端200检测安装的视频应用,并打开所述视频应用;在第二终端200检测到使用微信或QQ等社交账号登录视频应用的输入操作后,登录视频应用并进入配音间。本申请实施例以第一终端100检测到对邀请视频好友控件2173的输入操作(例如单击),第一终端100响应该输入操作显示图2C所示的好友列表界面220为例进行举例说明。
如图2C所示,好友列表界面220包括多个好友显示框221,在每个好友显示框中包括好友名称、好友图标以及选择控件222。当第一终端100检测到对于选择控件222的输入操作时(例如单击),第一终端100会选中好友。另外,在好友列表界面220中还包括搜索框223以及搜索图标224,用于第一终端100检测到针对搜索框233以及搜索图标224的输入操作后,第一终端100在好友列表中搜索所述搜索框223中的对应好友用户名;在图2C中,第一终端100检测到对用户李华以及用户小丽对应的选择控件的输入操作后,第一终端100响应该输入操作,向用户李华以及用户小丽的电子设备200发送配音链接,使得李虎和小丽的第二电子设备中的视频应用能够接入配音间。当第一终端100检测到好友列表界面220中的完成控件225后,第一终端100进入图2D所示的角色选择界面230。
如图2D所示,当第一终端100检测到邀请的好友的第二终端200中的视频应用都接入配音间后,第一终端100显示角色选择界面230,角色选择界面230包括展示区域231以及角色选择功能区域232。其中,展示区域231用于显示配音素材的封面图像,角色选择功能区域232用于配音角色的分配。在图2D中,角色选择功能区域232显示配音角色姓名以及性别等信息,每个配音分别对应一个角色分配控件233。当第一终端100检测到用户对角色分配控件233的输入操作(例如单击),第一终端100在角色选择界面230上显示用户选择框234;其中,用户选择框234显示参与配音的用户名和头像,用户选择框234包括多个选择控 件235,每个选择控件235对应一个用户,当第一终端100检测到对选择控件235的输入操作(例如单击),第一终端100为选择控件对应的用户分配配音角色并显示如图2E所示的角色选择界面。例如,在图2D中,第一终端100检测到针对角色B的角色分配控件233的输入操作后,第一终端100响应该操作,显示用户选择框234,并在用户选择框234中显示进入配音间的用户名以及用户头像;在检测到针对用户李华对应的选择控件235的单击操作后,响应该单击操作,将角色B分配给用户李华。
在图2E中,第一终端100检测并响应用户分配配音角色的操作后,第一终端100在角色选择功能区域232中显示已分配角色的用户,每个已分配角色的用户对应一个撤销控件236。当第一终端100检测到用户针对撤销控件236的单击等输入操作时,第一终端100响应该输入操作,取消用户已分配的配音角色,并将所述用户的头像和用户名等信息移除角色选择功能区域232,将撤销控件236替换为角色分配控件233。如图2E所示,当第一终端100响应配音角色分配的相关输入操作后,第一终端100检测到对开始配音控件215的单击等输入操作后,第一终端100响应该输入操作,显示如图3A所示配音间的配音界面310。
如图3A所示,配音界面310包括配音片段显示区域311,字幕显示区域312、播放/暂停控件313、配音控件314以及提交控件315。当第一终端100检测到用户对播放/暂停控件313的单击等输入操作时,第一终端100响应所述输入操作,开启配音模式,即在配音片段显示区域311中播放配音视频,并在字幕显示区域312中滚动显示字幕,实时采集外部音频;并将播放/暂停控件313的功能设置为成暂停配音功能。当配音角色变化时,在字幕显示区域312中显示提示信息316,提示信息316用于指示待配音的准备时间。当第一终端100检测到对配音控件314的单击等输入操作时,第一终端100响应该输入操作,接收并保存用户的语音。在第一终端100处于配音模式的过程中,若检测到对播放/暂停控件313的单击等输入操作,第一终端100响应该输入操作,暂停配音模式,并将播放/暂停控件313的功能设置为成开始配音功能,并在配音片段显示区域311中显示暂停图标317。当第一终端100检测到提交控件315的输入操作后,在配音显示界面310中显示如图3F所示的完成提示框323,当检测到用户对“是”控件的输入操作时,第一终端100显示如图5A所示的预览界面510。
本申请实施例中第一终端100是创建配音间的设备,第一终端100邀请其它用户的电子设备为第二终端200,第二终端200的配音界面为如图4所示的配音界面410,在配音界面410中,包括配音控件411;当第二终端200检测到对配音控件411的单击等输入操作时,第二终端200响应该输入操作,接收并保存用户的语音。
当第一终端100检测到对播放/暂停控件313或对暂停控件317的单击等输入操作时,第一终端100恢复配音间的配音模式,第一终端100和第二终端200继续实时采集外部音频。在配音模式暂停的情况下,若第一终端100检测到对更多功能控件318的单击等输入操作后,显示如图3B所示更多操作功能框319。
如图3B所示,第一终端100在暂停配音模式的情况下,检测到对更多功能控件318的单击等输入操作时,第一终端100显示更多操作功能框319;更多操作功能框319包括语音通话功能控件3191和回放配音功能控件3192。当第一终端100检测到对语音通话功能控件3191的单击等输入操作时,第一终端100响应输入操作,进入语音通话模式,在配音界面310中显示如图3C所示的语音控件320和退出控件321。当第一终端100检测到对语音控件320的输入操作时,第一终端100实时采集用户的语音,并在配音间中实时播放所述用户的语音;当第一终端100再次检测到对语音控件320的输入操作时,第一终端100停止采集和停止播放用户的语音,不再对第一终端100的用户开放语音权限。当在语音通话模式下,第一终端 100检测到对退出控件321的单击等输入操作时,第一终端100在配音界面310中显示如图3D所示的语音模式功能框322。
如图3D所示,语音模式功能框322包括“是”控件3221和“否”控件3222,当第一终端100检测到对“是”控件3221的单击等输入操作时,第一终端100退出语音通话模式,回到如图3A所示的配音界面310。
如图3B所示,当第一终端100检测到对回放配音功能控件321的单击等输入操作时,电子设备响应该输入操作,并显示如图3E所示的配音回放界面330。
如图3E所示,配音回放界面330包括配音作品展示区域331、时间进度条332、进度拖条333、后退控件334、前进控件335以及播放/暂停控件336。其中,配音作品展示区域331用于播放用户录制的配音视频,精度拖条333用于调整配音作品的播放进度。当第一终端100检测到对进度拖条333的输入操作(例如向左滑/向右滑)时,配音作品展示区域331显示进度拖条333在时间进度条332上对应时刻的配音视频的图像帧。当第一终端100检测到对后退控件334的单击等输入操作时,第一终端100将配音视频在时间进度条332上的播放时刻后退预设时间段;如图3E所示,配音视频在时间进度条上的播放时刻为6s,当第一终端100接收到用户对后退控件334的单击操作时,若预设时间段为5s,那么,配音视频的播放进度为1s,即配音视频当前的播放时刻在时间进度条332上显示的时刻为1s。在配音视频播放的情况下,当第一终端100检测到对播放/暂停控件336的单击等输入操作时,第一终端100停止播放用户录制的配音视频;在配音视频暂停播放的情况下,当第一终端100检测到对播放/暂停控件336的单击等输入操作时,第一终端100播放用户录制的配音视频。当第一终端100检测到对返回控件336的单击等输入操作时,第一终端100返回如图3A所述的配音界面310。
如图5A所示,预览界面510包括配音视频播放区域511、进度调节控件512、人声调节功能控件513、视频剪切功能控件514、人声音量调节控件515、背景音量调节控件516、重录控件517以及生成作品控件518。当第一终端100显示预览界面510时,配音视频播放区域511开始播放已配音视频;当第一终端100检测到对进度调节控件512的输入操作时(例如左滑/右滑),第一终端100调整已配音视频的播放进度;同理,当第一终端100检测到对人声音量调节控件515向左滑/向右滑的输入操作时,第一终端100调高/调低已配音视频中配音角色的音量;当第一终端100检测到对背景音量调节控件516向左滑/向右滑的输入操作时,第一终端100调高/调低已配音视频中的背景音量;当第一终端100检测到对重录控件517的单击等输入操作时,第一终端100显示如图3A的配音界面310,此时,需要配音间内的用户重新配音;当第一终端100检测到对人声调节功能控件513的单击等输入操作时,电子设备在预览界面510中显示如图5B所示的人声调节功能框519;当第一终端100检测到对生成作品控件518的单击等输入操作时,第一终端100上传配音作品,并将配音作品发送到参与配音用户的配音个人主页上。当第一终端100检测到对视频剪切功能控件515的单击等输入操作时,第一终端100显示如图5C所示的视频剪切界面530。
如图5B所示,人声调节功能框包括减少噪音应用图标5191、人声后退图标5192、人声前进图标5193;当第一终端100检测到对减少噪音应用图标5191的单击等输入操作时,第一终端100会降低用户配音音频中“非人声”的杂音的音量(例如用户配音时,周围环境的噪音等),以保证配音作品的音质;当第一终端100检测到对人声后退图标5192的单击等输入操作时,第一终端100调整配音角色的音频,使得配音角色的音频延迟于对应的字幕和配音视频的图像帧;当第一终端100检测到对人声前进图标5193的单击等输入操作时,第一终端100调整配音角色的音频,使得配音角色的音频超前于对应的字幕和配音视频的图像帧; 当第一终端100检测到对保存控件5194的单击等输入操作时,第一终端100保存调节后的已配音视频,并回到图5A所示的预览界面510。
如图5C所示,视频剪切界面530包括配音视频播放区域531、字幕预览区域534、第一剪切控件532、第二剪切控件533。当第一终端100检测到对第一剪切控件532/第二剪切控件533的输入操作(例如左滑/右滑)时,第一终端100响应所述输入操作,在配音视频播放区域531上显示配音视频对应的时刻的图像帧;所述对应时刻为第一剪切控件532/第二剪切控件在时间进度条上对应的时刻。同时,第一终端100在字幕预览区域显示所述图像帧对应的字幕。例如,在图5C中,当第一终端100检测到第一剪切控件532被左滑到时间进度条上的00:03时刻,第一终端100会在配音视频播放区域431上显示配音视频在00:03时刻的图像帧,以及在字幕预览区域534展示在00:03时刻图像帧对应的字幕;通过上述方式,当用户剪切配音视频时,第一终端100通过向用户展示配音视频起始剪切点(例如图5C中的00:03)和结束剪切点(例如图5C中的00:12)的图像帧以及字幕,让用户获悉其配音视频剪切点的确定是否符合预期。当第一终端100检测到对保存控件535的单击等输入操作时,第一终端100以第一剪切控件532在进度条上的时刻为起始剪切点,以第二剪切控件533在进度条上的时刻为结束剪切点剪切配音视频,并保存已剪切的配音视频。
当第一终端100检测到针对预览界面510的生成作品控件518的单击等输入操作后,第一终端100将配音作品上传至如图6所示的配音用户的配音个人主页610。当第一终端100检测到针对如图1B中视频主界面120中的配音应用图标120的单击等输入操作时,第一终端100显示如图6所示的配音个人主页610。在配音个人主页610中包括用户信息栏611,用户信息栏611包括作品信息栏6111、好友信息栏6112、关注信息栏以及收藏信息栏。当第一终端100检测到针对配音作品信息栏6111的输入操作(例如单击)时,第一终端100显示用户的配音作品。
下面结合附图详细描述本申请实施例提供的视频配音方法。
请参见图7,图7是本申请实施例提供的一种视频配音方法流程图。在图7实施例中,所述第一终端在其视频应用中截取视频,得到截取视频片段,并为所述截取视频片段配音,得到配音作品。最后,第一终端将所述配音作品上传至参与配音用户的配音个人主页上。示例性的,第一终端可以为上述实施例中的第一终端100,第二终端可以为上述实施例中的第二终端200。下面,对视频配音的流程进行展开描述:
步骤S701:第一终端检测到针对其视频播放界面的截取视频指令后,显示截取界面。
示例性的,该截取视频指令可以是上述图1E实施例中针对视频播放界面150中视频截取控件153的输入操作(例如单击);也可以是语音信号,例如,当第一终端的语音功能模块接收到截取视频的语音指令后,可以通过内部的语音识别模块对所述音频进行识别和解析,解析完成后,生成触发信号来触发第一终端截取在其视频播放界面中显示的视频;本申请实施例仅对截取配音操作指令进行举例说明,截取配音操作指令的具体形式本申请实施例不做限制。其中,视频播放界面150不限定通过图1A-图1D的过程来触发显示,本申请实施例不做限制;该截取界面可以为上述图1F实施例中的视频片段截取界面160,对于截取界面的布局和配置,本申请实施例仅作举例说明,不做任何限制。
步骤S702:第一终端检测到针对当前显示视频的截取配音操作指令后,截取所述当前显示视频,得到视频截取片段并显示视频配音控件。
示例性的,该截取配音操作指令可以为上述图1F实施例中针对第一截取控件162/第二截取控件163的输入操作(例如单击),第一终端截取视频的过程请参考上述图1F的实施例, 本申请实施例不再赘述。
在第一终端截取视频并保存视频截取片段后,显示视频配音控件。示例性的,所述视频配音控件可以为上述图1G实施例中的配音应用图标172,本申请实施例仅做举例说明,不做限制。
步骤S703:第一终端检测到针对所述配音控件的触发操作后,创建并显示针对所述视频截取片段的配音间。
示例性的,该触发操作可以为上述图1F实施例中针对配音应用图标172的输入操作(例如单击),本申请实施例仅做举例说明,不做限制。配音间的界面可以为上述图2A实施例中的配音间第一界面,本申请实施例仅做举例说明,不做限制。
具体地,第一终端创建配音间之后与网络设备交互的流程可以如图8所示;其中,所述网络设备为视频服务端。下面,结合附图对其进行展开描述:
步骤S801:第一终端向网络设备发送请求消息。
具体地,所述请求消息包括视频截取片段的原始视频ID、视频截取片段的起始时间、视频截取片段的结束时间以及配音请求;视频截取片段的起始时间为所述视频截取片段在原始视频的起始播放时间,视频截取片段的结束时间为所述视频截取片段在原始视频中的结束播放时间。
步骤S802:网络设备基于所述请求消息生成第一响应。
具体地,网络设备在接收到请求消息后,通过原始视频的ID在视频资源库中找到原始视频,然后基于视频截取片段在原始视频中的起始播放时间和视频截取片段在原始视频中的结束播放时间找到该视频截取片段在原始视频中的播放位置(原始视频中所述视频截取片段对应的播放时间段),并基于原始视频的信息(例如角色信息、音频信息等)使用AI模块分析在该播放位置中视频中的配音角色的信息(例如角色的名称、角色的性别等信息)以及配音角色的数量。另外,网络设备响应请求消息中的配音请求,生成配音间ID,该配音间ID具备唯一性,用于区分第一终端创建的配音间与其它配音间,以免第一终端与网络设备交互的过程中,网络设备将消息错发给其它终端设备。然后,网络设备生成第一响应;第一响应包括配音角色的信息以及配音间ID以及配音角色的数量信息。
步骤S803:网络设备将第一响应发送给第一终端。
步骤S804:第一终端基于第一响应对所述截取视频片段进行分析和处理。
具体地,第一终端接收到第一响应后,基于第一响应中的配音角色的数量信息对该截取视频片段有以下三种处理情况:
第一种情况,当配音角色数量为0时,此时该截取视频片段不能用于配音,为无效视频片段;第一终端会在其诸如触控屏等显示区域上显示提示消息,所述提示消息用于指示用户所述视频截取片段不可用。在这种情况下,第一终端可以接收用户上传的其他视频片段或者重新截取视频片段作为配音素材。
第二种情况,当配音角色数量为1时,所述配音间为单人配音间,第一终端不能像其它终端设备发送配音邀请链接;
第三种情况,当配音角色数量为N(N>1)时,第一终端最多可以邀请N-1个第二终端的视频应用账号接入配音间;当第一终端检测到针对配音间的第二输入操作时,即第一终端接收到邀请好友的指令时,第一终端向第二终端发送邀请链接,所述邀请链接为第二请求消息;第二终端为在第一终端的视频应用账号的好友列表中好友对应的终端设备。示例性的,第一终端的视频应用账号的好友列表界面可以是上述图2C实施例中的好友列表界面220,第 一终端邀请好友的具体操作和过程请参考上述图2A~图2C实施例中的具体内容,本申请实施例不再赘述。
第一终端的视频应用账号的好友列表可以是微信、QQ等通信软件中的好友列表,也可以是视频应用账号的好友列表中关注的好友列表。当第一终端向例如微信等第三方软件中的好友列表发送邀请链接时,第二终端即可通过使微信等第三方软件的账号登录视频应用,从而接入第一终端创建的配音间。
步骤S704:在第一终端分析视频截取片段的配音角色数量不为0的情况下,第一终端分配配音角色。
具体的,在配音角色数量为N(N>1)的情况下,第一终端分配配音角色,分配配音角色的方式主要有两种:
第一种方式,第一终端为接入配音间终端的视频应用账号分配配音角色,示例性的,第一终端分配配音角色的内容和过程参考上述图2D~图2E实施例中的内容,本申请实施例不再赘述。第一终端分配配音角色完毕后,会向第二终端发送通知消息以便第二终端的视频应用账号知晓其所对应的配音角色。然后,第一终端生成用于指示配音间内终端的视频应用账号与配音角色的对应关系的第一信息,并将其发送给网络设备。
第二种方式,第一终端开放权限让接入配音间终端的视频应用账号选择配音角色,即第一终端向第二终端发送指示第二终端的视频应用账号选择配音角色的第二指令;其中,每个视频应用账号只能选择一个配音角色,两个及其以上的视频应用账号不能选择同一个配音角色。当配音角色选择完毕后,第二终端向第一终端发送用于告知第一终端其选择的配音角色的确认消息。然后,第一终端基于确认消息生成用于指示配音间内终端的视频应用账号与配音角色的对应关系的第一信息,并将第一信息发送给网络设备。
第一终端将第一信息发送给网络设备后,网络设备基于所述第一信息执行一系列操作,最终将效应后的视频片段作为处理后的配音素材发送给第一设备用于配音。网络设备基于所述第一信息执行一系列操作的具体内容请参见图9,图9是本申请实施例提供的一种第一终端与网络设备通信的流程图,下面结合附图,对所述流程进行展开说明:
步骤S901:第一终端将第一信息发送给网络设备。
步骤S902:网络设备截取所述视频截取片段在其原始视频中播放位置对应的视频片段,得到截取后的视频片段。
步骤S903:网络设备基于所述第一信息对所述截取后的视频片段中对应的配音角色进行消音处理得到配音素材。
具体地,网络设备基于第一信息中第二终端的视频账号与视频截取片段中配音角色的对应关系,将所述截取后的视频片段中已选择的配音角色进行消音处理,得到配音素材。需要说明的是,当视频截取片段中的配音角色数量为N时,若只对其中的N-1个配音角色进行分配;那么网络设备只对所述截取后的视频片段中的该N-1个配音角色进行消音处理。
步骤S904:网络设备将配音素材发送给第一终端。
在配音角色数量等于1的情况下,第一终端无需分配和选择配音角色,网络设备自动将消音后的截取后的视频片段作为配音素材发送给第一终端。
步骤S705:在第一终端分析视频截取片段的配音角色数量不为0的情况下,第一终端检测到针对所述配音间的第一输入操作后,显示配音界面。
示例性的,所述第一输入操作可以为上述图2A实施例中针对开始配音控件215的输入操作(例如单击),也可以为对配音间输入语音指令的操作,本申请实施例仅作举例说明,不 做限制;配音界面可以为上述图3A实施例中的配音界面310,对于配音界面的布局和配置,本申请实施例仅作举例说明,不做限制。另外,在第一终端检测到针对所述配音间的第一输入操作后,接入配音间的第二终端也显示配音界面。
需要说明的是,第一终端是创建配音间的主设备,相较于第二终端,第一终端拥有更多的操作权限;例如,第一终端有暂停/开始配音模式的权限、回放配音视频的权限以及开启/关闭语音通话模式的权限。示例性的,第一终端执行暂停/开始配音模式的权限可以如上述图3A实施例中,检测到对播放/暂停控件313的输入操作(例如单击),来开启/暂停配音模式,本申请实施例仅作举例说明,不做限制;当处于配音模式时,第一终端和第二终端实时采集外部音频,并在其配音界面的第一展示框中播放显示所述配音素材。在处于配音模式的情况下,当第一终端检测到针对所述配音界面的第三输入操作后,暂停配音模式;所述第三输入操作可以为上述图3A实施例中,检测到对播放/暂停控件313的输入操作(例如单击),本申请实施例仅举例说明,不做限制。
示例性的,第一终端执行开启/关闭语音通话模式的权限可以入上述图3B实施例中,检测到对语音通话功能控件3191的输入操作(例如单击),来开启/关闭语音通话模式。当第一终端开启语音通话模式后,会向接入配音间中的第二终端发送用于指示其进入语音通话模式的第三指令;语音通话模式的界面可以为上述图3C实施例中的配音界面310,本申请实施例仅作举例说明,不做限制。在语音通话模式下,接入配音间内的终端可以实时采集用户音频,并将采集到的用户音频实时传输至配音间中的语音通话平台,由语音通话平台将所述用户实时音频发送给接入配音间内的所有终端,实现的多用户实时语音交流。
当第一终端处于暂停配音模式的情况下,第一终端检测到针对配音界面的第五输入操作来执行回放配音视频的权限。示例性的,第五输入操作可以为上述图3B实施例中针对回放配音功能控件3192的单击等输入操作,也可以为语音指令等输入操作,本申请实施例仅作举例说明,不做限制;当检测到第五输入操作后,第一终端显示回放界面,所述回放界面包括展示回放配音作品的第二展示框。示例性的,回放界面可以为上述图3E实施例中的配音回放界面330,第二展示框可以为上述图3E实施例中的配音作品展示区域331;对于第二展示框的形状以及在回放界面的位置布局,本申请实施例仅做举例说明,不做任何限制。在第一终端检测到针对回放界面的第六输入操作后,在第二展示框中回放第一视频片段并回放在配音模式下接入配音间内的终端采集的用户音频;其中,第六输入操作为回放配音视频的输入操作,配音模式为接入配音间的终端设备在其配音界面中播放消音后的配音素材,并实时采集外部的音频的工作模式,第一视频片段为所述配音素材中已配音的视频片段;示例性的,第六输入操作可以为上述图3E实施例中针对进度拖条333的输入操作(例如左滑/右滑),也可以是针对后退控件334/前进控件335的输入操作(例如单击),本申请实施例仅做举例说明,不做限制。第一终端回放配音作品的具体操作和过程可参考上述图3E实施例的具体内容,此处不再赘述。
步骤S706:第一终端检测到针对所述配音界面的第七输入操作后,显示预览界面。
示例性的,第七输入操作可以为上述图3A实施例中,针对提交控件315的输入操作(例如单击),也可以为用户的语音输入操作,本申请实施例仅作举例说明,不做限制。预览界面包括用于显示第二视频片段的第三展示框;其中,所述第二视频片段为所述配音素材中已配音的视频片段。示例性的,预览界面可以为上述图5A实施例中的预览界面510,第三展示框可以为视频播放区域511,对于预览界面的布局和配置以及第三展示框的形状以及在预览界面中的位置布局,本申请实施例仅做举例说明,不做限制。
在一种可能实现的方式中,第一终端检测到针对作品预览界面的第八输入操作后,显示剪切界面,该剪切界面包括用于播放剪切后第二视频片段的第四展示框。示例性的,剪切界面可以为上述图5C实施例中的视频剪切界面530,本申请实施例对剪切界面的布局和配置仅做举例说明,不做限制;第四展示框可以为上述图5C实施例中的配音视频播放区域531,本申请实施例对第四展示框的形状以及在剪切界面中的位置仅做举例说明,不做限制。当第一终端检测到针对剪切界面的第九输入操作后,执行该输入操作,剪切第二视频片段和在配音模式下接入配音间内的终端采集的用户音频,得到剪切后的第二视频片段以及剪切后的音频;示例性的,第九输入操作可以为上述图5C实施例中针对第一剪切控件532/第二剪切控件533的输入操作(例如单击),本申请实施例对于第九输入操作仅做举例说明,对于第九输入操作的其它形式,本申请实施例不做限制。第一终端剪切配音视频的具体过程和内容请参考上述图5C实施例中的具体内容,本申请实施例在此不做赘述。
在一种可能实现的方式中,第一终端检测到针对预览界面的作品调节操作后,响应该作品调节操作,对第二视频片段以及在配音模式下接入配音间内的终端采集的用户音频进行调节和修改得到调节修改后的第二视频片段以及调节修改后的音频;示例性,作品调节操作可以为上述图5A实施例中针对人声调节功能控件513的输入操作(例如单击),也可以为针对人声音量调节控件515的输入操作(例如左滑/右滑),也可以为针对背景音量调节控件516的输入操作(例如左滑/右滑)等其他控件的输入操作,本申请实施例不做限制。
步骤S707:所述第一终端检测到针对配音视频的上传指令后,将所述配音视频上传网络设备。
具体地,若在预览界面中,第一终端没有检测到所述第八输入操作或作品调节操作,所述配音视频为包括第二视频片段和在配音模式下接入配音间内的终端采集的用户音频的视频片段;若在预览界面中,第一终端检测到所述第八输入操作,所述视频片段为包括剪切后的第二视频片段和剪切后的音频的视频片段;若在预览界面中,第一终端检测到作品调节操作,所述配音视频为包括调节修改后的第二视频片段和调节修改后的音频的视频片段。第一终端检测到上传指令后,将配音视频中的视频和音频进行编码,并将编码后的视频和音频以数据流的形式发送给网络设备。示例性的,上传指令可以是上述图5A实施例中针对生成作品控件518的输入操作(例如单击),本申请实施例仅做举例说明,不做限制。
步骤S708:网络设备将配音视频中的视频和音频进行转码合成,生成配音作品并将配音作品上传至参与配音的终端的配音个人主页。
具体的,所述终端检测到针对配音个人主页查看作品的触发操作后,即可展示并播放配音作品。
本申请实施例,第一终端检测到截取配音操作指令后截取在其视频应用中播放的视频,当用户相位所述截取视频配音时,第一终端可以创建以所述截取视频为配音素材的配音间;使得用户可配音的素材更加丰富。另外,本申请实施例所述的视频配音方法支持多人实时配音,当配音角色的数量大于1时,第一终端可以邀请好友进入配音间实时配音,相比传统多人依次配音的方法,用户在配音时,不再对空说话,提高了用户的配音体验。
接下来介绍本申请实施例中的第一终端的结构。如图10所示,第一终端1000至少可以包括:可以包括处理器1010,外部存储器接口1020,内部存储器1021,天线1,天线2,移动通信模块1040,无线通信模块1050,音频模块1060,扬声器1060A,受话器1060B,麦克风1061,传感器模块1070,马达1081,显示屏1091等。其中传感器模块1070可以包括压 力传感器1070A,触摸传感器1070B等。
可以理解的是,本发明实施例示意的结构并不构成对第一终端1000的具体限定。在本申请另一些实施例中,第一终端1000可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器1010可以包括一个或多个处理单元,例如:处理器1010可以包括应用处理器(Application Processor,AP),调制解调处理器,图形处理器(Graphics Processing Unit,GPU),图像信号处理器(Image Signal Processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(Digital Signal Processor,DSP),基带处理器,和/或神经网络处理器(Neural-Network Processing Unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是第一终端1000的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器1010中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器1010中的存储器为高速缓冲存储器。该存储器可以保存处理器1010刚用过或循环使用的指令或数据。如果处理器1010需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器1010的等待时间,因而提高了系统的效率。
在一些实施例中,处理器1010可以包括一个或多个接口。接口可以包括集成电路(Inter-Integrated Circuit,I2C)接口,集成电路内置音频(Inter-Integrated Circuit Sound,I2S)接口,脉冲编码调制(Pulse Code Modulation,PCM)接口,通用异步收发传输器(Universal Asynchronous Receiver/Transmitter,UART)接口,移动产业处理器接口(Mobile Industry Processor Interface,MIPI),通用输入输出(General-Purpose Input/Output,GPIO)接口,用户标识模块(Subscriber Identity Module,SIM)接口,和/或通用串行总线(Universal Serial Bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(Serial Data Line,SDA)和一根串行时钟线(Derail Clock Line,SCL)。在一些实施例中,处理器1010可以包含多组I2C总线。处理器1010可以通过不同的I2C总线接口分别耦合触摸传感器1070B,充电器,闪光灯等。例如:处理器1010可以通过I2C接口耦合触摸传感器1070B,使处理器1010与触摸传感器1070B通过I2C总线接口通信,实现第一终端1000的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器1010可以包含多组I2S总线。处理器1010可以通过I2S总线与音频模块1060耦合,实现处理器1010与音频模块1060之间的通信。在一些实施例中,音频模块1060可以通过I2S接口向无线通信模块1050传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块1060与无线通信模块1050可以通过PCM总线接口耦合。在一些实施例中,音频模块1060也可以通过PCM接口向无线通信模块1050传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器1010与无线通信模块1050。例如:处理器1010通过UART接口与无线通信模块1050中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块1060可以通过UART 接口向无线通信模块1050传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器1010与显示屏1091等外围器件。MIPI接口包括显示屏串行接口(Display Serial Interface,DSI)等。在一些实施例中,处理器1010和显示屏1091通过DSI接口通信,实现第一终端1000的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器1010与显示屏1091,无线通信模块1050,音频模块1060,传感器模块1070等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对第一终端1000的结构限定。在本申请另一些实施例中,第一终端1000也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
第一终端1000的无线通信功能可以通过天线1,天线2,移动通信模块1040,无线通信模块1050,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。第一终端1000中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块1040可以提供应用在第一终端1000上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块1040可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(Low Noise Amplifier,LNA)等。移动通信模块1040可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块1040还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块1040的至少部分功能模块可以被设置于处理器1010中。在一些实施例中,移动通信模块1040的至少部分功能模块可以与处理器1010的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器1060A,受话器1060B等)输出声音信号,或通过显示屏1091显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器1010,与移动通信模块1040或其他功能模块设置在同一个器件中。
无线通信模块1050可以提供应用在第一终端1000上的包括无线局域网(Wireless Local Area Networks,WLAN)(如无线保真(Wireless Fidelity,Wi-Fi)网络),蓝牙(BlueTooth,BT),全球导航卫星系统(Global Navigation Satellite System,GNSS),调频(Frequency Modulation,FM),近距离无线通信技术(Near Field Communication,NFC),红外技术(InfRared,IR)等无线通信的解决方案。无线通信模块1050可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块1050经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器1010。无线通信模块1050还可以从处理器1010接收待发送的信号,对其进行调频、放大,经天线2转为电磁波辐射出去。
在一些实施例中,第一终端1000的天线1和移动通信模块1040耦合,天线2和无线通信模块1050耦合,使得第一终端1000可以通过无线通信技术与网络以及其他设备通信。所 述无线通信技术可以包括全球移动通讯系统(Global System For Mobile Communications,GSM),通用分组无线服务(General Packet Radio Service,GPRS),码分多址接入(Code Division Multiple Access,CDMA),宽带码分多址(Wideband Code Division Multiple Access,WCDMA),时分码分多址(Time-Division Code Division Multiple Access,TD-SCDMA),长期演进(Long Term Evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(Global Positioning System,GPS),全球导航卫星系统(Global Navigation Satellite System,GLONASS),北斗卫星导航系统(Beidou Navigation Satellite System,BDS),准天顶卫星系统(Quasi-Zenith Satellite System,QZSS)和/或星基增强系统(Satellite Based Augmentation Systems,SBAS)。
第一终端1000通过GPU,显示屏1091,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏1091和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器1010可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏1091用于显示图像,视频等。显示屏1091包括显示面板。显示面板可以采用液晶显示屏(Liquid Crystal Display,LCD),有机发光二极管(Organic Light-Emitting Diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(Active-Matrix Organic Light Emitting Diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(Quantum Dot Light Emitting Diodes,QLED)等。在一些实施例中,第一终端1000可以包括1个或N个显示屏1091,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当第一终端1000在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。第一终端1000可以支持一种或多种视频编解码器。这样,第一终端1000可以播放或录制多种编码格式的视频,例如:动态图像专家组(Moving Picture Experts Group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(Neural-Network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现第一终端1000的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口1020可以用于连接外部存储卡,例如Micro SD卡,实现扩展第一终端1000的存储能力。外部存储卡通过外部存储器接口1020与处理器1010通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器1021可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器1010通过运行存储在内部存储器1021的指令,从而执行第一终端1000的各种功能应用以及数据处理。内部存储器1021可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储第一终端1000使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器1021可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(Universal Flash Storage,UFS)等。
第一终端1000可以通过音频模块1060,扬声器1060A,受话器1060B,麦克风1061,以及应用处理器等实现音频功能。例如音乐播放,录音、配音等。
音频模块1060用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块1060还可以用于对音频信号编码和解码。在一些实施例中,音 频模块1060可以设置于处理器1010中,或将音频模块1060的部分功能模块设置于处理器1010中。
扬声器1060A,也称“喇叭”,用于将音频电信号转换为声音信号。第一终端1000可以通过扬声器1060A收听音乐,或收听免提通话。
受话器1060B,也称“听筒”,用于将音频电信号转换成声音信号。当第一终端1000接听电话或语音信息时,可以通过将受话器1060B靠近人耳接听语音。
麦克风1061,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风1061发声,将声音信号输入到麦克风1061。第一终端1000可以设置至少一个麦克风1061。在另一些实施例中,第一终端1000可以设置两个麦克风1061,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,第一终端1000还可以设置三个,四个或更多麦克风1061,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。本申请实施例中,麦克风1061可以实时采集用户的音频,以便于处理器1010将用户的音频与所述处理后的配音素材相匹配。
压力传感器1070A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器1070A可以设置于显示屏1091。压力传感器1070A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器1070A,电极之间的电容改变。第一终端1000根据电容的变化确定压力的强度。当有触摸操作作用于显示屏1091,第一终端1000根据压力传感器1070A检测所述触摸操作强度。第一终端1000也可以根据压力传感器1070A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
触摸传感器1070B,也称“触控面板”。触摸传感器1070B可以设置于显示屏1091,由触摸传感器1070B与显示屏1091组成触摸屏,也称“触控屏”。触摸传感器1070B用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏1091提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器1070B也可以设置于第一终端1000的表面,与显示屏1091所处的位置不同。
马达1081可以产生振动提示。马达1081可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏1091不同区域的触摸操作,马达1091也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
第二终端的结构请参考图10实施例中第一终端的结构,本申请实施例不再赘述。
第一终端1000的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本发明实施例以分层架构的Android系统为例,示例性说明第一终端1000的软件结构。请参见图11,图11是本申请实施例提供的一种第一终端1000的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。
如图11所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(Application Programming Interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图11所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供第一终端1000的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(Surface Manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
下面结合视频配音场景,示例性说明第一终端800软件以及硬件的工作流程。
当触摸传感器1070B接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在 内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。以该触摸操作是触摸单击操作,该单击操作所对应的控件为上述图2A实施例中开始配音控件215为例,视频应用调用应用框架层的接口,启动配音功能,进而通过调用内核层启动麦克风驱动,通过麦克风实时采集用户音频,将用户音频与配音素材匹配。
第二终端的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,本申请的各实施方式可以任意进行组合,以实现不同的技术效果。第二终端的软件系统请参考图11中的实施例,本申请实施例不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk)等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。
总之,以上所述仅为本发明技术方案的实施例而已,并非用于限定本发明的保护范围。凡根据本发明的揭露,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (18)

  1. 一种视频配音的方法,其特征在于,包括:
    第一终端检测到针对当前显示视频的截取配音操作指令后,截取所述当前显示视频,得到视频截取片段,并显示视频配音控件;
    所述第一终端检测到针对所述视频配音控件的触发操作后,创建并显示针对所述视频截取片段的配音间;
    在所述第一终端分析所述视频截取片段的配音角色数量不为0的情况下,所述第一终端检测到针对所述配音间的第一输入操作后,显示配音界面;其中,所述配音界面包括第一展示框,所述第一展示框用于显示和播放配音素材。
  2. 如权利要求1所述的方法,其特征在于,所述第一终端检测到针对所述视频配音控件的触发操作后,创建并显示针对所述视频截取片段的配音间之后,包括:
    所述第一终端向网络设备发送请求消息;所述请求消息包括所述视频截取片段的原始视频ID、所述视频截取片段的起始时间以及所述视频截取片段的结束时间;
    所述第一终端接收所述网络设备发送的第一响应;所述第一响应包括配音角色的数量信息;
    所述第一终端基于所述配音角色信息执行第一操作。
  3. 如权利要求2所述的方法,其特征在于,所述第一终端基于所述配音角色信息执行第一操作,包括:
    若所述配音角色的数量为0,所述第一终端显示第一提示信息;所述第一提示信息用于指示所述视频截取片段不可用;
    若所述配音角色的数量大于1,所述第一终端检测到针对所述配音间的第二输入操作后,向第二终端发送第一指令;所述第一指令用于指示所述第二终端的视频应用账号接入所述配音间;
    在所述第二终端的视频应用账号接入所述配音间的情况下,所述第一终端分配配音角色并生成第一信息;所述第一信息用于指示接入所述配音间内终端的视频应用账号与所述配音角色的对应关系;
    所述第一终端向所述网络设备发送所述第一信息;
    所述第一终端接收所述网络设备发送的所述的配音素材;所述配音素材是所述网络设备基于所述第一信息得到的。
  4. 如权利要求3所述的方法,其特征在于,所述第一终端分配配音角色并生成第一信息,包括:
    所述第一终端绑定所述配音角色与接入所述配音间内终端的视频应用账号;
    所述第一终端生成所述第一信息;所述第一信息用于指示接入所述配音间内终端的视频应用账号与所述配音角色的对应关系;
    所述第一终端向所述第二终端发送通知消息;所述通知消息用于指示所述第二终端的视频应用账号所分配的配音角色。
  5. 如权利要求3所述的方法,其特征在于,所述第一终端分配配音角色并生成第一信息,包括:
    所述第一终端向所述第二终端发送第二指令;所述第二指令用于指示所述第二终端的视频应用账号选择配音角色;
    所述第一终端接收所述第二终端发送的确认消息;所述确认消息用于指示所述第二终端选择的配音角色;
    所述第一终端基于所述确认消息生成所述第一信息;所述第一信息用于指示接入所述配音间内终端的视频应用账号与所述配音角色的对应关系。
  6. 如权利要求3-5任一项所述的方法,其特征在于,所述第一终端基于所述配音角色信息执行第一操作之后,包括:
    所述第一终端检测到针对所述配音界面的第三输入操作后,暂停配音模式;其中,所述配音模式为:所述第一终端实时采集外部音频作为配音音频并在所述第一展示框中播放所述配音素材;
    在暂停配音模式的情况下,若所述第一终端检测到针对所述配音界面的第四输入操作后,所述第一终端向所述第二终端发送第三指令;所述第三指令用于指示所述第二终端的视频应用账号进入语音通话模式。
  7. 如权利要求3-5任一项所述的方法,其特征在于,所述第一终端基于所述配音角色信息执行第一操作之后,包括:
    所述第一终端检测到针对所述配音界面的第三输入操作后,暂停配音模式;其中,所述配音模式为:所述第一终端实时采集外部音频作为配音音频并在所述第一展示框中播放所述配音素材;
    在暂停配音模式的情况下,若所述第一终端检测到针对所述配音界面的第五输入操作,所述第一终端显示回放界面;所述回放界面包括第二展示框;
    所述第一终端检测到针对所述回放界面的第六输入操作后,在所述第二展示框中回放第一视频片段并回放所述第一终端和所述第二终端在所述配音模式下实时采集的外部音频;其中,所述第一视频片段为所述配音素材中已配音的视频片段。
  8. 如权利要求3-7任一项所述的方法,其特征在于,所述第一终端检测到针对所述配音间的第一输入操作后,显示配音界面之后,包括:
    所述第一终端检测到针对所述配音界面的第七输入操作后,显示预览界面;所述预览界面包括第三展示框,所述第三展示框用于显示第二视频片段;其中,所述第二视频片段为所述配音素材中已配音的视频片段;
    所述第一终端检测到针对所述预览界面的第八输入操作后,显示剪切界面;所述剪切界面包括第四展示框,所述第四展示框用于显示剪切后的所述第二视频片段;
    所述第一终端检测到针对所述剪切界面的第九输入操作后,剪切所述第二视频片段与所述第一终端和所述第二终端在配音模式下实时采集的外部音频。
  9. 一种视频配音的方法,其特征在于,包括:
    网络设备接收第一终端发送的请求消息;所述请求消息包括视频截取片段的原始视频ID、 视频截取片段的起始时间以及视频截取片段的结束时间;
    所述网络设备基于所述视频截取片段的原始视频ID从视频资源库中找到所述视频截取片段的原始视频;
    所述网络设备基于所述视频截取片段的起始时间以及所述视频截取片段的结束时间在所述原始视频中获取所述视频截取片段的播放位置;
    所述网络设备基于所述视频截取片段在所述原始视频中的播放位置分析所述截取视频中的可配音的角色,并得到配音角色的数量信息;
    所述网络设备基于所述配音角色的数量信息生成第一响应;
    所述网络设备将所述第一响应发送给所述第一终端。
  10. 如权利要求9所述的方法,其特征在于,所述网络设备基于所述配音角色信息生成第一响应之后,包括:
    所述网络设备接收所述第一终端发送的第一信息;所述第一信息用于指示接入所述配音间内终端的视频应用账号与所述配音角色的对应关系;
    所述网络设备截取所述视频截取片段在其原始视频中播放位置对应的视频片段,得到截取后的视频片段;
    所述网络设备基于所述第一信息对所述截取后的视频片段中已分配的配音角色进行消音处理得到处理后的配音素材;
    所述网络设备将所述处理后的配音素材发送给所述第一终端。
  11. 一种视频配音的方法,其特征在于,包括:
    第二终端接收第一终端发送的第一指令;所述第一指令用于指示所述第二终端的视频应用账号接入所述第一终端创建的配音间;
    所述第二终端响应所述第一指令,将其视频应用账号接入所述第一终端创建的配音间。
  12. 如权利要求11所述的方法,其特征在于,所述第二终端响应所述第二请求消息,接入所述第一终端创建的配音间之后,包括:
    所述第二终端接收所述第一终端发送的通知消息;所述通知消息用于指示所述第二终端的视频应用账号所分配的配音角色。
  13. 如权利要求11所述的方法,其特征在于,所述第二终端响应所述第二请求消息,接入所述第一终端创建的配音间之后,包括:
    所述第二终端接收所述第一终端发送的第二指令;所述第二指令用于指示第二终端的视频应用账号选择配音角色;
    所述第二终端向所述第一终端发送确认消息;所述确认消息用于指示所述第二终端的视频应用账号选择的配音角色。
  14. 如权利要求11所述的方法,其特征在于,所述第二终端响应所述第一指令,将其视频应用的用户账号接入所述第一终端创建的配音间之后,还包括:
    所述第二终端接收所述第一终端发送的第三指令;所述第三指令用于指示所述第二终端的视频应用账号进入语音通话模式;
    所述第二终端响应所述第三指令,令其视频应用账号进入所述语音通话模式。
  15. 一种终端,其特征在于,包括:存储器、处理器、通信模块和触控屏;其中:
    所述触控屏用于显示内容;
    所述通信模块用于向其它终端或网络设备通信;
    所述存储器,用于存储计算机程序,所述计算机程序包括程序指令;
    所述处理器用于调用所述程序指令,使得所述终端执行如权利要求1-8任一项所述的方法。
  16. 一种网络设备,其特征在于,包括:存储器、处理器和通信模块;其中:
    所述通信模块用于向其它终端或网络设备通信;
    所述存储器,用于存储计算机程序,所述计算机程序包括程序指令;
    所述处理器用于调用所述程序指令,使得所述网络设备执行如权利要求9-10任一项所述的方法。
  17. 一种终端,其特征在于,包括:存储器、处理器、通信模块和触控屏;其中:
    所述触控屏用于显示内容;
    所述通信模块用于向其它终端或网络设备通信;
    所述存储器,用于存储计算机程序,所述计算机程序包括程序指令;
    所述处理器用于调用所述程序指令,使得所述终端执行如权利要求11-14任一项所述的方法。
  18. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时,实现如权利要求1-14任意一项所述的方法。
PCT/CN2022/077496 2021-02-24 2022-02-23 一种视频配音的方法、相关设备以及计算机可读存储介质 WO2022179530A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP22758894.4A EP4284005A1 (en) 2021-02-24 2022-02-23 Video dubbing method, related device, and computer readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110205548.XA CN115037975B (zh) 2021-02-24 2021-02-24 一种视频配音的方法、相关设备以及计算机可读存储介质
CN202110205548.X 2021-02-24

Publications (1)

Publication Number Publication Date
WO2022179530A1 true WO2022179530A1 (zh) 2022-09-01

Family

ID=83047758

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/077496 WO2022179530A1 (zh) 2021-02-24 2022-02-23 一种视频配音的方法、相关设备以及计算机可读存储介质

Country Status (3)

Country Link
EP (1) EP4284005A1 (zh)
CN (1) CN115037975B (zh)
WO (1) WO2022179530A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080131095A1 (en) * 2006-12-01 2008-06-05 Toshiya Mizushima Information recording and/or playback apparatus
CN106293347A (zh) * 2016-08-16 2017-01-04 广东小天才科技有限公司 一种人机交互的学习方法及装置、用户终端
KR20170004774A (ko) * 2015-07-03 2017-01-11 삼성전자주식회사 디스플레이장치, 서버 및 그 제어방법
CN106911900A (zh) * 2017-04-06 2017-06-30 腾讯科技(深圳)有限公司 视频配音方法及装置
CN107659850A (zh) * 2016-11-24 2018-02-02 腾讯科技(北京)有限公司 媒体信息处理方法和装置
CN110650366A (zh) * 2019-10-29 2020-01-03 成都超有爱科技有限公司 互动配音方法、装置、电子设备及可读存储介质
CN110753263A (zh) * 2019-10-29 2020-02-04 腾讯科技(深圳)有限公司 视频配音方法、装置、终端及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105959773B (zh) * 2016-04-29 2019-06-18 魔方天空科技(北京)有限公司 多媒体文件的处理方法和装置
WO2020081872A1 (en) * 2018-10-18 2020-04-23 Warner Bros. Entertainment Inc. Characterizing content for audio-video dubbing and other transformations
CN110933330A (zh) * 2019-12-09 2020-03-27 广州酷狗计算机科技有限公司 视频配音方法、装置、计算机设备及计算机可读存储介质
CN112261435B (zh) * 2020-11-06 2022-04-08 腾讯科技(深圳)有限公司 一种社交互动方法、装置、系统、设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080131095A1 (en) * 2006-12-01 2008-06-05 Toshiya Mizushima Information recording and/or playback apparatus
KR20170004774A (ko) * 2015-07-03 2017-01-11 삼성전자주식회사 디스플레이장치, 서버 및 그 제어방법
CN106293347A (zh) * 2016-08-16 2017-01-04 广东小天才科技有限公司 一种人机交互的学习方法及装置、用户终端
CN107659850A (zh) * 2016-11-24 2018-02-02 腾讯科技(北京)有限公司 媒体信息处理方法和装置
CN106911900A (zh) * 2017-04-06 2017-06-30 腾讯科技(深圳)有限公司 视频配音方法及装置
CN110650366A (zh) * 2019-10-29 2020-01-03 成都超有爱科技有限公司 互动配音方法、装置、电子设备及可读存储介质
CN110753263A (zh) * 2019-10-29 2020-02-04 腾讯科技(深圳)有限公司 视频配音方法、装置、终端及存储介质

Also Published As

Publication number Publication date
CN115037975B (zh) 2024-03-01
CN115037975A (zh) 2022-09-09
EP4284005A1 (en) 2023-11-29

Similar Documents

Publication Publication Date Title
JP7414842B2 (ja) コメント追加方法及び電子デバイス
US11818420B2 (en) Cross-device content projection method and electronic device
JP7324313B2 (ja) 音声対話方法及び装置、端末、並びに記憶媒体
CN112394895B (zh) 画面跨设备显示方法与装置、电子设备
CN108924464B (zh) 视频文件的生成方法、装置及存储介质
WO2016177296A1 (zh) 一种生成视频的方法和装置
US11489972B2 (en) Method for presenting video on electronic device when there is incoming call and electronic device
WO2021249318A1 (zh) 一种投屏方法和终端
KR20160026317A (ko) 음성 녹음 방법 및 장치
RU2619089C2 (ru) Способ и устройство для воспроизведения множества видео
CN114040242B (zh) 投屏方法、电子设备和存储介质
WO2022078295A1 (zh) 一种设备推荐方法及电子设备
WO2022048347A1 (zh) 一种视频编辑方法及设备
WO2022042769A2 (zh) 多屏交互的系统、方法、装置和介质
US20240053868A1 (en) Feedback method, apparatus, and system
WO2023005711A1 (zh) 一种服务的推荐方法及电子设备
CN115098449B (zh) 一种文件清理方法及电子设备
WO2022179530A1 (zh) 一种视频配音的方法、相关设备以及计算机可读存储介质
WO2022267640A1 (zh) 视频共享方法、电子设备及存储介质
CN116193179A (zh) 会议记录方法、终端设备和会议记录系统
CN114079691A (zh) 一种设备识别方法及相关装置
CN114244955A (zh) 一种服务的分享方法、系统及电子设备
EP4167580A1 (en) Audio control method, system, and electronic device
WO2024113999A1 (zh) 游戏管理的方法及终端设备
CN118051289A (zh) 页面播报模式的切换方法、电子设备及可读介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22758894

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022758894

Country of ref document: EP

Effective date: 20230821

NENP Non-entry into the national phase

Ref country code: DE