WO2023051246A1 - 视频录制方法、装置、设备及存储介质 - Google Patents

视频录制方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023051246A1
WO2023051246A1 PCT/CN2022/118698 CN2022118698W WO2023051246A1 WO 2023051246 A1 WO2023051246 A1 WO 2023051246A1 CN 2022118698 W CN2022118698 W CN 2022118698W WO 2023051246 A1 WO2023051246 A1 WO 2023051246A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
special effect
audio
matching degree
image
Prior art date
Application number
PCT/CN2022/118698
Other languages
English (en)
French (fr)
Inventor
吴紫阳
陶璐
朱意星
王燚
李松达
冯穗豫
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2023051246A1 publication Critical patent/WO2023051246A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations

Definitions

  • Embodiments of the present disclosure relate to the technical field of the Internet, for example, to a video recording method, device, device, and storage medium.
  • the way in which the terminal device records the audio data and images when the user sings the target song through the video application program is relatively monotonous, and the user experience is poor.
  • Embodiments of the present disclosure provide a video recording method, device, device, and storage medium to record audio data and images when a user sings a song, which can improve the fun of recording video and improve user experience.
  • an embodiment of the present disclosure provides a video recording method, including:
  • the embodiment of the present disclosure also provides a video recording device, including:
  • the collection module is configured to collect voice data and images of the target user
  • a matching degree determination module configured to determine the matching degree between the voice data and the reference audio
  • a target special effect determination module configured to determine the target special effect according to the matching degree
  • the target image acquisition module is configured to add the target special effect to the captured image to obtain the target image
  • the target video acquisition module is configured to perform audio and video encoding on the voice data and the target image to obtain the target video.
  • an embodiment of the present disclosure further provides an electronic device, and the electronic device includes:
  • a storage device configured to store one or more programs
  • the one or more processing devices When the one or more programs are executed by the one or more processing devices, the one or more processing devices implement the video recording method according to the embodiments of the present disclosure.
  • the embodiments of the present disclosure further provide a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the video recording method as described in the embodiments of the present disclosure is implemented.
  • FIG. 1 is a flowchart of a video recording method in an embodiment of the present disclosure
  • FIG. 2 is a schematic structural diagram of a video recording device in an embodiment of the present disclosure
  • Fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • Fig. 1 is a flow chart of a video recording method provided by an embodiment of the present disclosure. This embodiment is applicable to the situation of recording a video of a user who sings a song.
  • the method can be performed by a video recording device, which can be composed of hardware and and/or software, and generally can be integrated into a device with video recording function, which can be an electronic device such as a server, a mobile terminal, or a server cluster. As shown in Figure 1, the method includes the following steps:
  • Step 110 collecting voice data and images of the target user.
  • the voice data may be a voice produced by the user imitating a certain reference audio or singing a target song.
  • the image can be a half-length or full-body portrait (including a face) of the target user.
  • the target user can cause the terminal device to collect voice data and images by triggering a recording instruction.
  • the target user can trigger the recording instruction by clicking the recording button on the interface, or by voice or gesture.
  • the terminal device controls the voice collection module (such as: microphone) and the image collection module (such as: camera) to start and start working, so as to record the target user's singing audio data and The image is collected.
  • the following steps are also included: receiving the reference audio selected by the target user; segmenting the reference audio to obtain multiple sub-audios; playing multiple sub-audios in sequence according to the time stamp, so that the target The user imitates the played sub-audio to speak.
  • the reference audio may be a song, an audio played by a musical instrument, or an animal call, and the reference audio is not limited here. Segmentation of the reference audio can be performed according to the duration, for example, every 5 seconds into a segment, or, if the reference audio contains text content, it can be divided according to the text content, that is, the text content is first divided into sentences, The reference audio is segmented according to the text content after the sentence.
  • the sub-audio is played segment by segment according to the time stamp, so that the target user imitates the sub-audio segment by segment to make a speech.
  • the reference audio is a target song
  • the reference audio is segmented
  • the process of obtaining multiple sub-audios can be: receiving the target song selected by the target user; obtaining the musical instrument digital interface (Musical Instrument Digital Interface, MIDI) of the target song File and lyrics; decompose the lyrics to obtain multi-sentence lyrics; play multiple segments of sub-audio in sequence according to the timestamp, so that the target user can imitate the sub-audio played to make a voice; the multi-sentence lyrics and MIDI files are sequenced according to the timestamp Playing makes the target user sing the target song according to the played sub-lyric and the melody corresponding to the MIDI file.
  • MIDI Musical Instrument Digital Interface
  • the target song may be a song selected by the user from the song library, and the MIDI file may be understood as a file in MIDI format in the music file.
  • Playing the MIDI file according to the timestamp can be understood as playing the melody corresponding to the MIDI file in sequence according to the timestamp; playing the lyrics of multiple sentences in sequence according to the timestamp can be understood as displaying the lyrics of multiple sentences on the interface according to the timestamp.
  • Step 120 determine the matching degree between the speech data and the reference audio.
  • the matching degree may be characterized by the similarity between the speech data and the reference audio. In this embodiment, if the similarity between the speech data and the reference audio is less than the preset threshold (or the preset threshold range), the matching degree is low; if the similarity between the speech data and the reference audio is greater than or equal to the preset threshold (or preset threshold range), the matching degree is high.
  • the manner of determining the degree of matching between the voice data and the reference audio may be: extracting the voice features of the voice data and the audio features of the reference audio; determining the similarity between the voice features and the audio features; determining the similarity as the voice data and the reference audio Matching between audio.
  • the audio feature may be characterized by a pitch difference sequence.
  • the process of extracting the voice features of the voice data and the audio features of the reference audio can be: performing note segmentation and quantization on the voice data, establishing a high-pitch difference sequence of the voice data based on the quantized notes, and obtaining the voice high-pitch difference sequence, i.e. voice features; Obtain the reference pitch difference sequence of the reference audio, that is, the audio feature. Then calculate various distances between the speech pitch difference sequence and the reference pitch difference sequence, and combine the various distances to obtain the similarity between speech features and audio features.
  • the various distances may include a pitch sequence distance, a sound length sequence distance and an overall matching distance.
  • a manner of integrating multiple distances may be to perform weighted summation of multiple distances.
  • the method of obtaining the reference pitch difference sequence may be to use a dynamic time warping algorithm (Dynamic Time Warping, DTW) to obtain the reference pitch difference sequence from the song library.
  • DTW Dynamic Time Warping
  • Step 130 determine the target special effect according to the matching degree.
  • the special effect may be a special effect added to the captured image.
  • Special effects include reward special effects and punishment special effects.
  • the reward special effects may be: beautify the target user, add cute stickers, and beautiful scenes;
  • the punishment special effects may be: make the target user bigger, fatter, add spoof scenes, etc.
  • the special effect can be stored in the form of a special effect package (program package).
  • the program code for performing special effect processing on the image is written in the special effect package, and the image special effect can be added by calling the special effect package. Determining the target special effect according to the matching degree can be understood as calling the feature package corresponding to the target special effect according to the matching degree.
  • the method further includes the following step: pre-establishing a corresponding relationship between the matching degree and the special effect.
  • the manner of determining the target special effect according to the matching degree may be: determining the target special effect corresponding to the matching degree according to the corresponding relationship.
  • the manner of determining the target special effect according to the matching degree may also be: performing feature extraction on the target user in the collected image to obtain the characteristic information of the target user; determining the target special effect according to the characteristic information and the matching degree.
  • the characteristic information may be information such as clothing characteristics (such as color and style) of the target user.
  • the process of determining the target special effect according to the characteristic information and the matching degree may be: first obtain the special effect set corresponding to the characteristic information, and then select the target special effect corresponding to the matching degree from the special effect set.
  • Step 140 adding target special effects to the captured image to obtain a target image.
  • the special effect can be stored in the form of a special effect package (program package).
  • the program code for performing special effect processing on the image is written in the special effect package, and the image special effect can be added by calling the special effect package.
  • the method of adding target special effects to the captured image and obtaining the target image may be: calling a special effect package corresponding to the target special effect to perform special effect processing on the captured image to obtain the target image.
  • the special effect package is pre-developed by the developer, and the special effect package corresponding to the target special effect is called through the calling interface, so as to realize the special effect processing on the image.
  • the process of adding target special effects to captured images may be: adding target special effects to images captured between the current matching level and the next matching level; or adding target special effects to images captured from the current matching level in a set number of images.
  • the method of calculating the matching degree for the singing audio data may be: performing a matching degree calculation for every imitation of N segments of sub-audio.
  • N can be a positive integer greater than or equal to 1.
  • Step 150 perform audio and video coding on the voice data and the target image to obtain the target video.
  • the solution of this embodiment is also applicable to the scene of multi-person chorus, in the process of multi-person chorus, the matching degree calculation can be performed for each user participating in the singing, and special effects or punishments can be added to the image based on the matching degree special effects.
  • the specific process reference may be made to the foregoing embodiments, and details are not repeated here.
  • the technical solution of the embodiment of the present disclosure collects the voice data and image of the target user; determines the matching degree between the voice data and the reference audio; determines the target special effect according to the matching degree; adds the target special effect to the collected image to obtain the target image; Audio and video encoding is performed on the voice data and the target image to obtain the target video.
  • the special effects obtained according to the matching degree are added to the collected images, which can improve the fun of recorded videos, enrich the presentation methods of videos, and improve user experience.
  • Fig. 2 is a schematic structural diagram of a video recording device provided by an embodiment of the present disclosure. As shown in Figure 2, the device includes:
  • Acquisition module 210 is configured to collect voice data and images of the target user
  • the matching degree determining module 220 is configured to determine the matching degree between the speech data and the reference audio
  • the target special effect determining module 230 is configured to determine the target special effect according to the matching degree
  • the target image acquisition module 240 is configured to add the target special effect to the captured image to obtain the target image;
  • the target video acquisition module 250 is configured to perform audio and video encoding on the voice data and the target image to obtain the target video.
  • the video recording device also includes: a reference audio playback module configured to:
  • the reference audio playback module is also set to:
  • the matching degree determination module 220 is also set to:
  • the similarity is determined as a matching degree between the speech data and a reference audio.
  • the video recording device also includes: a corresponding relationship establishment module, which is set to:
  • the target special effect determination module 230 is also set to:
  • a target special effect corresponding to the matching degree is determined according to the corresponding relationship.
  • the target special effect determination module 230 is also set to:
  • the target image acquisition module 240 is also set to:
  • the target image acquisition module 240 is also set to:
  • the special effect package corresponding to the target special effect is invoked to perform special effect processing on the collected image to obtain the target image.
  • the above-mentioned device can execute the methods provided by all the foregoing embodiments of the present disclosure, and has corresponding functional modules and advantageous effects for executing the above-mentioned methods.
  • the above-mentioned device can execute the methods provided by all the foregoing embodiments of the present disclosure, and has corresponding functional modules and advantageous effects for executing the above-mentioned methods.
  • FIG. 3 it shows a schematic structural diagram of an electronic device 300 suitable for implementing an embodiment of the present disclosure.
  • the electronic equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as Mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, etc., or various forms of servers, such as independent servers or server clusters.
  • the electronic device shown in FIG. 3 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
  • an electronic device 300 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 301, which may be stored in a read-only storage device (ROM) 302 or loaded from a storage device 305 to a random Various appropriate actions and processes are executed by accessing programs in the storage device (RAM) 303 . In the RAM 303, various programs and data necessary for the operation of the electronic device 300 are also stored.
  • the processing device 301, ROM 302, and RAM 303 are connected to each other through a bus 304.
  • An input/output (I/O) interface 305 is also connected to the bus 304 .
  • the following devices can be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibrating an output device 307 such as a computer; a storage device 308 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 309.
  • the communication means 309 may allow the electronic device 300 to perform wireless or wired communication with other devices to exchange data. While FIG. 3 shows electronic device 300 having various means, it should be understood that implementing or possessing all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer readable medium, the computer program comprising program code for performing a word recommendation method.
  • the computer program may be downloaded and installed from the network via the communication means 309, or from the storage means 305, or from the ROM 302.
  • Computer readable media may be non-transitory computer readable media.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable signal medium may send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium
  • HTTP HyperText Transfer Protocol
  • the communication eg, communication network
  • Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: collects the voice data and image of the target user; determines the distance between the voice data and the reference audio The matching degree; determine the target special effect according to the matching degree; add the target special effect to the collected image to obtain the target image; perform audio and video encoding on the voice data and the target image to obtain the target video.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (such as through an Internet Service Provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of a unit does not constitute a limitation of the unit itself under certain circumstances.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the embodiments of the present disclosure disclose a video recording method, including:
  • the voice data and images of the target user before collecting the voice data and images of the target user, it also includes:
  • the reference audio is a target song
  • the reference audio is segmented to obtain multiple sub-audios, including:
  • Play the multiple segments of sub-audio in sequence according to the timestamp, so that the target user imitates the sub-audio played to make a voice including:
  • determining the matching degree between the voice data and the reference audio includes:
  • the similarity is determined as a matching degree between the speech data and a reference audio.
  • the method also includes:
  • a target special effect corresponding to the matching degree is determined according to the corresponding relationship.
  • determining the target special effect according to the matching degree includes:
  • adding the target special effect to the captured image includes:
  • adding the target special effect to the captured image to obtain the target image includes:
  • the special effect package corresponding to the target special effect is invoked to perform special effect processing on the collected image to obtain the target image.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

本公开实施例公开了一种视频录制方法、装置、设备及存储介质。包括:采集目标用户的语音数据和图像;确定所述语音数据与基准音频间的匹配度;根据所述匹配度确定目标特效;将所述目标特效添加至采集的图像中,获得目标图像;对所述语音数据和所述目标图像进行音视频编码,获得目标视频。

Description

视频录制方法、装置、设备及存储介质
本申请要求在2021年9月30日提交中国专利局、申请号为202111165277.6的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开实施例涉及互联网技术领域,例如涉及一种视频录制方法、装置、设备及存储介质。
背景技术
随着互联网技术发展,许多视频应用程序支持歌曲的录制,用户在演唱某首歌曲过程中,该视频应用程序可以对用户进行录制,并通过网络将录制的视频共享到视频应用的网络平台中。
相关技术中,终端设备通过该视频应用程序录制用户演唱目标歌曲时的音频数据和图像的方式比较单调,用户体验差。
发明内容
本公开实施例提供一种视频录制方法、装置、设备及存储介质,以实现对用户演唱歌曲时的音频数据和图像的录制,可以提高录制视频的趣味性,提高用户体验。
第一方面,本公开实施例提供了一种视频录制方法,包括:
采集目标用户的语音数据和图像;
确定所述语音数据与基准音频间的匹配度;
根据所述匹配度确定目标特效;
将所述目标特效添加至采集的图像中,获得目标图像;
对所述语音数据和所述目标图像进行音视频编码,获得目标视频。
第二方面,本公开实施例还提供了一种视频录制装置,包括:
采集模块,设置为采集目标用户的语音数据和图像;
匹配度确定模块,设置为确定所述语音数据与基准音频间的匹配度;
目标特效确定模块,设置为根据所述匹配度确定目标特效;
目标图像获取模块,设置为将所述目标特效添加至采集的图像中,获得目标图像;
目标视频获取模块,设置为对所述语音数据和所述目标图像进行音视频编码,获得目标视频。
第三方面,本公开实施例还提供了一种电子设备,所述电子设备包括:
一个或多个处理装置;
存储装置,设置为存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理装置执行,使得所述一个或多个处理装置实现如本公开实施例所述的视频录制方法。
第四方面,本公开实施例还提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现如本公开实施例所述的视频录制方法。
附图说明
图1是本公开实施例中的一种视频录制方法的流程图;
图2是本公开实施例中的一种视频录制装置的结构示意图;
图3是本公开实施例中的一种电子设备的结构示意图。
具体实施方式
应当理解,本公开的方法实施方式中记载的多个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
图1为本公开实施例提供的一种视频录制方法的流程图,本实施例可适用于对演唱歌曲的用户进行视频录制的情况,该方法可以由视频录制装置来执行,该装置可由硬件和/或软件组成,并一般可集成在具有视频录制功能的设备中,该设备可以是服务器、移动终端或服务器集群等电子设备。如图1所示,该方法包括如下步骤:
步骤110,采集目标用户的语音数据和图像。
其中,语音数据可以是用户模仿某段基准音频发出的语音或者演唱目标歌曲产生的语音。图像可以是目标用户的半身或全身人像(包含人脸)。
本实施例中,目标用户可以通过触发录制指令使得终端设备对语音数据和图像进行采集。目标用户触发录制指令的方式可以是点击界面上的录制按钮,或者通过语音触发或者手势触发等。本实施例中,当终端设备接收到目标用户触发的录制指令后,控制语音采集模块(如:麦克风)以及图像采集模块(如:摄像头)启动并开始工作,以对目标用户的演唱音频数据 和图像进行采集。
例如,在采集目标用户的语音数据和图像之前,还包括如下步骤:接收目标用户选择的基准音频;对基准音频进行分段处理,获得多段子音频;按照时间戳依次播放多段子音频,使得目标用户模仿播放的子音频以发出语音。
其中,基准音频可以是一首歌曲、一段乐器演奏出的音频或者动物叫声等,此处对基准音频不做限制。对基准音频进行分段处理可以按照时长进行分段,例如每5秒划分为一段,或者,如果基准音频中包含有文本内容,则可以按照文本内容进行划分,即首先对文本内容进行分句,按照分句后的文本内容对基准音频进行分段处理。
本实施例中,在对基准音频进行分段处理后,按照时间戳一段一段的播放子音频,使得目标用户一段一段的模仿子音频来发出语音。
例如,若基准音频为目标歌曲,则对基准音频进行分段处理,获得多段子音频的过程可以是:接收目标用户选择的目标歌曲;获取目标歌曲的乐器数字接口(Musical Instrument Digital Interface,MIDI)文件及歌词;对歌词进行分解,获得多句子歌词;按照时间戳依次播放多段子音频,使得目标用户模仿播放的子音频以发出语音的过程可以是;将多句子歌词和MIDI文件按照时间戳依次进行播放,使得目标用户根据播放的子歌词和MIDI文件对应的旋律演唱目标歌曲。
其中,目标歌曲可以是用户从歌曲库中选择的歌曲,MIDI文件可以理解为音乐文件中MIDI格式的文件。将MIDI文件按照时间戳播放可以理解为按照时间戳依次播放MIDI文件对应的旋律;将多句子歌词按照时间戳依次进行播放可以理解为将多句子歌词按照时间戳依次在界面上显示。
步骤120,确定语音数据与基准音频间的匹配度。
其中,匹配度可以由语音数据与基准音频间的相似度表征。本实施例中,若语音数据与基准音频间的相似度小于预设阈值(或预设阈值范围),则匹配度低,若语音数据与基准音频间的相似度大于或等于预设阈值(或预设阈值范围),则匹配度高。
例如,确定语音数据与基准音频间的匹配度的方式可以是:提取语音数据的语音特征及基准音频的音频特征;确定语音特征与音频特征间的相似度;将相似度确定为语音数据与基准音频间的匹配度。
其中,音频特征可以由高音差序列来表征。提取语音数据的语音特征及基准音频的音频特征的过程可以是:对语音数据进行音符切分并量化,基于量化后的音符建立语音数据的高音差序列,获得语音高音差序列,即语音特征;获取基准音频的基准高音差序列,即音频特征。然后计算语音高音差序列与基准高音差序列间的多种距离,将多种距离进行综合,获得语音特征与音频特征间的相似度。
其中,多种距离可以包括音高序列距离、音长序列距离及整体匹配距离。将多种距离进行综合的方式可以是,将多种距离进行加权求和。获取基准音高差序列的方式可以是采用动态时间规整算法(Dynamic Time Warping,DTW)从歌曲库中获取基准音高差序列。
步骤130,根据匹配度确定目标特效。
其中,特效可以是添加至采集的图像上的特殊效果。特效包括奖励特效和惩罚特效。当匹配度超过一定值时,可以选择奖励特效,若匹配度低于一定值时,则可以选择惩罚特效。示例性的,奖励特效可以是:对目标用户进行美颜、添加可爱贴纸、唯美场景;惩罚特效可以是:将目标用户变大头、变胖、添加恶搞场景等。本实施例中,特效可以是以特效包(程序包)的形式存储,特效包中编写了对图像进行特效处理的程序代码,通过调用特效包可以实现图像特效的添加。根据匹配度确定目标特效可以可理解为根据匹配度调用目标特效对应的特性包。
例如,该方法还包括如下步骤:预先建立匹配度与特效间的对应关系。根据匹配度确定目标特效的方式可以是:根据对应关系确定匹配度对应的目标特效。
例如,根据匹配度确定目标特效的方式还可以是:对采集的图像中的目标用户进行特征提取,获得目标用户的特征信息;根据特征信息和匹配度确定目标特效。
其中,特征信息可以是目标用户的衣着特征(如颜色、样式)等信息。例如,根据特征信息和匹配度确定目标特效的过程可以是:首先获取特征信息对应的特效集,然后从该特效集中选择匹配度对应的目标特效。
步骤140,将目标特效添加至采集的图像中,获得目标图像。
本实施例中,特效可以是以特效包(程序包)的形式存储,特效包中编写了对图像进行特效处理的程序代码,通过调用特效包可以实现图像特效的添加。
例如,将目标特效添加至采集的图像中,获得目标图像的方式可以是:调用目标特效对应的特效包对采集的图像进行特效处理,获得目标图像。
其中,特效包是由开发人员预先开发的,通过调用接口调用目标特效对应的特效包,以实现对图像的特效处理。
例如,将目标特效添加至采集的图像中的过程可以是:将目标特效添加至在当前匹配度和下一个匹配度间采集的图像中;或者,将目标特效添加至从当前匹配度开始采集的设定数量的图像中。
本实施例中,对演唱音频数据进行匹配度计算的方式可以是:每模仿N段子音频进行一次匹配度的计算。其中N可以是大于或等于1的正整数。
步骤150,对语音数据和目标图像进行音视频编码,获得目标视频。
例如,在获得添加了特效的目标图像后,将演唱音频数据和目标图像进行音视频编码,获得目标视频。
例如,本实施例的方案还适用于多人合唱的场景,在多人合唱的过程中,可以对每个参与演唱的用户分别进行匹配度的计算,并基于匹配度对图像添加奖励特效或者惩罚特效。具体过程可参见上述实施例,此处不再赘述。
本公开实施例的技术方案,采集目标用户的语音数据和图像;确定语音数据与基准音频间的匹配度;根据匹配度确定目标特效;将目标特效添加至采集的图像中,获得目标图像; 对语音数据和目标图像进行音视频编码,获得目标视频。本公开实施例提供的视频录制方法,将根据匹配度获得的特效添加至采集的图像中,可以提高录制视频的趣味性,丰富视频的呈现方式,提高用户体验。
图2是本公开实施例提供的一种视频录制装置的结构示意图。如图2所示,该装置包括:
采集模块210,设置为采集目标用户的语音数据和图像;
匹配度确定模块220,设置为确定所述语音数据与基准音频间的匹配度;
目标特效确定模块230,设置为根据所述匹配度确定目标特效;
目标图像获取模块240,设置为将所述目标特效添加至采集的图像中,获得目标图像;
目标视频获取模块250,设置为对所述语音数据和所述目标图像进行音视频编码,获得目标视频。
例如,视频录制装置还包括:基准音频播放模块,设置为:
接收目标用户选择的基准音频;
对所述基准音频进行分段处理,获得多段子音频;
按照时间戳依次播放所述多段子音频,使得所述目标用户模仿播放的子音频以发出语音。
例如,当所述基准音频为目标歌曲时,基准音频播放模块,还设置为:
获取所述目标歌曲的MIDI文件及歌词;
对所述歌词进行分解,获得多句子歌词;
将所述多句子歌词和所述MIDI文件按照时间戳依次进行播放,使得所述目标用户根据播放的子歌词和所述MIDI文件对应的旋律演唱所述目标歌曲。
例如,匹配度确定模块220,还设置为:
提取所述语音数据的语音特征及所述基准音频的音频特征;
确定所述语音特征与所述音频特征间的相似度;
将所述相似度确定为所述语音数据与基准音频间的匹配度。
例如,视频录制装置还包括:对应关系建立模块,设置为:
预先建立匹配度与特效间的对应关系;
例如,目标特效确定模块230,还设置为:
根据所述对应关系确定所述匹配度对应的目标特效。
例如,目标特效确定模块230,还设置为:
对采集的图像中的目标用户进行特征提取,获得所述目标用户的特征信息;
根据所述特征信息和所述匹配度确定目标特效。
例如,目标图像获取模块240,还设置为:
将所述目标特效添加至在当前匹配度和下一个匹配度间采集的图像中;或者,将所述目标特效添加至从当前匹配度开始采集的设定数量的图像中。
例如,目标图像获取模块240,还设置为:
调用所述目标特效对应的特效包对采集的图像进行特效处理,获得目标图像。
上述装置可执行本公开前述所有实施例所提供的方法,具备执行上述方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本公开前述所有实施例所提供的方法。
下面参考图3,其示出了适于用来实现本公开实施例的电子设备300的结构示意图。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端,或者多种形式的服务器,如独立服务器或者服务器集群。图3示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图3所示,电子设备300可以包括处理装置(例如中央处理器、图形处理器等)301,其可以根据存储在只读存储装置(ROM)302中的程序或者从存储装置305加载到随机访问存储装置(RAM)303中的程序而执行多种适当的动作和处理。在RAM 303中,还存储有电子设备300操作所需的多种程序和数据。处理装置301、ROM 302以及RAM 303通过总线304彼此相连。输入/输出(I/O)接口305也连接至总线304。
通常,以下装置可以连接至I/O接口305:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置306;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置307;包括例如磁带、硬盘等的存储装置308;以及通信装置309。通信装置309可以允许电子设备300与其他设备进行无线或有线通信以交换数据。虽然图3示出了具有多种装置的电子设备300,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行词语的推荐方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置309从网络上被下载和安装,或者从存储装置305被安装,或者从ROM 302被安装。在该计算机程序被处理装置301执行时,执行本公开实施例的方法中限定的上述功能。计算机可读介质可以为非暂态计算机可读介质。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号, 其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:采集目标用户的语音数据和图像;确定所述语音数据与基准音频间的匹配度;根据所述匹配度确定目标特效;将所述目标特效添加至采集的图像中,获得目标图像;对所述语音数据和所述目标图像进行音视频编码,获得目标视频。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开多种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开实施例的一个或多个实施例,本公开实施例公开了一种视频录制方法,包括:
采集目标用户的语音数据和图像;
确定所述语音数据与基准音频间的匹配度;
根据所述匹配度确定目标特效;
将所述目标特效添加至采集的图像中,获得目标图像;
对所述语音数据和所述目标图像进行音视频编码,获得目标视频。
例如,在采集目标用户的语音数据和图像之前,还包括:
接收目标用户选择的基准音频;
对所述基准音频进行分段处理,获得多段子音频;
按照时间戳依次播放所述多段子音频,使得所述目标用户模仿播放的子音频以发出语音。
例如,若所述基准音频为目标歌曲,则对所述基准音频进行分段处理,获得多段子音频,包括:
获取所述目标歌曲的MIDI文件及歌词;
对所述歌词进行分解,获得多句子歌词;
按照时间戳依次播放所述多段子音频,使得所述目标用户模仿播放的子音频以发出语音,包括:
将所述多句子歌词和所述MIDI文件按照时间戳依次进行播放,使得所述目标用户根据播放的子歌词和所述MIDI文件对应的旋律演唱所述目标歌曲。
例如,确定所述语音数据与基准音频间的匹配度,包括:
提取所述语音数据的语音特征及所述基准音频的音频特征;
确定所述语音特征与所述音频特征间的相似度;
将所述相似度确定为所述语音数据与基准音频间的匹配度。
例如,所述方法还包括:
预先建立匹配度与特效间的对应关系;
根据所述匹配度确定目标特效,包括:
根据所述对应关系确定所述匹配度对应的目标特效。
例如,根据所述匹配度确定目标特效,包括:
对采集的图像中的目标用户进行特征提取,获得所述目标用户的特征信息;
根据所述特征信息和所述匹配度确定目标特效。
例如,将所述目标特效添加至采集的图像中,包括:
将所述目标特效添加至在当前匹配度和下一个匹配度间采集的图像中;或者,将所述目标特效添加至从当前匹配度开始采集的设定数量的图像中。
例如,将所述目标特效添加至采集的图像中,获得目标图像,包括:
调用所述目标特效对应的特效包对采集的图像进行特效处理,获得目标图像。

Claims (11)

  1. 一种视频录制方法,包括:
    采集目标用户的语音数据和图像;
    确定所述语音数据与基准音频间的匹配度;
    根据所述匹配度确定目标特效;
    将所述目标特效添加至采集的图像中,获得目标图像;
    对所述语音数据和所述目标图像进行音视频编码,获得目标视频。
  2. 根据权利要求1所述的方法,在所述采集目标用户的语音数据和图像之前,还包括:
    接收目标用户选择的基准音频;
    对所述基准音频进行分段处理,获得多段子音频;
    按照时间戳依次播放所述多段子音频,使得所述目标用户模仿播放的子音频以发出语音。
  3. 根据权利要求2所述的方法,其中,响应于确定所述基准音频为目标歌曲,所述对所述基准音频进行分段处理,获得多段子音频,包括:
    获取所述目标歌曲的MIDI文件及歌词;
    对所述歌词进行分解,获得多句子歌词;
    所述按照时间戳依次播放所述多段子音频,使得所述目标用户模仿播放的子音频以发出语音,包括:
    将所述多句子歌词和所述MIDI文件按照时间戳依次进行播放,使得所述目标用户根据播放的子歌词和所述MIDI文件对应的旋律演唱所述目标歌曲。
  4. 根据权利要求2所述的方法,其中,所述确定所述语音数据与基准音频间的匹配度,包括:
    提取所述语音数据的语音特征及所述基准音频的音频特征;
    确定所述语音特征与所述音频特征间的相似度;
    将所述相似度确定为所述语音数据与基准音频间的匹配度。
  5. 根据权利要求1所述的方法,还包括:
    预先建立匹配度与特效间的对应关系;
    所述根据所述匹配度确定目标特效,包括:
    根据所述对应关系确定所述匹配度对应的目标特效。
  6. 根据权利要求1所述的方法,其中,所述根据所述匹配度确定目标特效,包括:
    对采集的图像中的目标用户进行特征提取,获得所述目标用户的特征信息;
    根据所述特征信息和所述匹配度确定目标特效。
  7. 根据权利要求1所述的方法,其中,所述将所述目标特效添加至采集的图像中,包括:
    将所述目标特效添加至在当前匹配度和下一个匹配度间采集的图像中;或者,将所述目标特效添加至从当前匹配度开始采集的设定数量的图像中。
  8. 根据权利要求1所述的方法,其中,所述将所述目标特效添加至采集的图像中,获得目标图像,包括:
    调用所述目标特效对应的特效包对采集的图像进行特效处理,获得目标图像。
  9. 一种视频录制装置,包括:
    采集模块,设置为采集目标用户的语音数据和图像;
    匹配度确定模块,设置为确定所述语音数据与基准音频间的匹配度;
    目标特效确定模块,设置为根据所述匹配度确定目标特效;
    目标图像获取模块,设置为将所述目标特效添加至采集的图像中,获得目标图像;
    目标视频获取模块,设置为对所述语音数据和所述目标图像进行音视频编码,获得目标视频。
  10. 一种电子设备,包括:
    一个或多个处理装置;
    存储装置,设置为存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理装置执行,使得所述一个或多个处理装置实现如权利要求1-8中任一所述的视频录制方法。
  11. 一种计算机可读介质,其上存储有计算机程序,所述计算机程序被处理装置执行时实现如权利要求1-8中任一所述的视频录制方法。
PCT/CN2022/118698 2021-09-30 2022-09-14 视频录制方法、装置、设备及存储介质 WO2023051246A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111165277.6 2021-09-30
CN202111165277.6A CN113923390A (zh) 2021-09-30 2021-09-30 视频录制方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023051246A1 true WO2023051246A1 (zh) 2023-04-06

Family

ID=79237938

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/118698 WO2023051246A1 (zh) 2021-09-30 2022-09-14 视频录制方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN113923390A (zh)
WO (1) WO2023051246A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113923390A (zh) * 2021-09-30 2022-01-11 北京字节跳动网络技术有限公司 视频录制方法、装置、设备及存储介质
CN114697568B (zh) * 2022-04-07 2024-02-20 脸萌有限公司 特效视频确定方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1601600A (zh) * 2003-09-24 2005-03-30 乐金电子(惠州)有限公司 卡拉ok系统中的歌词着色方法
US20120067196A1 (en) * 2009-06-02 2012-03-22 Indian Institute of Technology Autonomous Research and Educational Institution System and method for scoring a singing voice
CN104581348A (zh) * 2015-01-27 2015-04-29 苏州乐聚一堂电子科技有限公司 伴唱视觉特效系统及伴唱视觉特效处理方法
CN108259983A (zh) * 2017-12-29 2018-07-06 广州市百果园信息技术有限公司 一种视频图像处理方法、计算机可读存储介质和终端
CN112380379A (zh) * 2020-11-18 2021-02-19 北京字节跳动网络技术有限公司 歌词特效展示方法、装置、电子设备及计算机可读介质
CN113923390A (zh) * 2021-09-30 2022-01-11 北京字节跳动网络技术有限公司 视频录制方法、装置、设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7977562B2 (en) * 2008-06-20 2011-07-12 Microsoft Corporation Synthesized singing voice waveform generator
CN102543136B (zh) * 2012-02-17 2015-05-20 广州盈可视电子科技有限公司 一种视频剪辑的方法及装置
CN102968409B (zh) * 2012-11-23 2015-09-09 海信集团有限公司 智能人机交互语义分析方法及交互系统
CN105094807A (zh) * 2015-06-25 2015-11-25 三星电子(中国)研发中心 一种实现语音控制的方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1601600A (zh) * 2003-09-24 2005-03-30 乐金电子(惠州)有限公司 卡拉ok系统中的歌词着色方法
US20120067196A1 (en) * 2009-06-02 2012-03-22 Indian Institute of Technology Autonomous Research and Educational Institution System and method for scoring a singing voice
CN104581348A (zh) * 2015-01-27 2015-04-29 苏州乐聚一堂电子科技有限公司 伴唱视觉特效系统及伴唱视觉特效处理方法
CN108259983A (zh) * 2017-12-29 2018-07-06 广州市百果园信息技术有限公司 一种视频图像处理方法、计算机可读存储介质和终端
CN112380379A (zh) * 2020-11-18 2021-02-19 北京字节跳动网络技术有限公司 歌词特效展示方法、装置、电子设备及计算机可读介质
CN113923390A (zh) * 2021-09-30 2022-01-11 北京字节跳动网络技术有限公司 视频录制方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN113923390A (zh) 2022-01-11

Similar Documents

Publication Publication Date Title
CN109543064B (zh) 歌词显示处理方法、装置、电子设备及计算机存储介质
WO2023051246A1 (zh) 视频录制方法、装置、设备及存储介质
WO2020113733A1 (zh) 动画生成方法、装置、电子设备及计算机可读存储介质
CN111402842B (zh) 用于生成音频的方法、装置、设备和介质
CN111798821B (zh) 声音转换方法、装置、可读存储介质及电子设备
US11721312B2 (en) System, method, and non-transitory computer-readable storage medium for collaborating on a musical composition over a communication network
CN110675886A (zh) 音频信号处理方法、装置、电子设备及存储介质
CN113365134A (zh) 音频分享方法、装置、设备及介质
WO2021259300A1 (zh) 音效添加方法和装置、存储介质和电子设备
CN111782576B (zh) 背景音乐的生成方法、装置、可读介质、电子设备
CN113257218B (zh) 语音合成方法、装置、电子设备和存储介质
CN115691544A (zh) 虚拟形象口型驱动模型的训练及其驱动方法、装置和设备
CN110660375A (zh) 一种生成乐曲的方法、装置和设备
WO2022160603A1 (zh) 歌曲的推荐方法、装置、电子设备及存储介质
WO2024078293A1 (zh) 图像处理方法、装置、电子设备及存储介质
CN106649559B (zh) 音频推荐方法及装置
WO2024001548A1 (zh) 歌单生成方法、装置、电子设备及存储介质
WO2023061229A1 (zh) 视频生成方法及设备
CN114595361B (zh) 一种音乐热度的预测方法、装置、存储介质及电子设备
CN115623146A (zh) 一种生成特效视频的方法、装置、电子设备及存储介质
WO2023010949A1 (zh) 一种音频数据的处理方法及装置
WO2023160713A1 (zh) 音乐生成方法、装置、设备、存储介质及程序
CN114822456A (zh) 基于乐谱的乐谱音频检测方法、装置、设备和计算机介质
WO2023217003A1 (zh) 音频处理方法、装置、设备及存储介质
WO2024012257A1 (zh) 音频处理方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22874633

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE