WO2020253806A1 - 展示视频的生成方法、装置、设备及存储介质 - Google Patents
展示视频的生成方法、装置、设备及存储介质 Download PDFInfo
- Publication number
- WO2020253806A1 WO2020253806A1 PCT/CN2020/096969 CN2020096969W WO2020253806A1 WO 2020253806 A1 WO2020253806 A1 WO 2020253806A1 CN 2020096969 W CN2020096969 W CN 2020096969W WO 2020253806 A1 WO2020253806 A1 WO 2020253806A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- displayed
- content
- beat
- pictures
- music
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000000605 extraction Methods 0.000 claims abstract description 21
- 239000012634 fragment Substances 0.000 claims description 31
- 238000013528 artificial neural network Methods 0.000 claims description 27
- 238000012545 processing Methods 0.000 claims description 13
- 230000000694 effects Effects 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 9
- 238000013527 convolutional neural network Methods 0.000 claims description 5
- 238000003062 neural network model Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 241000699670 Mus sp. Species 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0276—Advertisement creation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/265—Mixing
Definitions
- the present disclosure relates to the field of Internet technology, for example, to a method, device, device, and storage medium for generating a display video.
- Advertisement is a means of propaganda to convey information to the public openly and widely through a certain form of media for a specific demand.
- the embodiments of the present disclosure provide a method, device, equipment, and storage medium for generating a display video, so as to reduce the cost of generating the display video and improve the quality of the display video.
- the embodiment of the present disclosure provides a method for generating a display video, including:
- Obtain data where the data includes one of the following: at least two pictures of the content to be displayed; at least two pictures of the content to be displayed and characteristic information of the content to be displayed;
- a display video is generated according to at least two pictures of the content to be displayed and a music clip matching the content to be displayed, wherein the time point at which each picture is presented in the display video is the same as the beat point in the beat information correspond.
- the embodiment of the present disclosure also provides a device for generating a display video, including:
- the characteristic information acquiring module is configured to acquire data, wherein the data includes one of the following: at least two pictures of the content to be displayed; at least two pictures of the content to be displayed and characteristic information of the content to be displayed;
- a music segment determining module configured to determine a music segment matching the content to be displayed according to the acquired data
- the beat information acquisition module is configured to perform feature extraction on the music fragment to obtain beat information of the music fragment, wherein the beat information includes at least two beat points;
- the display video generation module is configured to generate a display video based on at least two pictures of the content to be displayed and a music clip matching the content to be displayed, wherein the time point when each picture is presented in the display video and the time Corresponds to the beat points in the beat information.
- An embodiment of the present disclosure also provides an electronic device, which includes:
- One or more processing devices are One or more processing devices;
- Storage device set to store one or more programs
- the one or more processing devices When the one or more programs are executed by the one or more processing devices, the one or more processing devices implement the display video generation method according to the embodiment of the present disclosure.
- the embodiment of the present disclosure also provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, the method for generating a display video as described in the embodiment of the present disclosure is realized.
- FIG. 1 is a flowchart of a method for generating a display video provided by Embodiment 1 of the present disclosure
- FIG. 2 is a schematic structural diagram of an apparatus for generating a display video provided by Embodiment 2 of the present disclosure
- FIG. 3 is a schematic structural diagram of an electronic device provided in the third embodiment of the present disclosure.
- FIG. 1 is a flowchart of a method for generating a display video according to Embodiment 1 of the present disclosure. This embodiment is applicable to a situation where a display video is generated based on a picture of a content to be displayed.
- the method can be executed by a display video generation device
- the device can be composed of hardware and/or software, and is generally integrated in electronic equipment. As shown in Figure 1, the method includes the following steps:
- Step 110 Obtain at least two pictures of the content to be displayed and/or characteristic information of the content to be displayed.
- the content to be displayed can be commodities, concerts, competitions, film and television dramas, and tourist attractions that need to be promoted.
- the characteristic information of the content to be displayed may include category information of the content to be displayed, information about the owner of the content to be displayed, and delivery data of the content to be displayed.
- the owner information of the content to be displayed can be the producer of the content to be displayed, such as the manufacturer of the product, the organizer of the concert, the producer of the film and television series, etc.; the release data of the content to be displayed can be the consumption after the initial release of the content to be displayed Volume, volume, click volume, etc.
- the user uploads at least two pictures of the content to be displayed and characteristic information of the content to be displayed.
- Step 120 Determine a music segment matching the content to be displayed according to the acquired at least two pictures of the content to be displayed and/or characteristic information of the content to be displayed.
- the music clip is used as background music for the display video.
- the feature of the content to be displayed is obtained, and the matching music segment is obtained according to the feature of the content to be displayed.
- a music segment matching the content to be displayed is determined based on at least two pictures and characteristic information at the same time.
- determining a music segment that matches the content to be displayed can be implemented in the following manner: feature extraction of at least two pictures , Obtain the first feature vector; generate the second feature vector according to the feature information; input the first feature vector and/or the second feature vector into the set neural network model to obtain a music segment matching the content to be displayed.
- the set neural network can be a deep neural network (Deep Neural Network, DNN) or a Convolutional Neural Network (Convolutional Neural Networks, CNN).
- DNN Deep Neural Network
- CNN Convolutional Neural Networks
- the neural network is assumed to have the ability to output music fragments matching the content to be displayed according to the input first feature vector and/or second feature vector.
- the manner of performing feature extraction on at least two pictures may be to input at least two pictures into a feature extraction neural network to perform feature extraction, so as to obtain the first feature vector corresponding to the at least two pictures.
- the method of generating the second feature vector according to the feature information may be to obtain vector elements corresponding to the feature information, and then form the second feature vector.
- the neural network After obtaining the first feature vector and the second feature vector, set the neural network with the first feature vector and the second feature vector, or one of the first feature vector and the second feature vector, so as to obtain a match with the content to be displayed Music clips.
- Step 130 Perform feature extraction on the music fragment to obtain beat information of the music fragment, where the beat information includes at least two beat points.
- the feature extraction is performed on the music segment to obtain the beat information of the music segment.
- the way to obtain the beat information of the music segment may be: using Mel-Frequency Cepstrum (MFCC) algorithm to extract the features of the music segment to obtain a satisfactory setting
- MFCC Mel-Frequency Cepstrum
- Conditional accent points Acquire a group of accent points whose time interval between adjacent accent points is within a set range, and determine the group of accent points as beat information of the music segment.
- the beat information includes at least two beat points, and the at least two beat points have a one-to-one correspondence with the accent points in the group.
- the accent points satisfying the set condition may be music points whose sound frequency exceeds a preset threshold.
- a group of accent points whose time intervals between adjacent accent points are within a set range can be understood as the same or similar time intervals between adjacent accent points.
- the MFCC algorithm is used to extract the accent points in the music fragment, and then a group of accent points with the same or similar time interval between adjacent accent points is obtained, and the group of accent points is regarded as the music fragment Beat information.
- Step 140 Generate a display video based on at least two pictures of the content to be displayed and a music segment matching the content to be displayed.
- the time point of each picture presented in the display video corresponds to the beat point in the beat information.
- the beat information of the music fragment is obtained, at least two pictures are set on the beat points in the beat information according to the set sequence, and the at least two pictures and the music fragment set on the beat points are merged to obtain a display video.
- Setting at least two pictures on the beat points in the beat information according to the set sequence can be understood as a picture corresponding to each beat point in the beat information.
- the setting sequence can be the upload sequence of the pictures or the shooting time sequence marked in the pictures, which is not limited here.
- the method further includes the following step: adding a set playing special effect to the at least two pictures.
- Setting playback special effects can include special effects such as entering the picture from left to right, rotating into the picture, and entering the picture from top to bottom.
- a set playing special effect is added to at least two pictures, so that when the display video is played, the pictures in the display video are played according to the set playing special effect, which increases the interest of the display video.
- the method before setting at least two pictures on the beat points in the beat information in a set order, the method further includes the following step: if the number of beat points in the beat information is greater than the number of pictures, then the music clip is cut Process to make the number of beat points equal to the number of pictures; if the number of beat points in the beat information is less than the number of pictures, copy the music sub-segment from the music segment, and stitch the music sub-segment and the music segment to form new music Fragment, so that the number of beat points contained in the new music fragment is equal to the number of pictures.
- the way of cutting the music segment can be to start cutting from the beginning or the end of the music segment, and the size of the cut segment can be determined according to the number of beat points and the number of pictures.
- the length of the music sub-segment can be determined according to the number of beats and the number of pictures.
- the way of copying the music sub-segment from the music segment can be to copy the music sub-segment of a certain length from the beginning and the end of the music segment. The advantage of this is that the number of pictures matches the length of the music clip.
- the technical solution of this embodiment firstly, at least two pictures of the content to be displayed and/or characteristic information of the content to be displayed are acquired, and then according to the acquired at least two pictures of the content to be displayed and/or characteristic information of the content to be displayed.
- the music fragments matching the content to be displayed are then feature extracted to obtain the beat information of the music fragments, and finally a display video is generated according to at least two pictures of the content to be displayed and the music fragments matching the content to be displayed.
- the display video generation method provided by the embodiment of the present disclosure obtains the beat information of the music segment matching the content to be displayed, and generates the display video based on at least two pictures and the music segment, which can reduce the cost of display video generation and improve the display video performance quality.
- performing feature extraction on at least two pictures, and before obtaining the first feature vector further includes the following steps: obtaining a display video sample set; extracting the first feature vector corresponding to each video frame of the display video in the display video sample set and / Or the second feature vector corresponding to the feature information; for each display video, input the first feature vector and/or the second feature vector into the set neural network to obtain the initial music segment; according to the initial music segment and the music segment in the display video
- the loss function adjusts the parameters in the set neural network to train the set neural network.
- the display video in the display video sample set may be a published video.
- the process of extracting the first feature vector corresponding to the video frame of each display video may be to input all or part of the video frames included in the display video into the feature extraction neural network to obtain the first feature vector of the current display video.
- the method for extracting the second feature vector corresponding to the feature information of each displayed video may be to generate the second feature vector according to feature information such as category information of the currently displayed video, owner information, and placement data.
- the initial music segment After inputting the first feature vector and/or second feature vector of the current display video into the set neural network, the initial music segment is obtained, and then the loss function of the initial music segment and the music segment in the current display video is calculated, and the loss function is set in the design Set the reverse transmission in the neural network and adjust the parameters in the deep neural network to train the deep neural network.
- training the set neural network by displaying the video sample set can improve the recognition accuracy of the set neural network.
- FIG. 2 is a schematic structural diagram of an apparatus for generating a display video provided by the second embodiment of the disclosure.
- the device includes: a feature information acquisition module 210, a music segment determination module 220, a beat information acquisition module 230, and a display video generation module 240.
- the feature information obtaining module 210 is configured to obtain at least two pictures of the content to be displayed and/or feature information of the content to be displayed.
- the music segment determining module 220 is configured to determine a music segment matching the content to be displayed according to the acquired at least two pictures of the content to be displayed and/or characteristic information of the content to be displayed.
- the beat information acquisition module 230 is configured to perform feature extraction on the music fragment to obtain beat information of the music fragment, and the beat information includes at least two beat points.
- the display video generation module 240 is configured to generate a display video based on at least two pictures of the content to be displayed and a music clip matching the content to be displayed, wherein the time point at which each picture is presented in the display video and the beat point in the beat information correspond.
- the characteristic information of the content to be displayed includes category information of the content to be displayed, information about the owner of the content to be displayed, and delivery data of the content to be displayed.
- the music segment determining module 220 is set to:
- the beat information acquisition module 230 is set to:
- the beat information of the music segment includes at least two beat points, and the at least two beat points correspond one-to-one with the accent points in the group.
- the display video generation module 240 is set to:
- At least two pictures are set on the beat points in the beat information according to the set sequence; at least two pictures and music clips set on the beat points are merged to obtain a display video.
- Optional also includes:
- a music segment adjustment module set to:
- the music fragment will be cut to make the number of beat points equal to the number of pictures; if the number of beat points in the beat information is less than the number of pictures, then the music fragment Copy the music sub-segment in, splicing the music sub-segment with the music segment to form a new music segment, so that the number of beat points contained in the new music segment is equal to the number of pictures.
- it also includes: setting the neural network training module to:
- the display video sample set extract the first feature vector corresponding to each video frame of the display video in the display video sample set and/or the second feature vector corresponding to the feature information; for each display video, combine the first feature vector and/or
- the second feature vector input sets the neural network to obtain the initial music segment;
- the set neural network includes a deep neural network or a convolutional neural network; adjusts the parameters in the neural network according to the loss function of the initial music segment and the music segment in the display video , To train the set neural network.
- the foregoing device can execute the methods provided in all the foregoing embodiments of the present disclosure, and has functional modules and effects corresponding to the foregoing methods. For technical details not described in this embodiment, refer to the methods provided in all the foregoing embodiments of the present disclosure.
- FIG. 3 shows a schematic structural diagram of an electronic device 300 suitable for implementing embodiments of the present disclosure.
- the electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA), tablet computers (PAD), and portable multimedia players (Portable Media Player). , PMP), in-vehicle terminals (for example, in-vehicle navigation terminals), mobile terminals such as digital televisions (Television, TV), desktop computers, etc., or multiple forms of servers, such as independent servers or server clusters.
- PMP Personal Digital Assistant
- PDA Personal Digital Assistant
- PAD tablet computers
- portable multimedia players Portable Media Player
- PMP Personal Digital Assistant
- in-vehicle terminals for example, in-vehicle navigation terminals
- mobile terminals such as digital televisions (Television, TV), desktop computers, etc.
- multiple forms of servers such as independent servers or server clusters.
- the electronic device 300 may include a processing device (such as a central processing unit, a graphics processor, etc.) 301, which can be based on a program stored in a read-only memory (Read-Only Memory, ROM) 302 or from a storage device.
- the device 305 loads a program in a random access memory (RAM) 303 to execute various appropriate actions and processes.
- RAM random access memory
- various programs and data required for the operation of the electronic device 300 are also stored.
- the processing device 301, ROM 302, and RAM 303 are connected to each other through a bus 304.
- An input/output (Input/Output, I/O) interface 305 is also connected to the bus 304.
- the following devices can be connected to the I/O interface 305: including input devices 306 such as touch screens, touch pads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; including, for example, liquid crystal displays (LCD) Output devices 307 such as speakers, vibrators, etc.; storage devices 308 such as magnetic tapes, hard disks, etc.; and communication devices 309.
- the communication device 309 may allow the electronic device 300 to perform wireless or wired communication with other devices to exchange data.
- FIG. 3 shows an electronic device 300 having multiple devices, it is not required to implement or have all the devices shown. It may alternatively be implemented or provided with more or fewer devices.
- an embodiment of the present disclosure includes a computer program product including a computer program carried on a computer-readable medium, and the computer program contains program code for executing a word recommendation method.
- the computer program may be downloaded and installed from the network through the communication device 309, or installed from the storage device 305, or installed from the ROM 302.
- the processing device 301 When the computer program is executed by the processing device 301, the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
- the aforementioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two.
- the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above.
- Examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access storage devices (RAM), read-only storage devices (ROM), erasable programmable Read-only storage device (Erasable Programmable Read-Only Memory, EPROM or flash memory), optical fiber, portable compact disk read-only memory device (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or Any suitable combination of the above.
- a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein.
- This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
- the computer-readable signal medium may send, propagate or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
- the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, radio frequency (RF), etc., or any suitable combination of the above.
- the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device.
- the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the processing device, the electronic device: obtains at least two pictures of the content to be displayed and/or characteristic information of the content to be displayed According to the acquired at least two pictures of the content to be displayed and/or the feature information of the content to be displayed, determine a music segment matching the content to be displayed; perform feature extraction on the music segment to obtain the music The beat information of the segment, where the beat information includes at least two beat points; a display video is generated based on at least two pictures of the content to be displayed and a music segment matching the content to be displayed, wherein each picture is displayed in the video The time point presented in corresponds to the beat point in the beat information.
- the computer program code used to perform the operations of the present disclosure may be written in one or more programming languages or a combination thereof.
- the above-mentioned programming languages include object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language.
- the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
- the remote computer can be connected to the user's computer through any kind of network-including Local Area Network (LAN) or Wide Area Network (WAN)-or it can be connected to an external computer (for example, use an Internet service provider to connect via the Internet).
- LAN Local Area Network
- WAN Wide Area Network
- each block in the flowchart or block diagram can represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logical function Executable instructions.
- the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
- each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
- the units involved in the embodiments described in the present disclosure may be implemented in a software manner, or may be implemented in a hardware manner. Among them, the name of the module does not constitute a limitation on the module itself in one case.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Business, Economics & Management (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Accounting & Taxation (AREA)
- Acoustics & Sound (AREA)
- Development Economics (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Marketing (AREA)
- Economics (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- User Interface Of Digital Computer (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims (11)
- 一种展示视频的生成方法,包括:获取数据,其中,所述数据包括以下之一:待展示内容的至少两张图片;待展示内容的至少两张图片和待展示内容的特征信息;根据获取的数据确定与所述待展示内容匹配的音乐片段;对所述音乐片段进行特征提取,获得所述音乐片段的节拍信息,其中,所述节拍信息包括至少两个节拍点;根据所述待展示内容的至少两张图片和与所述待展示内容匹配的音乐片段生成展示视频,其中,每张图片在所述展示视频中呈现的时间点与所述节拍信息中的节拍点对应。
- 根据权利要求1所述的方法,其中,所述待展示内容的特征信息包括待展示内容类别信息、待展示内容业主信息及待展示内容投放数据。
- 根据权利要求1或2所述的方法,其中,所述根据获取的数据确定与所述待展示内容匹配的音乐片段,包括以下之一:对所述至少两张图片进行特征提取,获得第一特征向量;将所述第一特征向量输入设定神经网络模型,获得与所述待展示内容匹配的音乐片段;根据所述特征信息生成第二特征向量;将所述第二特征向量输入设定神经网络模型,获得与所述待展示内容匹配的音乐片段;对所述至少两张图片进行特征提取,获得第一特征向量;根据所述特征信息生成第二特征向量;将所述第一特征向量和所述第二特征向量输入设定神经网络模型,获得与所述待展示内容匹配的音乐片段。
- 根据权利要求1所述的方法,其中,所述对所述音乐片段进行特征提取,获得所述音乐片段的节拍信息,包括:采用梅尔频率倒谱系数算法对所述音乐片段进行特征提取,获得满足设定条件的重音点;获取相邻重音点的时间间隔在设定范围内的一组重音点,将所述组重音点确定为音乐片段的节拍信息,其中,所述至少两个节拍点与组内的重音点一一对应。
- 根据权利要求1所述的方法,其中,所述根据所述待展示内容的至少两张图片和与所述待展示内容匹配的音乐片段生成展示视频,包括:将所述至少两张图片按照设定顺序设置于所述节拍信息中的节拍点上;将设置于节拍点上的所述至少两张图片和所述音乐片段进行合并,获得所 述展示视频。
- 根据权利要求5所述的方法,在所述将所述至少两张图片按照设定顺序设置于所述节拍信息中的节拍点上之后,还包括:对所述至少两张图片添加设定播放特效。
- 根据权利要求5所述的方法,在所述将所述至少两张图片按照设定顺序设置于所述节拍信息中的节拍点上之前,还包括:在所述节拍信息中节拍点的数量大于所述至少两张图片的数量的情况下,对所述音乐片段进行剪切处理,使得所述节拍点的数量与所述至少两张图片的数量相等;在所述节拍信息中节拍点的数量小于所述至少两张图片的数量,从所述音乐片段中复制音乐子片段,将所述音乐子片段与所述音乐片段进行拼接,形成新的音乐片段,使得新的音乐片段包含的节拍点的数量与所述至少两张图片的数量相等。
- 根据权利要求3所述的方法,在所述对所述至少两张图片进行特征提取,获得第一特征向量之前,还包括:获取展示视频样本集,其中,所述展示视频样本集中包括多个展示视频;提取以下至少之一:每个展示视频的视频帧对应的第一特征向量;每个展示视频的特征信息对应的第二特征向量;对于每个展示视频,将提取的特征向量输入设定神经网络,获得初始音乐片段;其中,所述设定神经网络包括深度神经网络或卷积神经网络;根据通过所述初始音乐片段和所述展示视频中的音乐片段得到的损失函数调整所述设定神经网络中的参数,以对所述设定神经网络进行训练。
- 一种展示视频的生成装置,包括:特征信息获取模块,设置为获取数据,其中,所述数据包括以下之一:待展示内容的至少两张图片;待展示内容的至少两张图片和待展示内容的特征信息;音乐片段确定模块,设置为根据获取的数据确定与所述待展示内容匹配的音乐片段;节拍信息获取模块,设置为对所述音乐片段进行特征提取,获得所述音乐片段的节拍信息,其中,所述节拍信息包括至少两个节拍点;展示视频生成模块,设置为根据所述待展示内容的至少两张图片和与所述 待展示内容匹配的音乐片段生成展示视频,其中,每张图片在所述展示视频中呈现的时间点与所述节拍信息中的节拍点对应。
- 一种电子设备,包括:至少一个处理装置;存储装置,设置为存储至少一个程序;当所述至少一个程序被所述至少一个处理装置执行,使得所述至少一个处理装置实现如权利要求1-8中任一项所述的展示视频的生成方法。
- 一种计算机可读介质,存储有计算机程序,其中,所述程序被处理装置执行时实现如权利要求1-8中任一项所述的展示视频的生成方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910532395.2 | 2019-06-19 | ||
CN201910532395.2A CN110278388B (zh) | 2019-06-19 | 2019-06-19 | 展示视频的生成方法、装置、设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020253806A1 true WO2020253806A1 (zh) | 2020-12-24 |
Family
ID=67961271
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/096969 WO2020253806A1 (zh) | 2019-06-19 | 2020-06-19 | 展示视频的生成方法、装置、设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110278388B (zh) |
WO (1) | WO2020253806A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114329001A (zh) * | 2021-12-23 | 2022-04-12 | 游艺星际(北京)科技有限公司 | 动态图片的显示方法、装置、电子设备及存储介质 |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110278388B (zh) * | 2019-06-19 | 2022-02-22 | 北京字节跳动网络技术有限公司 | 展示视频的生成方法、装置、设备及存储介质 |
CN112822563A (zh) | 2019-11-15 | 2021-05-18 | 北京字节跳动网络技术有限公司 | 生成视频的方法、装置、电子设备和计算机可读介质 |
CN112822541B (zh) | 2019-11-18 | 2022-05-20 | 北京字节跳动网络技术有限公司 | 视频生成方法、装置、电子设备和计算机可读介质 |
CN111010611A (zh) * | 2019-12-03 | 2020-04-14 | 北京达佳互联信息技术有限公司 | 电子相册的获取方法、装置、计算机设备和存储介质 |
CN113223487B (zh) * | 2020-02-05 | 2023-10-17 | 字节跳动有限公司 | 一种信息识别方法及装置、电子设备和存储介质 |
CN111432141B (zh) * | 2020-03-31 | 2022-06-17 | 北京字节跳动网络技术有限公司 | 一种混剪视频确定方法、装置、设备及存储介质 |
CN111813970A (zh) * | 2020-07-14 | 2020-10-23 | 广州酷狗计算机科技有限公司 | 多媒体内容展示方法、装置、终端及存储介质 |
CN111756953A (zh) * | 2020-07-14 | 2020-10-09 | 北京字节跳动网络技术有限公司 | 视频处理方法、装置、设备和计算机可读介质 |
CN112259062B (zh) * | 2020-10-20 | 2022-11-04 | 北京字节跳动网络技术有限公司 | 特效展示方法、装置、电子设备及计算机可读介质 |
CN112489681B (zh) * | 2020-11-23 | 2024-08-16 | 瑞声新能源发展(常州)有限公司科教城分公司 | 节拍识别方法、装置及存储介质 |
CN113473177B (zh) * | 2021-05-27 | 2023-10-31 | 北京达佳互联信息技术有限公司 | 音乐推荐方法、装置、电子设备及计算机可读存储介质 |
CN113438547B (zh) * | 2021-05-28 | 2022-03-25 | 北京达佳互联信息技术有限公司 | 一种音乐生成方法、装置、电子设备及存储介质 |
CN115695899A (zh) * | 2021-07-23 | 2023-02-03 | 花瓣云科技有限公司 | 视频的生成方法、电子设备及其介质 |
CN113655930B (zh) * | 2021-08-30 | 2023-01-10 | 北京字跳网络技术有限公司 | 信息发布方法、信息的展示方法、装置、电子设备及介质 |
CN116152393A (zh) * | 2021-11-18 | 2023-05-23 | 脸萌有限公司 | 视频生成方法、装置、设备及存储介质 |
CN116800908A (zh) * | 2022-03-18 | 2023-09-22 | 北京字跳网络技术有限公司 | 一种视频生成方法、装置、电子设备和存储介质 |
CN115243101B (zh) * | 2022-06-20 | 2024-04-12 | 上海众源网络有限公司 | 视频动静率识别方法、装置、电子设备及存储介质 |
CN115243107B (zh) * | 2022-07-08 | 2023-11-21 | 华人运通(上海)云计算科技有限公司 | 短视频播放的方法、装置、系统、电子设备和介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7904815B2 (en) * | 2003-06-30 | 2011-03-08 | Microsoft Corporation | Content-based dynamic photo-to-video methods and apparatuses |
CN104202540A (zh) * | 2014-09-28 | 2014-12-10 | 北京金山安全软件有限公司 | 一种利用图片生成视频的方法及系统 |
CN105072354A (zh) * | 2015-07-17 | 2015-11-18 | Tcl集团股份有限公司 | 一种利用多张照片合成视频流的方法及系统 |
CN107743268A (zh) * | 2017-09-26 | 2018-02-27 | 维沃移动通信有限公司 | 一种视频的编辑方法及移动终端 |
CN109618222A (zh) * | 2018-12-27 | 2019-04-12 | 北京字节跳动网络技术有限公司 | 一种拼接视频生成方法、装置、终端设备及存储介质 |
CN110278388A (zh) * | 2019-06-19 | 2019-09-24 | 北京字节跳动网络技术有限公司 | 展示视频的生成方法、装置、设备及存储介质 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7569761B1 (en) * | 2007-09-21 | 2009-08-04 | Adobe Systems Inc. | Video editing matched to musical beats |
CN101727943B (zh) * | 2009-12-03 | 2012-10-17 | 无锡中星微电子有限公司 | 一种图像配乐的方法、图像配乐装置及图像播放装置 |
CN102256030A (zh) * | 2010-05-20 | 2011-11-23 | Tcl集团股份有限公司 | 可匹配背景音乐的相册演示系统及其背景音乐匹配方法 |
CN102403011A (zh) * | 2010-09-14 | 2012-04-04 | 北京中星微电子有限公司 | 一种音乐输出方法及装置 |
US20140317480A1 (en) * | 2013-04-23 | 2014-10-23 | Microsoft Corporation | Automatic music video creation from a set of photos |
CN105550251A (zh) * | 2015-12-08 | 2016-05-04 | 小米科技有限责任公司 | 图片播放方法和装置 |
CN108920648B (zh) * | 2018-07-03 | 2021-06-22 | 四川大学 | 一种基于音乐-图像语义关系的跨模态匹配方法 |
CN109256146B (zh) * | 2018-10-30 | 2021-07-06 | 腾讯音乐娱乐科技(深圳)有限公司 | 音频检测方法、装置及存储介质 |
CN109697236A (zh) * | 2018-11-06 | 2019-04-30 | 建湖云飞数据科技有限公司 | 一种多媒体数据匹配信息处理方法 |
-
2019
- 2019-06-19 CN CN201910532395.2A patent/CN110278388B/zh active Active
-
2020
- 2020-06-19 WO PCT/CN2020/096969 patent/WO2020253806A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7904815B2 (en) * | 2003-06-30 | 2011-03-08 | Microsoft Corporation | Content-based dynamic photo-to-video methods and apparatuses |
CN104202540A (zh) * | 2014-09-28 | 2014-12-10 | 北京金山安全软件有限公司 | 一种利用图片生成视频的方法及系统 |
CN105072354A (zh) * | 2015-07-17 | 2015-11-18 | Tcl集团股份有限公司 | 一种利用多张照片合成视频流的方法及系统 |
CN107743268A (zh) * | 2017-09-26 | 2018-02-27 | 维沃移动通信有限公司 | 一种视频的编辑方法及移动终端 |
CN109618222A (zh) * | 2018-12-27 | 2019-04-12 | 北京字节跳动网络技术有限公司 | 一种拼接视频生成方法、装置、终端设备及存储介质 |
CN110278388A (zh) * | 2019-06-19 | 2019-09-24 | 北京字节跳动网络技术有限公司 | 展示视频的生成方法、装置、设备及存储介质 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114329001A (zh) * | 2021-12-23 | 2022-04-12 | 游艺星际(北京)科技有限公司 | 动态图片的显示方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN110278388B (zh) | 2022-02-22 |
CN110278388A (zh) | 2019-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020253806A1 (zh) | 展示视频的生成方法、装置、设备及存储介质 | |
CN110677711B (zh) | 视频配乐方法、装置、电子设备及计算机可读介质 | |
US10182095B2 (en) | Method and system for video call using two-way communication of visual or auditory effect | |
WO2021093737A1 (zh) | 生成视频的方法、装置、电子设备和计算机可读介质 | |
WO2021196903A1 (zh) | 视频处理方法、装置、可读介质及电子设备 | |
CN109543064B (zh) | 歌词显示处理方法、装置、电子设备及计算机存储介质 | |
WO2021008223A1 (zh) | 信息的确定方法、装置及电子设备 | |
WO2022152064A1 (zh) | 视频生成方法、装置、电子设备和存储介质 | |
WO2020082870A1 (zh) | 即时视频显示方法、装置、终端设备及存储介质 | |
WO2020113733A1 (zh) | 动画生成方法、装置、电子设备及计算机可读存储介质 | |
WO2020259130A1 (zh) | 精选片段处理方法、装置、电子设备及可读介质 | |
CN109640129B (zh) | 视频推荐方法、装置,客户端设备、服务器及存储介质 | |
CN110324718B (zh) | 音视频生成方法、装置、电子设备及可读介质 | |
WO2020207080A1 (zh) | 视频拍摄方法、装置、电子设备及存储介质 | |
WO2021057740A1 (zh) | 视频生成方法、装置、电子设备和计算机可读介质 | |
WO2021012764A1 (zh) | 音视频播放方法、装置、电子设备及可读介质 | |
CN113257218B (zh) | 语音合成方法、装置、电子设备和存储介质 | |
JP2020174339A (ja) | 段落と映像を整列させるための方法、装置、サーバー、コンピュータ可読記憶媒体およびコンピュータプログラム | |
CN107450874B (zh) | 一种多媒体数据双屏播放方法及系统 | |
WO2023103889A1 (zh) | 视频处理方法、装置、电子设备及存储介质 | |
US20230131975A1 (en) | Music playing method and apparatus based on user interaction, and device and storage medium | |
WO2020224294A1 (zh) | 用于处理信息的系统、方法和装置 | |
WO2022218109A1 (zh) | 交互方法, 装置, 电子设备及计算机可读存储介质 | |
WO2024078293A1 (zh) | 图像处理方法、装置、电子设备及存储介质 | |
WO2023174073A1 (zh) | 视频生成方法、装置、设备、存储介质和程序产品 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20827465 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20827465 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.03.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20827465 Country of ref document: EP Kind code of ref document: A1 |