CN114584803A - Video generation method and computer equipment - Google Patents

Video generation method and computer equipment Download PDF

Info

Publication number
CN114584803A
CN114584803A CN202011387442.8A CN202011387442A CN114584803A CN 114584803 A CN114584803 A CN 114584803A CN 202011387442 A CN202011387442 A CN 202011387442A CN 114584803 A CN114584803 A CN 114584803A
Authority
CN
China
Prior art keywords
image
images
audio
starting
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011387442.8A
Other languages
Chinese (zh)
Inventor
药欣
马瑞
曹芝勇
周树荣
毛明海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen TCL Digital Technology Co Ltd
Original Assignee
Shenzhen TCL Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen TCL Digital Technology Co Ltd filed Critical Shenzhen TCL Digital Technology Co Ltd
Priority to CN202011387442.8A priority Critical patent/CN114584803A/en
Publication of CN114584803A publication Critical patent/CN114584803A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The invention provides a video generation method and computer equipment, wherein the video generation method comprises the following steps: acquiring an image set to be processed and an audio to be processed, wherein the image set comprises a plurality of images; determining a target beat point corresponding to each image in the plurality of images based on the image set and the audio; and inserting the image corresponding to the target beat point into each target beat point in the audio to generate a video. In the invention, each image is inserted into the target beat point of the audio, so that the expressive force of the video can be improved, and the video with better quality can be obtained.

Description

Video generation method and computer equipment
Technical Field
The present application relates to the field of video processing technologies, and in particular, to a video generation method and a computer device.
Background
Video production is a complex process, and in order to obtain a high-quality video, images need to be sorted, appropriate audio needs to be selected, and then insertion points of the images need to be determined. This process places relatively high demands on the technical capabilities of the fabricator.
For the ordinary people without video production technology, the images are usually selected manually, and the background audio is played in the form of a slide show, so that the video obtained in this way has poor expression and low quality.
Therefore, the prior art is in need of improvement.
Disclosure of Invention
The invention provides a video generation method and computer equipment, which are used for realizing the continuous playing of similar images, and each image is inserted into a beat point of an audio frequency, so that the expressive force of the video can be improved, and the video with better quality can be obtained.
In a first aspect, an embodiment of the present invention provides a method for generating a video, including:
acquiring an image set to be processed and an audio to be processed, wherein the image set comprises a plurality of images;
determining a target beat point corresponding to each image in the plurality of images based on the image set and the audio;
and inserting the image corresponding to the target beat point into each target beat point in the audio to generate a video.
In a further improvement, the determining, based on the image set and the audio, a target beat point corresponding to each of the images specifically includes:
sequencing the plurality of images based on the similarity between any two images in the image set to obtain an image insertion sequence;
acquiring a plurality of beat points of the audio;
and determining a target beat point corresponding to each image in the image insertion sequence according to the image insertion sequence, the plurality of beat points and the audio, wherein the target beat point is a beat point used for inserting the image in the plurality of beat points of the audio.
In a further improvement, the sorting the images based on the similarity between any two images in the image set to obtain an image insertion sequence specifically includes:
selecting a starting image in the image set, and setting the insertion sequence number of the starting image as a first sequence number;
determining a non-starting image set corresponding to the starting image, wherein the non-starting image set comprises a plurality of non-starting images;
determining a candidate image corresponding to the starting image based on the similarity between each non-starting image and the starting image, and setting the insertion sequence number of the candidate image as the next sequence number of the insertion sequence number of the starting image;
taking the candidate image as a starting image, and continuing to execute the step of determining the non-starting image set corresponding to the starting image until the insertion serial numbers corresponding to all images in the image set are determined;
and determining an image insertion sequence corresponding to the image set according to the insertion sequence numbers corresponding to all the images in the image set.
In a further improvement, the determining the non-starting image set corresponding to the starting image specifically includes:
and for the starting image, selecting all images with undetermined insertion sequence numbers in the image set to obtain a non-starting image set corresponding to the starting image.
In a further improvement, the determining a candidate image corresponding to the starting image based on the similarity between each non-starting image in the non-starting image set and the starting image specifically includes:
respectively calculating the similarity between each non-initial image in the non-initial image set and the initial image to obtain a similarity set;
and selecting the maximum similarity in the similarity set, and taking the image corresponding to the maximum similarity as a candidate image corresponding to the starting image.
In a further improvement, the determining, according to the image insertion sequence, the plurality of beat points, and the audio, a target beat point corresponding to each image in the image insertion sequence specifically includes:
acquiring the audio time corresponding to the audio and the number of the images of the plurality of images;
determining an image insertion point corresponding to each image in the image insertion sequence according to the number of the images and the audio time length, wherein the time length between every two adjacent image insertion points is determined according to the audio time length and the number of the images;
and determining target beat points corresponding to the images according to the image insertion points and the beat points.
In a further improvement, the determining, according to the image insertion points and the beat points, target beat points corresponding to the images respectively specifically includes:
and for each image insertion point, determining a beat point which is closest to the image insertion point in the plurality of beat points, and taking the beat point which is closest to the image insertion point as a target beat point of the image corresponding to the image insertion point.
In a further improvement, the inserting, into each target beat point in the audio, an image corresponding to the target beat point to generate a video specifically includes:
and for each target beat point, inserting an image corresponding to the target beat point when the playing time of the audio reaches the target beat point, and taking the image as an image frame played between the target beat point and the next target beat point.
In a further improvement, before the acquiring a set of images to be processed and determining an image insertion sequence based on a similarity between any two images in the set of images, the method further includes:
acquiring an original image set, wherein the original image set comprises a plurality of original images, and the plurality of original images comprise at least one template image;
determining and extracting a target characteristic graph corresponding to each original image in the original image set;
dividing the original image set into original image subsets of different categories based on all the determined target feature maps;
and taking any original image subset comprising the template image as the image set to be processed.
In a second aspect, the present invention provides a video generating apparatus, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an image set to be processed and an audio to be processed, and the image set comprises a plurality of images;
a target beat point determining unit, configured to determine a target beat point corresponding to each of the images based on the image set and the audio;
and the video generation unit is used for inserting the image corresponding to the target beat point into each target beat point in the audio to generate a video.
In a third aspect, an embodiment of the present invention provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the following steps when executing the computer program:
acquiring an image set to be processed and an audio to be processed, wherein the image set comprises a plurality of images;
determining a target beat point corresponding to each image in the plurality of images based on the image set and the audio;
and inserting the image corresponding to the target beat point into each target beat point in the audio to generate a video.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following steps:
acquiring an image set to be processed and an audio to be processed, wherein the image set comprises a plurality of images;
determining a target beat point corresponding to each image in the plurality of images based on the image set and the audio;
and inserting the image corresponding to the target beat point into each target beat point in the audio to generate a video.
Compared with the prior art, the embodiment of the invention has the following advantages:
the invention provides a video generation method, which comprises the following steps: acquiring an image set to be processed and an audio to be processed, wherein the image set comprises a plurality of images; determining a target beat point corresponding to each image in the plurality of images based on the image set and the audio; and inserting the image corresponding to the target beat point into each target beat point in the audio to generate a video. In the invention, each image is inserted at the beat point of the audio, so that the expressive force of the video can be improved, and the video with better quality can be obtained.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a video generation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a feature extraction network in an embodiment of the invention;
FIG. 3 is a schematic structural diagram of a video generating apparatus according to an embodiment of the present invention;
fig. 4 is an internal structural diagram of a computer device in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The inventor finds that video production is a complex process, and in order to obtain a high-quality video, images need to be sorted, appropriate audio needs to be selected, and then the insertion point of the images needs to be determined. This process places relatively high demands on the technical capabilities of the fabricator. For the ordinary people without video production technology, the images are usually selected manually, and the background audio is played in the form of a slide show, so that the video obtained in this way has poor expression and low quality.
In order to solve the above problem, in an embodiment of the present invention, an image set to be processed and an audio to be processed are obtained, where the image set includes a plurality of images; determining a target beat point corresponding to each image in the plurality of images based on the image set and the audio; and inserting the image corresponding to the target beat point into each target beat point in the audio to generate a video. In the invention, each image is inserted at the beat point of the audio, so that the expressive force of the video can be improved, and the video with better quality can be obtained.
The video generation method provided by the embodiment of the invention can be applied to electronic equipment, and the electronic equipment can comprise: including PCs, televisions, servers, cell phones, tablet computers, palm top computers, Personal Digital Assistants (PDAs), etc.
The invention will be further explained by the description of the embodiments with reference to the drawings.
Referring to fig. 1, the present embodiment provides a method for generating a video, including:
s1, acquiring an image set to be processed and audio to be processed, wherein the image set comprises a plurality of images.
In the embodiment of the present invention, the image set to be processed includes a plurality of images for generating a video. The image set to be processed may be acquired by a terminal that executes the video generation method, or the image set to be processed may be acquired from a third-party device, or a part of images in the image set to be processed comes from the third-party device and is acquired by the terminal.
In the embodiment of the present invention, the audio is used to generate a video, and in order to obtain a better video effect, the audio may be an audio corresponding to music, and the audio is used as background music of the video.
In the embodiment of the present invention, in order to make the styles of the images in the image set to be processed similar and obtain a video with a more harmonious picture, the images in the image set may be limited to belong to the same category. Specifically, images belonging to the same category may be selected from the original image set, and the selected images belonging to the same category may be used as the image set to be processed.
Specifically, before step S1, the method includes:
m1, obtaining an original image set, wherein the original image set comprises a plurality of original images, and the plurality of original images comprise at least one template image.
In the embodiment of the present invention, the original image set includes a plurality of original images, and as such, the original images may be acquired from a network, or the original images may be acquired by a terminal. The plurality of original images include at least one template image. When an image used for generating a video is selected from a plurality of original images, the priority of a template image is higher than that of a non-template image, wherein the non-template image refers to other original images except the template image in the plurality of original images; that is, for one template image and one non-template image, the template image may be selected as the image for generating the video.
The template image may be a user-defined selection, that is, the user specifies the template image from a plurality of original images, and the template image may be the original image that is the favorite of the user and indicates that the user most desires to have the template image appear in the video. The template image may be multiple.
M2, determining the target characteristic graph corresponding to each original image in the original image set.
In the embodiment of the invention, the target characteristic map corresponding to each original image can be determined through a neural network model. The target feature map is a multi-channel image (only one pixel) with the image size of 1 x1, the target feature map can be represented by vectors, the dimensionality of the vectors is the same as the number of channels of the target feature map, and the numerical value of one dimensionality in the vectors is the pixel value of the pixel in the channel corresponding to the dimensionality.
An original image is input into a neural network model, and the output item of the neural network model is a target characteristic diagram corresponding to the original image. The neural network model comprises a feature extraction module and a full connection module; inputting the original image through a feature extraction module to obtain a feature map corresponding to the original image, and inputting the feature map into a full-connection module, wherein the output item of the full-connection module is a target feature map.
In the embodiment of the present invention, the image size of the original image input to the neural network model needs to satisfy the input requirement of the neural network model, and the image sizes of all the original images need to be adjusted to the preset size in advance. For example, the predetermined size is 224 × 224.
In particular implementations, the neural network model may be a VGG16 network model. As shown in fig. 2, the neural network model includes five feature extraction modules and a fully connected module, where the five feature extraction modules include: the device comprises a first feature extraction module, a second feature extraction module, a third feature extraction module, a fourth feature extraction module and a fifth feature extraction module.
The first feature extraction module includes: a first convolutional layer c1, a second convolutional layer c2, and a first pooling layer p 1; the sizes of convolution kernels of c1 and c2 are 3 × 3, the number of convolution kernels of c1 and c2 is 64, and the parameter of p1 is 2 × 2; the input item of the first feature extraction module is an original image, and the first feature extraction module extracts the features of the original image to obtain a first feature map.
The second feature extraction module includes: a third convolutional layer c3, a fourth convolutional layer c4, and a second pooling layer p 2; the sizes of convolution kernels of c3 and c4 are 3 × 3, the number of convolution kernels of c3 and c4 is 128, and the parameter of p2 is 2 × 2; the input item of the second feature extraction module is a first feature map, and the second feature extraction module extracts the features of the first feature map to obtain a second feature map.
The third feature extraction module includes: a fifth convolutional layer c5, a sixth convolutional layer c6, a seventh convolutional layer c7, and a third pooling layer p 3; the sizes of convolution kernels of c5, c6 and c7 are all 3 × 3, the number of convolution kernels of c5, c6 and c7 is 256, and the parameter of p3 is 2 × 2; and the third feature extraction module extracts the features of the second feature map to obtain a third feature map.
The fourth feature extraction module includes: an eighth convolutional layer c8, a ninth convolutional layer c9, a tenth convolutional layer c10, and a fourth pooling layer p 4; the sizes of convolution kernels of c8, c9 and c10 are all 3 × 3, the number of convolution kernels of c8, c9 and c10 is 512, and the parameter of p4 is 2 × 2; and the fourth feature extraction module extracts the features of the third feature map to obtain a fourth feature map.
The fifth feature extraction module includes: an eleventh convolutional layer c11, a twelfth convolutional layer c12, a thirteenth convolutional layer c13, and a fifth pooling layer p 5; the sizes of convolution kernels of c10, c11 and c12 are all 3 × 3, the number of convolution kernels of c11, c12 and c13 is 512, and the parameter of p5 is 2 × 2; and the fifth feature extraction module extracts the features of the fourth feature map to obtain a fifth feature map.
The full-connection module comprises a first full-connection layer fc1, a second full-connection layer fc2 and a third full-connection layer fc3, and the parameters of the first full-connection layer are as follows: 1 x 4096, the parameters of the second fully-connected layer are 1 x 4096, and the parameters of the third fully-connected layer are 1 x 1000. And inputting the fifth feature map into a full-connection module to obtain a target feature map corresponding to the original image. The target feature map may be represented in a vector form, for example, the target feature map may be represented as: { x1, x2, …, xn }.
M3, dividing the original image set into different categories of original image subsets based on all the determined target feature maps.
In the embodiment of the invention, the original image set is divided into a plurality of original image subsets by adopting a classification method, and the corresponding categories of each original image subset are different from each other. For example, the original image set may be divided into different categories of original image subsets by a K-means algorithm. A subset of the original images comprises original images of similar styles.
Specifically, k target feature maps are randomly selected from all the target feature maps as initial centroids, for example, if k is set to be 3, the initial centroids are obtained as follows: u1, u2 and u 3. Marking the target feature images except the initial centroids in all the target feature images as feature images to be classified, and respectively calculating the feature images to be classified and each initial centroid for each feature image to be classified: and the distances among u1, u2 and u3 divide the initial centroid corresponding to the minimum distance and the feature map to be classified into the same class to obtain a plurality of classification sets.
For example, there are 10 target feature maps, including: t1, t2, … and t10, wherein three target feature maps are randomly selected from the 10 target feature maps: t1, t2, and t3, denote t1 as initial centroid u1, t2 as initial centroid u2, and t3 as initial centroid u 3. The characteristic graph to be classified comprises: t4, t5, … and t10, and for each feature map to be classified, calculating the distance between the feature map to be classified and each initial centroid. For example, for t4, the distance between t4 and u1 is calculated to obtain d41, the distance between t4 and u2 is calculated to obtain d42, the distance between t4 and u3 is calculated to obtain d43, and assuming that d43 is the minimum, the original image corresponding to t4 and the original image corresponding to u3 are divided into one type. And performing the above calculation on all the feature maps to be classified to obtain 3 classification sets.
In the embodiment of the invention, for each classification set in a plurality of classification sets, the classification centroid corresponding to the classification set is determined, for each feature map to be classified in the feature map to be classified, the distance between the feature map to be classified and each classification centroid is calculated, and the classification centroid corresponding to the minimum distance and the feature map to be classified are divided into the same class, so as to obtain a plurality of updated classification sets.
The classification centroid corresponding to the classification set can be determined by formula (1).
Figure BDA0002811404160000101
Wherein Cj is a classification set, t is a target feature map belonging to the classification set Cj, and uj is a classification centroid.
In the embodiment of the invention, the following steps are repeatedly executed: and calculating classification execution corresponding to the classification set: and determining a classification centroid corresponding to each of the classification sets until the classification centroid is the same as the classification centroid calculated last time, and taking the classification sets as original image subsets of different classes.
M4, taking any original image subset including the template image as the image set to be processed.
In the embodiment of the invention, the original image subsets including the template images are determined in a plurality of original image subsets, and if the number of the original image subsets including the template images is more than 1, one of the original image subsets including the template images is randomly determined as an image set to be processed. Or the template images have respective corresponding respective likelihoods, which can be set by a user, and the original image subset including the template image with the highest likelihoods is used as the image set to be processed. Or, combining the likeness and the number of original images in the original image subsets, determining a set of images to be processed in a plurality of original image subsets including the template image.
And S2, determining a target beat point corresponding to each image in the images based on the image set and the audio.
In the embodiment of the present invention, the signal corresponding to the audio frequency is analyzed, and a plurality of beat points corresponding to the audio frequency can be determined, where the beat points are moments when a musical instrument plays a certain specific note, such as drum points. The audio comprises a large number of beat points, and the playing time corresponding to any two different beat points is different. The target beat point is substantially a beat point in audio, which is a beat point for inserting an image among several beat points of the audio.
In the embodiment of the invention, the playing sequence of the images is determined firstly, and then the target beat point of the images is determined according to the playing sequence of the images. The playing sequence of the images in the video can be determined based on the similarity between any two images in the plurality of images, and the similar images can be set to be played continuously, so that the situation that the styles of the adjacent played images are too jumpy can not occur during video playing.
Specifically, step S2 includes:
s21, based on the similarity between any two images in the image set, sequencing the images to obtain an image insertion sequence.
In this embodiment of the present invention, the image insertion sequence includes a plurality of images, and insertion sequence numbers respectively corresponding to the images, and the images in the image insertion sequence may be arranged according to the insertion sequence numbers. When the video is generated, a plurality of images are sequentially inserted into the audio according to the sequence of the insertion sequence numbers from small to large.
In this embodiment of the present invention, for two adjacent images in the image insertion sequence, the two images are a first image and a second image, respectively, where if the first image is arranged before the second image, the similarity between the first image and the second image is greater than the similarity between any one of the images arranged after the second image and the first image.
Specifically, step S21 includes:
s211, selecting a starting image in the image set, and setting the insertion sequence number of the starting image as a first sequence number.
In the embodiment of the present invention, the start image may be randomly selected, and the insertion sequence number corresponding to the start image is set as a first sequence number, where the first sequence number may be represented by a number, for example, the first sequence number is represented by a number 1.
S212, determining a non-starting image set corresponding to the starting image, wherein the non-starting image set comprises a plurality of non-starting images.
In the embodiment of the present invention, for the starting image, all images with undetermined insertion sequence numbers are selected from the image set, the images with undetermined insertion sequence numbers are used as non-starting images, and a non-starting image set corresponding to the starting image is obtained based on all non-starting images. That is, the non-starting image set includes several non-starting images, which are images without an insertion sequence number set.
For example, the image set includes images: r1, r2, r3, … and r 8. If it has been determined in the foregoing step that the insertion sequence number of r1 is the first sequence number, then r2, r3, …, and r8 do not determine the insertion sequence number, so r2, r3, …, and r8 are non-initial images, and the set of non-initial images is: { r2, r3, …, r8 }.
S213, determining a candidate image corresponding to the starting image based on the similarity between each non-starting image and the starting image, and setting the insertion sequence number of the candidate image as the next sequence number of the insertion sequence number of the starting image.
In the embodiment of the present invention, "setting the insertion number of the candidate image to be the next number to the insertion number of the start image" means that the candidate image is located next to the start image in the image insertion sequence. For each non-starting image, the similarity between the non-starting image and the starting image is calculated.
Specifically, step S213 includes:
s2131, respectively calculating the similarity between each non-initial image in the non-initial image set and the initial image to obtain a similarity set.
In the embodiment of the invention, for each non-initial image, the target feature map corresponding to the non-initial image and the target feature map of the initial image are obtained, the similarity between each non-initial image and the initial image is calculated, the similarity corresponding to each non-initial image is obtained, and further the similarity set is obtained.
Specifically, the similarity between the non-starting image and the starting image can be calculated by formula (2).
Figure BDA0002811404160000121
Where, the starting image rx is characterized by tx { x1, x2, …, xn }, the non-starting image ry is characterized by ty { y1, y2, … yn }, and SIM (x, y) is the similarity between the starting image rx and the non-starting image ry.
S2132, selecting the maximum similarity from the similarity set, and taking the image corresponding to the maximum similarity as a candidate image corresponding to the starting image.
In the embodiment of the present invention, the similarity is represented by a numerical value, and the smaller the numerical value is, the smaller the similarity is, and the larger the numerical value is, the larger the similarity is. The maximum similarity refers to the similarity with the largest value among all similarities. And taking the image corresponding to the maximum similarity as a candidate image corresponding to the initial image.
For example, r1 is the starting image, r2, r3, … and r8 are non-starting images, and the similarity between r3 and r1 is the maximum similarity among the similarity sets, then r3 is used as the candidate image of r1, and r3 corresponds to the insertion number that is the next number of the insertion number of r 1. Assuming that r1 corresponds to an insertion number of 1, which indicates that the first inserted image is r1, r3 corresponds to an insertion number of 2, which indicates that the second inserted image is r3, wherein the insertion number "2" is the next bit of the insertion number "1". In the image insertion sequence, the image r3 is arranged one bit after the image r 1.
S214, taking the candidate image as a starting image, and continuing to execute the step of determining the non-starting image set corresponding to the starting image until the insertion sequence numbers corresponding to all images in the image set are determined.
In the embodiment of the present invention, after steps S211 to S213, only the insertion numbers of the two images are determined, and the insertion numbers of other images in the image set to be processed need to be determined.
For example, after the steps S211 to S123, it is determined that the insertion number of r1 is 1 and the insertion number of r3 is 2, and it is then necessary to determine the candidate image corresponding to r3, that is, the next image of r3 in playing. Taking r3 as the starting image, the non-starting image corresponding to r3 (starting image) is determined first, and it has been explained in the foregoing that the non-starting image is all the images for which the insertion sequence number is not determined, in this example, since the insertion sequence numbers have been determined by r1 and r3, the non-starting image corresponding to r3 includes: r2, r4, r5, … and r 8. And respectively calculating the similarity between r3 and each non-initial image (r2, r4, r5, … and r8), and determining a candidate image corresponding to r3, wherein if the candidate image corresponding to r3 is r7, the insertion sequence number of r7 is one bit after the insertion sequence number corresponding to r3, and in the image insertion sequence, r7 is arranged one bit after r 3. And step S212 is executed again, and the candidate image corresponding to r7 is determined until the insertion sequence numbers corresponding to all the images in the image set are determined.
S215, determining an image insertion sequence corresponding to the image set according to the insertion sequence numbers corresponding to all the images in the image set.
In the embodiment of the invention, after the insertion sequence numbers corresponding to all the images are determined, all the images are sequenced according to the insertion sequence numbers corresponding to all the images, so as to obtain the image insertion sequence. For one image in the image insertion sequence, the similarity between the image and the image arranged one bit after the image is greater than the similarity between the image and the image arranged two bits after the image.
For example, the image set includes images of: r1, r2, r3, … and r8, through steps S211 to S214, the insertion numbers corresponding to the images in r1, r2, r3, … and r8 are obtained, and the image insertion sequence is obtained as follows: r1, r3, r7, r6, r2, r8, r5 and r 4. Wherein the similarity between r7 and r6 is greater than the similarity between r7 and r2(r8, r5, r 4).
In the embodiment of the present invention, the greater the similarity between two images, the more similar the style between the two images is represented. And playing the images according to the sequence of the image insertion sequence by the image insertion sequence determined by the similarity between any two images in the image set, so that the images with similar styles can be continuously played, and the video pictures are more harmonious.
And S22, acquiring a plurality of beat points of the audio.
In the embodiment of the invention, the signals corresponding to the audio are analyzed, so that a plurality of beat points corresponding to the audio can be determined. The beat point is the moment when the instrument plays a particular note, such as the drum point. The audio comprises a large number of beat points, and the playing moments corresponding to any two different beat points are different.
Specifically, step S22 includes:
and S221, acquiring an initial signal corresponding to the audio.
In this embodiment of the present invention, the initial signal is a time domain signal corresponding to the audio, an abscissa of the initial signal is time, and an ordinate of the initial signal is energy of the audio signal.
In the embodiment of the present invention, in order to obtain more accurate beat points in the subsequent steps and to reduce the amount of data processed in the subsequent steps, some noise exists in the initial signal, and the initial signal may be preprocessed first to eliminate the interference signal in the initial signal and reduce the amount of data.
The process of preprocessing the initial signal comprises the following steps:
determining a plurality of central moments in the initial signal, calculating an accumulated value of all energy in the neighborhood of each central moment, taking the accumulated value as an amplitude value corresponding to the central moment to further obtain a preprocessed initial signal, and replacing the initial signal with the preprocessed initial signal. Wherein any two of the plurality of center moments may intersect.
In the embodiment of the present invention, the neighborhood of the center time may be: from the preset time before the central time to the neighborhood time after the central time. For example, the center time is t0, the neighborhood of t0 is [ t0-tr, t0+ tr ], and the neighborhood time length can be 10 ms.
Specifically, the initial signal is preprocessed by formula (3).
Figure BDA0002811404160000151
In the preprocessed initial signal, Wt is the amplitude corresponding to the time when the central moment is t; tr is the neighborhood duration, and can be set to 10 ms; an is the amplitude corresponding to time n; t is t +10ms, namely, the corresponding amplitude at the time t is determined every 10 ms. For example, the amplitude corresponding to 10ms in the preprocessed initial signal is determined according to the respective amplitudes corresponding to all sampling points in [0,20ms ]; since t is t +10ms, the amplitude corresponding to 20ms in the preprocessed initial signal is determined again, which may be determined according to the respective amplitudes corresponding to all the sampling points in [10ms,30ms ].
Wherein, the amplitude corresponding to the time n can pass through the sampling frequency f0Determining that the sampling point corresponding to the time n is n x f0. May be at the n x f0And the amplitude corresponding to each sampling point is taken as the amplitude corresponding to the moment n. The sampling frequency may be 50 Hz.
S222, low-pass filtering processing is carried out on the initial signal to obtain a first signal corresponding to the audio.
In the present embodiment, in theory, beats in music are generally produced by low frequency instruments, such as percussion instruments (bass drums, tambourines, etc.). Therefore, the low-frequency signal corresponding to the audio is analyzed, and a plurality of beat points are easier to determine. The initial signal is first low-pass filtered to filter out signals greater than a low-frequency threshold, which may be: 200 Hz. The initial signal may be processed by a gaussian low pass filter to obtain the first signal.
And S223, determining a plurality of target amplitudes according to the first signal.
In the embodiment of the present invention, the first signal is divided into a plurality of signal segments, a maximum amplitude value corresponding to each signal segment is obtained, the obtained maximum amplitude value corresponding to each signal segment is obtained, and the step of obtaining the maximum amplitude value corresponding to each signal segment is executed for a preset number of times; and for each signal segment, if the maximum amplitude values of the signal segment obtained by the preset numerical value times are the same, taking the maximum amplitude value of the signal segment as the target amplitude value of the signal segment.
The specific process of acquiring the maximum amplitude value corresponding to each signal segment for each execution is as follows:
the method comprises the steps of determining signal segments through a window with preset time duration, specifically, determining a plurality of signal segments according to the preset time duration, wherein the time duration corresponding to each signal segment is the preset time duration. For example, if the preset time duration corresponding to the window is set to be L, the time duration corresponding to each signal segment is L.
And sliding the window on the first signal according to a preset step length to determine signal segments, and determining the maximum amplitude in each signal segment. The preset step length is a step length in a time dimension, the preset step length can be greater than or equal to the preset duration, and when the preset step length can be greater than or equal to the preset duration, any two signal segments in the plurality of signal segments are not intersected; the preset step length can be smaller than the preset duration, and when the preset step length is smaller than the preset duration, any two adjacent signal sections in the plurality of signal sections are intersected.
In the embodiment of the present invention, the preset value may be set to 20 times. That is to say, for each signal segment, the preset value is required to be repeated for determining the maximum amplitude corresponding to the signal segment, and if the maximum amplitudes of the preset values determined by the preset value are the same, the maximum amplitude is used as the target amplitude corresponding to the signal segment; and if any two different maximum amplitude values exist in the maximum amplitude values of the preset values obtained by the preset value number determination, the signal section has no target amplitude value.
And S224, regarding each target amplitude value in the plurality of target amplitude values, taking the moment corresponding to the target amplitude value as a beat point corresponding to the target amplitude value.
In the embodiment of the present invention, for a target amplitude, the target amplitude is the maximum amplitude in a signal segment of the first signal (the low-frequency signal after the high frequency is filtered); the loudness of the beat points in the low frequency signal is maximal (amplitude is maximal), and thus the beat points can be determined according to the target amplitude. And the beat point is the time when the musical instrument plays a certain specific note, and the time corresponding to the target amplitude is taken as the beat point corresponding to the target amplitude.
S23, determining the target beat point corresponding to each image in the image insertion sequence according to the image insertion sequence, the beat points and the audio.
In the embodiment of the present invention, the image insertion sequence includes a plurality of images arranged in an insertion order, and a target beat point of each image is determined according to a plurality of beat points, the audio, and the image insertion sequence. And for any two adjacent images in the image insertion sequence, the any two adjacent images comprise a first image which is played first and a second image which is played later, when the playing time of the video reaches a target beat point corresponding to the first image, the first image is inserted, and the second image is inserted until the playing time of the video reaches the target beat point corresponding to the second image.
Specifically, step S23 includes:
s231, obtaining audio time corresponding to the audio and the number of the images of the plurality of images; and determining an image insertion point corresponding to each image in the image insertion sequence according to the number of the images and the audio time length, wherein the time length between every two adjacent image insertion points is determined according to the audio time length and the number of the images.
In the embodiment of the present invention, the audio duration is an audio duration, and after the video is generated, the video playing duration is equal to the audio duration. The number of images refers to the number of images in the image insertion sequence, i.e. the number of images in the image set to be processed. The ratio of the audio duration to the number of images may be calculated to obtain an average playing duration for each image. And image insertion points corresponding to each image can be obtained according to the average playing time length and the audio time length. The image insertion point corresponding to the image ranked first in the image insertion sequence may be set as the play start time of the audio.
For example, the audio duration is 20 seconds, the image insertion sequence includes 4 images, and the duration of playing each image is 5 seconds, assuming that the image insertion sequence includes: g1, g2, g3 and g4 can determine that the image insertion point corresponding to g1 is t-0 seconds, the image insertion point corresponding to g2 is t-5 seconds, the image insertion point corresponding to g3 is t-10 seconds, and the image insertion point corresponding to g4 is t-15 seconds.
S232, determining target beat points corresponding to the images according to the image insertion points and the beat points.
In the embodiment of the invention, for each image insertion point, the beat point closest to the image insertion point is determined in the plurality of beat points, and the beat point closest to the image insertion point is taken as the target beat point of the image corresponding to the image insertion point.
In the embodiment of the present invention, the term "closest to" means: the distance between the time corresponding to the target beat point and the time corresponding to the image insertion point is the shortest.
For example, several beat points are: { j1, j2, j3, …, j20}, where the image insertion points are: { c1, c2, …, c5}, where, for cm, if cm is jn, jn is taken as a target beat point of an image corresponding to cm, where 1 ≦ m ≦ 5, and 1 ≦ n ≦ 20; and if cm ≠ jn, determining jn closest to cm, and taking jn as a target beat point of the image corresponding to cm.
For example, if an image insertion point corresponding to g3 is t ═ 10 seconds, and if there is one beat point: when t is 10 seconds, t is 10 seconds and is set as the target beat point of the image corresponding to g 3. When the image insertion point corresponding to g3 is t equal to 10 seconds, and the closest beat point from t equal to 10 seconds among the plurality of beat points is t equal to 11 seconds, t equal to 11 seconds is taken as the target beat point of the image corresponding to g 3.
And S3, inserting the image corresponding to the target beat point into each target beat point in the audio to generate a video.
In the embodiment of the present invention, for each target beat point, when the playing time of the audio reaches the target beat point, an image corresponding to the target beat point is inserted, and the image is used as an image frame played between the target beat point and the next target beat point.
For example, the audio duration is 20 seconds, the image insertion sequence includes 4 images, and the duration of playing each image is 5 seconds, assuming that the image insertion sequence includes: g1, g2, g3, and g4 may determine that the target beat point corresponding to g1 is t0 seconds, the target beat point corresponding to g2 is t 5.5 seconds, the target beat point corresponding to g3 is t 9.5 seconds, and the target beat point corresponding to g4 is t 16 seconds. G1 is inserted when the playing time of the audio is 0 second, and g1 is inserted in each frame of the playing time of 0 second to 5.5 seconds; g2 is inserted when the playing time of the audio is 5.5 seconds, and g2 is inserted in each frame of 5.5 seconds to 9.5 seconds; g3 is inserted when the playing time of the audio is 9.5 seconds, and g3 is inserted in each frame of 9.5 seconds to 16 seconds; g4 is inserted when the playback time of the audio is 16 seconds, and g4 is inserted at each frame of the playback time of 16 seconds to 20 seconds.
In the embodiment of the present invention, the generated video may implement: when the video is played, the same image is continuously played between the two target beat points, and each target beat point is a beat point. For example, in the above example, g1 is continuously played for 0 to 5.5 seconds, g2 is switched at 5.5 seconds, and g2 is continuously played for 5.5 to 9.5 seconds, where 5.5 seconds is a beat point, i.e., an image at which switching of playing is achieved.
Based on the above video generation method, referring to fig. 3, an embodiment of the present invention further provides a video generation apparatus, including:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an image set to be processed and an audio to be processed, and the image set comprises a plurality of images;
a target beat point determining unit, configured to determine a target beat point corresponding to each of the images based on the image set and the audio;
and the video generation unit is used for inserting the image corresponding to the target beat point into each target beat point in the audio to generate a video.
In one embodiment, the invention provides a computer device, which may be a terminal, having an internal structure as shown in fig. 4. The computer device comprises a processor, a memory, a network model interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network model interface of the computer device is used for communicating with an external terminal through network model connection. The computer program is executed by a processor to implement a method of generating a video. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the illustration in fig. 4 is merely a block diagram of a portion of the structure associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
An embodiment of the present invention provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the following steps when executing the computer program:
acquiring an image set to be processed and an audio to be processed, wherein the image set comprises a plurality of images;
determining a target beat point corresponding to each image in the plurality of images based on the image set and the audio;
and inserting the image corresponding to the target beat point into each target beat point in the audio to generate a video.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following steps:
acquiring an image set to be processed and an audio to be processed, wherein the image set comprises a plurality of images;
determining a target beat point corresponding to each image in the plurality of images based on the image set and the audio;
and inserting the image corresponding to the target beat point into each target beat point in the audio to generate a video.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (12)

1. A method for generating a video, comprising:
acquiring an image set to be processed and an audio to be processed, wherein the image set comprises a plurality of images;
determining a target beat point corresponding to each image in the plurality of images based on the image set and the audio;
and inserting the image corresponding to the target beat point into each target beat point in the audio to generate a video.
2. The method for generating a video according to claim 1, wherein the determining, based on the image set and the audio, a target beat point corresponding to each of the images specifically includes:
sequencing the plurality of images based on the similarity between any two images in the image set to obtain an image insertion sequence;
acquiring a plurality of beat points of the audio;
and determining a target beat point corresponding to each image in the image insertion sequence according to the image insertion sequence, the plurality of beat points and the audio, wherein the target beat point is a beat point used for inserting the image in the plurality of beat points of the audio.
3. The method according to claim 2, wherein the sorting the images based on the similarity between any two images in the image set to obtain an image insertion sequence specifically comprises:
selecting a starting image in the image set, and setting the insertion sequence number of the starting image as a first sequence number;
determining a non-starting image set corresponding to the starting image, wherein the non-starting image set comprises a plurality of non-starting images;
determining a candidate image corresponding to the starting image based on the similarity between each non-starting image and the starting image, and setting the insertion sequence number of the candidate image as the next sequence number of the insertion sequence number of the starting image;
taking the candidate image as a starting image, and continuing to execute the step of determining the non-starting image set corresponding to the starting image until the insertion serial numbers corresponding to all images in the image set are determined;
and determining an image insertion sequence corresponding to the image set according to the insertion sequence numbers corresponding to all the images in the image set.
4. The method for generating a video according to claim 3, wherein the determining the non-starting image set corresponding to the starting image specifically includes:
and for the starting image, selecting all images with undetermined insertion sequence numbers in the image set to obtain a non-starting image set corresponding to the starting image.
5. The method according to claim 3, wherein the determining the candidate image corresponding to the starting image based on the similarity between each non-starting image in the non-starting image set and the starting image specifically comprises:
respectively calculating the similarity between each non-initial image in the non-initial image set and the initial image to obtain a similarity set;
and selecting the maximum similarity from the similarity set, and taking the image corresponding to the maximum similarity as a candidate image corresponding to the starting image.
6. The method for generating a video according to claim 2, wherein the determining, according to the image insertion sequence, the plurality of beat points, and the audio, a target beat point corresponding to each image in the image insertion sequence specifically includes:
acquiring the audio time corresponding to the audio and the number of the images of the plurality of images;
determining an image insertion point corresponding to each image in the image insertion sequence according to the number of the images and the audio time length, wherein the time length between every two adjacent image insertion points is determined according to the audio time length and the number of the images;
and determining target beat points corresponding to the images according to the image insertion points and the beat points.
7. The method for generating a video according to claim 6, wherein the determining, according to the image insertion points and the beat points, target beat points corresponding to the images respectively comprises:
and for each image insertion point, determining a beat point which is closest to the image insertion point in the plurality of beat points, and taking the beat point which is closest to the image insertion point as a target beat point of the image corresponding to the image insertion point.
8. The method for generating a video according to claim 1, wherein the inserting, into each target beat point in the audio, an image corresponding to the target beat point to generate a video specifically comprises:
and for each target beat point, inserting an image corresponding to the target beat point when the playing time of the audio reaches the target beat point, and taking the image as an image frame played between the target beat point and the next target beat point.
9. The method according to any one of claims 1 to 8, wherein before acquiring the image set to be processed and determining the image insertion sequence based on the similarity between any two images in the image set, the method further comprises:
acquiring an original image set, wherein the original image set comprises a plurality of original images, and the plurality of original images comprise at least one template image;
determining a target characteristic graph corresponding to each original image in the original image set;
dividing the original image set into original image subsets of different categories based on all the determined target feature maps;
and taking any original image subset comprising the template image as the image set to be processed.
10. A video generation apparatus, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an image set to be processed and an audio to be processed, and the image set comprises a plurality of images;
a target beat point determining unit, configured to determine a target beat point corresponding to each of the images based on the image set and the audio;
and the video generation unit is used for inserting the image corresponding to the target beat point into each target beat point in the audio to generate a video.
11. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps in the method of generating a video according to any one of claims 1 to 9 when executing the computer program.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps in the method for generating a video according to any one of claims 1 to 9.
CN202011387442.8A 2020-12-01 2020-12-01 Video generation method and computer equipment Pending CN114584803A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011387442.8A CN114584803A (en) 2020-12-01 2020-12-01 Video generation method and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011387442.8A CN114584803A (en) 2020-12-01 2020-12-01 Video generation method and computer equipment

Publications (1)

Publication Number Publication Date
CN114584803A true CN114584803A (en) 2022-06-03

Family

ID=81768334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011387442.8A Pending CN114584803A (en) 2020-12-01 2020-12-01 Video generation method and computer equipment

Country Status (1)

Country Link
CN (1) CN114584803A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108259984A (en) * 2017-12-29 2018-07-06 广州市百果园信息技术有限公司 Method of video image processing, computer readable storage medium and terminal
CN110233976A (en) * 2019-06-21 2019-09-13 广州酷狗计算机科技有限公司 The method and device of Video Composition
US20190335229A1 (en) * 2017-04-21 2019-10-31 Tencent Technology (Shenzhen) Company Limited Video data generation method, computer device, and storage medium
CN110545476A (en) * 2019-09-23 2019-12-06 广州酷狗计算机科技有限公司 Video synthesis method and device, computer equipment and storage medium
CN111010611A (en) * 2019-12-03 2020-04-14 北京达佳互联信息技术有限公司 Electronic album obtaining method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190335229A1 (en) * 2017-04-21 2019-10-31 Tencent Technology (Shenzhen) Company Limited Video data generation method, computer device, and storage medium
CN108259984A (en) * 2017-12-29 2018-07-06 广州市百果园信息技术有限公司 Method of video image processing, computer readable storage medium and terminal
CN110233976A (en) * 2019-06-21 2019-09-13 广州酷狗计算机科技有限公司 The method and device of Video Composition
CN110545476A (en) * 2019-09-23 2019-12-06 广州酷狗计算机科技有限公司 Video synthesis method and device, computer equipment and storage medium
CN111010611A (en) * 2019-12-03 2020-04-14 北京达佳互联信息技术有限公司 Electronic album obtaining method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107679465B (en) It is a kind of that data generation and extending method are identified based on the pedestrian for generating network again
He et al. Filter pruning by switching to neighboring CNNs with good attributes
CN106228188A (en) Clustering method, device and electronic equipment
CN109522902B (en) Extraction of space-time feature representations
CN111464834A (en) Video frame processing method and device, computing equipment and storage medium
CN111027419B (en) Method, device, equipment and medium for detecting video irrelevant content
WO2020151491A1 (en) Image deformation control method and device and hardware device
CN111816170B (en) Training of audio classification model and garbage audio recognition method and device
US20210224647A1 (en) Model training apparatus and method
Li et al. Multi-task rank learning for visual saliency estimation
CN112950640A (en) Video portrait segmentation method and device, electronic equipment and storage medium
JP2021503123A (en) Video summary generation methods and devices, electronic devices and computer storage media
Liu et al. Multitask feature selection by graph-clustered feature sharing
CN113241092A (en) Sound source separation method based on double-attention mechanism and multi-stage hybrid convolution network
CN110413840B (en) Neural network for constructing video determination label and training method thereof
CN114548218A (en) Image matching method, device, storage medium and electronic device
CN108304915B (en) Deep learning neural network decomposition and synthesis method and system
CN114584803A (en) Video generation method and computer equipment
KR20070047795A (en) Method for generating a vector codebook, method and device for compressing data, and distributed speech recognition system
CN113591472A (en) Lyric generation method, lyric generation model training method and device and electronic equipment
CN110767201B (en) Music score generation method, storage medium and terminal equipment
CN113792167B (en) Cross-media cross-retrieval method based on attention mechanism and modal dependence
CN113111273B (en) Information recommendation method and device, electronic equipment and storage medium
JP2000048181A (en) Image comparing device and method therefor, and recording medium
CN115439883A (en) Gesture recognition model training method, gesture recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination