CN115471778A

CN115471778A - Transition video generation method and device, electronic equipment and storage medium

Info

Publication number: CN115471778A
Application number: CN202211237472.XA
Authority: CN
Inventors: 梁长辉
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2022-10-10
Filing date: 2022-10-10
Publication date: 2022-12-13

Abstract

The application discloses a transition video generation method, and belongs to the technical field of image data processing. The method comprises the following steps: acquiring a first target frame of a first video and a second target frame of a second video; respectively extracting a first feature vector of a first target frame and a second feature vector of a second target frame; fitting a plurality of transition images according to the first characteristic vector and the second characteristic vector; and generating transition video according to the transition image.

Description

Transition video generation method and device, electronic equipment and storage medium

Technical Field

The application belongs to the technical field of image data processing, and particularly relates to a transition video generation method and device, electronic equipment and a storage medium.

Background

Video transitions refer to the intermediate process of switching from one video to another. Usually, a hard transition mode is adopted, that is, the video is transited from one video to another video without inserting any other materials, or special effect means such as fade-in and fade-out modes are adopted; or, by adopting the soft transition idea, other materials, such as some process animations, are inserted between two video transitions.

At present, soft transition needs to realize transition by adding additional video materials, namely, suitable transition videos are recommended from corresponding video material libraries to users. The transition videos are supplemented in the video material library, and due to the increase of the number of the transition videos, the relevant algorithms recommended by the user need to be retrained, so that the time and the labor are consumed.

Disclosure of Invention

The embodiment of the application aims to provide a transition video generation method, a transition video generation device, an electronic device and a storage medium, which can solve the problem that the supplement of transition videos in a video material library is complicated.

In a first aspect, an embodiment of the present application provides a transition video generation method, where the method includes: acquiring a first target frame of a first video and a second target frame of a second video; respectively extracting a first feature vector of the first target frame and a second feature vector of the second target frame; fitting a plurality of transition images according to the first characteristic vector and the second characteristic vector; and generating a transition video according to the transition image.

In a second aspect, an embodiment of the present application provides an apparatus for transition video generation, where the apparatus includes: the video frame acquisition module is used for acquiring a first target frame of a first video and a second target frame of a second video; a feature vector extraction module, configured to extract a first feature vector of the first target frame and a second feature vector of the second target frame, respectively; the transition image generation module is used for fitting a plurality of transition images according to the first characteristic vector and the second characteristic vector; and the transition video generation module is used for generating a transition video according to the transition image.

In a third aspect, embodiments of the present application provide an electronic device, which includes a processor and a memory, where the memory stores a program or instructions executable on the processor, and the program or instructions, when executed by the processor, implement the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product, stored on a storage medium, for execution by at least one processor to implement the method according to the first aspect.

In the embodiment of the present application, first, a first target frame of a first video and a second target frame of a second video are acquired. Extracting the characteristic vectors of the two image frames, and fitting a plurality of visually vivid transition images by taking the two characteristic vectors as references respectively, wherein the transition images can be combined to generate corresponding transition videos so as to connect the two videos. Compared with the method that corresponding transition videos are recommended from the transition material library, the generated images are more fit to the two videos, and the storage space of the transition materials is reduced. Under the condition of improving the user experience, the time and the cost consumed by retraining after the transition materials are supplemented can be effectively reduced.

Drawings

FIG. 1 is a block flow diagram of transition video generation in an embodiment of the present application;

FIG. 2 is a block flow diagram of a transition video generation method in an embodiment of the present application;

fig. 3 is a block flow diagram of a transition video generation apparatus in an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the present application;

fig. 5 is a schematic hardware structure diagram of an electronic device in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below clearly with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

A transition video generation method provided in the embodiments of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

The embodiment of the application provides a transition video generation method, which can be implemented by a terminal device, wherein the terminal device can be any electronic device with an image processing function, for example: computers, cell phones, tablets, etc. In this embodiment, an execution subject is taken as an example of a terminal device, and a detailed description of a scheme is similar to the above, and the detailed description is omitted here. Specifically, as shown in fig. 1, two videos are input into a terminal device, a frame in a first video and a frame in a second video are converted into two feature vectors through a feature extraction network, the two feature vectors are generated into a plurality of images through a feature fitting network and a generation countermeasure network, and the plurality of images can be combined into a corresponding transition video, so that maintenance time and cost caused by retraining after transition materials are supplemented are reduced. The feature extraction Network may be a Network mainly including a probabilistic Neural Network (i.e., a Convolutional Neural Network), may be Jian Lvewei V-Net, and the generation countermeasure Network may be a Network mainly including a generic adaptive Network (i.e., a generation countermeasure Network), and may be abbreviated as G-Net, and the combination of the feature extraction Network and the generation countermeasure Network may be a Video addition Transfer Network (i.e., a Video adaptive conversion Network), and may be abbreviated as VAT-Net.

The process flow shown in fig. 2 will be described in detail below with reference to the specific embodiments, and the contents may be as follows:

step 201, a first target frame of a first video and a second target frame of a second video are obtained.

The terminal device needs to configure a corresponding application program, and the application program has a conventional video playing function and also has the image processing function. The application program may be an APP, a video browser, etc., and is not limited herein.

In implementation, one video is about to end playing, i.e. the playing progress bar of the first video is about to reach the end point, the user may specify to play another video or the system may automatically jump to the next video, i.e. the first video is about to jump to the second video. Wherein one video is the first video and the other or next video is the second video. Accordingly, an image at the end of one video and an image of a corresponding playing node of the user in the action state of switching the video can both be used as a first target frame of the first video, and one of the images can be determined as the first target frame according to the preference of the user or the system setting. The image at the beginning of another video, and the image of a certain node where the user has browsed the video, either of the two images can be used as the second target frame of the second video, and can also be determined according to the preference of the user or the system setting.

In an alternative embodiment, in the case of jumping to the next video from the automatic playing, the natural transition from the previous video to the next video can be realized. The first video and the second video may include the following relationships: the second video is a video that is played sequentially after the first video. The first target frame is a last frame of the first video, and the second target frame is a first frame of the second video.

In implementation, the last frame of the last played video may be set as the first target frame and the first frame of the next played video may be set as the second target frame by a user or by default. In the case where the user does not switch to a different video, a jump to the next video may be made by default. Accordingly, the previous video and the next video are usually the same episode or the same type, and therefore, the relevance of the color, the content and the like of the image of the first target frame of the previous video and the image of the second target frame of the next video is strong, that is, the transition from the last frame of the first video to the first frame of the second video is natural, and the experience of the user is improved.

In an alternative embodiment, when the user actively switches to the next video, the transition video of the transition can be more natural, and before step 201, the method may include: and receiving a video switching instruction output by a user. Step 201 may specifically include the following: receiving a video switching instruction, responding to the video switching instruction, determining a current frame of a first video being played and taking the current frame as a first target frame, determining a first frame of a second video to be jumped and taking the first frame as a second target frame.

In implementation, when a user is playing a video, the user clicks a link corresponding to another video, that is, the user outputs a video switching instruction. The terminal device receives a video switching instruction output by a user, wherein the video switching instruction can comprise an end request of the first video and a start request of the second video. In response to the video switching instruction, based on an end request of the first video output by the user, a current frame of the first video being played is determined and the current frame is taken as a first target frame. And determining a first frame of the second video to be jumped based on a start request of the second video output by the user, and taking the first frame as a second target frame. If the user has accessed the corresponding second video, the image corresponding to a certain node in the second video played by the user last can be used as the second target frame.

In other words, when the user switches from the first video to the second video, a certain frame of the first half of the first video may be being played, and the frame is transited to the first frame of the second video, compared with the case where the last frame of the first video that has not been played is transited to the first frame of the second video, the situation that the appearance of the transition video generated subsequently is abrupt is effectively reduced, and the experience of the user is further improved.

Step 202, respectively extracting a first feature vector of the first target frame and a second feature vector of the second target frame.

The method includes the steps that a trained convolutional neural network can be preset, and the convolutional neural network can convert an image into a feature vector capable of representing the image.

In implementation, a feature vector of a first target frame of a first video is extracted, and a second feature vector of a second target frame of a second video is extracted. The feature vector of the first target frame is a first feature vector, and the feature vector of the second target frame is a second feature vector.

In an alternative embodiment, different types of images extract corresponding feature vectors, and the dimension between the feature vectors is generally arbitrary, and may be 128-dimensional, 512-dimensional, and so on. And images with different dimensions are displayed in sequence, and the displayed images are sometimes clear and sometimes fuzzy, so that the user experience is influenced. In order to maintain the same definition of the displayed image, step 202 specifically includes the following: and respectively mapping the first target frame and the second target frame to an embedded layer with preset dimensionality through a preset feature extraction network to obtain a first feature vector corresponding to the first target frame and a second feature vector corresponding to the second target frame.

The terminal device is preset with a feature extraction Network, specifically, for example, a Network mainly including a Convolutional Neural Network (Convolutional Neural Network), the Network may be connected to an embedding layer with a specific dimension, and when two different images are input into the embedding layer, two feature vectors with the same dimension may be obtained. Wherein the specific dimension may be set manually.

In implementation, a first target frame is input into a preset feature extraction network, the first target frame is mapped to an embedding layer with preset dimensionality, and a corresponding first feature vector is output. And inputting the second target frame into a preset feature extraction network, mapping the second target frame to an embedded layer with preset dimensionality, and outputting a corresponding second feature vector. The dimensionality of the output first characteristic vector is the same as that of the output second characteristic vector, and therefore the same definition of an image displayed to a user is achieved.

And step 203, fitting a plurality of transition images according to the first characteristic vector and the second characteristic vector.

The terminal equipment is preset with a characteristic fitting network and a generation countermeasure network. The feature fitting network may be based on two different feature vectors, generating a plurality of target feature vectors transitioning from one feature vector to another. A network is generated, which is based on a generic adaptive network, and which can restore the target feature vectors to a more realistic transition image.

In implementation, after the first feature vector and the second feature vector are obtained through the feature extraction network, a corresponding number of feature vectors may be selected between the first feature vector and the second feature vector, and a feature vector set composed of the feature vectors may be obtained. And inputting each feature vector in the feature vector set into the generation countermeasure network, so that a plurality of transition images can be fitted. In other words, the transition image fitted by the generated confrontation network is more vivid and different from the first target frame and the second target frame, thereby providing better viewing experience for users.

In an alternative embodiment, in order to achieve visually transition from the first target frame to the second target frame, step 203 specifically includes the following steps: fitting a plurality of target feature vectors which are transited from the first feature vector to the second feature vector based on a preset fitting step; and generating a transition image corresponding to the target characteristic vector based on a preset generator for generating the countermeasure network.

The terminal device is preset with a fitting step, and the fitting step may be a quotient of the vector difference and the number of images. Specifically, the number of images may be a fixed number set manually, and the fitting stride is obtained according to a vector difference between the first feature vector and the second feature vector and the number of images. Specifically, as shown in fig. 1, α is used to represent a fitting step of the first feature vector transitioning to the second feature vector, and the first feature vector, the second feature vector, and the fitting step are simultaneously input into the generated countermeasure network, so that multiple transition images can be obtained.

In implementation, the terminal device fits a plurality of target feature vectors transitioning from the first feature vector to the second feature vector based on a preset fitting step, and the plurality of target feature vectors are combined into a corresponding feature vector set. And inputting the feature vector set into a preset generator for generating a confrontation network, and generating a transition image corresponding to the target feature vector. In other words, the corresponding feature vector is selected by setting the fitting stride, so that the first target frame can be smoothly changed into the second target frame, and the situation that a certain image is more abrupt in the watching process of the user is reduced.

Further, in order to improve the consistency of the transition process from the first target frame to the second target frame, the fitting step may be set according to the transition interval duration between the first video and the second video.

In an implementation, the number of images may also be set according to a transition interval duration between the first video and the second video, and the shorter the transition interval duration, the smaller the number of images. For example: when the duration of a transition interval between the first video and the second video is 5s, the number of images can be 20; when the transition interval duration between the first video and the second video is 2min, the number of images may be 480. When the transition interval between the first video and the second video is long, more image transitions can be achieved, so that the transition video pictures are rich, and the experience of a user is further improved.

In an alternative embodiment, in order to implement the fitting of the image after converting the image into the feature vector into the corresponding transition image, steps 203 and 203 specifically include the following: inputting the first target frame and the second target frame into a trained transition image generation model, respectively extracting a first feature vector of the first target frame and a second feature vector of the second target frame through the transition image generation model, and fitting a plurality of transition images according to the first feature vector and the second feature vector.

The terminal device is preset with a transition image generation model, which can be integrated by a feature extraction network, a feature fitting network and a generation countermeasure network.

In implementation, the first target frame and the second target frame are input into the trained transition image generation model, and a first feature vector representing the first target frame and a second feature vector representing the second target frame can be extracted. Through the first feature vector and the second feature vector, a plurality of transition images of the transition from the first target frame to the second target frame can be directly fitted. In other words, the process of extracting the features from the images, fitting the features, and outputting the corresponding transition images by the feature fitting is continuous, so that the transition image generation model can be integrated to realize three types of network integrated training, the accuracy of connection between networks is improved, and the quality of the output transition images is improved.

And step 204, generating a transition video according to the transition image.

In implementation, transition video transitioning from a first target frame to a second target frame is generated based on the transition image.

In addition, when the execution subject is a server, the server may receive a first target frame of the first video and a second target frame of the second video transmitted by the user. And respectively extracting a first characteristic vector of the first target frame and a second characteristic vector of the second target frame, fitting a corresponding transition image, and generating a transition video according to the transition image. The server can return the generated transition video to the user, and can also splice the transition video between the first video and the second video and then pack the video and return the video to the user.

In an alternative embodiment, the transition image generation may be implemented by a trained integral transition image generation model, and the transition image generation model may include a feature extraction network, a feature fitting network, and a generation countermeasure network, which are connected in sequence. The feature extraction network is used for extracting feature vectors of the input image, the feature fitting network is used for extracting feature vectors which are transited from one special case vector to another feature vector, and the feature vectors which are used for being input by the countermeasure network are generated and restored into a more vivid transition image. Before step 101, the whole transition image generation model needs to be trained to improve the quality of the transition image, and the corresponding transition image generation model training process includes: acquiring a plurality of training samples, wherein the training samples comprise a first sample image and a second sample image, and the first sample image and the second sample image are provided with class labels; training a rotating field image generation model by using a plurality of training samples; when the total loss of the transition image generation model is reduced to a preset threshold value, finishing training; wherein the total loss of the transition image generation model comprises the loss of the feature extraction network and the weighted sum value of the loss of the generation countermeasure network.

Each sample image carries a category label, the category labels are used for determining the similarity degree of the sample images, and corresponding measurement loss is measured by combining the corresponding extracted feature vectors.

In implementation, a plurality of training samples are input into a feature extraction network, the feature extraction network outputs a plurality of corresponding sets of feature vectors, and each set corresponds to two feature vectors. And determining the corresponding metric loss of each group in combination with the corresponding class label. And inputting the corresponding characteristic vectors into a characteristic fitting network, fitting a characteristic vector set by each group of characteristic vectors, inputting the characteristic vector set into a generation countermeasure network, and determining corresponding generation loss. And (3) setting corresponding weight according to the user, and combining the measurement loss of the feature extraction network and the generation loss of the generation countermeasure network to obtain corresponding total loss. And training the transition image generation model by using a plurality of training samples, and finishing the training when the total loss of the transition image generation model is reduced to a preset threshold value. The preset threshold may be set manually or may be a trend constant value, and is not limited herein. The total loss specific expression of the transition image generation model can be as follows: l = γ M + β N, where L is the total loss, M is the metric loss, N is the generation loss, γ is the weight corresponding to the metric loss, and β is the weight corresponding to the metric loss. The training process for generating a model from a rotating field image by using the total loss is the prior art and is not specifically described here. By training the transition image generation model, the quality of the output transition image can be improved.

According to the transition video generation method provided by the embodiment of the application, the execution main body can be a transition video generation device. In the embodiment of the present application, an apparatus for generating transition video executes a method for generating transition video as an example, and an apparatus for generating transition video provided in the embodiment of the present application is described.

An apparatus for transition video generation, as shown in fig. 3, the apparatus comprising:

a target frame acquiring module 301, configured to acquire a first target frame of a first video and a second target frame of a second video; a feature vector extraction module 302, configured to extract a first feature vector of a first target frame and a second feature vector of a second target frame, respectively; a transition image generation module 303, configured to fit a plurality of transition images according to the first feature vector and the second feature vector; and a transition video generating module 304, configured to generate a transition video according to the transition image.

Optionally, the instruction receiving module is configured to receive a video switching instruction; and the target frame acquisition module is also used for responding to a video switching instruction, determining a current frame of the first video being played and taking the current frame as a first target frame, determining a head frame of the second video to be jumped and taking the head frame as a second target frame.

Optionally, the feature vector extraction module 302 is further configured to map the first target frame and the second target frame to an embedded layer with a preset dimension through a preset feature extraction network, so as to obtain a first feature vector corresponding to the first target frame and a second feature vector corresponding to the second target frame.

Optionally, the transition image generating module 303 is further configured to fit a plurality of target feature vectors that transition from the first feature vector to the second feature vector based on a preset fitting step; and generating a transition image corresponding to the target characteristic vector based on a preset generator for generating the countermeasure network.

Optionally, the feature vector extraction module 302 is further configured to input the first target frame and the second target frame into a trained transition image generation model, and extract a first feature vector of the first target frame and a second feature vector of the second target frame through the transition image generation model respectively; and the transition image generation module is also used for fitting a plurality of transition images according to the first characteristic vector and the second characteristic vector.

Optionally, the model obtaining module is configured to train the transition image generation model to obtain a trained transition image generation model, where the transition image generation model includes a feature extraction network, a feature fitting network, and a generation countermeasure network, which are connected in sequence; the model obtaining module is specifically used for obtaining a plurality of training samples, each training sample comprises a first sample image and a second sample image, and the first sample image and the second sample image are provided with category labels; training a rotating field image generation model by using a plurality of training samples; when the total loss of the transition image generation model is reduced to a preset threshold value, finishing training; wherein the total loss of the transition image generation model comprises the loss of the feature extraction network and the weighted sum value of the loss of the generation countermeasure network.

In the embodiment of the present application, first, a first target frame of a first video and a second target frame of a second video are acquired. Extracting the characteristic vectors of the two image frames, and fitting a plurality of visually vivid transition images by taking the two characteristic vectors as references respectively, wherein the transition images can be combined to generate corresponding transition videos so as to connect the two videos. Compared with the method that corresponding transition videos are recommended from the transition material library, the generated images are more fit to the two videos, and the storage space of the transition materials is reduced. Under the condition of improving the user experience, the maintenance time and cost caused by retraining after the transition materials are supplemented can be effectively reduced.

The transition video generation apparatus in the embodiment of the present application may be an electronic device, or may be a component in an electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be a device other than a terminal. The electronic Device may be, for example, a Mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic Device, a Mobile Internet Device (MID), an Augmented Reality (AR)/Virtual Reality (VR) Device, a robot, a wearable Device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and may also be a server, a Network Attached Storage (Network Attached Storage, NAS), a personal computer (NAS), a Television (TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not limited in particular.

An apparatus for transition video generation in an embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.

The apparatus for generating transition video according to the embodiment of the present application can implement each process implemented by the method embodiment of fig. 2, and is not described herein again to avoid repetition.

Optionally, as shown in fig. 4, an electronic device 400 is further provided in an embodiment of the present application, and includes a processor 401 and a memory 402, where the memory 402 stores a program or an instruction that can be executed on the processor 401, and when the program or the instruction is executed by the processor 401, the steps of the transition video generation method embodiment described above are implemented, and the same technical effects can be achieved, and are not described again here to avoid repetition.

It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 5 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

The electronic device 1000 includes, but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and a processor 1010.

Those skilled in the art will appreciate that the electronic device 1000 may further comprise a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 1010 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 5 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.

The input unit 1004 is configured to obtain a first target frame of a first video and a second target frame of a second video.

A processor 1010, configured to extract a first feature vector of the first target frame and a second feature vector of the second target frame, respectively; fitting a plurality of transition images according to the first characteristic vector and the second characteristic vector; and generating transition video according to the transition image.

Optionally, the input unit 1004 is further configured to receive a video switching instruction; in response to a video switching instruction, determining a current frame of a first video being played as a first target frame, determining a head frame of a second video to be skipped and taking the head frame as a second target frame.

Optionally, the processor 1010 is further configured to map the first target frame and the second target frame to the embedding layer with the preset dimensionality through a preset feature extraction network, so as to obtain a first feature vector corresponding to the first target frame and a second feature vector corresponding to the second target frame.

Optionally, the processor 1010 is further configured to fit a plurality of target feature vectors transitioning from the first feature vector to the second feature vector based on a preset fitting step; and generating a transition image corresponding to the target characteristic vector based on a preset generator for generating the countermeasure network.

Optionally, the processor 1010 is further configured to input the first target frame and the second target frame to a trained transition image generation model, extract a first feature vector of the first target frame and a second feature vector of the second target frame through the transition image generation model, and fit a plurality of transition images according to the first feature vector and the second feature vector.

Optionally, the processor 1010 is further configured to train the transition image generation model to obtain a trained transition image generation model; the method comprises the steps that a plurality of training samples are obtained, wherein the training samples comprise a first sample image and a second sample image, and the first sample image and the second sample image are provided with class labels; training a rotating field image generation model by using a plurality of training samples; when the total loss of the transition image generation model is reduced to a preset threshold value, finishing training; wherein the total loss of the transition image generation model comprises the loss of the feature extraction network and a weighted sum value for generating the loss of the countermeasure network.

It should be understood that in the embodiment of the present application, the input Unit 1004 may include a Graphics Processing Unit (GPU) 10041 and a microphone 10042, and the Graphics Processing Unit 10041 processes image data of still pictures or videos obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes at least one of a touch panel 10071 and other input devices 10072. The touch panel 10071 is also referred to as a touch screen. The touch panel 10071 may include two parts, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.

The memory 1009 may be used to store software programs as well as various data. The memory 1009 may mainly include a first storage area for storing a program or an instruction and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or an instruction (such as a sound playing function, an image playing function, and the like) required for at least one function, and the like. Further, the memory 1009 may include volatile memory or nonvolatile memory, or the memory 1009 may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. The volatile Memory may be a Random Access Memory (RAM), a Static Random Access Memory (Static RAM, SRAM), a Dynamic Random Access Memory (Dynamic RAM, DRAM), a Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate SDRAM, ddr SDRAM), an Enhanced Synchronous SDRAM (ESDRAM), a Synchronous Link DRAM (SLDRAM), and a Direct Memory bus RAM (DRRAM). The memory 1009 in the embodiments of the present application includes, but is not limited to, these and any other suitable types of memory.

Processor 1010 may include one or more processing units; optionally, the processor 1010 integrates an application processor, which primarily handles operations related to the operating system, user interface, and applications, and a modem processor, which primarily handles wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into processor 1010.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the transition video generation method in the embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a computer read only memory ROM, a random access memory RAM, a magnetic or optical disk, and the like.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the transition video generation method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the description is omitted here.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

Embodiments of the present application provide a computer program product, where the program product is stored in a storage medium, and the program product is executed by at least one processor to implement each process of the transition video generation method embodiment, where the same technical effect can be achieved, and details are not repeated here to avoid repetition.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A transition video generation method, the method comprising:

acquiring a first target frame of a first video and a second target frame of a second video;

respectively extracting a first feature vector of the first target frame and a second feature vector of the second target frame;

fitting a plurality of transition images according to the first characteristic vector and the second characteristic vector;

and generating a transition video according to the transition image.

2. The transition video generation method according to claim 1, wherein the second video is a video that is positioned in the playback order after the first video; the first target frame is a last frame of the first video, and the second target frame is a first frame of the second video.

3. The transition video generation method according to claim 1, further comprising, before said acquiring the first target frame of the first video and the second target frame of the second video:

receiving a video switching instruction;

the acquiring a first target frame of a first video and a second target frame of a second video includes:

and responding to the video switching instruction, determining a current frame of the first video being played and taking the current frame as a first target frame, determining a head frame of a second video to be jumped and taking the head frame as a second target frame.

4. The transition video generation method according to claim 1, wherein said separately extracting a first feature vector of the first target frame and a second feature vector of the second target frame comprises:

and respectively mapping the first target frame and the second target frame to an embedded layer with preset dimensionality through a preset feature extraction network to obtain a first feature vector corresponding to the first target frame and a second feature vector corresponding to the second target frame.

5. The transition video generation method of claim 1, wherein fitting a plurality of transition images according to the first feature vector and the second feature vector comprises:

fitting a plurality of target feature vectors transitioning from the first feature vector to the second feature vector based on a preset fitting stride;

and generating a transition image corresponding to the target feature vector based on a preset generator for generating the countermeasure network.

6. The transition video generation method of claim 5, wherein the fitting step is set according to a transition interval duration between the first video and the second video.

7. The transition video generation method according to claim 1, wherein said extracting a first feature vector of the first target frame and a second feature vector of the second target frame, respectively, and fitting a plurality of transition images according to the first feature vector and the second feature vector comprises:

inputting the first target frame and the second target frame into a trained transition image generation model, respectively extracting a first feature vector of the first target frame and a second feature vector of the second target frame through the transition image generation model, and fitting a plurality of transition images according to the first feature vector and the second feature vector.

8. The transition video generating method according to claim 7, further comprising, before said obtaining the first target frame of the first video and the second target frame of the second video:

training the transition image generation model to obtain a trained transition image generation model;

the transition image generation model comprises a feature extraction network, a feature fitting network and a generation countermeasure network which are connected in sequence;

the training of the transition image generation model to obtain the trained transition image generation model comprises the following steps:

obtaining a plurality of training samples, wherein the training samples comprise a first sample image and a second sample image, and the first sample image and the second sample image are provided with class labels;

training the transition image generation model by using a plurality of training samples;

when the total loss of the transition image generation model is reduced to a preset threshold value, finishing training; wherein the total loss of the transition image generation model includes a weighted sum of the loss of the feature extraction network and the loss of the generation countermeasure network.

9. An apparatus for transition video generation, the apparatus comprising:

the target frame acquisition module is used for acquiring a first target frame of a first video and a second target frame of a second video;

a feature vector extraction module, configured to extract a first feature vector of the first target frame and a second feature vector of the second target frame, respectively;

the transition image generation module is used for fitting a plurality of transition images according to the first characteristic vector and the second characteristic vector;

and the transition video generation module is used for generating a transition video according to the transition image.

10. An electronic device comprising a processor and a memory, the memory storing a program or instructions executable on the processor, the program or instructions when executed by the processor implementing the steps of the transition video generation method according to any one of claims 1-8.