CN113473182B

CN113473182B - Video generation method and device, computer equipment and storage medium

Info

Publication number: CN113473182B
Application number: CN202111036069.6A
Authority: CN
Inventors: 林琴; 洪志鹰; 张浩鑫; 熊江丰; 姚丹; 张丹燕; 康又文; 杨秀金
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-09-06
Filing date: 2021-09-06
Publication date: 2021-12-07
Anticipated expiration: 2041-09-06
Also published as: CN113473182A

Abstract

The application discloses a method and a device for video generation, computer equipment and a storage medium, which are used in the field of video processing. The video generation comprises the following steps: acquiring a material to be processed and target parameters; acquiring label information based on the material to be processed; determining a parameter adjustment strategy based on multimedia data in the material to be processed and industry label information in the label information; and processing the material to be processed based on the parameter adjustment strategy and the target parameter to generate a target video. According to the method, the to-be-processed material is processed through the video duration adjusting strategy and the target parameter, the video duration of the target video can meet the requirement of the target parameter because the video duration adjusting strategy is used for adjusting the video duration, and secondly, the label information also comprises industry label information indicating the industry to which the to-be-processed material belongs, and the corresponding adjustment through the parameter adjusting strategy also meets the industry requirement, so that the flexibility and the accuracy of video generation are improved.

Description

Video generation method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of video processing, and in particular, to a method and an apparatus for video generation, a computer device, and a storage medium.

Background

With the development of internet services, video has become an important development trend. In the process of making videos, the making cost of new materials of the videos from shooting to clipping is high, and the obtained videos are too similar due to the fact that the materials are repeated when existing materials are directly multiplexed. At present, the existing materials can be secondarily created, and a new section of video can be generated by utilizing picture materials, file materials and video materials, fusing, combining and transforming the picture materials, the file materials, the video materials, related music and templates according to the editing experience of an editing personnel, increasing special effects and transition, and dubbing music again. However, different media industries have different industry requirements, so how to generate videos more flexibly and accurately becomes a problem to be solved.

Disclosure of Invention

The embodiment of the application provides a method and a device for video generation, computer equipment and a storage medium, wherein label information is obtained based on a material to be processed, a parameter adjustment strategy is determined based on multimedia data in the material to be processed and industry label information in the label information, the material to be processed is processed through a video time length adjustment strategy and a target parameter, the video time length of the obtained target video can meet the requirement of the target parameter due to the fact that the video time length adjustment strategy is used for adjusting the video time length, and secondly, the corresponding adjustment performed through the parameter adjustment strategy also meets the industry requirement due to the fact that the label information further comprises industry label information indicating the industry to which the material to be processed belongs, and therefore the flexibility and the accuracy of video generation are improved.

In view of the above, a first aspect of the present application provides a method for video generation, including:

acquiring a material to be processed and target parameters, wherein the material to be processed comprises multimedia data, and the target parameters comprise target video duration of a video to be synthesized;

acquiring label information based on the material to be processed, wherein the label information comprises industry label information indicating the industry to which the material to be processed belongs;

determining a parameter adjustment strategy based on multimedia data in a material to be processed and industry label information in the label information, wherein the parameter adjustment strategy comprises a video duration adjustment strategy for performing video duration processing on the material to be processed;

and processing the material to be processed based on the parameter adjustment strategy and the target parameter to generate a target video, wherein the video time length of the target video is equal to the target video time length of the video to be synthesized.

A second aspect of the present application provides a video generating apparatus, including:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a material to be processed and a target parameter, the material to be processed comprises multimedia data, and the target parameter comprises a target video duration of a video to be synthesized;

the acquisition module is also used for acquiring label information based on the material to be processed, wherein the label information comprises industry label information indicating the industry to which the material to be processed belongs;

the system comprises a determining module, a parameter adjusting module and a processing module, wherein the determining module is used for determining a parameter adjusting strategy based on multimedia data in a material to be processed and industry label information in label information, and the parameter adjusting strategy comprises a video time length adjusting strategy used for carrying out video time length processing on the material to be processed;

and the processing module is used for processing the material to be processed based on the parameter adjustment strategy and the target parameter so as to generate a target video, wherein the video time length of the target video is equal to the target video time length of the video to be synthesized.

In one possible embodiment, the target parameters further include a target video size of the video to be synthesized;

the label information also comprises scene label information indicating a scene to which the material to be processed belongs;

the parameter adjusting strategy also comprises a video size adjusting strategy for adjusting the video size;

the video size of the target video is equal to the target video size of the video to be synthesized;

the determining module is specifically used for determining a video duration adjusting strategy based on the material to be processed and the industry label information;

determining a video size adjustment strategy based on multimedia data, industry label information and scene label information in a material to be processed;

and the processing module is specifically used for processing the material to be processed based on the video duration adjustment strategy, the video size adjustment strategy, the target video duration of the video to be synthesized and the target video size of the video to be synthesized so as to generate the target video.

In one possible embodiment, the multimedia data is video data;

the acquisition module is specifically used for performing characterization processing on the multimedia data to obtain material characteristics, wherein the material characteristics comprise video frame characteristics and voice sequence characteristics, or the material characteristics comprise video frame characteristics, voice sequence characteristics and text characteristics;

if the material characteristics comprise video frame characteristics and voice sequence characteristics, performing aggregation processing on the video frame characteristics and the voice sequence characteristics to obtain first global characteristics, and obtaining label information of the material to be processed based on the first global characteristics;

and if the material characteristics comprise video frame characteristics, voice sequence characteristics and text characteristics, performing aggregation processing on the video frame characteristics, the voice sequence characteristics and the text characteristics to obtain second global characteristics, and obtaining label information of the material to be processed based on the second global characteristics.

In a possible implementation manner, the processing module is specifically configured to obtain RGB parameters of each pixel point in each video frame of the multimedia data;

based on multimedia data and RGB parameters of each pixel point in each video frame, acquiring the probability of each video frame being a picture switching frame through a probability output model;

dividing the multimedia data into a plurality of groups of video clips based on the probability that each video frame is a picture switching frame;

and processing the plurality of groups of video clips based on the parameter adjustment strategy and the target parameters to generate the target video.

In one possible embodiment, the video duration of the multimedia data is less than the target video duration;

the processing module is specifically used for determining an industry template set based on industry label information;

adding the industry templates in the industry template set into a plurality of groups of video clips to obtain a plurality of groups of first video clips, wherein the sum of the video time length of each first video clip is equal to the target video time length of a video to be synthesized;

and combining the multiple groups of first video clips to generate the target video.

In one possible embodiment, the tag information further includes feature tag information indicating features of the material to be processed;

the video duration of the multimedia data is greater than the target video duration;

the plurality of groups of video clips comprise a first group of video clips and a second group of video clips;

the processing module is specifically used for determining the scores of the first group of video clips and the scores of the second group of video clips based on the industry tag information and the characteristic tag information, wherein the scores of the first group of video clips are larger than the scores of the second group of video clips;

determining a video clip proportion based on the scores of the first set of video segments and the scores of the second set of video segments;

the method comprises the steps that a first group of video segments and a second group of video segments are clipped based on a video clipping proportion, so that clipping results of the first group of video segments and clipping results of the second group of video segments are obtained, wherein the sum of video duration of the clipping results of the first group of video segments and video duration of the clipping results of the second group of video segments is equal to target video duration of a video to be synthesized;

and combining the clipping results of the first group of video segments and the clipping results of the second group of video segments to generate the target video.

In one possible embodiment, the multimedia data is picture data;

the acquisition module is specifically used for performing characterization processing on the multimedia data to obtain material characteristics, wherein the material characteristics comprise picture characteristics, or the material characteristics comprise picture characteristics and text characteristics;

if the material characteristics comprise picture characteristics and text characteristics, carrying out aggregation processing on the picture characteristics and the text characteristics to obtain third global characteristics, and obtaining label information of the material to be processed based on the third global characteristics;

and if the material characteristics comprise picture characteristics, obtaining label information of the material to be processed based on the picture characteristics.

In one possible embodiment, the processing module is specifically configured to determine an industry template set based on industry tag information;

and processing the material to be processed according to the industry template set and the target parameters to obtain a target video.

In one possible embodiment, the picture data is a single picture;

the label information also comprises an interactive control label which indicates that an interactive control exists in the material to be processed;

the processing module is specifically used for determining an interactive control from the material to be processed based on the interactive control label;

performing enhancement processing on the interactive control to obtain a first material, wherein the first material comprises the interactive control subjected to enhancement processing, and the enhancement processing is to enlarge and reduce the interactive control, or the enhancement processing is to thicken and highlight the interactive control;

and processing the first material according to the industry template set and the target parameters to obtain a target video.

In one possible embodiment, the picture data is a plurality of pictures;

the target parameters also include music style;

the processing module is specifically used for determining target music based on the music style;

determining a plurality of drum point positions in the target music based on the target music;

determining the display duration of each picture in the material to be processed based on a plurality of drum points in the target music;

and processing the material to be processed according to the industry template set, the target parameters and the display duration of each picture in the picture data to obtain the target video.

In one possible implementation, the label information further includes an interactive control label, and the interactive control label indicates that an interactive control exists in the material to be processed;

the processing module is specifically used for determining at least one interactive control from the material to be processed based on the interactive control label;

performing enhancement processing on each interactive control to obtain a second material, wherein the second material comprises at least one interactive control subjected to enhancement processing, and the enhancement processing is to enlarge and reduce the interactive control or to thicken and highlight the interactive control;

and processing the second material according to the industry template set and the target parameters to obtain a target video.

In one possible embodiment, the target parameter further comprises at least one of a sticker element, a subtitle element, or a graphic flag element;

the processing module is specifically used for processing the material to be processed based on the parameter adjustment strategy and the target parameter so as to generate a first video;

determining preset positions of the sticker elements, the subtitle elements and the graphic mark elements in the first video;

and placing at least one of the sticker elements, the subtitle elements and the graphic mark elements at a preset position in the first video to generate a target video.

In one possible embodiment, the target parameters further include a music style and a special effect element;

the processing module is specifically used for placing at least one of a sticker element, a subtitle element and a graphic mark element at a preset position in a first video to generate a second video;

determining target music based on the music style;

determining a special effect transition position in the second video based on the plurality of drum point positions in the target music and the second video;

adding the special effect element to a special effect transition position in the second video to generate the target video.

In one possible embodiment, the video generating apparatus further comprises a display module;

the acquisition module is specifically used for displaying an input interface, wherein the input interface comprises a data input interface and a parameter selection interface, the data input interface is used for inputting materials to be processed, and the parameter selection interface is used for selecting target parameters;

responding to data selection operation of a data input interface in an input interface, and acquiring a material to be processed;

responding to the parameter selection operation of the parameter selection interface in the input interface to acquire target parameters;

and the display module is used for processing the material to be processed by the processing module based on the parameter adjustment strategy and the target parameter so as to generate a target video and then display the target video on the video display interface.

A third aspect of the present application provides a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the method of the above-described aspects.

In another aspect of the application, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided by the above aspects.

The embodiment of the application provides a video generation method, which includes the steps of firstly obtaining a material to be processed and a target parameter, wherein the material to be processed comprises multimedia data, the target parameter comprises target video time length of a video to be synthesized, then obtaining label information based on the material to be processed, the label information comprises industry label information indicating an industry to which the material to be processed belongs, then determining a parameter adjustment strategy based on the multimedia data in the material to be processed and the industry label information in the label information, the parameter adjustment strategy comprises a video time length adjustment strategy for carrying out video time length processing on the material to be processed, and finally processing the material to be processed based on the parameter adjustment strategy and the target parameter to generate the target video, wherein the video time length of the target video is equal to the target video time length of the video to be synthesized. According to the method, the label information is obtained based on the material to be processed, the parameter adjustment strategy is determined based on the multimedia data in the material to be processed and the industry label information in the label information, the material to be processed is processed through the video time length adjustment strategy and the target parameter, the video time length of the obtained target video can meet the requirement of the target parameter due to the fact that the video time length adjustment strategy is used for adjusting the video time length, and then the label information also comprises the industry label information indicating the industry to which the material to be processed belongs, the industry requirement is met through corresponding adjustment conducted through the parameter adjustment strategy, and therefore flexibility and accuracy of video generation are improved.

Drawings

FIG. 1 is a schematic diagram of an architecture of a video generation system according to an embodiment of the present application;

fig. 2 is a schematic diagram of an embodiment of a method for generating a video according to an embodiment of the present application;

fig. 3 is a schematic diagram of an embodiment of a video resizing strategy provided in an embodiment of the present application;

fig. 4 is a schematic diagram of an embodiment of another video resizing strategy provided in the embodiments of the present application;

fig. 5 is a schematic diagram of an embodiment of obtaining material characteristics according to an embodiment of the present application;

fig. 6 is a schematic diagram of an embodiment of obtaining tag information of a material to be processed according to an embodiment of the present application;

fig. 7 is a schematic diagram of an embodiment of obtaining a probability of a picture switching frame according to an embodiment of the present application;

fig. 8 is a schematic diagram of an embodiment of another video generation provided in an embodiment of the present application;

fig. 9 is a schematic diagram of an embodiment for obtaining material characteristics according to an embodiment of the present application;

fig. 10 is a schematic diagram of an embodiment of obtaining tag information of a material to be processed according to an embodiment of the present application;

FIG. 11 is a schematic interface diagram of an enhancement process provided by an embodiment of the present application;

FIG. 12 is a schematic interface diagram of a sticker element, a subtitle element, and a graphic flag element according to an embodiment of the present application;

FIG. 13 is a schematic interface diagram of generating a target video based on a sticker element, a subtitle element, and a graphical sign element according to an embodiment of the present application;

FIG. 14 is a schematic diagram of an interface for displaying an input interface and displaying a target video according to an embodiment of the present application;

FIG. 15 is a schematic diagram of an embodiment of a video generation apparatus in an embodiment of the present application;

FIG. 16 is a schematic diagram of an embodiment of a server in an embodiment of the present application;

fig. 17 is a schematic diagram of an embodiment of a terminal device in the embodiment of the present application.

Detailed Description

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

With the development of internet services, video has become an important development trend. The video production of the advertisement materials becomes a great trend of advertisement putting, the production cost of new video materials from shooting to editing is high in the process of producing the advertisement video, and the obtained video is too similar due to the fact that the existing materials are directly multiplexed. At present, the existing materials can be secondarily created, and a new section of video can be generated by utilizing picture materials, file materials and video materials, fusing, combining and transforming the picture materials, the file materials, the video materials, related music and templates according to the editing experience of an editing personnel, increasing special effects and transition, and dubbing music again. However, in the conventional editing software operation, editing personnel have rich editing experience, and can have different creative forms for advertisement videos of different industries, however, different industries have different industrial requirements for different media industries, so how to generate videos more flexibly and accurately becomes a problem to be solved urgently. In order to solve the above problem, an embodiment of the present application provides a method for video generation, where a video duration adjustment policy is used to adjust a video duration, so that the obtained video duration of a target video can meet a requirement of a target parameter, and then, since tag information further includes industry tag information indicating an industry to which a material to be processed belongs, corresponding adjustment performed by the parameter adjustment policy also meets an industry requirement, so that flexibility and accuracy of video generation are improved.

The video generation system according to the embodiment of the present application is described below. It is understood that the method of video generation may be performed by a terminal device or a server. Referring to fig. 1, fig. 1 is a schematic diagram of an architecture of a video generation system in an embodiment of the present application, and as shown in fig. 1, the video generation system includes a terminal device and a server. Specifically, if the method for generating a video provided by the embodiment of the present application is introduced by taking a terminal device as an execution subject, the terminal device can obtain a material to be processed and a target parameter by using the method provided by the embodiment of the present application, obtain tag information based on the material to be processed, determine a parameter adjustment policy based on multimedia data in the material to be processed and industry tag information in the tag information, process the material to be processed based on the parameter adjustment policy and the target parameter to generate a target video, and display the target video on the terminal device. Secondly, if the method for generating the video provided by the embodiment of the application is introduced by taking the server as an execution subject as an example, firstly, a user needs to perform data selection operation and parameter selection operation on an input interface displayed by the terminal device to acquire a material to be processed and a target parameter, and send the material to be processed and the target parameter to the server, so that the server acquires tag information based on the material to be processed, determines a parameter adjustment policy based on multimedia data in the material to be processed and industry tag information in the tag information, processes the material to be processed based on the parameter adjustment policy and the target parameter to generate a target video, and then the server sends the target video to the terminal device, so that the terminal device displays the target video.

The server related to the application can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, safety service, Content Delivery Network (CDN), big data and an artificial intelligence platform. The terminal device may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, a smart television, and the like. And the terminal device and the server can communicate with each other through a wireless network, a wired network or a removable storage medium. Wherein the wireless network described above uses standard communication techniques and/or protocols. The wireless Network is typically the internet, but can be any Network including, but not limited to, bluetooth, Local Area Network (LAN), Metropolitan Area Network (MAN), Wide Area Network (WAN), mobile, private, or any combination of virtual private networks. In some embodiments, custom or dedicated data communication techniques may be used in place of or in addition to the data communication techniques described above. The removable storage medium may be a Universal Serial Bus (USB) flash drive, a removable hard drive or other removable storage medium, and the like.

Although only five terminal devices and one server are shown in fig. 1, it should be understood that the example in fig. 1 is only used for understanding the present solution, and the number of the specific terminal devices and the number of the servers should be flexibly determined according to actual situations.

Secondly, the method for generating a video provided by the embodiment of the present application can be applied to an advertisement scene or an entertainment scene, and the advertisement can be specifically divided into a non-economic advertisement and an economic advertisement, the non-economic advertisement refers to an advertisement which is not targeted for profit, and the economic advertisement is a advertisement which is targeted for profit, and in order to further understand the scheme, the non-economic advertisement and the economic advertisement are respectively introduced as follows:

the first kind of application scenes are non-economic advertisements, which include but are not limited to government announcements, inspirations and declarations in political parties, education, culture, municipal administration, social groups and the like, so for the non-economic advertisements, the method for video generation provided by the scheme can obtain the material to be processed and the target parameters, wherein the material can be the inspirational material or the declaration material and the like, and the target parameters can be the target video duration or the video size of the video to be synthesized, for example, the generated video should be played on a screen of a certain important section in a circulating way, and then the video size of the generated video should be consistent with the video size capable of being played on the screen. Based on the above, relevant tags of the material are obtained according to the material, for example, the government industry, the education industry or the municipal industry, so that important industry tag information of the material can be determined, and a parameter adjustment strategy for performing parameter adjustment on the video to be generated is determined based on the tag information, so that the material to be processed is processed through the parameter adjustment strategy with the target parameter being met, the target video is generated, and the video capable of meeting specific industry requirements and the target parameter is obtained.

The second type of application scenario is economic advertising, which is usually commercial advertising, which is a means of disseminating information of goods or services to consumers or users through advertising media in a pay-per-view manner for the purpose of promoting the goods or providing the services. For economic advertisements, the video generation method provided by the scheme can acquire the material to be processed and the target parameter, where the material may be a commodity material or a service information material, and the target parameter may be a target video duration or a video size of the video to be synthesized, for example, when the generated video is played in a mall with a large personal traffic, the video size of the generated video should be consistent with the video size that can be played on a screen included in the mall. Based on the above, relevant tags of the material are obtained according to the material, for example, the e-commerce industry, the network service industry, the financial industry and the like, namely, important industry tag information of the material can be determined, and a parameter adjustment strategy for performing parameter adjustment on the video to be generated is determined based on the tag information, so that the target parameter is met through the parameter adjustment strategy, the material to be processed is processed to generate the target video, and the video capable of meeting specific industry requirements of the e-commerce industry, the network service industry, the financial industry and the like and meeting the target parameter required by the user is obtained.

It is to be understood that the foregoing application scenarios are presented only for further understanding of the present solution, and in practical applications, specific application scenarios of the method for video generation provided by the embodiments of the present application include, but are not limited to, several of the above examples, and the application scenarios that are available here are not specifically exhaustive.

Since some steps in the embodiment of the present application need to be implemented based on Artificial Intelligence (AI), before the method for generating a video provided by the embodiment of the present application is introduced, some basic concepts in the field of Artificial Intelligence are introduced. Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and researched in various directions, Machine Learning (ML) is a multi-field cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

With reference to the above description, taking an execution subject as a terminal device as an example, please refer to fig. 2, where fig. 2 is a schematic diagram of an embodiment of a method for generating a video according to an embodiment of the present application, and as shown in fig. 2, the method for generating a video includes:

101. the method comprises the steps of obtaining a material to be processed and target parameters, wherein the material to be processed comprises multimedia data, and the target parameters comprise target video time length of a video to be synthesized.

In this embodiment, the video generation apparatus obtains a material to be processed and a target parameter. The material to be processed includes multimedia data, which may be video data, or multimedia data is picture data, or multimedia data is video data and picture data.

Next, the target parameter includes a target video time length of the video to be synthesized, for example, the target video time length of the video to be synthesized is 30 seconds (second, s), or the target video time length of the video to be synthesized is 2 minutes (min), or the like, or the target video time length of the video to be synthesized is less than 15s, and the target time length may be a specific time or a certain time range, which is not limited herein. The target parameters are used to indicate parameters that the finally generated video should be matched with and other related information, for example, the target parameters may further include a target video size of the video to be synthesized, or a music style of the video to be synthesized, or a sticker element, a subtitle element, a graphic sign element, a special effect element, and the like that can be placed in the video to be synthesized.

102. And acquiring label information based on the material to be processed, wherein the label information comprises industry label information indicating the industry to which the material to be processed belongs.

In this embodiment, the video generation apparatus acquires tag information based on the material to be processed, where the tag information includes industry tag information indicating an industry to which the material to be processed belongs. Optionally, the tag information may further include scene tag information indicating a scene to which the material to be processed belongs, or feature tag information indicating a feature of the material to be processed, or only an interactive control tag lamp having an interactive control in the material to be processed, and tag information corresponding to the material to be processed can be obtained for different materials to be processed, so that the content included in the tag information is not exhaustive here.

Specifically, the industry label information in the present embodiment includes, but is not limited to, government industry, education industry, financial industry, network service industry, e-commerce industry, and the like. For example, if the material to be processed is picture data, and the picture data includes a pair of shoes or a piece of clothes, the video generation apparatus may include industry tag information indicating that the material to be processed belongs to the e-commerce industry based on the tag information acquired by the material to be processed. Secondly, the multimedia data is video data, and the picture data is a section of propaganda video related to epidemic situation prevention and control, so that the video generation device can include industry label information indicating that the material to be processed belongs to government industry based on the label information acquired by the material to be processed. It should be understood that the foregoing examples are for the purpose of understanding industry labeling only and are not to be construed as limiting the present solution.

103. Determining a parameter adjustment strategy based on multimedia data in a material to be processed and industry label information in the label information, wherein the parameter adjustment strategy comprises a video duration adjustment strategy for performing video duration processing on the material to be processed.

In this embodiment, the video generation device determines the parameter adjustment policy based on the multimedia data in the material to be processed and the industry tag information in the tag information, and since the tag information includes the industry tag information indicating the industry to which the material to be processed belongs, the video duration adjustment policy can be determined based on the material to be processed and the industry tag information, and the video duration adjustment policy can perform video duration processing on the multimedia data in the material to be processed, that is, the parameter adjustment policy includes the video duration adjustment policy.

Specifically, the parameter adjustment policy is specifically configured to process the multimedia data in the material to be processed to generate a target video meeting the target parameter, and the method for specifically performing video duration processing on the multimedia data in the material to be processed may include, but is not limited to, clipping processing, template adding processing, merging processing, and the like, which is not specifically limited herein. Illustratively, the multimedia data is video data, and if the video duration of the video data is greater than the target video duration of the video to be synthesized in the target parameter, a video duration adjustment policy for determining how to clip the video time of the material to be processed to shorten the video time to the target parameter needs to be determined based on the material to be processed and the tag information determined in step 102. Secondly, if the video duration of the video data is less than the target video duration of the video to be synthesized in the target parameter, a video duration adjustment strategy for increasing the video time of the material to be processed to the target parameter needs to be determined based on the material to be processed and the tag information determined in step 102. It should be understood that, as can be seen from step 102, the tag information may further include other specific tag information corresponding to the material to be processed, different tag information may obtain corresponding parameter adjustment strategies, and each parameter adjustment strategy is used to adjust the material to be processed, so as to achieve the purpose of the parameter included in the target parameter. The content included in the parameter adjustment strategy is not exhaustive here on this basis.

104. And processing the material to be processed based on the parameter adjustment strategy and the target parameter to generate a target video, wherein the video time length of the target video is equal to the target video time length of the video to be synthesized.

In this embodiment, the video generation device processes the material to be processed based on the parameter adjustment policy and the target parameter to generate the target video, where the video duration of the target video is equal to the target video duration of the video to be synthesized. Specifically, the target video described in this embodiment may be one video or multiple videos, that is, as long as the video durations are all equal to the target video duration of the video to be synthesized, it should not be understood that only one video matching the target parameter is finally generated in this embodiment of the present application, and the number of videos of the target video should not be understood as a limitation of this scheme.

Specifically, the video generation apparatus adjusts the policy based on the parameter determined in step 103, and processes the material to be processed with the target parameter acquired in step 101 as a processing target, so as to obtain a target video matching the target parameter. It should be understood that, since the present embodiment only defines the target parameter as the target video duration of the video to be synthesized, the video duration of the obtained target video is equal to the target video duration of the video to be synthesized, based on the target parameter exemplified in step 101, if the target parameter further includes the target video size of the video to be synthesized, the video size of the target video is also equal to the target video size of the video to be synthesized, if the target parameter further includes the music style of the video to be synthesized, the music style of the target video matches the music style of the video to be synthesized, and secondly, if the target parameter further includes the sticker element and the subtitle element that can be placed in the video to be synthesized, the sticker element and the subtitle element that are selected by the user are also included in the target video. Thus, there may be different target parameters for different industries, and different target parameter pairs result in different target videos being generated, and therefore the foregoing examples should not be construed as limitations of the present solution.

In the embodiment of the application, a method for generating a video is provided, by the method, label information is obtained based on a material to be processed, a parameter adjustment strategy is determined based on multimedia data in the material to be processed and industry label information in the label information, the material to be processed is processed through a video duration adjustment strategy and a target parameter, the video duration of the obtained target video can meet the requirement of the target parameter due to the fact that the video duration adjustment strategy is used for adjusting the video duration, and then the corresponding adjustment performed through the parameter adjustment strategy also meets the industry requirement due to the fact that the label information further comprises the industry label information indicating the industry to which the material to be processed belongs, and therefore the flexibility and the accuracy of video generation are improved.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the method for generating a video provided in the embodiment of the present application, the target parameter further includes a target video size of the video to be synthesized;

determining a parameter adjustment strategy based on multimedia data in a material to be processed and industry label information in the label information, specifically comprising:

determining a video time length adjustment strategy based on the material to be processed and the industry label information;

based on the parameter adjustment strategy and the target parameter, processing the material to be processed to generate a target video, specifically comprising:

and processing the material to be processed based on the video duration adjustment strategy, the video size adjustment strategy, the target video duration of the video to be synthesized and the target video size of the video to be synthesized so as to generate the target video.

In this embodiment, as can be known from the foregoing embodiments, the target parameter may further include a target video size of the video to be synthesized, and the target video size may be a size ratio of the obtained video, for example, the target video size of the video to be synthesized is 16:9, the target video size of the video to be synthesized is 9:16, and the target video size of the video to be synthesized is 4: 3. Alternatively, the target video size may also be a specific size of the obtained video, for example, the target video size of the video to be synthesized is 750 × 1334, the target video size of the video to be synthesized is 512 × 384, and the like. Secondly, the label information also comprises scene label information indicating the scene to which the material to be processed belongs, for example, the scene label information is a test paper answer scene label, the scene label information is a commodity close-up scene label, and the like, and the scheme does not exhaust specific video size and specific scene label information.

Based on this, since the target parameter may further include a target video size of the video to be synthesized, the obtained parameter adjustment policy further includes a video size adjustment policy for adjusting the video size, so that the video size of the finally generated target video is equal to the target video size of the video to be synthesized. Therefore, the video generation device specifically determines a video duration adjustment strategy based on the material to be processed and the industry tag information, and determines a video size adjustment strategy based on the multimedia data in the material to be processed, the industry tag information and the scene tag information.

Specifically, the parameter adjustment policy is specifically configured to process multimedia data in the material to be processed to generate a target video meeting the target parameter, and the specific video duration adjustment policy is introduced in the foregoing embodiment and is not described here again. The specific method for adjusting the video size may include, but is not limited to, a filling process, a template adding process, a cropping process, an increasing or decreasing process, and the like, and is not limited herein.

Illustratively, to facilitate understanding of the video resizing strategy, several video resizing strategies under specific industries and corresponding scenarios are exemplified below. If the industry label information is an education industry label and the scene label information is a test paper answer scene label, the video size adjustment strategy can be that an industry template set containing education industry characteristics is obtained through the education industry label and the test paper answer scene label, and the material to be processed which does not meet the video size is filled so as to meet the requirement of the video size in the target parameter. Or, the industry label information is the financial industry, and the scene label information is the financial instrument panel scene label, and the video size adjustment strategy at this time may be that an industry template set containing financial industry features is obtained through the financial industry label and the financial instrument panel scene label, and the material to be processed which does not meet the video size is filled, so as to meet the requirement of the video size in the target parameter. Or if the industry label information is an e-commerce industry label and the scene label information is a commodity close-up scene label, the video size adjustment strategy can obtain an industry template set displayed by the selling point based on the e-commerce industry label and the commodity close-up scene label, and the industry template displayed by the selling point is filled into the material to be processed which does not meet the video size, so that the requirement of the video size in the target parameter is met.

Referring to fig. 3, fig. 3 is a schematic diagram of an embodiment of a video resizing strategy provided by an embodiment of the present application, as shown in fig. 3, a1 refers to a material to be processed, and a2 refers to an industry template. Fig. 3 (a) illustrates the material to be processed a1, and the size of the material to be processed a1 is 5:9, if the target video size of the video to be synthesized is 16:9, the size of the material to be processed a1 is compared with 16:9, if the material to be processed a1 can obtain an e-commerce industry label and a commodity close-up scene label, then an industry template set of a point-of-sale display of the industry can be obtained, an industry template a2 is obtained from the industry template set, and then the industry template a2 is filled into the material to be processed a1 shown in (a) in fig. 3, so as to obtain a target video shown in (B) in fig. 3, where the target video includes the material to be processed a1 and an industry template a2, and the video size of the target video is 16: and 9, meeting the requirement of video size in the target parameters. Secondly, if the target video size of the video to be synthesized is 9: by way of example, the industry template a2 is obtained from the industry template set in a similar manner as described above, and then is filled into the material to be processed a1 shown in fig. 3 (a), so as to obtain the target video shown in fig. 3 (C), where the target video includes the material to be processed a1 and the industry template a2, and the video size of the target video is 9:16, the requirement of video size in the target parameter can be met.

Or, if the industry label information is a network service industry label and the scene label information is an orally-played scene label, the video size adjustment strategy can obtain a focus character following template based on the network service industry label and the orally-played scene label, identify the orally-played character, and determine the orally-played character as the focus character on the basis that the requirement of the video size in the target parameter is met when the material to be processed is processed, so as to ensure that the focus character is always in the focus position of the target video. Exemplarily, referring to fig. 4, the target video size of the video to be synthesized is set to 9:16 again as an example, and fig. 4 is a schematic diagram of an embodiment of another video resizing strategy provided in the embodiment of the present application, as shown in fig. 4, B1 refers to the material to be processed, and B2 refers to the focused person. Fig. 4 (a) illustrates a to-be-processed material B1, and the size of the to-be-processed material B1 is 7:11, that is, the size of the to-be-processed material B1 is smaller than 9:16 in the target parameters, if the to-be-processed material B1 can obtain a web service tag and an orocast scene tag, a focus person following template can be obtained, specifically, the focus person B2 is identified, and then the to-be-processed material B1 is processed, so that the target video illustrated in fig. 4 (B) is obtained, in which the focus person B2 is always at the focus position of the target video, and the video size of the target video is 9:16, which meets the requirement of the video size in the target parameters.

It should be understood that the examples in fig. 3 and fig. 4 are only used for understanding the present solution, and the specific video resizing strategy determined based on the multimedia data, the industry tag information, and the scene tag information in the material to be processed is, in practical applications, also can be used for processing the material to be processed by using a depopulation template for all the industry tag information and including the filling scene tag, or by using a highlight display or hierarchical simulcast mode for all the industry tag information and including the video multi-shot scene tag, when the material to be processed is processed, which is not exhaustive here, but the foregoing examples are also only used for understanding the present solution.

Further, based on the video duration adjustment policy and the video size adjustment policy determined in the foregoing embodiment, the video generation device processes the material to be processed by using the target video duration of the video to be synthesized and the target video size of the video to be synthesized as processing targets, so as to obtain a target video matched with the target parameter.

In the embodiment of the application, another video generation method is provided, because the acquired tag information further includes scene tag information of a scene to which a to-be-processed material belongs, and corresponding adjustment performed through a parameter adjustment strategy can meet specific scene requirements on the basis of meeting industrial requirements, so that the flexibility and accuracy of video generation are improved, and secondly, a video size adjustment strategy is further determined through the scene tag information, so that through the video duration adjustment strategy and the video size adjustment strategy, a target video duration of the to-be-synthesized video and a target video size of the to-be-synthesized video are taken as processing targets, the to-be-processed material is processed, so that a target video matched with the video size and the video duration in the target parameter is obtained, and the accuracy of video generation is further improved.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the method for generating a video according to the embodiment of the present application, the multimedia data is video data;

acquiring label information of the material to be processed based on the material to be processed, which specifically comprises the following steps:

performing characterization processing on the multimedia data to obtain material characteristics, wherein the material characteristics comprise video frame characteristics and voice sequence characteristics, or the material characteristics comprise video frame characteristics, voice sequence characteristics and text characteristics;

In the embodiment, since the material to be processed can include multimedia data, in order to process different types of material to be processed, the multi-modal tag model is specifically adopted to process different types of material to be processed in the embodiment of the present application, the multi-modal tag model can be compatible with multimedia data as an input of the model, and since video data and picture data specifically include video frames, audio data, picture data and text information, therefore, the multi-mode label model comprises an inclusion V3 model, a Vggish model, a Bert model and a Resnet50 model, and the video frame data is characterized through an inclusion V3 model, the audio data is characterized through a Vggish model, the text information is characterized through a Bert model, and the picture data is characterized through a Resnet50 model, so that the corresponding characteristics are obtained. It should be understood that the foregoing model is only used for characterizing different data, and in practical applications, it is also possible to perform the characterization processing on video frame data through the inclusion V2 model, perform the characterization processing on audio data through the moviepy model, and so on, and therefore the foregoing example should not be construed as a limitation to the characterization processing.

Based on this, when the multimedia data is video data. Based on this, the video generation device specifically performs characterization processing on the multimedia data to obtain material characteristics, and since video frame data and audio data are inevitably present in the video data, the material to be processed is subjected to characterization processing, and the obtained material characteristics can include video frame characteristics (specifically, feature vector representation) and voice sequence characteristics, and secondly, text information is also present in part of the video data, and at this time, the material to be processed is subjected to characterization processing, and the obtained material characteristics can include video frame characteristics, voice sequence characteristics and text characteristics. Specifically, video frame data is characterized by an inclusion V3 model in the multi-modal label model to obtain video frame characteristics, audio data is characterized by a Vggish model in the multi-modal label model to obtain voice sequence characteristics, and text information is further characterized by a Bert model in the multi-modal label model to obtain text characteristics when the text information exists in the video data.

For convenience of understanding, please refer to fig. 5, fig. 5 is a schematic diagram illustrating an embodiment of obtaining material characteristics according to an embodiment of the present application, as shown in fig. 5, fig. 5 (a) illustrates that a material to be processed includes video frame data and audio data, and the video frame data and the audio data in the material to be processed are subjected to a characterization process to obtain video frame characteristics corresponding to the video frame data and speech sequence characteristics corresponding to the audio data. Next, as illustrated in fig. 5 (B), the material to be processed includes video frame data, audio data, and text information, and based on this, the video frame data, the audio data, and the text information in the material to be processed are characterized, so as to obtain video frame features corresponding to the video frame data, speech sequence features corresponding to the audio data, and text features corresponding to the text information.

Further, since different data and information are characterized, corresponding features can be obtained, but the features are dispersed, it is necessary to dynamically adjust the weight of each modality and enhance the effective features by using a context understanding (CG) method for all different features, and the CG is used for capturing the association of the dispersed features and outputting a more accurate overall result. Based on this, if the material characteristics include video frame characteristics and voice sequence characteristics, the video generation device performs aggregation processing on the video frame characteristics and the voice sequence characteristics to obtain first global characteristics, and obtains tag information of the material to be processed based on the first global characteristics, specifically, for the video frame characteristics and the voice sequence characteristics, a Nextvald method is adopted to aggregate the video frame characteristics and the voice sequence characteristics to obtain global characteristic representation, that is, the first global characteristics are obtained, then the first global characteristics are used as input of a multi-tag classification model, and the tag information of the material to be processed is output through the multi-tag classification model. Secondly, if the material characteristics comprise video frame characteristics, voice sequence characteristics and text characteristics, the video generation device carries out aggregation processing on the video frame characteristics, the voice sequence characteristics and the text characteristics to obtain second global characteristics, obtains label information of the material to be processed based on the second global characteristics, specifically, after the second global characteristics are obtained, the second global characteristics are used as input of a multi-label classification model, and label information of the material to be processed is output through the multi-label classification model.

For convenience of understanding, please refer to fig. 6, where fig. 6 is a schematic view of an embodiment of obtaining tag information of a material to be processed according to the present application, and as shown in fig. 6, (a) in fig. 6 illustrates that, after obtaining video frame features and speech sequence features by the example shown in fig. 5, aggregation processing is performed on the video frame features and the speech sequence features to obtain first global features, the first global features are used as inputs of a multi-tag classification model, and the tag information of the material to be processed is output by the multi-tag classification model. Next, fig. 6 (B) illustrates that, after the video frame feature, the voice sequence feature, and the text feature are obtained by the example illustrated in fig. 5, the video frame feature, the voice sequence feature, and the text feature are subjected to aggregation processing to obtain a second global feature, the second global feature is used as an input of the multi-tag classification model, and tag information of the material to be processed is output by the multi-tag classification model.

It should be understood that the examples in fig. 5 and fig. 6 are only used to understand how to obtain the material characteristics and how to obtain the tag information of the material to be processed in the present solution, and the specific material characteristics and tag information are not exhaustive here, and the foregoing examples are also only used to understand the present solution.

In the embodiment of the application, the method for acquiring the label information of the material to be processed is provided, the data and the text of different types in the material to be processed are subjected to characteristic processing, different characteristics are aggregated, and the association between the scattered different characteristics is captured, so that the obtained global characteristics can comprise more effective characteristics, the more accurate overall result can be output based on the aggregated global characteristics, and the accuracy of acquiring the label information of the material to be processed is improved.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the method for generating a video according to the embodiment of the present application, the processing, based on the parameter adjustment policy and the target parameter, the material to be processed to generate the target video specifically includes:

acquiring RGB parameters of each pixel point in each video frame of multimedia data;

In this embodiment, in practical application, because the splicing of multiple groups of video segments of video data is formed, the splicing may be direct splicing (hard splicing), or transition splicing (soft splicing) in one transition, in order to improve the processing accuracy, this scheme hopes to divide video data into multiple groups of video segments, and at this moment, video division is needed, and video division specifically needs shot boundary detection (shot boundary detection) by finding the boundary (i.e., picture switching frame) as the splicing. Therefore, the video generation device acquires the RGB parameters of each pixel point in each video frame of the multimedia data, and secondly, the video generation device can also acquire the similarity between each video frame in the material to be processed.

Based on the method, the video generation device takes the multimedia data, the RGB parameters of each pixel point in each video frame and the similarity between each video frame in the material to be processed as the input of the probability output model, and outputs the probability that each video frame is a picture switching frame through the probability output model. Specifically, due to the RGB parameters of each pixel point in each video frame and the similarity between each video frame in the material to be processed, the relevance between each video frame can be more accurately indicated, and the accuracy of subsequent frame-by-frame prediction is improved. Secondly, the probability output model in the scheme is specifically a TransNet V2 model, and because a batch normalization (atch normalization) b and a residual network structure are added to the TransNet V2 model, and noise is added during training of the TransNet V2 model, a material to be processed passes through a plurality of residual networks and is based on a data-dependent convolutional neural network (Res-DDCNN), so that the TransNet V2 model can learn the characteristics of an image and the time dimension, and the accuracy of subsequent frame-by-frame prediction is further improved.

Secondly, if the method provided by the scheme is specifically applied to an advertisement scene, in order to better adapt to video generation in the advertisement scene, when a probability output model is trained, a to-be-trained sample set is obtained by synthesizing videos containing advertisement characteristics, the probability output model to be trained is trained through the to-be-trained sample set to obtain the probability output model used by the scheme, the videos containing the advertisement characteristics include but are not limited to videos with padding or videos with pictures synthesized through transition, and the padding is but not limited to Gaussian fuzzy padding, picture padding, color padding and the like. Secondly, because the operation efficiency of the probability output model is reduced by adopting a single frame, the scheme also optimizes the frame extraction mode, adopts a parallel frame extraction strategy, takes a plurality of video frames as a group as the input of the probability output model, reduces the time consumption of the model algorithm and improves the efficiency of obtaining the probability.

For easy understanding, please refer to fig. 7, fig. 7 is a schematic diagram illustrating an embodiment of obtaining a probability of a picture switching frame according to an embodiment of the present disclosure, as shown in fig. 7, C1 refers to a material to be processed, C2 refers to RGB parameters of each pixel point in each video frame, C3 refers to similarity between each video frame in the material to be processed, C4 refers to a probability output model, C5 refers to a probability that each video frame is a picture switching frame, and the probability output model C4 includes a plurality of Res-DDCNN and a full link layer. Based on this, the material to be processed C1 is used as an input of the probability output model C4, and the RGB parameters C2 of each pixel point in each video frame and the similarity C3 between each video frame in the material to be processed are used as inputs of the full connection layer in the probability output model C4, so that the probability output model C4 outputs each video frame as the probability C5 of the picture switching frame. It should be understood that the example of fig. 7 is only used for understanding the present solution, and how to obtain the probability of obtaining the picture switching frame can also be obtained by other model structures, and the specific model structures and methods are not exhaustive here, and the foregoing examples are also only used for understanding the present solution.

Furthermore, the video generation device divides the multimedia data into a plurality of groups of video clips based on the probability that each video frame is a picture switching frame, and finally processes the plurality of groups of video clips based on the parameter adjustment strategy and the target parameters to generate the target video. Specifically, the video generation device determines the probability that each video frame is a picture switching frame, wherein the probability is greater than a preset probability threshold, as a target probability, and divides a plurality of video frames by using the picture switching frame, wherein the video frames correspond to the target probability, as the picture switching frame, so as to obtain a plurality of groups of video clips.

Illustratively, if the material to be processed includes 10 video frames, which are respectively the video frame 1 to the video frame 10, and the preset probability threshold is 75%, the probability that the video frame 4 is the picture switching frame is 80%, the probability that the video frame 7 is the picture switching frame is 85%, and the probabilities that the remaining video frames are the picture switching frames are all 10-20% obtained through the foregoing steps, at this time, the video frame 4 and the video frame 7 may be used as the picture switching frames, and the video frames 1 to 10 are divided, so that the video frames 1 to 4 can be used as a group of video clips, the video frames 5 to 7 can be used as a group of video clips, and the video frames 8 to 10 can be used as a group of video clips.

In the embodiment of the present application, another video generation method is provided, based on multimedia data as video data, because the video data is generally composed of a plurality of video frames, based on RGB parameters of each pixel point in each video frame in the video data and correlation between each video frame in the video data, the video data is divided into a plurality of groups of video segments, the plurality of groups of video segments are respectively processed based on a parameter adjustment policy, with target parameters as targets, so that the content of the video data in a continuous time sequence can be understood, time length and size conversion is performed, so as to ensure that the relationship between the plurality of video frames and the video data in the continuous time sequence are more accurately processed, and therefore, the obtained target video can more accurately describe feature information in the plurality of video frames on the basis of meeting the requirements of the target parameters, thereby improving the reliability and information integrity of the target video.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the method for generating a video according to the embodiment of the present application, the video duration of the multimedia data is less than the target video duration;

processing the multiple groups of video clips based on the parameter adjustment strategy and the target parameters to generate a target video, specifically comprising:

determining an industry template set based on industry tag information;

In this embodiment, since the video duration of the multimedia data can be less than the target video duration, or the video duration of the multimedia data is greater than the target video duration, the present embodiment first introduces a method how to generate the target video when the video duration of the multimedia data is less than the target video duration. Since the video duration of the multimedia data is less than the target video duration, the video generation device is required to determine an industry template set specifically based on industry tag information, and then the industry templates in the industry template set are added to multiple groups of video clips to obtain multiple groups of first video clips, at this time, the sum of the video durations of each first video clip is equal to the target video duration of a video to be synthesized, it should be understood that not each group of video clips needs to be added with a corresponding industry template, but needs to be added according to the specific video duration and a specific industry, and the industry template set may include one or more industry templates, which need to be determined based on templates preset in the specific industry. Further, the video generation device combines the plurality of groups of first video segments to generate the target video, so as to increase the attraction of the front part of the target video or increase the subsequent conversion of the target video.

Specifically, at this time, the industry templates in the industry template set are head and tail frame industry templates, that is, the industry templates include a head frame industry template added before a head frame (a first frame) of the video segment and a tail frame industry template added before a tail frame (a last frame) of the video segment, and the head frame industry templates obtained according to different industries are different, for example, the head frame industry template corresponding to the education industry tag information is an education session template, the head frame industry template corresponding to the game industry tag information is a decompression template, and the head frame industry template corresponding to the e-commerce industry tag information is an action call template (for example, a download click template, a jump click template, and the like). It should be understood that in practical applications, the industry template set may also be a template inserted into a video frame of a set of video clips, and thus should not be construed as a limitation of the present application. And the multiple groups of first video clips obtained by different industry template set adding modes are also different, so that the multiple groups of first video clips in the scheme are specifically the sets of the multiple groups of first video clips, and the obtained target video can comprise multiple videos.

Illustratively, if the video duration of the multimedia data is 10s and the target video duration is 15s, the e-commerce industry label information is obtained in the foregoing manner, and the material to be processed can be divided into a video segment 1 and a video segment 2, and the duration of the video segment 1 is 6s and the duration of the video segment 2 is 4 s. If the time length of clicking the download template is 3s and the time length of clicking the jump template is 2s at this time, the clicked download template can be added to the front of the first frame of the video clip 1, so that the first video clip 1 is obtained, the time length of the first video clip 1 is 9s, then, the clicked jump template is added to the back of the tail frame of the video clip 2, so that the first video clip 2 is obtained, the time length of the first video clip 2 is 6s, then the first video clip 1 and the first video clip 2 are synthesized, so that the obtained video time length of the target video is 15s, and the target parameter is met.

In the embodiment of the application, another video generation method is provided, when the video duration of multimedia data is less than the target video duration, an industry template set is determined through industry label information, and at least one industry template in the industry template set is added to at least one group of video clips, so that a plurality of first video clips with the sum equal to the target video duration of a video to be synthesized can be obtained, and the video is synthesized based on the industry template set, so that on the basis that the video duration can meet the requirement of a target parameter, the video generation flexibility can be improved through a specific template adding mode, such as adding to the tail frame, the head frame or the video frame of the video clip.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the method for generating a video provided in the embodiment of the present application, the tag information further includes feature tag information indicating a feature of the material to be processed;

determining scores of a first group of video clips and scores of a second group of video clips based on industry tag information and feature tag information, wherein the scores of the first group of video clips are greater than the scores of the second group of video clips;

In this embodiment, the tag information further includes feature tag information indicating features of the material to be processed, the video duration of the multimedia data is greater than the target video duration, and the plurality of groups of video segments include a first group of video segments and a second group of video segments. The feature tag information can indicate the material features of the material to be processed, the feature tag information specifically includes a main body tag, a color tag, a benefit point tag, an interface tag and the like, for example, the material to be processed is a game interface, the feature tag information that can be obtained can include a game character tag (main body tag) and a game interface tag, and at this time, it can be known that the main body in the material to be processed is a game task through the feature tag information, and the material to be processed comes from the game interface.

Based on the industry label information and the characteristic label information, the video generation device determines scores of the first group of video segments and scores of the second group of video segments, wherein the scores of the video segments can indicate the proportion of material characteristics included in the video segments in the material to be processed, and the scores of the first group of video segments are larger than the scores of the second group of video segments. For example, the first group of video clips includes video frames composed of game characters, game backgrounds and other game features, and the second group of video clips includes only video frames composed of game backgrounds, so that the proportion of material features included in the video clips reflected in the first group of video clips in the material to be processed is larger, and the proportion of material features included in the video clips reflected in the second group of video clips in the material to be processed is smaller, so that the scores of the first group of video clips are larger than the scores of the second group of video clips.

Further, the video generation apparatus determines the video clip ratio based on the scores of the first group of video segments and the scores of the second group of video segments, for example, the score of the first group of video segments is 90 and the score of the second group of video segments is 60, thereby obtaining a video clip ratio of 3: 2 (90: 60= 3: 2). And then, based on the video clip ratio, clipping the first group of video segments and the second group of video segments to obtain clipping results of the first group of video segments and clipping results of the second group of video segments, where the sum of the video duration of the clipping results of the first group of video segments and the video duration of the clipping results of the second group of video segments is equal to the target video duration of the video to be synthesized, for example, with the video clip ratio of 3: 2, the duration of the material to be processed is 35s, the duration of the target video is 20s, the duration of the first group of video segments is 20s, and the duration of the second group of video segments is 15s, which is taken as an example to illustrate that the video duration of the first group of video segments, i.e. the result of the editing of the first group of video segments, is expected to be approximately 12s (20 × 3/5), and the video duration of the second group of video segments, i.e. the result of the editing of the second group of video segments, is expected to be approximately 8s (20 × 2/5).

The following describes how to clip a video clip specifically, first, subject detection is performed on the video clip, and if the material to be processed is a game interface, detection needs to be performed on game characters in the video clip, so as to detect the proportion of the game characters appearing in the video clip. Secondly, in the case of e-commerce, it is necessary to identify e-commerce goods, such as clothes, trousers, and shoes, and to identify the proportion of the e-commerce goods appearing in the video clip. Thirdly, in order to ensure that the pictures are continuous without frame skipping after the video clip, the colors of the video segments need to be identified so as to ensure that the colors are excessively natural. Based on the method, the preset video time length of the clipping result of each video segment is determined according to the video clipping proportion and the target video time length, and each video segment is clipped in the manner, so that the clipping result of the first group of video segments and the clipping result of the second group of video segments are obtained.

And finally, combining the clipping results of the first group of video segments and the clipping results of the second group of video segments to generate a target video, wherein the sum of the video duration of the clipping results of the first group of video segments and the video duration of the clipping results of the second group of video segments is equal to the target video duration of the video to be synthesized, so that the video duration of the obtained target video is also equal to the target video duration of the video to be synthesized. It should be understood that, because the manner of editing is different, the present solution can obtain the editing results of multiple sets of the first group of video segments and the editing results of the second group of video segments, and thus the generated target video can also include multiple videos.

For convenience of understanding, the clipping process in the present solution is described below with reference to fig. 8, and fig. 8 is a schematic view of another embodiment of video generation provided by the embodiment of the present application, as shown in fig. 8, D1 indicates a material to be processed, D21 refers to a first group of video segments, D22 refers to tag information of the first group of video segments D21, and since the first group of video segments D21 includes a commodity body, the tag information D22 of the first group of video segments D21 is specifically a background matting subject position. Next, D31 refers to the second group of video clips, D32 refers to the tag information of the second group of video clips D31, and since the second group of video clips D31 only include text information and do not include a commodity main body, the tag information D32 of the second group of video clips D31 is specifically a promotion page no main body.

Further, D4 refers to the score of the first group of video segments, D5 refers to the score of the second group of video segments, since the score of the video segments can indicate the proportion of the material features included in the video segments to be processed, and as can be known from the tag information D22 of the first group of video segments D21 and the tag information D32 of the second group of video segments D31, the commercial subject is included in the first group of video segments D21, and the commercial subject is not included in the second group of video segments D31, the resulting score D4 of the first group of video segments should be greater than the score D5 of the second group of video segments. Based on this, D6 refers to a video clip ratio, that is, a video clip ratio D6 is determined according to the scores D4 of the first group of video segments and the scores D5 of the second group of video segments, and the first group of video segments D21 and the second group of video segments D31 are clipped according to the video clip ratio D6, so that the clipping results of the first group of video segments and the clipping results of the second group of video segments can be obtained, and then the clipping results of the first group of video segments and the clipping results of the second group of video segments are combined according to the method described in the foregoing embodiment, so as to generate the target video meeting the target parameters.

In the embodiment of the application, another video generation method is provided, the video duration of multimedia data is greater than the duration of a target video, each group of video segments is scored, the score indicates the proportion of material characteristics of materials to be processed reflected by the group of video segments, the clipping proportion is obtained on the basis, and therefore material tendency clipping is performed through the clipping proportion, results with different tendencies are clipped, namely different groups of clipping results are obtained, the target video comprising a plurality of videos is generated, and the flexibility of video generation is improved.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the method for generating a video provided in the embodiment of the present application, the multimedia data is picture data;

performing characterization processing on the multimedia data to obtain material characteristics, wherein the material characteristics comprise picture characteristics, or the material characteristics comprise picture characteristics and text characteristics;

In this embodiment, the multimedia data is picture data. Based on this, the video generation device performs characterization processing on the multimedia data to obtain a material feature, where the material feature can include a picture feature, or, when the picture data specifically includes the picture data and the text information is, the material feature includes the picture feature and the text feature. Specifically, the image features are characterized by the model introduced in the foregoing embodiment to obtain the image features, and then, when text information exists in the image data, the text information is further characterized by the model introduced in the foregoing embodiment to obtain the text features. It should be understood that the specific model is similar to the previous embodiment and is not described herein.

For convenience of understanding, please refer to fig. 9, where fig. 9 is a schematic view of an embodiment of obtaining material characteristics according to the present embodiment, and as shown in fig. 9, fig. 9 (a) illustrates that the multimedia data H1 is specifically the picture data H2, and does not include other information data such as text information, so that the picture characteristic H3 corresponding to the picture data H2 can be obtained by characterizing the picture data H2. Next, as illustrated in fig. 9 (B), since the multimedia data H4 includes the picture data H5 and the text information H6, the picture data H5 and the text information H6 in the multimedia data H4 are characterized, and the picture feature H7 corresponding to the picture data H5 and the text feature H8 corresponding to the text information H6 can be obtained.

Further, since different data and information are characterized to obtain corresponding features, a CG method is required for all different features to dynamically adjust the weights of the respective modalities and enhance the effective features, and the CG is used for capturing the association of the dispersed features to output a more accurate overall result. Based on this, if the material characteristics comprise the picture characteristics and the text characteristics, the picture characteristics and the text characteristics are subjected to aggregation processing to obtain third global characteristics, the third global characteristics are used as input of a multi-label classification model, and label information of the material to be processed is output through the multi-label classification model. Secondly, if the material characteristics only include the picture characteristics, the aggregation processing is not needed, the picture characteristics are directly used as the input of the multi-label classification model, and the label information of the material to be processed is output through the multi-label classification model.

For easy understanding, referring to fig. 10, fig. 10 is a schematic diagram illustrating an embodiment of obtaining tag information of a material to be processed according to an embodiment of the present application, as shown in fig. 10, fig. 10 (a) illustrates that, after the picture feature I1 is obtained by the example shown in fig. 9, the picture feature is used as an input of a multi-tag classification model I2, and tag information I3 of the material to be processed is output by the multi-tag classification model I2. Next, fig. 10 (B) illustrates that, after the picture feature I4 and the text feature I5 are obtained by the example shown in fig. 9, the picture feature I4 and the text feature I5 are aggregated to obtain a third global feature I6, and then the third global feature I6 is used as an input of the multi-tag classification model I7, and the tag information I8 of the material to be processed is output by the multi-tag classification model I7.

It should be understood that the examples in fig. 9 and fig. 10 are only used to understand how to obtain the material characteristics and how to obtain the tag information of the material to be processed in the present solution, and the specific material characteristics and tag information are not exhaustive here, and the foregoing examples are also only used to understand the present solution.

In the embodiment of the application, another method for acquiring the label information of the material to be processed is provided, the image data and the text information in the material to be processed are subjected to characteristic processing, different features are aggregated, and the association between the scattered different features is captured, so that the obtained global features can comprise more effective features, the more accurate overall result can be output based on the aggregated global features, and the accuracy of acquiring the label information of the material to be processed is improved.

determining an industry template set based on industry tag information;

In this embodiment, the video generation apparatus determines an industry template set based on the industry tag information, and specifically processes the material to be processed according to the industry template set and the target parameter, so as to obtain the target video. Specifically, since the picture data may be a single picture or multiple pictures, after the industry template set is determined, if the picture is a single picture, the selected industry template may be a video template, and then the selected industry template and the single picture are added in a cutting manner with the target video duration as the purpose, until the target video duration is met, the target video may be generated. If the images are multiple images, other industry template adding modes exist, for example, at least one piece of music is determined according to the music style included in the images, the music drum points of the music are determined, and the multiple images are displayed based on the music drum points. The following describes a video generation method corresponding to a single picture and multiple pictures.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the method for generating a video according to the embodiment of the present application, the picture data is a single picture;

according to the industry template set and the target parameters, processing the material to be processed to obtain a target video, which specifically comprises the following steps:

determining an interactive control from the material to be processed based on the interactive control label;

In this embodiment, the picture data is first introduced as a single picture. The label information also comprises an interactive control label, the interactive control label indicates that an interactive control exists in the material to be processed, in a specific application, the interactive control can be a click button, a button and the like, and the form of the specific interactive control is not limited here. Based on this, if the obtained tag information further includes an interactive control tag, that is, it indicates that the interactive control exists in the material to be processed, the video generation apparatus needs to determine the interactive control from the material to be processed. It should be understood that, because the forms of the interactive controls in each picture are different, and the background of each picture is complex and diverse, the accuracy of the interactive controls to the pixel level is determined from the material to be processed, in order to ensure the accuracy of the determined interactive controls, in this embodiment, a semantic segmentation method is selected to identify the interactive controls, an image network training interactive control segmentation model is specifically adopted, the interactive control segmentation model specifically uses a multi-dimensional context-aware feature extraction module (MCFEM), has the capability of extracting the multi-dimensional context-aware features of the picture, and increases a threshold mechanism (GBMP) to remove the noise of the picture. Furthermore, the problem of foreground and background weighting values is also introduced in the embodiment of the application so as to solve the problem of identification defects. Second, a contour loss (contour loss) is introduced to solve the problem of inaccurate edges of interactive controls.

And then, the video generation device performs enhancement processing on the interactive control to obtain a first material, wherein the first material comprises the interactive control subjected to enhancement processing, the enhancement processing is to perform amplification and reduction on the interactive control, or the enhancement processing is to perform thickening and highlighting on the interactive control, and the like, and finally, the first material is processed according to the industry template set and the target parameters to obtain a target video.

For example, referring to fig. 11, fig. 11 is an interface schematic diagram of the enhancement processing provided by the embodiment of the present application, as shown in fig. 11, E1 refers to an interactive control, E2 refers to an interactive control after zooming in and zooming out, and E3 refers to an interactive control after highlighting. Based on this, the image (a) in fig. 11 is picture data (a single picture), and includes the interactive control E1, and the interactive control E1 is zoomed in and zoomed out, so that the first material illustrated in fig. 11 (B) can be obtained, where the first material includes the interactive control E2 after being zoomed in and zoomed out. Secondly, the interactive control E1 is thickened and highlighted, so that the first material illustrated in (C) in fig. 11 can be obtained, where the first material includes the interaction control E3 after being thickened and highlighted.

In the embodiment of the application, another video generation method is provided, when picture data is a single picture, the interactive control is enhanced to emphasize the part of the interactive control in the picture, so that the generated video can highlight the interactive control, a user can perform interactive operation based on the video, and therefore the practicability and the interactivity of the generated video are improved.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the method for generating a video according to the embodiment of the present application, the picture data is a plurality of pictures;

the target parameters also include music style;

determining target music based on the music style;

In this embodiment, the introduction picture data is a plurality of pictures, and the target parameter further includes a music style at this time. Based on the method, the video generation device determines target music based on the music style, determines a plurality of drum points in the target music based on the target music, determines the display duration of each picture in the material to be processed based on the plurality of drum points in the target music, and processes the material to be processed according to the industry template set, the target parameters and the display duration of each picture in the picture data to obtain the target video. Specifically, based on the display duration of each picture in the picture data, each picture in the material to be processed and an industry template in the industry template set are displayed together so as to meet the display duration of each picture, and finally the obtained target video meets the target video duration in the target parameters.

For example, in the case where the target music includes music a, and the picture data includes picture a, picture B, picture C, and picture D, and the target video time length is 15s, if the positions of the plurality of drum points in music a are respectively at the 5 th, 8 th and 12 th and 18 th seconds, since only 4 pictures are needed in the scheme, and the required target video time length is 15s, the positions of the drum points of the 5 th s, the 8 th s and the 12 th s of the music A are taken, and the display time of the picture A is determined to be 0-5 s, the display time of the picture B is determined to be 5 s-8, the display time of the picture C is 8-12 s, the display time of the picture D is 12 s-15 s, and then displaying any industry template and the picture A to the 5 th s together, switching the picture B and any industry template, and so on, and synthesizing to obtain the target video with the video duration of 15 s.

It should be understood that the target music is music satisfying the music style, and thus may be multiple pieces of music, and thus it is necessary to determine multiple drum positions of each piece of music, and determine the presentation time length of each picture based on the multiple drum positions of each piece of music, that is, the presentation time lengths of each picture determined by different pieces of music are different, and thus the target videos generated by different pieces of music are also different.

In the embodiment of the application, another video generation method is provided, because the picture data are a plurality of pictures and the target parameters further include music styles, music selected according to the music styles can meet the requirements of users, and then the display duration of each picture is calculated by referring to the drum point position of the music selected according to the music styles, so that the display of each picture has a rhythm, and the flexibility and the interestingness of video generation are improved.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the method for generating a video provided in the embodiment of the present application, the label information further includes an interactive control label, and the interactive control label indicates that an interactive control exists in the material to be processed;

determining at least one interactive control from the material to be processed based on the interactive control label;

In this embodiment, the tag information further includes an interactive control tag, where the interactive control tag indicates that an interactive control exists in the material to be processed, and in a specific application, the interactive control may be a click button, a button, or the like, and a form of the specific interactive control is not limited here. Based on this, if the obtained tag information further includes an interactive control tag, that is, it indicates that the interactive control exists in the material to be processed, the video generation apparatus needs to determine the interactive control from the material to be processed. It should be understood that, since the form of the interactive control in each picture is different, and the background of each picture is complex and diverse, the interactive control needs to be determined from the processing material by using the method described in the foregoing embodiment, and details are not described here.

And then, the video generation device performs enhancement processing on the interactive control to obtain a second pixel, the second pixel comprises the interactive control subjected to enhancement processing, the enhancement processing is to perform amplification and reduction on the interactive control, or the enhancement processing is to perform thickening and highlighting on the interactive control, and the like, and finally, the second pixel is processed according to the industry template set and the target parameters to obtain the target video. The specific second material is similar to the first material described in the foregoing embodiment, and the specific interface display form refers to fig. 11 again, which is not described herein again.

It can be understood that, the foregoing describes separately the case where the multimedia data is video data or the material to be processed is picture data, and in practical applications, the material to be processed may also include video data and picture data at the same time.

In the embodiment of the application, another video generation method is provided, when multimedia data is video data, the interactive control is subjected to enhancement processing to emphasize the part of the interactive control in a picture, so that the generated video can highlight the interactive control, a user can perform interactive operation based on the video, and therefore the practicability and the interactivity of the generated video are improved.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the method for generating a video according to an embodiment of the present application, the target parameter further includes at least one of a sticker element, a subtitle element, or a graphic flag element;

processing the material to be processed based on the parameter adjustment strategy and the target parameter to generate a first video;

In this embodiment, the target parameter further includes at least one of a sticker element, a subtitle element, or a graphic flag element. Based on this, the video generating device specifically processes the material to be processed based on the parameter adjustment policy and the target parameter to generate the first video, and it should be understood that if the target parameter includes the target video duration and the target video size, the first video obtained at this time meets the target video duration and the target video size, and a specific generation manner of the first video is similar to that of the target video described in the foregoing embodiment, and details are not repeated here.

Further, the video generating device determines preset positions of the sticker element, the subtitle element, and the graphic mark element in the first video. It should be understood that the principle of adding the sticker elements, the subtitle elements, and the graphic mark elements is that main elements in the first video, such as characters, commodities, and texts, are not occluded, and the added sticker elements, subtitle elements, and graphic mark elements need to be matched with the overall color tone of the first video, so that the video generation apparatus needs to identify the positions of the characters, commodities, and texts in the first video, so as to determine the position that can be placed at the preset position in the first video, thereby completing the position estimation of the sticker elements, subtitle elements, and graphic mark elements. Secondly, the video generating apparatus may further recognize a dominant hue in the first video, and determine a style to which the sticker element, the subtitle element, and the graphic mark element are suitable based on the dominant hue in the first video. Exemplarily, referring to fig. 12, fig. 12 is an interface schematic diagram of a sticker element, a subtitle element, and a graphic mark element provided in an embodiment of the present application, as shown in fig. 12, a style of the sticker element is illustrated in fig. 12 (a), a color and a style of the subtitle element are illustrated in fig. 12 (B), and a style of the graphic mark element is illustrated in fig. 12 (C).

Based on the above, at least one of the sticker element, the subtitle element and the graphic mark element is placed in a preset position in the first video at the preset position to generate the target video. Illustratively, referring to fig. 13, fig. 13 is a schematic diagram of an interface for generating a target video based on a sticker element, a subtitle element, and a graphic mark element according to an embodiment of the present application, as shown in fig. 13, F1 refers to the sticker element, F2 refers to the subtitle element, and F3 refers to the graphic mark element. The target video comprises a sticker element F1, a subtitle element F2 and a graphic mark element F3, and the characters in the target video are not occluded.

Further, in practical applications, Text To Speech (TTS) technology can also be used to convert a user subtitle element into a speech element, and the speech element is played synchronously with the subtitle element. The detailed steps are not described herein.

In the embodiment of the application, another video generation method is provided, and more elements are added to the generated video meeting the target video duration and the target video size, so that important components in the video cannot be shielded on the basis of not modifying the video duration and the video size, and the generated target video is more interesting, and the flexibility of video generation is improved.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the method for generating a video provided in the embodiment of the present application,

the target parameters also comprise music style and special effect elements;

to predetermine the position, place at least one item in sticker element, subtitle element and graphic sign element in the preset position in first video to generate the target video, specifically include:

placing at least one of a sticker element, a subtitle element and a graphic mark element at a preset position in a first video to generate a second video;

determining target music based on the music style;

In this embodiment, the target parameters further include a music style and special effect elements. Based on this, the video generating apparatus places at least one of the sticker element, the subtitle element, and the graphic flag element at a preset position in the first video by the method described in the foregoing embodiment to generate the second video. Then, the target music is determined based on the music style, and similarly to the steps in the previous embodiment, a plurality of drum positions in the target music are determined based on the target music, and then a special effect transition position in the second video is determined based on the plurality of drum positions in the target music and the second video, and then a special effect element is added to the special effect transition position in the second video to generate the target video.

For example, taking the target music including music a and the target music time length being less than 15s as an example, since the target music time length needs to be less than 15s, if the video time length of the obtained second video is 13s (less than 15 s), if the multiple drumhead positions in music a are respectively at the 5 th s, the 8 th s, the 12 th s and the 18 th s of music a, the special effect transition positions in the second video are determined to be the 5 th s, the 8 th s and the 12 th s. Based on this, the special effect elements are added to the 5 th s, 8 th s and 12 th s in the second video, and since the special effect element duration is usually short, the obtained target video is still less than 15s, and if the special effect element duration is long, the characteristic elements are not added at the 12 th s.

It should be understood that the target music is music satisfying the music style, and thus may be multiple pieces of music, and thus it is necessary to determine multiple drum positions of each piece of music, and determine the presentation time length of each picture based on the multiple drum positions of each piece of music, that is, the special effect transition position in the second video determined by different pieces of music, and thus the target videos generated by different pieces of music are also different.

In the embodiment of the application, another video generation method is provided, and a special effect transition position is determined by referring to the drum point position of music selected by a music style, so that a special effect is inserted into the special effect transition position, and the special effect transition in the video has a rhythmic sense, thereby improving the flexibility and the interestingness of video generation.

Optionally, on the basis of the embodiment corresponding to fig. 2, in an optional embodiment of the method for generating a video according to the embodiment of the present application, the obtaining the material to be processed and the target parameter specifically includes:

displaying an input interface, wherein the input interface comprises a data input interface and a parameter selection interface, the data input interface is used for inputting materials to be processed, and the parameter selection interface is used for selecting target parameters;

after the material to be processed is processed based on the parameter adjustment strategy and the target parameter to generate the target video, the method for generating the video further comprises the following steps:

and displaying the target video on the video display interface.

In this embodiment, since the terminal device is used as the execution subject for description in this embodiment, the video generation apparatus is the terminal device capable of displaying the relevant interface. Based on the above, the video generation device displays an input interface, the input interface includes a data input interface and a parameter selection interface, the data input interface is used for inputting the material to be processed, the parameter selection interface is used for selecting the target parameter, then the user selects the material to be processed by performing data selection operation on the data input interface in the input interface, so that the video generation device responds to the data selection operation on the data input interface in the input interface to obtain the material to be processed, and similarly, the user selects the target parameter by performing parameter selection operation on the parameter selection interface in the input interface, so that the video generation device responds to the parameter selection operation on the parameter selection interface in the input interface to obtain the target parameter. After the target video is generated through the foregoing embodiment, the video generation apparatus may also display the target video on the video display interface.

It should be understood that, if the server is used as an execution subject for introduction, after the terminal device obtains the material to be processed and the target parameter in the foregoing manner, the terminal device needs to send the material to be processed and the target parameter to the server, so that the server generates the target video according to the received material to be processed and the target parameter, and then sends the target video to the terminal device, so that the terminal device displays the target video on the video display interface.

Secondly, if the target video comprises a plurality of videos, the user can select one of the videos to display the target display video selected by the user finally. And if the user wants to change any one of the target parameters or the materials to be processed, after the target video is generated, any one of the target parameters or the materials to be processed can be modified, and the video generation device obtains a new target video according to the mode and displays the new target video on the video display interface. How to operate and how to display can be flexibly determined according to the actual situation, and the operation and the display need not be limited herein.

For example, referring to fig. 14, fig. 14 is an interface schematic diagram of a display input interface and a display target video according to an embodiment of the present application, as shown in fig. 14, fig. 14 (a) illustrates the display input interface G1, the display input interface G1 includes a data input interface G2 and a parameter selection interface G3, fig. 14 (a) also illustrates a music selection interface, and fig. 14 (B) illustrates the display target video on the video display interface G4, where the target video includes videos G51 to G53. It should be understood that the foregoing examples are only for understanding the present solution, and that in practical applications, the display input interface may further include other target parameter selection interfaces, such as a music selection interface, a special effect element selection interface, a sticker element selection interface, a subtitle element selection interface, or a graphic mark element selection interface.

According to the video generation method, the material to be processed and the target parameters are obtained according to the requirements and the operation of the user, the generated target video meets the requirements of the user, and then the target video is displayed on the display interface, so that the user can determine whether the generated video meets the requirements of the user, and the reliability of video generation is improved.

Referring to fig. 15, fig. 15 is a schematic view of an embodiment of a video generating apparatus according to the present application, and as shown in the drawing, the video generating apparatus 1500 includes:

the acquiring module 1501 is configured to acquire a material to be processed and a target parameter, where the material to be processed includes multimedia data, and the target parameter includes a target video duration of a video to be synthesized;

the obtaining module 1501 is further configured to obtain tag information based on the material to be processed, where the tag information includes industry tag information indicating an industry to which the material to be processed belongs;

a determining module 1502, configured to determine a parameter adjustment policy based on multimedia data in a material to be processed and industry tag information in the tag information, where the parameter adjustment policy includes a video duration adjustment policy for performing video duration processing on the material to be processed;

the processing module 1503 is configured to process the material to be processed based on the parameter adjustment policy and the target parameter to generate a target video, where a video duration of the target video is equal to a target video duration of the video to be synthesized.

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the video generating apparatus 1500 provided in the embodiment of the present application, the target parameter further includes a target video size of the video to be synthesized;

a determining module 1502, configured to determine a video duration adjustment policy based on the material to be processed and the industry tag information;

the processing module 1503 is specifically configured to process the material to be processed based on the video duration adjustment policy, the video size adjustment policy, the target video duration of the video to be synthesized, and the target video size of the video to be synthesized, so as to generate the target video.

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the video generating apparatus 1500 provided in the embodiment of the present application, the multimedia data is video data;

the obtaining module 1501 is specifically configured to perform characterization processing on multimedia data to obtain material characteristics, where the material characteristics include video frame characteristics and voice sequence characteristics, or the material characteristics include video frame characteristics, voice sequence characteristics, and text characteristics;

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the video generating apparatus 1500 provided in this embodiment of the present application, the processing module 1503 is specifically configured to obtain RGB parameters of each pixel in each video frame of the multimedia data;

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the video generating apparatus 1500 provided in the embodiment of the present application, the video duration of the multimedia data is less than the target video duration;

the processing module 1503 is specifically configured to determine an industry template set based on the industry tag information;

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the video generating apparatus 1500 provided in the embodiment of the present application, the tag information further includes feature tag information indicating a feature of the material to be processed;

the processing module 1503 is specifically configured to determine scores of a first group of video segments and scores of a second group of video segments based on the industry tag information and the feature tag information, where the scores of the first group of video segments are greater than the scores of the second group of video segments;

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the video generating apparatus 1500 provided in the embodiment of the present application, the multimedia data is picture data;

the obtaining module 1501 is specifically configured to perform characterization processing on multimedia data to obtain material characteristics, where the material characteristics include picture characteristics, or the material characteristics include picture characteristics and text characteristics;

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the video generating apparatus 1500 provided in the embodiment of the present application, the processing module 1503 is specifically configured to determine an industry template set based on industry tag information;

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the video generating apparatus 1500 provided in this embodiment of the application, the picture data is a single picture;

the processing module 1503 is specifically configured to determine an interactive control from the material to be processed based on the interactive control label;

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the video generating apparatus 1500 provided in the embodiment of the present application, the picture data is a plurality of pictures;

the target parameters also include music style;

a processing module 1503, specifically configured to determine target music based on a music style;

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the video generating apparatus 1500 provided in the embodiment of the present application, the label information further includes an interactive control label, and the interactive control label indicates that an interactive control exists in the material to be processed;

the processing module 1503 is specifically configured to determine at least one interactive control from the material to be processed based on the interactive control label;

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the video generating apparatus 1500 provided in this embodiment of the application, the target parameter further includes at least one of a sticker element, a subtitle element, or a graphic flag element;

the processing module 1503 is specifically configured to process the material to be processed based on the parameter adjustment policy and the target parameter to generate a first video;

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the video generating apparatus 1500 provided in the embodiment of the present application, the target parameters further include a music style and a special effect element;

the processing module 1503 is specifically configured to place at least one of a sticker element, a subtitle element, and a graphic flag element in a preset position in the first video at a preset position to generate a second video;

determining target music based on the music style;

Optionally, on the basis of the embodiment corresponding to fig. 15, in another embodiment of the video generating apparatus 1500 provided in the embodiment of the present application, the video generating apparatus 1500 further includes a display module 1504;

the acquisition module 1501 is specifically configured to display an input interface, where the input interface includes a data input interface and a parameter selection interface, the data input interface is used to input a material to be processed, and the parameter selection interface is used to select a target parameter;

and the display module 1504 is configured to, by the processing module 1503, process the material to be processed based on the parameter adjustment policy and the target parameter to generate a target video, and then display the target video on the video display interface.

The embodiment of the application also provides another video generation device, and the video generation device can be deployed in a server or terminal equipment. Referring to fig. 16, fig. 16 is a schematic diagram of an embodiment of a server in an embodiment of the present application, and as shown in the figure, the server 1000 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1022 (e.g., one or more processors) and a memory 1032, and one or more storage media 1030 (e.g., one or more mass storage devices) storing an application program 1042 or data 1044. Memory 1032 and storage medium 1030 may be, among other things, transient or persistent storage. The program stored on the storage medium 1030 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, a central processor 1022 may be disposed in communication with the storage medium 1030, and configured to execute a series of instruction operations in the storage medium 1030 on the server 1000.

The server 1000 may also include one or more power supplies 1026, one or more wired or wireless network interfaces 1050, one or more input-output interfaces 1058, and/or one or more operating systems 1041, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

The steps performed by the server in the above embodiment may be based on the server structure shown in fig. 16.

The server includes a CPU 1022 for executing the embodiment shown in fig. 2 and the corresponding embodiments in fig. 2.

The present application further provides a terminal device, configured to execute the steps performed by the video generation apparatus in the embodiment shown in fig. 2 and the embodiments corresponding to fig. 2. As shown in fig. 17, for convenience of explanation, only the portions related to the embodiments of the present application are shown, and details of the specific techniques are not disclosed, please refer to the method portion of the embodiments of the present application. Taking a terminal device as a mobile phone as an example for explanation:

fig. 17 is a block diagram illustrating a partial structure of a mobile phone related to a terminal provided in an embodiment of the present application. Referring to fig. 17, the handset includes: radio Frequency (RF) circuitry 1110, memory 1120, input unit 1130, display unit 1140, sensors 1150, audio circuitry 1160, wireless fidelity (WiFi) module 1170, processor 1180, and power supply 1190. Those skilled in the art will appreciate that the handset configuration shown in fig. 17 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 17:

RF circuit 1110 may be used for receiving and transmitting signals during a message transmission or call, and in particular, for receiving downlink messages from a base station and then processing the received downlink messages to processor 1180; in addition, the data for designing uplink is transmitted to the base station. In general, RF circuit 1110 includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 1110 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), and the like.

The memory 1120 may be used to store software programs and modules, and the processor 1180 may execute various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 1120. The memory 1120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 1120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 1130 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 1130 may include a touch panel 1131 and other input devices 1132. Touch panel 1131, also referred to as a touch screen, can collect touch operations of a user on or near the touch panel 1131 (for example, operations of the user on or near touch panel 1131 by using any suitable object or accessory such as a finger or a stylus pen), and drive corresponding connection devices according to a preset program. Alternatively, the touch panel 1131 may include two parts, namely, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 1180, and can receive and execute commands sent by the processor 1180. In addition, the touch panel 1131 can be implemented by using various types, such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 1130 may include other input devices 1132 in addition to the touch panel 1131. In particular, other input devices 1132 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 1140 may be used to display information input by the user or information provided to the user and various menus of the cellular phone. The Display unit 1140 may include a Display panel 1141, and optionally, the Display panel 1141 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 1131 can cover the display panel 1141, and when the touch panel 1131 detects a touch operation on or near the touch panel, the touch panel is transmitted to the processor 1180 to determine the type of the touch event, and then the processor 1180 provides a corresponding visual output on the display panel 1141 according to the type of the touch event. Although in fig. 17, the touch panel 1131 and the display panel 1141 are two independent components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 1131 and the display panel 1141 may be integrated to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 1150, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1141 according to the brightness of ambient light, and the proximity sensor may turn off the display panel 1141 and/or the backlight when the mobile phone moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

Audio circuitry 1160, speakers 1161, and microphone 1162 may provide an audio interface between a user and a cell phone. The audio circuit 1160 may transmit the electrical signal converted from the received audio data to the speaker 1161, and convert the electrical signal into a sound signal for output by the speaker 1161; on the other hand, the microphone 1162 converts the collected sound signals into electrical signals, which are received by the audio circuit 1160 and converted into audio data, which are then processed by the audio data output processor 1180, and then transmitted to, for example, another cellular phone via the RF circuit 1110, or output to the memory 1120 for further processing.

WiFi belongs to short-distance wireless transmission technology, and the cell phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 1170, and provides wireless broadband internet access for the user. Although fig. 17 shows the WiFi module 1170, it is understood that it does not belong to the essential component of the handset.

The processor 1180 is a control center of the mobile phone, and is connected to various parts of the whole mobile phone through various interfaces and lines, and executes various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 1120 and calling data stored in the memory 1120, thereby performing overall monitoring of the mobile phone. Optionally, processor 1180 may include one or more processing units; preferably, the processor 1180 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated within processor 1180.

The phone also includes a power supply 1190 (e.g., a battery) for powering the various components, and preferably, the power supply may be logically connected to the processor 1180 via a power management system, so that the power management system may manage charging, discharging, and power consumption management functions.

Although not shown, the mobile phone may further include a camera, a bluetooth module, and the like, which are not described herein.

In the embodiment of the present application, the processor 1180 included in the terminal is configured to execute the embodiment shown in fig. 2 and the corresponding embodiments in fig. 2.

An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the steps executed by the server in the method described in the foregoing embodiment shown in fig. 2 and the corresponding embodiment.

Also provided in embodiments of the present application is a computer program product comprising a program, which when run on a computer, causes the computer to perform the steps performed by the server in the method as described in the embodiment of fig. 2 and its corresponding embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions in actual implementation, for example, at least two units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may also be distributed on at least two network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method of video generation, comprising:

acquiring a material to be processed and target parameters, wherein the material to be processed comprises multimedia data, and the target parameters comprise target video time length of a video to be synthesized and target video size of the video to be synthesized;

acquiring label information based on the material to be processed, wherein the label information comprises industry label information indicating the industry to which the material to be processed belongs and scene label information indicating the scene to which the material to be processed belongs;

determining a video duration adjustment strategy based on the multimedia data in the material to be processed and the industry tag information in the tag information, wherein the video duration adjustment strategy is used for performing video duration processing on the material to be processed;

determining a video size adjustment strategy based on the multimedia data, the industry label information and the scene label information in the material to be processed, wherein the video size adjustment strategy is used for adjusting the video size;

and processing the material to be processed based on the video duration adjustment strategy, the video size adjustment strategy, the target video duration of the video to be synthesized and the target video size of the video to be synthesized to generate a target video, wherein the video duration of the target video is equal to the target video duration of the video to be synthesized, and the video size of the target video is equal to the target video size of the video to be synthesized.

2. The method of claim 1, wherein the multimedia data is video data;

the obtaining of the tag information of the material to be processed based on the material to be processed includes:

if the material characteristics comprise the video frame characteristics and the voice sequence characteristics, performing aggregation processing on the video frame characteristics and the voice sequence characteristics to obtain first global characteristics, and obtaining label information of the material to be processed based on the first global characteristics;

if the material characteristics comprise the video frame characteristics, the voice sequence characteristics and the text characteristics, performing aggregation processing on the video frame characteristics, the voice sequence characteristics and the text characteristics to obtain second global characteristics, and obtaining label information of the material to be processed based on the second global characteristics.

3. The method according to claim 2, wherein the processing the material to be processed based on the video duration adjustment policy, the video resizing policy, and the target parameter to generate a target video comprises:

acquiring RGB parameters of each pixel point in each video frame of the multimedia data;

based on the multimedia data and the RGB parameters of each pixel point in each video frame, acquiring the probability of each video frame being a picture switching frame through a probability output model;

and processing the plurality of groups of video clips based on the video duration adjustment strategy, the video size adjustment strategy and the target parameter to generate the target video.

4. The method of claim 3, wherein the video duration of the multimedia data is less than the target video duration;

the processing the plurality of groups of video segments based on the video duration adjustment strategy, the video size adjustment strategy and the target parameter to generate the target video comprises:

determining an industry template set based on the industry tag information;

adding the industry templates in the industry template set to the multiple groups of video clips to obtain multiple groups of first video clips, wherein the sum of the video time length of each first video clip is equal to the target video time length of the video to be synthesized;

5. The method of claim 3, wherein the tag information further comprises feature tag information indicative of a feature of the material to be processed;

the plurality of sets of video segments comprises a first set of video segments and a second set of video segments;

determining scores of a first set of video segments and scores of a second set of video segments based on the industry tag information and the feature tag information, wherein the scores of the first set of video segments are greater than the scores of the second set of video segments;

clipping the first group of video segments and the second group of video segments based on the video clipping proportion to obtain the clipping result of the first group of video segments and the clipping result of the second group of video segments, wherein the sum of the video duration of the clipping result of the first group of video segments and the video duration of the clipping result of the second group of video segments is equal to the target video duration of the video to be synthesized;

6. The method of claim 1, wherein the multimedia data is picture data;

if the material characteristics comprise the picture characteristics and the text characteristics, performing aggregation processing on the picture characteristics and the text characteristics to obtain third global characteristics, and obtaining label information of the material to be processed based on the third global characteristics;

7. The method according to claim 6, wherein the processing the material to be processed based on the video duration adjustment policy, the video resizing policy, and the target parameter to generate a target video comprises:

determining an industry template set based on the industry tag information;

and processing the material to be processed according to the industry template set and the target parameters to obtain the target video.

8. The method of claim 7, wherein the picture data is a single picture;

the label information also comprises an interactive control label, and the interactive control label represents that an interactive control exists in the material to be processed;

the processing the material to be processed according to the industry template set and the target parameters to obtain the target video comprises:

and processing the first material according to the industry template set and the target parameters to obtain the target video.

9. The method according to claim 7, wherein the picture data is a plurality of pictures;

the target parameters also comprise music style;

determining target music based on the music style;

10. The method of claim 9, wherein the label information further comprises an interactive control label indicating the presence of an interactive control in the material to be processed;

performing enhancement processing on each interactive control to obtain a second material, wherein the second material comprises at least one interactive control subjected to enhancement processing, and the enhancement processing is to enlarge and reduce the interactive control, or the enhancement processing is to thicken and highlight the interactive control;

and processing the second material according to the industry template set and the target parameters to obtain the target video.

11. The method of claim 1, wherein the target parameters further comprise at least one of a sticker element, a subtitle element, or a graphic flag element;

the processing the material to be processed based on the video duration adjustment strategy, the video size adjustment strategy and the target parameter to generate a target video includes:

processing the material to be processed based on the video duration adjustment strategy, the video size adjustment strategy and the target parameter to generate a first video;

and placing at least one of the sticker element, the subtitle element and the graphic mark element in a preset position in the first video to generate the target video.

12. The method of claim 11, wherein the target parameters further include a music style and a special effects element;

the placing at least one of the sticker element, the subtitle element, and the graphic flag element in the preset position in the first video to generate the target video includes:

placing at least one of the sticker elements, the subtitle elements and the graphic mark elements at the preset position in the first video to generate a second video;

determining target music based on the music style;

13. The method of claim 1, wherein the obtaining the material to be processed and the target parameters comprises:

displaying an input interface, wherein the input interface comprises a data input interface and a parameter selection interface, the data input interface is used for inputting the material to be processed, and the parameter selection interface is used for selecting the target parameter;

responding to data selection operation of a data input interface in the input interface, and acquiring the material to be processed;

responding to the parameter selection operation of the parameter selection interface in the input interface, and acquiring the target parameter;

after the to-be-processed material is processed based on the video duration adjustment strategy, the video size adjustment strategy and the target parameter to generate a target video, the method further includes:

and displaying the target video on a video display interface.

14. A video generation apparatus, characterized in that the video generation apparatus comprises:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a material to be processed and target parameters, the material to be processed comprises multimedia data, and the target parameters comprise target video time length of a video to be synthesized and target video size of the video to be synthesized;

the acquisition module is further configured to acquire tag information based on the material to be processed, where the tag information includes industry tag information indicating an industry to which the material to be processed belongs and scene tag information indicating a scene to which the material to be processed belongs;

a determining module, configured to determine a video duration adjustment policy based on the multimedia data in the material to be processed and the industry tag information in the tag information, where the video duration adjustment policy is used to perform video duration processing on the material to be processed;

the determining module is further configured to determine a video resizing policy based on the multimedia data in the material to be processed, the industry tag information, and the scene tag information, where the video resizing policy is used to resize a video;

and the processing module is used for processing the material to be processed based on the video duration adjusting strategy, the video size adjusting strategy, the target video duration of the video to be synthesized and the target video size of the video to be synthesized so as to generate a target video, wherein the video duration of the target video is equal to the target video duration of the video to be synthesized, and the video size of the target video is equal to the target video size of the video to be synthesized.

15. A computer device, comprising: a memory, a transceiver, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is configured to execute a program in the memory to implement the method of any one of claims 1 to 13;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

16. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 13.