WO2020259449A1

WO2020259449A1 - Method and device for generating short video

Info

Publication number: WO2020259449A1
Application number: PCT/CN2020/097520
Authority: WO
Inventors: 李汤锁; 吴珮华; 陈绍君; 汪新建; 周胜丰
Original assignee: 华为技术有限公司
Priority date: 2019-06-24
Filing date: 2020-06-22
Publication date: 2020-12-30
Also published as: CN110418191A

Abstract

Disclosed are a method and device for generating a short video, for use in generating a short video corresponding to a video clip of concern to a user while browsing and sharing a video, reducing the time that the user spends on browsing and sharing the video, satisfying a user demand, and enhancing user experience. The method for generating a short video comprises: analyzing the video content in a source video, acquiring metadata information of the source video; analyzing the features of a user photographed content, acquiring user image data; extracting the video content in the source video on the basis of the metadata information of the source video and of the user image data to generate a short video.

Description

Method and device for generating short video

This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office of China with application number 201910549540.8 on June 24, 2019, and the priority of the Chinese patent application with the title of "A Short Video Generation Method and Device" , Its entire content is incorporated in this application by reference.

Technical field

The embodiments of the present application relate to the field of video processing technology, and in particular to a method and device for generating short videos.

Background technique

With the popularization of various mobile terminals and the development of mobile social media, shooting, browsing, and sharing videos through the camera carried in the mobile terminal has become one of the more frequent activities of terminal users in the process of using the mobile terminal.

Generally, a large number of pictures and videos are stored in the mobile terminal used by the user at the same time. In the process of browsing the video, what the user really pays attention to is only one or more video clips in the whole video, and the content of other parts of the video is not what the user pays attention to. The user needs to browse the entire video to browse the video clips that the user really cares about, and the entire video browsing process requires a lot of time and energy. Similarly, the video sharing process also needs to be shared on the basis of video browsing, which also consumes a lot of time and energy, so whether it is video browsing or video sharing, it will greatly affect user experience.

Summary of the invention

In order to solve the above-mentioned technical problems, the embodiments of the present application provide a short video generation method and device, which are used to generate short videos corresponding to video clips that users care about in browsing and sharing videos, and reduce the number of users in the process of browsing and sharing videos. Time-consuming, meeting user needs and improving user experience. The specific technical solutions are as follows:

In the first aspect, an embodiment of the present application provides a short video generation method, including: analyzing the video content in the source video to obtain metadata information in the source video; analyzing the characteristics of the content captured by the user, Obtain user portrait data; according to the metadata information in the source video and the user portrait data, extract the video content from the source video to generate a short video. Optionally, user portrait refers to the understanding of the content of the pictures and videos taken by the user to learn the type of content (people, scenery, food, party, etc.), preferences (more specific characters, composition methods, etc.) and habits of the user’s shooting content .

It can be seen from the above-mentioned technical solution of the first aspect that the metadata information in the source video is obtained by analyzing the video content of the source video itself, and the user portrait data obtained by analyzing the characteristics of the content taken by the user is easy to understand. From the above analysis of the video content itself and the characteristic analysis of the user's shooting content (that is, the analysis of the user's shooting preferences), the content that the user cares about in the source video can be greatly obtained, and then from the source video Extract the corresponding video clips to generate short videos. On the one hand, the short video contains the content that the user cares about, and on the other hand, the duration of the short video is shorter than that of the source video. Therefore, browsing and sharing the source video through the short video can not only meet the needs of users, but also greatly Improve user experience. Optionally, the source video may be one or more videos.

In a possible implementation of the first aspect, the aforementioned metadata information includes but is not limited to at least one of the following: portrait section information, human voice section information, object classification label information, video optical flow information, and aesthetic rating information. Through this implementation method, the video content in the source video can be comprehensively analyzed from multiple dimensions, thereby increasing the probability of obtaining the content that the user cares about, better meeting user needs, and improving user experience. The portrait interval information includes, but is not limited to, the face interval information.

In a possible implementation of the first aspect, the foregoing analysis of the video content in the source video to obtain metadata information of the source video may specifically include: analyzing the video stream in the source video , Extract the metadata information in the video frame; analyze the audio stream in the source video to extract the metadata information in the audio frame, the metadata information of the source video includes: the metadata information in the video frame And metadata information in the audio frame. Optionally, the metadata information in the video frame may specifically include but is not limited to at least one of the following: portrait interval information, object classification label information, video optical flow information, and aesthetic rating information. Optionally, the metadata information in the audio frame may specifically include, but is not limited to, at least one of the following: vocal interval information and background music interval information. Optionally, the foregoing method for analyzing video content in the source video includes, but is not limited to, a deep learning algorithm. Through this implementation method, the source video is analyzed based on the audio stream and the video stream to improve the analysis effect of the source video, obtain more accurate metadata information, better meet user needs and improve user experience . It should be noted that the analysis dimensions of the video content of the source video in this implementation can include, but are not limited to: video stream and audio stream, and can also include the following: video theme and/or video style, etc. This application does not make any restrictions.

In a possible implementation of the first aspect, the above-mentioned analyzing the characteristics of the content taken by the user to obtain user portrait data may specifically include: analyzing the pictures and videos stored in the user's album, extracting the pictures and The metadata information in the video; according to the metadata information in the picture and the video, the characteristics of the content taken by the user are analyzed to obtain the user portrait data. Optionally, the user portrait data may include but is not limited to: preference information corresponding to the person and/or object that the user prefers when shooting. Optionally, the foregoing method for analyzing the characteristics of the user's shooting content includes, but is not limited to, a deep learning algorithm. In this implementation method, by analyzing the big data of the pictures and videos stored in the user's album, and extracting the metadata information, the user portrait data can be obtained more accurately, and the user's shooting preferences can be accurately analyzed, so as to better satisfy User needs and improving user experience.

In a possible implementation of the first aspect, the foregoing extraction of video content from the source video to generate a short video based on the metadata information in the source video and the user portrait data may specifically include: Use the metadata information in the source video and the user portrait data to adjust the weight of each metadata in the source video; according to the weight of each metadata in the source video, select from the source video in accordance with the preset The short video is generated in a segment section of a duration. In other words, the foregoing extraction of video content from the source video to generate a short video based on the metadata information in the source video and the user portrait data may specifically be: metadata information analyzed by the video content of the source video , Adjust the weight of each metadata in the source video in combination with the user portrait data, and select the short video that meets the duration through the optimization strategy, where the optimization strategy is a strategy that is obtained based on the user's shooting preferences obtained from the user portrait data and used to filter videos. It should be noted that the above scheme can be specifically used in scenarios where the duration of the essence segment is defaulted or the duration is set by user interaction.

In this implementation manner, by adjusting the weight of each metadata in the source video and as one of the basis for generating the short video, the content that the user cares about can be selected, and the accuracy of selecting the short video content can be improved.

In a possible implementation of the first aspect, the foregoing short video generation method may further include: performing video rendering effect processing on the short video according to metadata information of the short video part in the source video . By performing video rendering effect processing on short videos, the video effects can be enhanced, and short videos with better user experience can be obtained.

In a second aspect, an embodiment of the present application provides a short video generation device. The video generation device may include an entity such as a terminal device or a chip. The video generation device includes a processor and a memory; the memory is used to store instructions; The processor is configured to execute the instructions in the memory, so that the video generation device executes the method described in the foregoing first aspect.

In a third aspect, an embodiment of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the method described in the first aspect.

In the fourth aspect, the embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method described in the first aspect.

Description of the drawings

FIG. 1 is a flowchart of an embodiment of a short video generation method provided by an embodiment of the application;

2 is a schematic diagram of an embodiment of selecting a video priority interval based on a result of video content analysis provided in an embodiment of the application;

FIG. 3 is a schematic structural diagram of a short video generating device provided by an embodiment of the application;

FIG. 4 is a schematic diagram of another structure of a short video generation device provided by an embodiment of the application;

FIG. 5 is a schematic diagram of another structure of the short video generation device provided by an embodiment of the application.

Detailed ways

The embodiments of the application provide a short video generation method and device, which are used to generate short videos corresponding to video clips that users care about in browsing and sharing videos, reducing the time consumption of users in the process of browsing and sharing videos, and meeting user needs, Improve user experience.

The embodiments of the present application will be described below in conjunction with the drawings.

The terms "first", "second", etc. in the description and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is merely a way of distinguishing objects with the same attributes in the description of the embodiments of the present application. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusion, so that a process, method, system, product, or device including a series of units is not necessarily limited to those units, but may include Listed or inherent to these processes, methods, products or equipment.

Fig. 1 is a flowchart of an embodiment of a short video generation method provided in an embodiment of the application.

As shown in Figure 1, the short video generation method in the embodiment of the present application includes:

101. Analyze the video content in the source video to obtain metadata information in the source video.

Analyze the video content of the source video (such as intelligent analysis), and extract various metadata information in the source video. Optionally, the metadata information may specifically include at least one of the following: portrait section information, human voice section information, object classification label information, video optical flow information, aesthetic rating information, and the like. The portrait interval information includes, but is not limited to, the face interval information. Optionally, the source video can be one or more videos.

Specifically, in an embodiment, the specific operation of analyzing the video content in the source video may include, but is not limited to: analyzing the video stream in the source video to extract metadata information in the video frame; The audio stream in the video is analyzed, and the metadata information in the audio frame is extracted. The metadata information of the source video includes: the metadata information in the video frame and the metadata information in the audio frame. Optionally, the metadata information in the video frame may include at least one of the following: portrait section information, object classification label information, video optical flow information, and aesthetic rating information; the metadata information in the audio frame may specifically include but is not limited to At least one of the following: vocal interval information and background music interval information, etc. In some application scenarios, metadata information obtained by analysis in the source video can also be stored.

In an embodiment, the foregoing intelligent analysis of the video content in the source video may be specifically implemented by a deep learning algorithm. Specifically, the analysis method of the video stream can be: extract video frames from the source video, and perform deep learning algorithms on the extracted video frames through face detection, face clustering, object detection, aesthetic scoring, optical flow analysis, etc. Analyze to get the recognition result, organize and merge the above recognition results to obtain the metadata information in the video stream, such as face interval information, object classification label information, video optical flow information, and aesthetic rating information, among which video optical flow information can also be It is called the fast slow motion interval information; the analysis method of the audio stream can be: extract the metadata information in the audio stream through audio processing algorithms such as natural language processing (NLP) algorithm, such as vocal interval information and background music interval Information etc.

Before the above analysis of the video stream and audio stream in the source video, the source video needs to be preprocessed to separate the video stream and audio stream in the source video, and the duration and frame rate of the source video can also be separated. All are separated, and there is no restriction on this application. It should also be noted that the number of source videos in the embodiment of the present application may be one or more, which does not impose any limitation on the embodiment of the present application. It should also be noted that the above-mentioned analysis dimensions of the video content of the source video may include, but are not limited to: video stream and audio stream, and may also include the following aspects: video theme and/or video style, etc. There are no restrictions on application.

The specific video content dimension classification can include: 1) Video theme information, such as birthday, party, graduation, night tour, sports, tourism, parent-child, performance, etc.; 2) Video style information, such as joy, nostalgia, brisk, playful Etc.; 3), video stream information, such as the aforementioned portrait interval information, object classification label information, video optical flow information and aesthetic rating information, etc.; 4), audio stream information, such as the aforementioned human voice interval information and background music interval information It should be noted that the metadata information described in this application may also include: the above-mentioned video theme information and video style information.

102. Analyze the characteristics of the content taken by the user, and obtain user portrait data.

Analyze the characteristics of the user's shooting content (such as intelligent analysis) to obtain user portrait data, where the user portrait refers to the understanding of the content of the pictures and videos taken by the user, and learn the type of the user's shooting content (person, landscape, food, party, etc. ), preferences (more specific characters, composition methods, etc.), and habits. For example, if a specific portrait A has the most pictures, it means that portrait A is the person most concerned about by the user. Similarly, when a specific object B has the most pictures, it means object B It is the most concerned object of users.

Specifically, in an embodiment, analyzing the characteristics of the content captured by the user may specifically analyze the pictures and videos stored in the user's album. Specifically, it can analyze the pictures and videos stored in the user's album, and extract the metadata information in the pictures and videos, such as portraits (that is, the above-mentioned faces) and tags (such as the above-mentioned things classification tags); based on pictures And the metadata information extracted from the video, analyze the characteristics of the user's shooting content to obtain the user portrait data, for example, analyze the user's shooting preferences based on the portrait and tag information extracted from the pictures and videos to obtain the corresponding user portrait data. In some application scenarios, the acquired user portrait data can also be stored.

In an embodiment, similar to the analysis of the video content in the source video in step 101 using the deep learning algorithm, the method for analyzing the characteristics of the content captured by the user can also be, but is not limited to, the deep learning algorithm. Specifically, the analysis of the characteristics of the user's shooting content may include: analyzing the stock pictures and videos in the user's album through the deep learning algorithm, extracting the portraits, tags and other information in the pictures and videos; the collection of the extracted metadata information Sort and sort, and extract the user's shooting preferences. In some application scenarios, it is also possible to update the content weight rule library according to the above preference information; use the content weight rule library to further improve the user's video content selection strategy.

103. According to the metadata information and user portrait data in the source video, extract the video content in the source video to generate a short video.

In some application scenarios, based on the metadata information and user portrait data in the source video, the video content in the source video is extracted to generate a short video. Specifically, it can be combined with the metadata information in the source video and the user portrait data. The optimal strategy extracts the key or essential content from the source video, and intelligently generates selected short videos. The preferred strategy may include a strategy that is obtained based on the aforementioned user preference information and used to screen videos.

Specifically, in combination with the video content dimension classification described in step 101 above, the overall strategy (such as the preferred strategy) for selecting the video highlights in the source video includes: 1) Priority is given to selecting the video segment with the largest total weight of the video content dimension 2) According to the time length requirement of the output video, the weight values of the video clips are sorted, and the video clip that meets the above output time length is selected.

In an embodiment, the foregoing extraction of video content from the source video to generate a short video based on metadata information and user portrait data in the source video may specifically include: using metadata information and user portrait data in the source video The data adjusts the weight of each metadata in the source video; through the weight of each metadata in the source video, a short video is generated from the source video by selecting a segment that meets the preset duration. In other words, it can be specifically: using the metadata information analyzed by the video content of the source video, combined with the user portrait data to adjust the weight of each metadata in the source video, in the scene of the default essence segment duration or user interaction setting duration, through optimization The strategy selects the best segment interval that matches the duration to obtain the aforementioned short video.

Specifically, combining the video content dimensional classification described in step 101 above and the overall strategy (such as a preferred strategy) for selecting video highlights, the selection of video highlights can be specifically selected according to the following steps for multi-dimensional selection: 1 ). Set the weight of the recognition result of each dimension, among which the weight of each dimension can be set and updated through the user portrait data; 2) Scan according to the timeline, and select the interval containing the most dimensions; 3) Select the largest boundary interval The interval of is the video priority interval; 4). If there are multiple intervals of the same dimension when scanning according to the time line in step 2), the weight calculation is performed according to the dimension, and the interval with the largest calculation result is regarded as the video priority interval.

FIG. 2 is a schematic diagram of an embodiment of selecting a video priority interval based on a result of video content analysis provided in an embodiment of the application.

As shown in Figure 2, the source video is analyzed to obtain the corresponding video content analysis results, such as face segment recognition, highlight segment segment recognition, fast slow motion segment recognition, and human voice segment recognition on the source video to obtain corresponding recognition results. Scanning according to the time line, the video priority interval selected according to the above recognition results is shown in FIG. 2, where the "original video" described in FIG. 2 is the above-mentioned source video.

In the embodiment of the present application, the metadata information in the source video is obtained by analyzing the video content of the source video, and the user portrait data obtained by analyzing the characteristics of the content taken by the user, from the above analysis and analysis of the video content itself The combination of the feature analysis of the user's shooting content (that is, the analysis of the user's shooting preferences) can greatly obtain the content that the user cares about in the source video, and then extract the corresponding video clips from the source video to generate a short video. On the one hand, the short video contains the content that the user cares about, and on the other hand, the duration of the short video is shorter than that of the source video. Therefore, browsing and sharing the source video through the short video can not only meet the needs of users, but also greatly Improve user experience.

The short video generation method in the embodiment of the present application may further include the following optional step 104.

104. Perform video rendering effect processing on the short video according to the metadata information of the short video part in the source video.

Use the metadata information of the short video part of the source video to perform video rendering effect processing on the short video. The video rendering effect processing includes but is not limited to: 1), using the portrait interval information to enlarge the face in the video, and/or using filters to filter the face in the video; 2), using the human voice interval information, Add background music on the basis of the original sound of the video; 3) Use the video optical flow information (that is, the fast slow motion interval information) to add the fast slow motion playback effect to the video. It should be noted that the above step 104 can be implemented by, but not limited to, a video playback editor, which does not impose any limitation compared with this application.

In the embodiment of the present application, by performing video rendering effect processing on the short video, the video effect can be enhanced, and a short video with a better user experience effect can be obtained.

The number of source videos as described above can be one or more. In order to deepen the understanding of the short video generation method in the embodiments of the present application, the following describes the embodiments of the present application in combination with the application scenarios of generating short videos from a single video and generating short videos from multiple videos. The details are as follows:

One, single video generates short video

First, use the source video and the metadata information of the selected clips after the video analysis to select each video clip; finally, according to the selected video clips, combine the source video content to analyze the metadata information and use the metadata information to Achieve post-processing effects on video clips, and finally generate selected short videos with enhanced rendering effects.

2. Multiple videos generate short videos

For users who have taken multiple videos in one trip, it is necessary to summarize and generate a single short video for easy browsing and sharing; due to the long total time of multiple videos, the user is provided to interactively select the length of the short video to meet the user's requirements for browsing and sharing time: One: when the user does not set the selected video duration, all selected clips will generate selected short video content by default; the second: when the user sets the short video duration according to different sharing requirements, sort the weights of the selected video clips , Select the segment that meets the total duration.

The foregoing description describes in detail the short video generation method in the embodiment of the present application, and the device for generating short video provided in the embodiment of the present application is described in detail below.

FIG. 3 is a schematic structural diagram of a short video generation device provided by an embodiment of the application.

As shown in FIG. 3, the apparatus 300 for generating short videos in the embodiment of the present application includes: a processing module 301, the processing module 301 is configured to perform the following steps: analyze the video content in the source video to obtain the source video Analyze the characteristics of the content taken by the user to obtain user portrait data; According to the metadata information in the source video and the user portrait data, extract the video content from the source video to generate a short video .

In a possible implementation, the processing module 301 is specifically configured to: analyze the video stream in the source video to extract metadata information in the video frame; analyze the audio stream in the source video to extract The metadata information in the audio frame, and the metadata information of the source video includes: metadata information in the video frame and metadata information in the audio frame.

In a possible implementation manner, the processing module 301 is specifically configured to: analyze the pictures and videos stored in the user's photo album, and extract metadata information in the pictures and videos; according to the metadata in the pictures and videos Information, analyze the characteristics of the content taken by the user to obtain the user portrait data.

In a possible implementation manner, the processing module 301 is specifically configured to: use the metadata information in the source video and the user portrait data to adjust the weight of each metadata in the source video; The weight of each metadata is selected from the source video and a segment that meets the preset duration to generate the short video.

In a possible implementation manner, the metadata information includes at least one of the following: portrait interval information, human voice interval information, background music interval information, object classification label information, video optical flow information, and aesthetic rating information.

In a possible implementation manner, the processing module 301 is further configured to: perform video rendering effect processing on the short video according to the metadata information of the short video part in the source video.

It should be noted that all operations in the short video generation method described in FIG. 1 can be invoked to be executed in the processing module 301 described in FIG. 3, in other words, the processing module 301 described in FIG. All operations in the short video generation method described in FIG. 1 can be performed.

The above-mentioned FIG. 3 introduces a schematic diagram of a structure of the short video generating device, and the following describes another schematic diagram of the structure of the short video generating device in conjunction with FIG. 4.

FIG. 4 is a schematic diagram of another structure of a short video generation device provided by an embodiment of the application.

As shown in FIG. 4, the short video generation device 400 in the embodiment of the present application includes: a video preprocessing module 401, a video content analysis module 402, a user shooting content feature analysis module 403, a video content priority module 404, and a metadata information storage module 405, a video preview module 406, and a video storage module 407.

Among them, the video preprocessing module 401 is used to: preprocess the source video to separate the video stream and the audio stream in the source video, and also separate the duration and frame rate of the source video; the video content analysis module 402 is used to: perform the above step 101 to analyze the video content in the source video, and obtain the operation corresponding to the metadata information in the source video; the user captured content feature analysis module 403 is used to: perform the above step 102 to capture the content of the user The video content priority module 404 is used to: perform the above step 103 according to the metadata information in the source video and the user portrait data to extract the video content from the source video to generate a short The operation corresponding to the video; the metadata information storage module 405 is used to store metadata information and user portrait data in the source video; the video preview module 406 is used to perform the above step 104 according to the metadata of the short video part of the source video Information, performing operations corresponding to the video rendering effect processing on the short video and previewing the short video; the video storage module 407 is used to store the generated short video so that it can be subsequently provided to the user for browsing and sharing. The above-mentioned metadata information storage module 405 and video storage module 407 may be implemented by using the same physical storage medium or different physical storage media, and there is no limitation in the embodiment of the present application.

It should be noted that the information interaction and execution process between the various modules/units of the above-mentioned device are based on the same concept as the method embodiment of this application, and the technical effect brought by it is the same as that of the method embodiment of this application, and the specific content may be Please refer to the description in the method embodiment shown in the foregoing application, which will not be repeated here.

An embodiment of the present application further provides a computer storage medium, wherein the computer storage medium stores a program, and the program executes a part or all of the steps recorded in the foregoing method embodiment.

Next, another short video generation device provided in an embodiment of the present application will be introduced. The device may be a terminal or a chip set in the terminal.

Taking a terminal as an example, another short video generation device in an embodiment of the present application will be described with reference to FIG. 5.

As shown in FIG. 5, the terminal 500 in the embodiment of the present application includes: a receiver 501, a transmitter 502, a processor 503, and a memory 504 (the number of processors 503 in the terminal 500 may be one or more, as shown in FIG. Take a processor as an example). In some embodiments of the present application, the receiver 501, the transmitter 502, the processor 503, and the memory 504 may be connected through a bus or other methods, wherein the connection through a bus is taken as an example in FIG. 5.

The memory 504 may include a read-only memory and a random access memory, and provides instructions and data to the processor 503. A part of the memory 504 may also include a non-volatile random access memory (NVRAM). The memory 504 stores an operating system and operating instructions, executable modules or data structures, or a subset of them, or an extended set of them, where the operating instructions may include various operating instructions for implementing various operations. The operating system may include various system programs for implementing various basic services and processing hardware-based tasks.

The processor 503 controls the operation of the terminal, and the processor 503 may also be referred to as a central processing unit (CPU). In a specific application, the various components of the terminal are coupled together through a bus system. In addition to the data bus, the bus system may also include a power bus, a control bus, and a status signal bus. However, for clear description, various buses are referred to as bus systems in the figure.

The method disclosed in the foregoing embodiments of the present application may be applied to the processor 503 or implemented by the processor 503. The processor 503 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by hardware integrated logic circuits in the processor 503 or instructions in the form of software. The aforementioned processor 503 may be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or Other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory 504, and the processor 503 reads the information in the memory 504, and completes the steps of the foregoing method in combination with its hardware.

The receiver 501 can be used to receive input digital or character information, and to generate signal input related to terminal settings and function control. The transmitter 502 can include display devices such as a display screen. The transmitter 502 can be used to output digital or character information through an external interface. Character information.

In the embodiment of the present application, the processor 503 may specifically be the processing module 301 in FIG. 3, and is configured to perform all operations in the method embodiment described in FIG. 1.

In another possible design, the short video generation device is a chip, the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin Or circuits, etc. The processing unit can execute the computer-executable instructions stored in the storage unit, so that the chip in the terminal executes the wireless communication method of any one of the foregoing first aspect. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit in the terminal located outside the chip, such as a read-only memory (read-only memory). -only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.

Wherein, the processor mentioned in any of the above can be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the method in the first aspect.

In addition, it should be noted that the device embodiments described above are merely illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate The physical unit can be located in one place or distributed across multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solutions of the embodiments. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that they have a communication connection between them, which can be specifically implemented as one or more communication buses or signal lines.

Through the description of the above embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memory, Dedicated components and so on to achieve. Under normal circumstances, all functions completed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structure used to achieve the same function can also be diverse, such as analog circuits, digital circuits or dedicated Circuit etc. However, for this application, software program implementation is a better implementation in more cases. Based on this understanding, the technical solution of this application essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, server, or network device, etc.) execute the methods described in each embodiment of this application .

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part.

The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website site, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Claims

A method for generating short video, which is characterized in that it includes:

Analyzing the video content in the source video to obtain metadata information in the source video;

Analyze the characteristics of the user's shooting content and obtain user portrait data;

According to the metadata information in the source video and the user portrait data, the video content in the source video is extracted to generate a short video.
The method according to claim 1, wherein the analyzing the video content in the source video to obtain the metadata information of the source video comprises:

Analyzing the video stream in the source video, and extracting metadata information in the video frame;

Analyze the audio stream in the source video to extract metadata information in the audio frame. The metadata information of the source video includes: metadata information in the video frame and metadata information in the audio frame .
The method according to claim 1 or 2, wherein the analyzing the characteristics of the content taken by the user to obtain user portrait data includes:

Analyze the pictures and videos stored in the user's photo album, and extract metadata information in the pictures and videos;

According to the metadata information in the pictures and videos, the characteristics of the content taken by the user are analyzed to obtain the user portrait data.
The method according to any one of claims 1 to 3, wherein the video content in the source video is extracted to generate a short message according to the metadata information in the source video and the user portrait data. Video, including:

Adjusting the weight of each metadata in the source video by using the metadata information in the source video and the user portrait data;

According to the weight of each metadata in the source video, the short video is generated by selecting a segment interval conforming to a preset duration from the source video.
The method according to any one of claims 1 to 4, wherein the metadata information includes at least one of the following: portrait interval information, vocal interval information, background music interval information, thing classification label information, video Optical flow information and aesthetic rating information.
The method according to any one of claims 1 to 5, wherein the method further comprises: performing a video rendering effect on the short video according to the metadata information of the short video part in the source video deal with.
A short video generation device, characterized in that it comprises:

The processing module is used to: analyze the video content in the source video to obtain the metadata information in the source video; analyze the characteristics of the content shot by the user to obtain user portrait data; according to the metadata in the source video Information and the user portrait data, and extract the video content from the source video to generate a short video.
The device according to claim 7, wherein the processing module is specifically configured to:

Analyzing the video stream in the source video, and extracting metadata information in the video frame;

Analyze the audio stream in the source video to extract metadata information in the audio frame. The metadata information of the source video includes: metadata information in the video frame and metadata information in the audio frame .
The device according to claim 7 or 8, wherein the processing module is specifically configured to:

Analyze the pictures and videos stored in the user's photo album, and extract metadata information in the pictures and videos;

According to the metadata information in the pictures and videos, the characteristics of the content taken by the user are analyzed to obtain the user portrait data.
The device according to any one of claims 7 to 9, wherein the processing module is specifically configured to:

Adjusting the weight of each metadata in the source video by using the metadata information in the source video and the user portrait data;

According to the weight of each metadata in the source video, the short video is generated by selecting a segment interval conforming to a preset duration from the source video.
The device according to any one of claims 7 to 10, wherein the metadata information includes at least one of the following: portrait interval information, vocal interval information, background music interval information, thing classification label information, video Optical flow information and aesthetic rating information.
According to the device according to any one of claims 7 to 11, the processing module is further configured to:

Perform video rendering effect processing on the short video according to the metadata information of the short video part in the source video.
A short video generation device, characterized in that it comprises:

A processing unit and a storage unit, where the storage unit is used to store computer operation instructions;

The processing unit is configured to execute the short video generation method according to any one of claims 1 to 6 by calling the computer operation instructions.