WO2020259449A1 - Method and device for generating short video - Google Patents

Method and device for generating short video Download PDF

Info

Publication number
WO2020259449A1
WO2020259449A1 PCT/CN2020/097520 CN2020097520W WO2020259449A1 WO 2020259449 A1 WO2020259449 A1 WO 2020259449A1 CN 2020097520 W CN2020097520 W CN 2020097520W WO 2020259449 A1 WO2020259449 A1 WO 2020259449A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
information
source video
metadata information
user
Prior art date
Application number
PCT/CN2020/097520
Other languages
French (fr)
Chinese (zh)
Inventor
李汤锁
吴珮华
陈绍君
汪新建
周胜丰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020259449A1 publication Critical patent/WO2020259449A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44012Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Definitions

  • the embodiments of the present application relate to the field of video processing technology, and in particular to a method and device for generating short videos.
  • the embodiments of the present application provide a short video generation method and device, which are used to generate short videos corresponding to video clips that users care about in browsing and sharing videos, and reduce the number of users in the process of browsing and sharing videos. Time-consuming, meeting user needs and improving user experience.
  • the specific technical solutions are as follows:
  • an embodiment of the present application provides a short video generation method, including: analyzing the video content in the source video to obtain metadata information in the source video; analyzing the characteristics of the content captured by the user, Obtain user portrait data; according to the metadata information in the source video and the user portrait data, extract the video content from the source video to generate a short video.
  • user portrait refers to the understanding of the content of the pictures and videos taken by the user to learn the type of content (people, scenery, food, party, etc.), preferences (more specific characters, composition methods, etc.) and habits of the user’s shooting content .
  • the metadata information in the source video is obtained by analyzing the video content of the source video itself, and the user portrait data obtained by analyzing the characteristics of the content taken by the user is easy to understand.
  • the content that the user cares about in the source video can be greatly obtained, and then from the source video Extract the corresponding video clips to generate short videos.
  • the short video contains the content that the user cares about, and on the other hand, the duration of the short video is shorter than that of the source video. Therefore, browsing and sharing the source video through the short video can not only meet the needs of users, but also greatly Improve user experience.
  • the source video may be one or more videos.
  • the aforementioned metadata information includes but is not limited to at least one of the following: portrait section information, human voice section information, object classification label information, video optical flow information, and aesthetic rating information.
  • the video content in the source video can be comprehensively analyzed from multiple dimensions, thereby increasing the probability of obtaining the content that the user cares about, better meeting user needs, and improving user experience.
  • the portrait interval information includes, but is not limited to, the face interval information.
  • the foregoing analysis of the video content in the source video to obtain metadata information of the source video may specifically include: analyzing the video stream in the source video , Extract the metadata information in the video frame; analyze the audio stream in the source video to extract the metadata information in the audio frame, the metadata information of the source video includes: the metadata information in the video frame And metadata information in the audio frame.
  • the metadata information in the video frame may specifically include but is not limited to at least one of the following: portrait interval information, object classification label information, video optical flow information, and aesthetic rating information.
  • the metadata information in the audio frame may specifically include, but is not limited to, at least one of the following: vocal interval information and background music interval information.
  • the foregoing method for analyzing video content in the source video includes, but is not limited to, a deep learning algorithm.
  • the source video is analyzed based on the audio stream and the video stream to improve the analysis effect of the source video, obtain more accurate metadata information, better meet user needs and improve user experience .
  • the analysis dimensions of the video content of the source video in this implementation can include, but are not limited to: video stream and audio stream, and can also include the following: video theme and/or video style, etc. This application does not make any restrictions.
  • the above-mentioned analyzing the characteristics of the content taken by the user to obtain user portrait data may specifically include: analyzing the pictures and videos stored in the user's album, extracting the pictures and The metadata information in the video; according to the metadata information in the picture and the video, the characteristics of the content taken by the user are analyzed to obtain the user portrait data.
  • the user portrait data may include but is not limited to: preference information corresponding to the person and/or object that the user prefers when shooting.
  • the foregoing method for analyzing the characteristics of the user's shooting content includes, but is not limited to, a deep learning algorithm.
  • the user portrait data can be obtained more accurately, and the user's shooting preferences can be accurately analyzed, so as to better satisfy User needs and improving user experience.
  • the foregoing extraction of video content from the source video to generate a short video based on the metadata information in the source video and the user portrait data may specifically include: Use the metadata information in the source video and the user portrait data to adjust the weight of each metadata in the source video; according to the weight of each metadata in the source video, select from the source video in accordance with the preset The short video is generated in a segment section of a duration.
  • the foregoing extraction of video content from the source video to generate a short video based on the metadata information in the source video and the user portrait data may specifically be: metadata information analyzed by the video content of the source video , Adjust the weight of each metadata in the source video in combination with the user portrait data, and select the short video that meets the duration through the optimization strategy, where the optimization strategy is a strategy that is obtained based on the user's shooting preferences obtained from the user portrait data and used to filter videos.
  • the above scheme can be specifically used in scenarios where the duration of the essence segment is defaulted or the duration is set by user interaction.
  • the foregoing short video generation method may further include: performing video rendering effect processing on the short video according to metadata information of the short video part in the source video .
  • performing video rendering effect processing on short videos the video effects can be enhanced, and short videos with better user experience can be obtained.
  • an embodiment of the present application provides a short video generation device.
  • the video generation device may include an entity such as a terminal device or a chip.
  • the video generation device includes a processor and a memory; the memory is used to store instructions;
  • the processor is configured to execute the instructions in the memory, so that the video generation device executes the method described in the foregoing first aspect.
  • an embodiment of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the method described in the first aspect.
  • the embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method described in the first aspect.
  • FIG. 1 is a flowchart of an embodiment of a short video generation method provided by an embodiment of the application
  • FIG. 2 is a schematic diagram of an embodiment of selecting a video priority interval based on a result of video content analysis provided in an embodiment of the application;
  • FIG. 3 is a schematic structural diagram of a short video generating device provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of another structure of a short video generation device provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of another structure of the short video generation device provided by an embodiment of the application.
  • the embodiments of the application provide a short video generation method and device, which are used to generate short videos corresponding to video clips that users care about in browsing and sharing videos, reducing the time consumption of users in the process of browsing and sharing videos, and meeting user needs, Improve user experience.
  • Fig. 1 is a flowchart of an embodiment of a short video generation method provided in an embodiment of the application.
  • the short video generation method in the embodiment of the present application includes:
  • the metadata information may specifically include at least one of the following: portrait section information, human voice section information, object classification label information, video optical flow information, aesthetic rating information, and the like.
  • the portrait interval information includes, but is not limited to, the face interval information.
  • the source video can be one or more videos.
  • the specific operation of analyzing the video content in the source video may include, but is not limited to: analyzing the video stream in the source video to extract metadata information in the video frame;
  • the audio stream in the video is analyzed, and the metadata information in the audio frame is extracted.
  • the metadata information of the source video includes: the metadata information in the video frame and the metadata information in the audio frame.
  • the metadata information in the video frame may include at least one of the following: portrait section information, object classification label information, video optical flow information, and aesthetic rating information;
  • the metadata information in the audio frame may specifically include but is not limited to At least one of the following: vocal interval information and background music interval information, etc.
  • metadata information obtained by analysis in the source video can also be stored.
  • the foregoing intelligent analysis of the video content in the source video may be specifically implemented by a deep learning algorithm.
  • the analysis method of the video stream can be: extract video frames from the source video, and perform deep learning algorithms on the extracted video frames through face detection, face clustering, object detection, aesthetic scoring, optical flow analysis, etc. Analyze to get the recognition result, organize and merge the above recognition results to obtain the metadata information in the video stream, such as face interval information, object classification label information, video optical flow information, and aesthetic rating information, among which video optical flow information can also be It is called the fast slow motion interval information;
  • the analysis method of the audio stream can be: extract the metadata information in the audio stream through audio processing algorithms such as natural language processing (NLP) algorithm, such as vocal interval information and background music interval Information etc.
  • NLP natural language processing
  • the source video needs to be preprocessed to separate the video stream and audio stream in the source video, and the duration and frame rate of the source video can also be separated. All are separated, and there is no restriction on this application.
  • the number of source videos in the embodiment of the present application may be one or more, which does not impose any limitation on the embodiment of the present application.
  • the above-mentioned analysis dimensions of the video content of the source video may include, but are not limited to: video stream and audio stream, and may also include the following aspects: video theme and/or video style, etc. There are no restrictions on application.
  • the specific video content dimension classification can include: 1) Video theme information, such as birthday, party, graduation, night tour, sports, tourism, parent-child, performance, etc.; 2) Video style information, such as joy, nostalgia, brisk, playful Etc.; 3), video stream information, such as the aforementioned portrait interval information, object classification label information, video optical flow information and aesthetic rating information, etc.; 4), audio stream information, such as the aforementioned human voice interval information and background music interval information
  • Video theme information such as birthday, party, graduation, night tour, sports, tourism, parent-child, performance, etc.
  • Video style information such as joy, nostalgia, brisk, playful Etc.
  • video stream information such as the aforementioned portrait interval information, object classification label information, video optical flow information and aesthetic rating information, etc.
  • audio stream information such as the aforementioned human voice interval information and background music interval information
  • the metadata information described in this application may also include: the above-mentioned video theme information and video style information.
  • Analyze the characteristics of the user's shooting content (such as intelligent analysis) to obtain user portrait data, where the user portrait refers to the understanding of the content of the pictures and videos taken by the user, and learn the type of the user's shooting content (person, landscape, food, party, etc. ), preferences (more specific characters, composition methods, etc.), and habits. For example, if a specific portrait A has the most pictures, it means that portrait A is the person most concerned about by the user. Similarly, when a specific object B has the most pictures, it means object B It is the most concerned object of users.
  • analyzing the characteristics of the content captured by the user may specifically analyze the pictures and videos stored in the user's album. Specifically, it can analyze the pictures and videos stored in the user's album, and extract the metadata information in the pictures and videos, such as portraits (that is, the above-mentioned faces) and tags (such as the above-mentioned things classification tags); based on pictures
  • the metadata information extracted from the video analyze the characteristics of the user's shooting content to obtain the user portrait data, for example, analyze the user's shooting preferences based on the portrait and tag information extracted from the pictures and videos to obtain the corresponding user portrait data.
  • the acquired user portrait data can also be stored.
  • the method for analyzing the characteristics of the content captured by the user can also be, but is not limited to, the deep learning algorithm.
  • the analysis of the characteristics of the user's shooting content may include: analyzing the stock pictures and videos in the user's album through the deep learning algorithm, extracting the portraits, tags and other information in the pictures and videos; the collection of the extracted metadata information Sort and sort, and extract the user's shooting preferences.
  • the video content in the source video is extracted to generate a short video. Specifically, it can be combined with the metadata information in the source video and the user portrait data.
  • the optimal strategy extracts the key or essential content from the source video, and intelligently generates selected short videos.
  • the preferred strategy may include a strategy that is obtained based on the aforementioned user preference information and used to screen videos.
  • the overall strategy (such as the preferred strategy) for selecting the video highlights in the source video includes: 1) Priority is given to selecting the video segment with the largest total weight of the video content dimension 2) According to the time length requirement of the output video, the weight values of the video clips are sorted, and the video clip that meets the above output time length is selected.
  • the foregoing extraction of video content from the source video to generate a short video based on metadata information and user portrait data in the source video may specifically include: using metadata information and user portrait data in the source video The data adjusts the weight of each metadata in the source video; through the weight of each metadata in the source video, a short video is generated from the source video by selecting a segment that meets the preset duration.
  • it can be specifically: using the metadata information analyzed by the video content of the source video, combined with the user portrait data to adjust the weight of each metadata in the source video, in the scene of the default essence segment duration or user interaction setting duration, through optimization The strategy selects the best segment interval that matches the duration to obtain the aforementioned short video.
  • the selection of video highlights can be specifically selected according to the following steps for multi-dimensional selection: 1 ).
  • FIG. 2 is a schematic diagram of an embodiment of selecting a video priority interval based on a result of video content analysis provided in an embodiment of the application.
  • the source video is analyzed to obtain the corresponding video content analysis results, such as face segment recognition, highlight segment segment recognition, fast slow motion segment recognition, and human voice segment recognition on the source video to obtain corresponding recognition results.
  • the video priority interval selected according to the above recognition results is shown in FIG. 2, where the "original video" described in FIG. 2 is the above-mentioned source video.
  • the metadata information in the source video is obtained by analyzing the video content of the source video, and the user portrait data obtained by analyzing the characteristics of the content taken by the user, from the above analysis and analysis of the video content itself
  • the combination of the feature analysis of the user's shooting content can greatly obtain the content that the user cares about in the source video, and then extract the corresponding video clips from the source video to generate a short video.
  • the short video contains the content that the user cares about, and on the other hand, the duration of the short video is shorter than that of the source video. Therefore, browsing and sharing the source video through the short video can not only meet the needs of users, but also greatly Improve user experience.
  • the short video generation method in the embodiment of the present application may further include the following optional step 104.
  • the video rendering effect processing includes but is not limited to: 1), using the portrait interval information to enlarge the face in the video, and/or using filters to filter the face in the video; 2), using the human voice interval information, Add background music on the basis of the original sound of the video; 3) Use the video optical flow information (that is, the fast slow motion interval information) to add the fast slow motion playback effect to the video.
  • the above step 104 can be implemented by, but not limited to, a video playback editor, which does not impose any limitation compared with this application.
  • the video effect can be enhanced, and a short video with a better user experience effect can be obtained.
  • the number of source videos as described above can be one or more.
  • the following describes the embodiments of the present application in combination with the application scenarios of generating short videos from a single video and generating short videos from multiple videos. The details are as follows:
  • One, single video generates short video
  • the user is provided to interactively select the length of the short video to meet the user's requirements for browsing and sharing time: One: when the user does not set the selected video duration, all selected clips will generate selected short video content by default; the second: when the user sets the short video duration according to different sharing requirements, sort the weights of the selected video clips , Select the segment that meets the total duration.
  • FIG. 3 is a schematic structural diagram of a short video generation device provided by an embodiment of the application.
  • the apparatus 300 for generating short videos in the embodiment of the present application includes: a processing module 301, the processing module 301 is configured to perform the following steps: analyze the video content in the source video to obtain the source video Analyze the characteristics of the content taken by the user to obtain user portrait data; According to the metadata information in the source video and the user portrait data, extract the video content from the source video to generate a short video .
  • the processing module 301 is specifically configured to: analyze the video stream in the source video to extract metadata information in the video frame; analyze the audio stream in the source video to extract The metadata information in the audio frame, and the metadata information of the source video includes: metadata information in the video frame and metadata information in the audio frame.
  • the processing module 301 is specifically configured to: analyze the pictures and videos stored in the user's photo album, and extract metadata information in the pictures and videos; according to the metadata in the pictures and videos Information, analyze the characteristics of the content taken by the user to obtain the user portrait data.
  • the processing module 301 is specifically configured to: use the metadata information in the source video and the user portrait data to adjust the weight of each metadata in the source video; The weight of each metadata is selected from the source video and a segment that meets the preset duration to generate the short video.
  • the metadata information includes at least one of the following: portrait interval information, human voice interval information, background music interval information, object classification label information, video optical flow information, and aesthetic rating information.
  • the processing module 301 is further configured to: perform video rendering effect processing on the short video according to the metadata information of the short video part in the source video.
  • FIG. 3 introduces a schematic diagram of a structure of the short video generating device, and the following describes another schematic diagram of the structure of the short video generating device in conjunction with FIG. 4.
  • FIG. 4 is a schematic diagram of another structure of a short video generation device provided by an embodiment of the application.
  • the short video generation device 400 in the embodiment of the present application includes: a video preprocessing module 401, a video content analysis module 402, a user shooting content feature analysis module 403, a video content priority module 404, and a metadata information storage module 405, a video preview module 406, and a video storage module 407.
  • the video preprocessing module 401 is used to: preprocess the source video to separate the video stream and the audio stream in the source video, and also separate the duration and frame rate of the source video;
  • the video content analysis module 402 is used to: perform the above step 101 to analyze the video content in the source video, and obtain the operation corresponding to the metadata information in the source video;
  • the user captured content feature analysis module 403 is used to: perform the above step 102 to capture the content of the user
  • the video content priority module 404 is used to: perform the above step 103 according to the metadata information in the source video and the user portrait data to extract the video content from the source video to generate a short The operation corresponding to the video;
  • the metadata information storage module 405 is used to store metadata information and user portrait data in the source video;
  • the video preview module 406 is used to perform the above step 104 according to the metadata of the short video part of the source video Information, performing operations corresponding to the video rendering effect processing on the short video and previewing the short video;
  • An embodiment of the present application further provides a computer storage medium, wherein the computer storage medium stores a program, and the program executes a part or all of the steps recorded in the foregoing method embodiment.
  • the device may be a terminal or a chip set in the terminal.
  • the terminal 500 in the embodiment of the present application includes: a receiver 501, a transmitter 502, a processor 503, and a memory 504 (the number of processors 503 in the terminal 500 may be one or more, as shown in FIG. Take a processor as an example).
  • the receiver 501, the transmitter 502, the processor 503, and the memory 504 may be connected through a bus or other methods, wherein the connection through a bus is taken as an example in FIG. 5.
  • the memory 504 may include a read-only memory and a random access memory, and provides instructions and data to the processor 503. A part of the memory 504 may also include a non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 504 stores an operating system and operating instructions, executable modules or data structures, or a subset of them, or an extended set of them, where the operating instructions may include various operating instructions for implementing various operations.
  • the operating system may include various system programs for implementing various basic services and processing hardware-based tasks.
  • the processor 503 controls the operation of the terminal, and the processor 503 may also be referred to as a central processing unit (CPU).
  • the various components of the terminal are coupled together through a bus system.
  • the bus system may also include a power bus, a control bus, and a status signal bus.
  • various buses are referred to as bus systems in the figure.
  • the method disclosed in the foregoing embodiments of the present application may be applied to the processor 503 or implemented by the processor 503.
  • the processor 503 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by hardware integrated logic circuits in the processor 503 or instructions in the form of software.
  • the aforementioned processor 503 may be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or Other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • Other programmable logic devices discrete gates or transistor logic devices, discrete hardware components.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 504, and the processor 503 reads the information in the memory 504, and completes the steps of the foregoing method in combination with its hardware.
  • the receiver 501 can be used to receive input digital or character information, and to generate signal input related to terminal settings and function control.
  • the transmitter 502 can include display devices such as a display screen.
  • the transmitter 502 can be used to output digital or character information through an external interface. Character information.
  • the processor 503 may specifically be the processing module 301 in FIG. 3, and is configured to perform all operations in the method embodiment described in FIG. 1.
  • the short video generation device is a chip
  • the chip includes: a processing unit and a communication unit
  • the processing unit may be, for example, a processor
  • the communication unit may be, for example, an input/output interface, a pin Or circuits, etc.
  • the processing unit can execute the computer-executable instructions stored in the storage unit, so that the chip in the terminal executes the wireless communication method of any one of the foregoing first aspect.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit in the terminal located outside the chip, such as a read-only memory (read-only memory).
  • ROM read-only memory
  • RAM random access memory
  • the processor mentioned in any of the above can be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the method in the first aspect.
  • the device embodiments described above are merely illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate
  • the physical unit can be located in one place or distributed across multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the connection relationship between the modules indicates that they have a communication connection between them, which can be specifically implemented as one or more communication buses or signal lines.
  • this application can be implemented by means of software plus necessary general hardware.
  • it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memory, Dedicated components and so on to achieve.
  • all functions completed by computer programs can be easily implemented with corresponding hardware.
  • the specific hardware structure used to achieve the same function can also be diverse, such as analog circuits, digital circuits or dedicated Circuit etc.
  • software program implementation is a better implementation in more cases.
  • the technical solution of this application essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, server, or network device, etc.) execute the methods described in each embodiment of this application .
  • a computer device which can be a personal computer, server, or network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website site, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Disclosed are a method and device for generating a short video, for use in generating a short video corresponding to a video clip of concern to a user while browsing and sharing a video, reducing the time that the user spends on browsing and sharing the video, satisfying a user demand, and enhancing user experience. The method for generating a short video comprises: analyzing the video content in a source video, acquiring metadata information of the source video; analyzing the features of a user photographed content, acquiring user image data; extracting the video content in the source video on the basis of the metadata information of the source video and of the user image data to generate a short video.

Description

一种短视频的生成方法及装置Method and device for generating short video
本申请要求在2019年6月24日提交中国国家知识产权局、申请号为201910549540.8的中国专利申请的优先权,发明名称为“一种短视频的生成方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office of China with application number 201910549540.8 on June 24, 2019, and the priority of the Chinese patent application with the title of "A Short Video Generation Method and Device" , Its entire content is incorporated in this application by reference.
技术领域Technical field
本申请实施例涉及视频处理技术领域,尤其涉及一种短视频的生成方法及装置。The embodiments of the present application relate to the field of video processing technology, and in particular to a method and device for generating short videos.
背景技术Background technique
随着各类移动终端的普及和移动社交媒体的发展,通过移动终端中携带的相机进行拍摄、浏览和分享视频已经是终端用户在使用移动终端过程中较为频繁的活动之一。With the popularization of various mobile terminals and the development of mobile social media, shooting, browsing, and sharing videos through the camera carried in the mobile terminal has become one of the more frequent activities of terminal users in the process of using the mobile terminal.
通常用户使用的移动终端中会同时存储有大量的图片和视频。在浏览视频的过程中,用户真正关注的只是整个视频中的一个或多个视频片段,而视频中其他部分的内容均不是用户所关注的。用户需要浏览整个视频才能浏览到用户真正关注的视频片段,整个视频浏览过程中需要消耗大量的时间和精力。同样,视频分享过程也需要基于视频浏览的基础上进行分享,也需要消耗大量的时间和精力,从而无论是视频浏览或视频分享均会极大的影响用户体验。Generally, a large number of pictures and videos are stored in the mobile terminal used by the user at the same time. In the process of browsing the video, what the user really pays attention to is only one or more video clips in the whole video, and the content of other parts of the video is not what the user pays attention to. The user needs to browse the entire video to browse the video clips that the user really cares about, and the entire video browsing process requires a lot of time and energy. Similarly, the video sharing process also needs to be shared on the basis of video browsing, which also consumes a lot of time and energy, so whether it is video browsing or video sharing, it will greatly affect user experience.
发明内容Summary of the invention
为了解决上述存在的技术问题,本申请实施例提供了一种短视频的生成方法及装置,用于生成浏览和分享视频中用户关心的视频片段对应的短视频,缩减用户浏览和分享视频过程中的耗时,满足用户需求,提升用户体验。具体技术方案如下:In order to solve the above-mentioned technical problems, the embodiments of the present application provide a short video generation method and device, which are used to generate short videos corresponding to video clips that users care about in browsing and sharing videos, and reduce the number of users in the process of browsing and sharing videos. Time-consuming, meeting user needs and improving user experience. The specific technical solutions are as follows:
第一方面,本申请实施例提供了一种短视频的生成方法,包括:对源视频中的视频内容进行分析,获取所述源视频中的元数据信息;对用户拍摄内容的特征进行分析,获取用户画像数据;根据所述源视频中的元数据信息和所述用户画像数据,对所述源视频中视频内容进行提取生成短视频。可选的,用户画像指通过用户拍摄的图片、视频内容的理解,学习出用户拍摄内容的类型(人物、风景、美食、聚会等)、偏好(特定的人物较多、构图方式等)和习惯。In the first aspect, an embodiment of the present application provides a short video generation method, including: analyzing the video content in the source video to obtain metadata information in the source video; analyzing the characteristics of the content captured by the user, Obtain user portrait data; according to the metadata information in the source video and the user portrait data, extract the video content from the source video to generate a short video. Optionally, user portrait refers to the understanding of the content of the pictures and videos taken by the user to learn the type of content (people, scenery, food, party, etc.), preferences (more specific characters, composition methods, etc.) and habits of the user’s shooting content .
从上述第一方面的技术方案中可以看出:通过对源视频的视频内容本身进行分析得到源视频中的元数据信息,并结合用户拍摄内容的特征进行分析得到的用户画像数据,容易理解,从上述对视频内容本身的分析和对用户拍摄内容的特征分析(即是对用户拍摄偏好的分析)两者相结合,可以极大的获取源视频中用户所关心的内容,进而从源视频中提取出相应的视频片段以生成短视频。该短视频一方面包含用户所关心的内容,另一方面该短视频的时长短于源视频,因此,通过该短视频实现对源视频的浏览以及分享,不仅可以满足用户需求,还可以极大地提升用户体验。可选的,所述源视频可以是一个或多个视频。It can be seen from the above-mentioned technical solution of the first aspect that the metadata information in the source video is obtained by analyzing the video content of the source video itself, and the user portrait data obtained by analyzing the characteristics of the content taken by the user is easy to understand. From the above analysis of the video content itself and the characteristic analysis of the user's shooting content (that is, the analysis of the user's shooting preferences), the content that the user cares about in the source video can be greatly obtained, and then from the source video Extract the corresponding video clips to generate short videos. On the one hand, the short video contains the content that the user cares about, and on the other hand, the duration of the short video is shorter than that of the source video. Therefore, browsing and sharing the source video through the short video can not only meet the needs of users, but also greatly Improve user experience. Optionally, the source video may be one or more videos.
在第一方面的一种可能的实现方式中,上述的元数据信息包括但不限于以下至少一项:人像区间信息、人声区间信息、事物分类标签信息、视频光流信息和美学评分信息。通过该种实现方式,可以从多个维度对源视频中的视频内容进行全面分析,从而提高获取到用户所关心内容的概率,更好地满足用户需求以及提升用户体验效果。其中人像区间信息包括但不限于人脸区间信息。In a possible implementation of the first aspect, the aforementioned metadata information includes but is not limited to at least one of the following: portrait section information, human voice section information, object classification label information, video optical flow information, and aesthetic rating information. Through this implementation method, the video content in the source video can be comprehensively analyzed from multiple dimensions, thereby increasing the probability of obtaining the content that the user cares about, better meeting user needs, and improving user experience. The portrait interval information includes, but is not limited to, the face interval information.
在第一方面的一种可能的实现方式中,上述的对源视频中的视频内容进行分析,获取所述源视频的元数据信息,具体可以包括:对所述源视频中的视频流进行分析,提取视频帧中 的元数据信息;对所述源视频中的音频流进行分析,提取音频帧中的元数据信息,所述源视频的元数据信息包括:所述视频帧中的元数据信息和所述音频帧中的元数据信息。可选的,所述视频帧中的元数据信息具体可以包括但不限于以下至少一项:人像区间信息、事物分类标签信息、视频光流信息和美学评分信息。可选的,所述音频帧中的元数据信息具体可以包括但不限于以下至少一项:人声区间信息和背景音乐区间信息。可选的,上述对源视频中视频内容的分析方法包括但不限于:深度学习算法。通过该种实现方式,基于音频流和视频流两个方面对源视频进行分析,以提升对源视频的分析效果,获得更为准确的元数据信息,更好地满足用户需求以及提升用户体验效果。需要说明的是,该种实现方式中对源视频的视频内容的分析维度可以包括但不限于:视频流和音频流两个方面,还可以包括如下方面:视频主题和/或视频风格等,对此本申请不做任何限制。In a possible implementation of the first aspect, the foregoing analysis of the video content in the source video to obtain metadata information of the source video may specifically include: analyzing the video stream in the source video , Extract the metadata information in the video frame; analyze the audio stream in the source video to extract the metadata information in the audio frame, the metadata information of the source video includes: the metadata information in the video frame And metadata information in the audio frame. Optionally, the metadata information in the video frame may specifically include but is not limited to at least one of the following: portrait interval information, object classification label information, video optical flow information, and aesthetic rating information. Optionally, the metadata information in the audio frame may specifically include, but is not limited to, at least one of the following: vocal interval information and background music interval information. Optionally, the foregoing method for analyzing video content in the source video includes, but is not limited to, a deep learning algorithm. Through this implementation method, the source video is analyzed based on the audio stream and the video stream to improve the analysis effect of the source video, obtain more accurate metadata information, better meet user needs and improve user experience . It should be noted that the analysis dimensions of the video content of the source video in this implementation can include, but are not limited to: video stream and audio stream, and can also include the following: video theme and/or video style, etc. This application does not make any restrictions.
在第一方面的一种可能的实现方式中,上述的对用户拍摄内容的特征进行分析,获取用户画像数据,具体可以包括:对用户相册中存储的图片和视频进行分析,提取所述图片和视频中的元数据信息;根据所述图片和视频中的元数据信息,分析用户拍摄内容的特征,以获取所述用户画像数据。可选的,用户画像数据可以包括但不限于:用户拍摄时所偏好的人和/或物对应的偏好信息。可选的,上述对用户拍摄内容的特征的分析方法包括但不限于:深度学习算法。在该种实现方式中,通过对用户相册中存储的图片和视频进行大数据分析,提取其中的元数据信息,可以更加准确地获取用户画像数据,准确分析用户的拍摄偏好,从而更好地满足用户需求以及提升用户体验效果。In a possible implementation of the first aspect, the above-mentioned analyzing the characteristics of the content taken by the user to obtain user portrait data may specifically include: analyzing the pictures and videos stored in the user's album, extracting the pictures and The metadata information in the video; according to the metadata information in the picture and the video, the characteristics of the content taken by the user are analyzed to obtain the user portrait data. Optionally, the user portrait data may include but is not limited to: preference information corresponding to the person and/or object that the user prefers when shooting. Optionally, the foregoing method for analyzing the characteristics of the user's shooting content includes, but is not limited to, a deep learning algorithm. In this implementation method, by analyzing the big data of the pictures and videos stored in the user's album, and extracting the metadata information, the user portrait data can be obtained more accurately, and the user's shooting preferences can be accurately analyzed, so as to better satisfy User needs and improving user experience.
在第一方面的一种可能的实现方式中,上述的根据所述源视频中的元数据信息和所述用户画像数据,对所述源视频中视频内容进行提取生成短视频,具体可以包括:利用所述源视频中的元数据信息和所述用户画像数据调整所述源视频中各元数据的权重;通过所述源视频中各元数据的权重,从所述源视频中选取符合预设时长的片段区间生成所述短视频。换言之,上述的根据所述源视频中的元数据信息和所述用户画像数据,对所述源视频中视频内容进行提取生成短视频具体可以是:利用源视频的视频内容分析出的元数据信息,结合用户画像数据调整源视频中各元数据的权重,通过优选策略选取符合时长的短视频,其中,优选策略是基于用户画像数据得到的用户拍摄偏好得到并用于筛选视频的策略。需要说明的是,上述方案可以具体用于默认精华片段时长或用户交互设定时长的场景下。In a possible implementation of the first aspect, the foregoing extraction of video content from the source video to generate a short video based on the metadata information in the source video and the user portrait data may specifically include: Use the metadata information in the source video and the user portrait data to adjust the weight of each metadata in the source video; according to the weight of each metadata in the source video, select from the source video in accordance with the preset The short video is generated in a segment section of a duration. In other words, the foregoing extraction of video content from the source video to generate a short video based on the metadata information in the source video and the user portrait data may specifically be: metadata information analyzed by the video content of the source video , Adjust the weight of each metadata in the source video in combination with the user portrait data, and select the short video that meets the duration through the optimization strategy, where the optimization strategy is a strategy that is obtained based on the user's shooting preferences obtained from the user portrait data and used to filter videos. It should be noted that the above scheme can be specifically used in scenarios where the duration of the essence segment is defaulted or the duration is set by user interaction.
在该种实现方式中,通过对源视频中各元数据的权重进行调整以及作为短视频的生成依据之一,可以选取出用户较为关心的内容,提升短视频内容选择的准确性。In this implementation manner, by adjusting the weight of each metadata in the source video and as one of the basis for generating the short video, the content that the user cares about can be selected, and the accuracy of selecting the short video content can be improved.
在第一方面的一种可能的实现方式中,上述的短视频的生成方法还可以包括:根据所述源视频中所述短视频部分的元数据信息,对所述短视频进行视频渲染效果处理。通过对短视频进行视频渲染效果处理,可以增强视频效果,获得用户体验效果更好的短视频。In a possible implementation of the first aspect, the foregoing short video generation method may further include: performing video rendering effect processing on the short video according to metadata information of the short video part in the source video . By performing video rendering effect processing on short videos, the video effects can be enhanced, and short videos with better user experience can be obtained.
第二方面,本申请实施例提供一种短视频生成装置,该视频生成装置可以包括终端设备或者芯片等实体,所述视频生成装置包括:处理器、存储器;所述存储器用于存储指令;所述处理器用于执行所述存储器中的所述指令,使得所述视频生成装置执行如前述第一方面所述的方法。In a second aspect, an embodiment of the present application provides a short video generation device. The video generation device may include an entity such as a terminal device or a chip. The video generation device includes a processor and a memory; the memory is used to store instructions; The processor is configured to execute the instructions in the memory, so that the video generation device executes the method described in the foregoing first aspect.
第三方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面所述的方法。In a third aspect, an embodiment of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer, cause the computer to execute the method described in the first aspect.
第四方面,本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面所述的方法。In the fourth aspect, the embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the method described in the first aspect.
附图说明Description of the drawings
图1为本申请实施例提供的短视频的生成方法的一个实施例流程图;FIG. 1 is a flowchart of an embodiment of a short video generation method provided by an embodiment of the application;
图2为本申请实施例中提供的通过视频内容分析结果选择视频优先区间的一个实施例示意图;2 is a schematic diagram of an embodiment of selecting a video priority interval based on a result of video content analysis provided in an embodiment of the application;
图3为本申请实施例提供的短视频生成装置的一个结构示意图;FIG. 3 is a schematic structural diagram of a short video generating device provided by an embodiment of the application;
图4为本申请实施例提供的短视频生成装置的另一种结构示意图;FIG. 4 is a schematic diagram of another structure of a short video generation device provided by an embodiment of the application;
图5为本申请实施例提供的短视频生成装置的又一种结构示意图。FIG. 5 is a schematic diagram of another structure of the short video generation device provided by an embodiment of the application.
具体实施方式Detailed ways
本申请实施例提供了一种短视频的生成方法及装置,用于生成浏览和分享视频中用户关心的视频片段对应的短视频,缩减用户浏览和分享视频过程中的耗时,满足用户需求,提升用户体验。The embodiments of the application provide a short video generation method and device, which are used to generate short videos corresponding to video clips that users care about in browsing and sharing videos, reducing the time consumption of users in the process of browsing and sharing videos, and meeting user needs, Improve user experience.
下面结合附图,对本申请的实施例进行描述。The embodiments of the present application will be described below in conjunction with the drawings.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms "first", "second", etc. in the description and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is merely a way of distinguishing objects with the same attributes in the description of the embodiments of the present application. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusion, so that a process, method, system, product, or device including a series of units is not necessarily limited to those units, but may include Listed or inherent to these processes, methods, products or equipment.
图1为本申请实施例提供的短视频的生成方法的一个实施例流程图。Fig. 1 is a flowchart of an embodiment of a short video generation method provided in an embodiment of the application.
如图1所示,本申请实施例中短视频的生成方法,包括:As shown in Figure 1, the short video generation method in the embodiment of the present application includes:
101、对源视频中的视频内容进行分析,获取源视频中的元数据信息。101. Analyze the video content in the source video to obtain metadata information in the source video.
对源视频的视频内容进行分析(如智能分析),提取源视频中的各类元数据信息。可选的,元数据信息具体可以包括以下至少一项:人像区间信息、人声区间信息、事物分类标签信息、视频光流信息和美学评分信息等。其中人像区间信息包括但不限于人脸区间信息。可选的,源视频可以是一个或多个视频。Analyze the video content of the source video (such as intelligent analysis), and extract various metadata information in the source video. Optionally, the metadata information may specifically include at least one of the following: portrait section information, human voice section information, object classification label information, video optical flow information, aesthetic rating information, and the like. The portrait interval information includes, but is not limited to, the face interval information. Optionally, the source video can be one or more videos.
具体的,在一种实施例方式中,对源视频中的视频内容进行分析的具体操作可以包括但不限于:对源视频中的视频流进行分析,提取视频帧中的元数据信息;对源视频中的音频流进行分析,提取音频帧中的元数据信息,源视频的元数据信息包括:视频帧中的元数据信息和音频帧中的元数据信息。可选的,视频帧中的元数据信息可以包括以下至少一项:人像区间信息、事物分类标签信息、视频光流信息和美学评分信息等;音频帧中的元数据信息具体可以包括但不限于以下至少一项:人声区间信息和背景音乐区间信息等。在某些应用场景下,还可以将源视频中分析获得的元数据信息进行存储。Specifically, in an embodiment, the specific operation of analyzing the video content in the source video may include, but is not limited to: analyzing the video stream in the source video to extract metadata information in the video frame; The audio stream in the video is analyzed, and the metadata information in the audio frame is extracted. The metadata information of the source video includes: the metadata information in the video frame and the metadata information in the audio frame. Optionally, the metadata information in the video frame may include at least one of the following: portrait section information, object classification label information, video optical flow information, and aesthetic rating information; the metadata information in the audio frame may specifically include but is not limited to At least one of the following: vocal interval information and background music interval information, etc. In some application scenarios, metadata information obtained by analysis in the source video can also be stored.
在一种实施例方式中,上述对源视频中的视频内容进行智能分析具体可以通过深度学习算法实现。具体来说,视频流的分析方式可以是:抽取源视频中的视频帧,通过人脸检测、人脸聚类、物体检测、美学评分、光流分析等深度学习算法对抽取后的视频帧进行分析以得到识别结果,对上述识别结果进行整理合并得到视频流中的元数据信息,如人脸区间信息、事物分类标签信息、视频光流信息和美学评分信息等,其中视频光流信息也可以称之为快慢动作区间信息;音频流的分析方式可以是:通过音频处理算法如自然语言处理(natural language processing,NLP)算法提取音频流中的元数据信息,如人声区间信息和背景音乐 区间信息等。In an embodiment, the foregoing intelligent analysis of the video content in the source video may be specifically implemented by a deep learning algorithm. Specifically, the analysis method of the video stream can be: extract video frames from the source video, and perform deep learning algorithms on the extracted video frames through face detection, face clustering, object detection, aesthetic scoring, optical flow analysis, etc. Analyze to get the recognition result, organize and merge the above recognition results to obtain the metadata information in the video stream, such as face interval information, object classification label information, video optical flow information, and aesthetic rating information, among which video optical flow information can also be It is called the fast slow motion interval information; the analysis method of the audio stream can be: extract the metadata information in the audio stream through audio processing algorithms such as natural language processing (NLP) algorithm, such as vocal interval information and background music interval Information etc.
在上述对源视频中的视频流以及音频流进行分析之前,还需要对源视频进行预处理以将源视频中的视频流以及音频流分离出来,同时还可以将源视频的时长以及帧率等均分离出来,对此本申请不做任何限制。还需要说明的是,本申请实施例中源视频的数量可以是一个或多个,对此本申请实施例不做任何限制。还需要说明的是,上述的对源视频的视频内容的分析维度可以包括但不限于:视频流和音频流两个方面,还可以包括如下方面:视频主题和/或视频风格等,对此本申请不做任何限制。Before the above analysis of the video stream and audio stream in the source video, the source video needs to be preprocessed to separate the video stream and audio stream in the source video, and the duration and frame rate of the source video can also be separated. All are separated, and there is no restriction on this application. It should also be noted that the number of source videos in the embodiment of the present application may be one or more, which does not impose any limitation on the embodiment of the present application. It should also be noted that the above-mentioned analysis dimensions of the video content of the source video may include, but are not limited to: video stream and audio stream, and may also include the following aspects: video theme and/or video style, etc. There are no restrictions on application.
具体的视频内容维度分类可以包括:1)、视频主题信息,如生日、聚会、毕业、夜游、运动、旅游、亲子、演出等;2)、视频风格信息,如欢乐、怀旧、轻快、俏皮等;3)、视频流信息,如上述的人像区间信息、事物分类标签信息、视频光流信息和美学评分信息等;4)、音频流信息,如上述的人声区间信息和背景音乐区间信息等;需要说明的是,本申请中所述的元数据信息还可以包括:上述的视频主题信息和视频风格信息等。The specific video content dimension classification can include: 1) Video theme information, such as birthday, party, graduation, night tour, sports, tourism, parent-child, performance, etc.; 2) Video style information, such as joy, nostalgia, brisk, playful Etc.; 3), video stream information, such as the aforementioned portrait interval information, object classification label information, video optical flow information and aesthetic rating information, etc.; 4), audio stream information, such as the aforementioned human voice interval information and background music interval information It should be noted that the metadata information described in this application may also include: the above-mentioned video theme information and video style information.
102、对用户拍摄内容的特征进行分析,获取用户画像数据。102. Analyze the characteristics of the content taken by the user, and obtain user portrait data.
对用户拍摄内容的特征进行分析(如智能分析),获取用户画像数据,其中用户画像指通过用户拍摄的图片、视频内容的理解,学习出用户拍摄内容的类型(人物、风景、美食、聚会等)、偏好(特定的人物较多、构图方式等)和习惯,如,特定人像A的图片最多,则说明人像A是用户最关注的人,同样,特定物件B的图片最多,则说明物件B是用户最关注的物件。Analyze the characteristics of the user's shooting content (such as intelligent analysis) to obtain user portrait data, where the user portrait refers to the understanding of the content of the pictures and videos taken by the user, and learn the type of the user's shooting content (person, landscape, food, party, etc. ), preferences (more specific characters, composition methods, etc.), and habits. For example, if a specific portrait A has the most pictures, it means that portrait A is the person most concerned about by the user. Similarly, when a specific object B has the most pictures, it means object B It is the most concerned object of users.
具体的,在一种实施例方式中,对于用户摄内容的特征进行分析具体可以是对用户相册中存储的图片和视频进行分析。具体可以是,对用户相册中存储的图片和视频进行分析,提取图片和视频中的元数据信息,如人像(即上述的人脸)和标签(如上述的事物分类标签)等信息;基于图片和视频中提取到的元数据信息,分析用户拍摄内容的特征,以获取所述用户画像数据,例如基于图片和视频中提取的人像和标签信息,分析用户的拍摄偏好,以获取相应的用户画像数据。在某些应用场景下,还可以将获取到的用户画像数据进行存储。Specifically, in an embodiment, analyzing the characteristics of the content captured by the user may specifically analyze the pictures and videos stored in the user's album. Specifically, it can analyze the pictures and videos stored in the user's album, and extract the metadata information in the pictures and videos, such as portraits (that is, the above-mentioned faces) and tags (such as the above-mentioned things classification tags); based on pictures And the metadata information extracted from the video, analyze the characteristics of the user's shooting content to obtain the user portrait data, for example, analyze the user's shooting preferences based on the portrait and tag information extracted from the pictures and videos to obtain the corresponding user portrait data. In some application scenarios, the acquired user portrait data can also be stored.
在一种实施例方式中,与上述步骤101中对源视频中的视频内容的分析采用深度学习算法类似,对用户拍摄内容的特征的分析方法也可以采用但不限于深度学习算法。具体来说,对用户拍摄内容的特征的分析可以包括:通过深度学习算法对用户相册中存量图片、视频分析,提取图片、视频中的人像、标签等信息;对提取到的元数据信息的集合分类排序,提取出用户的拍摄偏好的人。在某些应用场景下,还可以根据上述的偏好信息,更新内容权重规则库;利用内容权重规则库,进一步提升用户视频内容优选策略。In an embodiment, similar to the analysis of the video content in the source video in step 101 using the deep learning algorithm, the method for analyzing the characteristics of the content captured by the user can also be, but is not limited to, the deep learning algorithm. Specifically, the analysis of the characteristics of the user's shooting content may include: analyzing the stock pictures and videos in the user's album through the deep learning algorithm, extracting the portraits, tags and other information in the pictures and videos; the collection of the extracted metadata information Sort and sort, and extract the user's shooting preferences. In some application scenarios, it is also possible to update the content weight rule library according to the above preference information; use the content weight rule library to further improve the user's video content selection strategy.
103、根据源视频中的元数据信息和用户画像数据,对源视频中视频内容进行提取生成短视频。103. According to the metadata information and user portrait data in the source video, extract the video content in the source video to generate a short video.
在某些应用场景中,根据源视频中的元数据信息和用户画像数据,对源视频中视频内容进行提取生成短视频,具体可以是:结合源视频中的元数据信息和用户画像数据,按照优选策略提取源视频中的关键或精华内容,智能生成精选的短视频。其中优选策略可以包括根据上述的用户的偏好信息得到并用于筛选视频的策略。In some application scenarios, based on the metadata information and user portrait data in the source video, the video content in the source video is extracted to generate a short video. Specifically, it can be combined with the metadata information in the source video and the user portrait data. The optimal strategy extracts the key or essential content from the source video, and intelligently generates selected short videos. The preferred strategy may include a strategy that is obtained based on the aforementioned user preference information and used to screen videos.
具体来说,结合上述步骤101中所述的视频内容维度分类,源视频中的视频精华片段选取的总体策略(如优选策略)包括:1)、优先选择视频内容维度总权值最大的视频片段;2)、按照输出视频的时长要求,排序视频片段的权重值,选择符合上述输出时长的视频片段。Specifically, in combination with the video content dimension classification described in step 101 above, the overall strategy (such as the preferred strategy) for selecting the video highlights in the source video includes: 1) Priority is given to selecting the video segment with the largest total weight of the video content dimension 2) According to the time length requirement of the output video, the weight values of the video clips are sorted, and the video clip that meets the above output time length is selected.
在一种实施例方式中,上述的根据源视频中的元数据信息和用户画像数据,对源视频中视频内容进行提取生成短视频,具体可以包括:利用源视频中的元数据信息和用户画像数据 调整源视频中各元数据的权重;通过源视频中各元数据的权重,从源视频中选取符合预设时长的片段区间生成短视频。换言之,具体可以是:利用源视频的视频内容分析出的元数据信息,结合用户画像数据调整源视频中各元数据的权重,在默认精华片段时长或用户交互设定时长的场景下,通过优选策略选取符合时长的精华片段区间,以得到上述的短视频。In an embodiment, the foregoing extraction of video content from the source video to generate a short video based on metadata information and user portrait data in the source video may specifically include: using metadata information and user portrait data in the source video The data adjusts the weight of each metadata in the source video; through the weight of each metadata in the source video, a short video is generated from the source video by selecting a segment that meets the preset duration. In other words, it can be specifically: using the metadata information analyzed by the video content of the source video, combined with the user portrait data to adjust the weight of each metadata in the source video, in the scene of the default essence segment duration or user interaction setting duration, through optimization The strategy selects the best segment interval that matches the duration to obtain the aforementioned short video.
具体来说,结合上述步骤101中所述的视频内容维度分类,以及上述的视频精华片段选取的总体策略(如优选策略),视频精华片段的选取具体可以根据以下步骤进行选择多维度选取:1)、设定每个维度识别结果的权重,其中可以通过用户画像数据,设定和更新各个维度的权重;2)、按照时间线扫描,选取包含最多维度的区间;3)、选择边界区间最大的区间为视频优先区间;4)、如果步骤2)中按照时间线扫描,存在多个相同维度的区间,按维度进行权重计算,将计算结果最大的区间作为视频优先区间。Specifically, combining the video content dimensional classification described in step 101 above and the overall strategy (such as a preferred strategy) for selecting video highlights, the selection of video highlights can be specifically selected according to the following steps for multi-dimensional selection: 1 ). Set the weight of the recognition result of each dimension, among which the weight of each dimension can be set and updated through the user portrait data; 2) Scan according to the timeline, and select the interval containing the most dimensions; 3) Select the largest boundary interval The interval of is the video priority interval; 4). If there are multiple intervals of the same dimension when scanning according to the time line in step 2), the weight calculation is performed according to the dimension, and the interval with the largest calculation result is regarded as the video priority interval.
图2为本申请实施例中提供的通过视频内容分析结果选择视频优先区间的一个实施例示意图。FIG. 2 is a schematic diagram of an embodiment of selecting a video priority interval based on a result of video content analysis provided in an embodiment of the application.
如图2所示,对源视频进行分析得到相应的视频内容分析结果,如对源视频进行人脸区间识别,精华片段区间识别、快慢动作区间识别和人声区间识别等得到对应的识别结果。按照时间线扫描,根据上述各个识别结果选取出的视频优先区间如图2所示,其中图2中所述的“原视频”即为上述的源视频。As shown in Figure 2, the source video is analyzed to obtain the corresponding video content analysis results, such as face segment recognition, highlight segment segment recognition, fast slow motion segment recognition, and human voice segment recognition on the source video to obtain corresponding recognition results. Scanning according to the time line, the video priority interval selected according to the above recognition results is shown in FIG. 2, where the "original video" described in FIG. 2 is the above-mentioned source video.
在本申请实施例中,通过对源视频的视频内容本身进行分析得到源视频中的元数据信息,并结合用户拍摄内容的特征进行分析得到的用户画像数据,从上述对视频内容本身的分析和对用户拍摄内容的特征分析(即是对用户拍摄偏好的分析)两者相结合,可以极大的获取源视频中用户所关心的内容,进而从源视频中提取出相应的视频片段以生成短视频。该短视频一方面包含用户所关心的内容,另一方面该短视频的时长短于源视频,因此,通过该短视频实现对源视频的浏览以及分享,不仅可以满足用户需求,还可以极大地提升用户体验。In the embodiment of the present application, the metadata information in the source video is obtained by analyzing the video content of the source video, and the user portrait data obtained by analyzing the characteristics of the content taken by the user, from the above analysis and analysis of the video content itself The combination of the feature analysis of the user's shooting content (that is, the analysis of the user's shooting preferences) can greatly obtain the content that the user cares about in the source video, and then extract the corresponding video clips from the source video to generate a short video. On the one hand, the short video contains the content that the user cares about, and on the other hand, the duration of the short video is shorter than that of the source video. Therefore, browsing and sharing the source video through the short video can not only meet the needs of users, but also greatly Improve user experience.
本申请实施例中的短视频的生成方法还可以包括如下可选步骤104。The short video generation method in the embodiment of the present application may further include the following optional step 104.
104、根据源视频中短视频部分的元数据信息,对短视频进行视频渲染效果处理。104. Perform video rendering effect processing on the short video according to the metadata information of the short video part in the source video.
利用源视频中短视频部分的元数据信息,对短视频进行视频渲染效果处理。其中视频渲染效果处理包括但不限于:1)、利用人像区间信息,放大视频中的人脸,和/或,使用滤镜对视频中的人脸进行过滤;2)、利用人声区间信息,在视频原声的基础上添加背景音乐;3)、利用视频光流信息(即快慢动作区间信息),在视频中增加快慢动作播放效果。需要说明的是,上述步骤104可以通过但不限于视频播放编辑器实现,对比本申请不做任何限制。Use the metadata information of the short video part of the source video to perform video rendering effect processing on the short video. The video rendering effect processing includes but is not limited to: 1), using the portrait interval information to enlarge the face in the video, and/or using filters to filter the face in the video; 2), using the human voice interval information, Add background music on the basis of the original sound of the video; 3) Use the video optical flow information (that is, the fast slow motion interval information) to add the fast slow motion playback effect to the video. It should be noted that the above step 104 can be implemented by, but not limited to, a video playback editor, which does not impose any limitation compared with this application.
本申请实施例中,通过对短视频进行视频渲染效果处理,可以增强视频效果,获得用户体验效果更好的短视频。In the embodiment of the present application, by performing video rendering effect processing on the short video, the video effect can be enhanced, and a short video with a better user experience effect can be obtained.
如上文中所述的源视频的数量可以是一个或者多个。为了加深对本申请实施例中短视频的生成方法的理解,下面结合单视频生成短视频和多视频生成短视频的应用场景对本申请实施例进行说明,具体如下:The number of source videos as described above can be one or more. In order to deepen the understanding of the short video generation method in the embodiments of the present application, the following describes the embodiments of the present application in combination with the application scenarios of generating short videos from a single video and generating short videos from multiple videos. The details are as follows:
一、单视频生成短视频One, single video generates short video
首先,利用源视频、视频分析精选后的精选片段的元数据信息,选取出各视频片段;最后,根据选出的各视频片段,结合源视频内容分析元数据信息,利用元数据信息,对视频片段实现后处理效果,最终生成带渲染效果增强处理后的精选短视频。First, use the source video and the metadata information of the selected clips after the video analysis to select each video clip; finally, according to the selected video clips, combine the source video content to analyze the metadata information and use the metadata information to Achieve post-processing effects on video clips, and finally generate selected short videos with enhanced rendering effects.
二、多视频生成短视频2. Multiple videos generate short videos
针对用户一次出行拍了多个视频,需要总结生成单个精华的短视频便于浏览和分享;由于多视频总时长较长,提供用户交互选择短视频的长度,满足用户浏览和分享时长的要求: 第一种:用户不设定精选视频时长时,默认所有精选片段生成精选的短视频内容;第二种:用户按不同分享要求,设置短视频时长时,排序精选视频片段的权值,选择符合满足总时长的片段。For users who have taken multiple videos in one trip, it is necessary to summarize and generate a single short video for easy browsing and sharing; due to the long total time of multiple videos, the user is provided to interactively select the length of the short video to meet the user's requirements for browsing and sharing time: One: when the user does not set the selected video duration, all selected clips will generate selected short video content by default; the second: when the user sets the short video duration according to different sharing requirements, sort the weights of the selected video clips , Select the segment that meets the total duration.
上述的描述中对本申请实施例中短视频的生成方法进行了详细描述,下面对本申请实施例中提供的用于生成短视频的装置进行详细描述。The foregoing description describes in detail the short video generation method in the embodiment of the present application, and the device for generating short video provided in the embodiment of the present application is described in detail below.
图3为本申请实施例提供的短视频生成装置的一个结构示意图。FIG. 3 is a schematic structural diagram of a short video generation device provided by an embodiment of the application.
如图3所示,本申请实施例中用于生成短视频的装置300,包括:处理模块301,处理模块301用于执行以下步骤:对源视频中的视频内容进行分析,获取所述源视频中的元数据信息;对用户拍摄内容的特征进行分析,获取用户画像数据;根据所述源视频中的元数据信息和所述用户画像数据,对所述源视频中视频内容进行提取生成短视频。As shown in FIG. 3, the apparatus 300 for generating short videos in the embodiment of the present application includes: a processing module 301, the processing module 301 is configured to perform the following steps: analyze the video content in the source video to obtain the source video Analyze the characteristics of the content taken by the user to obtain user portrait data; According to the metadata information in the source video and the user portrait data, extract the video content from the source video to generate a short video .
在一种可能的实现方式中,处理模块301具体用于:对所述源视频中的视频流进行分析,提取视频帧中的元数据信息;对所述源视频中的音频流进行分析,提取音频帧中的元数据信息,所述源视频的元数据信息包括:所述视频帧中的元数据信息和所述音频帧中的元数据信息。In a possible implementation, the processing module 301 is specifically configured to: analyze the video stream in the source video to extract metadata information in the video frame; analyze the audio stream in the source video to extract The metadata information in the audio frame, and the metadata information of the source video includes: metadata information in the video frame and metadata information in the audio frame.
在一种可能的实现方式中,处理模块301具体用于:对用户相册中存储的图片和视频进行分析,提取所述图片和视频中的元数据信息;根据所述图片和视频中的元数据信息,分析用户拍摄内容的特征,以获取所述用户画像数据。In a possible implementation manner, the processing module 301 is specifically configured to: analyze the pictures and videos stored in the user's photo album, and extract metadata information in the pictures and videos; according to the metadata in the pictures and videos Information, analyze the characteristics of the content taken by the user to obtain the user portrait data.
在一种可能的实现方式中,处理模块301具体用于:利用所述源视频中的元数据信息和所述用户画像数据调整所述源视频中各元数据的权重;通过所述源视频中各元数据的权重,从所述源视频中选取符合预设时长的片段区间生成所述短视频。In a possible implementation manner, the processing module 301 is specifically configured to: use the metadata information in the source video and the user portrait data to adjust the weight of each metadata in the source video; The weight of each metadata is selected from the source video and a segment that meets the preset duration to generate the short video.
在一种可能的实现方式中,所述元数据信息包括以下至少一项:人像区间信息、人声区间信息、背景音乐区间信息、事物分类标签信息、视频光流信息和美学评分信息。In a possible implementation manner, the metadata information includes at least one of the following: portrait interval information, human voice interval information, background music interval information, object classification label information, video optical flow information, and aesthetic rating information.
在一种可能的实现方式中,处理模块301还用于:根据所述源视频中所述短视频部分的元数据信息,对所述短视频进行视频渲染效果处理。In a possible implementation manner, the processing module 301 is further configured to: perform video rendering effect processing on the short video according to the metadata information of the short video part in the source video.
需要说明的是,上述图1中所述的短视频的生成方法中的所有操作均可以援引到上述图3中所述的处理模块301中执行,换言之,上述图3中所述的处理模块301可以执行上述图1中所述的短视频的生成方法中的所有操作。It should be noted that all operations in the short video generation method described in FIG. 1 can be invoked to be executed in the processing module 301 described in FIG. 3, in other words, the processing module 301 described in FIG. All operations in the short video generation method described in FIG. 1 can be performed.
上述的图3中介绍了短视频生成装置的一种结构示意图,下面结合图4介绍短视频生成装置的另一种结构示意图。The above-mentioned FIG. 3 introduces a schematic diagram of a structure of the short video generating device, and the following describes another schematic diagram of the structure of the short video generating device in conjunction with FIG. 4.
图4为本申请实施例提供的短视频生成装置的另一种结构示意图。FIG. 4 is a schematic diagram of another structure of a short video generation device provided by an embodiment of the application.
如图4所示,本申请实施例中的短视频生成装置400包括:视频预处理模块401、视频内容分析模块402、用户拍摄内容特征分析模块403、视频内容优先模块404、元数据信息存储模块405、视频预览模块406和视频存储模块407。As shown in FIG. 4, the short video generation device 400 in the embodiment of the present application includes: a video preprocessing module 401, a video content analysis module 402, a user shooting content feature analysis module 403, a video content priority module 404, and a metadata information storage module 405, a video preview module 406, and a video storage module 407.
其中,视频预处理模块401用于:对源视频进行预处理以将源视频中的视频流以及音频流分离出来,同时还可以将源视频的时长以及帧率等均分离出来;视频内容分析模块402用于:执行上述步骤101中对源视频中的视频内容进行分析,获取源视频中的元数据信息对应的操作;用户拍摄内容特征分析模块403用于:执行上述步骤102中对用户拍摄内容的特征进行分析,获取用户画像数据对应的所有操作;视频内容优先模块404用于:执行上述步骤103中根据源视频中的元数据信息和用户画像数据,对源视频中视频内容进行提取生成短视 频对应的操作;元数据信息存储模块405用于:存储源视频中的元数据信息以及用户画像数据等;视频预览模块406用于:执行上述步骤104中根据源视频中短视频部分的元数据信息,对短视频进行视频渲染效果处理对应的操作以及对短视频进行预览;视频存储模块407用于:存储生成后的短视频,以便后续提供给用户进行浏览和分享。上述元数据信息存储模块405和视频存储模块407既可以采用相同的物理存储介质实现,也可以采用不同的物理存储介质实现,对此本申请实施例中不做任何限制。Among them, the video preprocessing module 401 is used to: preprocess the source video to separate the video stream and the audio stream in the source video, and also separate the duration and frame rate of the source video; the video content analysis module 402 is used to: perform the above step 101 to analyze the video content in the source video, and obtain the operation corresponding to the metadata information in the source video; the user captured content feature analysis module 403 is used to: perform the above step 102 to capture the content of the user The video content priority module 404 is used to: perform the above step 103 according to the metadata information in the source video and the user portrait data to extract the video content from the source video to generate a short The operation corresponding to the video; the metadata information storage module 405 is used to store metadata information and user portrait data in the source video; the video preview module 406 is used to perform the above step 104 according to the metadata of the short video part of the source video Information, performing operations corresponding to the video rendering effect processing on the short video and previewing the short video; the video storage module 407 is used to store the generated short video so that it can be subsequently provided to the user for browsing and sharing. The above-mentioned metadata information storage module 405 and video storage module 407 may be implemented by using the same physical storage medium or different physical storage media, and there is no limitation in the embodiment of the present application.
需要说明的是,上述装置各模块/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其带来的技术效果与本申请方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。It should be noted that the information interaction and execution process between the various modules/units of the above-mentioned device are based on the same concept as the method embodiment of this application, and the technical effect brought by it is the same as that of the method embodiment of this application, and the specific content may be Please refer to the description in the method embodiment shown in the foregoing application, which will not be repeated here.
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储有程序,该程序执行包括上述方法实施例中记载的部分或全部步骤。An embodiment of the present application further provides a computer storage medium, wherein the computer storage medium stores a program, and the program executes a part or all of the steps recorded in the foregoing method embodiment.
接下来介绍本申请实施例中提供的另一种短视频生成装置,该装置可以是终端,也可以是设置于终端中的芯片。Next, another short video generation device provided in an embodiment of the present application will be introduced. The device may be a terminal or a chip set in the terminal.
以终端为例,结合图5对本申请实施例中的另一种短视频生成装置进行说明。Taking a terminal as an example, another short video generation device in an embodiment of the present application will be described with reference to FIG. 5.
如图5所示,本申请实施例中的终端500包括:接收器501、发射器502、处理器503和存储器504(其中终端500中的处理器503的数量可以一个或多个,图5中以一个处理器为例)。在本申请的一些实施例中,接收器501、发射器502、处理器503和存储器504可通过总线或其它方式连接,其中,图5中以通过总线连接为例。As shown in FIG. 5, the terminal 500 in the embodiment of the present application includes: a receiver 501, a transmitter 502, a processor 503, and a memory 504 (the number of processors 503 in the terminal 500 may be one or more, as shown in FIG. Take a processor as an example). In some embodiments of the present application, the receiver 501, the transmitter 502, the processor 503, and the memory 504 may be connected through a bus or other methods, wherein the connection through a bus is taken as an example in FIG. 5.
存储器504可以包括只读存储器和随机存取存储器,并向处理器503提供指令和数据。存储器504的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器504存储有操作系统和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。操作系统可包括各种系统程序,用于实现各种基础业务以及处理基于硬件的任务。The memory 504 may include a read-only memory and a random access memory, and provides instructions and data to the processor 503. A part of the memory 504 may also include a non-volatile random access memory (NVRAM). The memory 504 stores an operating system and operating instructions, executable modules or data structures, or a subset of them, or an extended set of them, where the operating instructions may include various operating instructions for implementing various operations. The operating system may include various system programs for implementing various basic services and processing hardware-based tasks.
处理器503控制终端的操作,处理器503还可以称为中央处理单元(central processing unit,CPU)。具体的应用中,终端的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。The processor 503 controls the operation of the terminal, and the processor 503 may also be referred to as a central processing unit (CPU). In a specific application, the various components of the terminal are coupled together through a bus system. In addition to the data bus, the bus system may also include a power bus, a control bus, and a status signal bus. However, for clear description, various buses are referred to as bus systems in the figure.
上述本申请实施例揭示的方法可以应用于处理器503中,或者由处理器503实现。处理器503可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器503中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器503可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器504,处理器503读取存储器504中的信息,结合其硬件完成上述方法的步骤。The method disclosed in the foregoing embodiments of the present application may be applied to the processor 503 or implemented by the processor 503. The processor 503 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by hardware integrated logic circuits in the processor 503 or instructions in the form of software. The aforementioned processor 503 may be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or Other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory 504, and the processor 503 reads the information in the memory 504, and completes the steps of the foregoing method in combination with its hardware.
接收器501可用于接收输入的数字或字符信息,以及产生与终端的相关设置以及功能控 制有关的信号输入,发射器502可包括显示屏等显示设备,发射器502可用于通过外接接口输出数字或字符信息。The receiver 501 can be used to receive input digital or character information, and to generate signal input related to terminal settings and function control. The transmitter 502 can include display devices such as a display screen. The transmitter 502 can be used to output digital or character information through an external interface. Character information.
本申请实施例中,处理器503具体可以是上述图3中处理模块301,用于执行上述图1中所述的方法实施例中的所有操作。In the embodiment of the present application, the processor 503 may specifically be the processing module 301 in FIG. 3, and is configured to perform all operations in the method embodiment described in FIG. 1.
在另一种可能的设计中,短视频生成装置为芯片,该芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使该终端内的芯片执行上述第一方面任意一项的无线通信方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述终端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。In another possible design, the short video generation device is a chip, the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin Or circuits, etc. The processing unit can execute the computer-executable instructions stored in the storage unit, so that the chip in the terminal executes the wireless communication method of any one of the foregoing first aspect. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit in the terminal located outside the chip, such as a read-only memory (read-only memory). -only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面方法的程序执行的集成电路。Wherein, the processor mentioned in any of the above can be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the method in the first aspect.
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。In addition, it should be noted that the device embodiments described above are merely illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate The physical unit can be located in one place or distributed across multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solutions of the embodiments. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that they have a communication connection between them, which can be specifically implemented as one or more communication buses or signal lines.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memory, Dedicated components and so on to achieve. Under normal circumstances, all functions completed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structure used to achieve the same function can also be diverse, such as analog circuits, digital circuits or dedicated Circuit etc. However, for this application, software program implementation is a better implementation in more cases. Based on this understanding, the technical solution of this application essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, server, or network device, etc.) execute the methods described in each embodiment of this application .
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website site, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Claims (13)

  1. 一种短视频的生成方法,其特征在于,包括:A method for generating short video, which is characterized in that it includes:
    对源视频中的视频内容进行分析,获取所述源视频中的元数据信息;Analyzing the video content in the source video to obtain metadata information in the source video;
    对用户拍摄内容的特征进行分析,获取用户画像数据;Analyze the characteristics of the user's shooting content and obtain user portrait data;
    根据所述源视频中的元数据信息和所述用户画像数据,对所述源视频中视频内容进行提取生成短视频。According to the metadata information in the source video and the user portrait data, the video content in the source video is extracted to generate a short video.
  2. 根据权利1所述的方法,其特征在于,所述对源视频中的视频内容进行分析,获取所述源视频的元数据信息,包括:The method according to claim 1, wherein the analyzing the video content in the source video to obtain the metadata information of the source video comprises:
    对所述源视频中的视频流进行分析,提取视频帧中的元数据信息;Analyzing the video stream in the source video, and extracting metadata information in the video frame;
    对所述源视频中的音频流进行分析,提取音频帧中的元数据信息,所述源视频的元数据信息包括:所述视频帧中的元数据信息和所述音频帧中的元数据信息。Analyze the audio stream in the source video to extract metadata information in the audio frame. The metadata information of the source video includes: metadata information in the video frame and metadata information in the audio frame .
  3. 根据权利要求1或2所述的方法,其特征在于,所述对用户拍摄内容的特征进行分析,获取用户画像数据,包括:The method according to claim 1 or 2, wherein the analyzing the characteristics of the content taken by the user to obtain user portrait data includes:
    对用户相册中存储的图片和视频进行分析,提取所述图片和视频中的元数据信息;Analyze the pictures and videos stored in the user's photo album, and extract metadata information in the pictures and videos;
    根据所述图片和视频中的元数据信息,分析用户拍摄内容的特征,以获取所述用户画像数据。According to the metadata information in the pictures and videos, the characteristics of the content taken by the user are analyzed to obtain the user portrait data.
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述根据所述源视频中的元数据信息和所述用户画像数据,对所述源视频中视频内容进行提取生成短视频,包括:The method according to any one of claims 1 to 3, wherein the video content in the source video is extracted to generate a short message according to the metadata information in the source video and the user portrait data. Video, including:
    利用所述源视频中的元数据信息和所述用户画像数据调整所述源视频中各元数据的权重;Adjusting the weight of each metadata in the source video by using the metadata information in the source video and the user portrait data;
    通过所述源视频中各元数据的权重,从所述源视频中选取符合预设时长的片段区间生成所述短视频。According to the weight of each metadata in the source video, the short video is generated by selecting a segment interval conforming to a preset duration from the source video.
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述元数据信息包括以下至少一项:人像区间信息、人声区间信息、背景音乐区间信息、事物分类标签信息、视频光流信息和美学评分信息。The method according to any one of claims 1 to 4, wherein the metadata information includes at least one of the following: portrait interval information, vocal interval information, background music interval information, thing classification label information, video Optical flow information and aesthetic rating information.
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述方法还包括:根据所述源视频中所述短视频部分的元数据信息,对所述短视频进行视频渲染效果处理。The method according to any one of claims 1 to 5, wherein the method further comprises: performing a video rendering effect on the short video according to the metadata information of the short video part in the source video deal with.
  7. 一种短视频生成装置,其特征在于,包括:A short video generation device, characterized in that it comprises:
    处理模块,用于:对源视频中的视频内容进行分析,获取所述源视频中的元数据信息;对用户拍摄内容的特征进行分析,获取用户画像数据;根据所述源视频中的元数据信息和所述用户画像数据,对所述源视频中视频内容进行提取生成短视频。The processing module is used to: analyze the video content in the source video to obtain the metadata information in the source video; analyze the characteristics of the content shot by the user to obtain user portrait data; according to the metadata in the source video Information and the user portrait data, and extract the video content from the source video to generate a short video.
  8. 根据权利要求7中所述的装置,其特征在于,所述处理模块具体用于:The device according to claim 7, wherein the processing module is specifically configured to:
    对所述源视频中的视频流进行分析,提取视频帧中的元数据信息;Analyzing the video stream in the source video, and extracting metadata information in the video frame;
    对所述源视频中的音频流进行分析,提取音频帧中的元数据信息,所述源视频的元数据信息包括:所述视频帧中的元数据信息和所述音频帧中的元数据信息。Analyze the audio stream in the source video to extract metadata information in the audio frame. The metadata information of the source video includes: metadata information in the video frame and metadata information in the audio frame .
  9. 根据权利要求7或8所述的装置,其特征在于,所述处理模块具体用于:The device according to claim 7 or 8, wherein the processing module is specifically configured to:
    对用户相册中存储的图片和视频进行分析,提取所述图片和视频中的元数据信息;Analyze the pictures and videos stored in the user's photo album, and extract metadata information in the pictures and videos;
    根据所述图片和视频中的元数据信息,分析用户拍摄内容的特征,以获取所述用户画像数据。According to the metadata information in the pictures and videos, the characteristics of the content taken by the user are analyzed to obtain the user portrait data.
  10. 根据权利要求7至9中任一项所述的装置,其特征在于,所述处理模块具体用于:The device according to any one of claims 7 to 9, wherein the processing module is specifically configured to:
    利用所述源视频中的元数据信息和所述用户画像数据调整所述源视频中各元数据的权重;Adjusting the weight of each metadata in the source video by using the metadata information in the source video and the user portrait data;
    通过所述源视频中各元数据的权重,从所述源视频中选取符合预设时长的片段区间生成所述短视频。According to the weight of each metadata in the source video, the short video is generated by selecting a segment interval conforming to a preset duration from the source video.
  11. 根据权利要求7至10中任一项所述的装置,其特征在于,所述元数据信息包括以下至少一项:人像区间信息、人声区间信息、背景音乐区间信息、事物分类标签信息、视频光流信息和美学评分信息。The device according to any one of claims 7 to 10, wherein the metadata information includes at least one of the following: portrait interval information, vocal interval information, background music interval information, thing classification label information, video Optical flow information and aesthetic rating information.
  12. 根据权利要求7至11中任一项所述的装置,所述处理模块还用于:According to the device according to any one of claims 7 to 11, the processing module is further configured to:
    根据所述源视频中所述短视频部分的元数据信息,对所述短视频进行视频渲染效果处理。Perform video rendering effect processing on the short video according to the metadata information of the short video part in the source video.
  13. 一种短视频生成装置,其特征在于,包括:A short video generation device, characterized in that it comprises:
    处理单元和存储单元,所述存储单元用于存储计算机操作指令;A processing unit and a storage unit, where the storage unit is used to store computer operation instructions;
    所述处理单元用于,通过调用所述计算机操作指令执行如权利要求1至6中任一项所述的短视频的生成方法。The processing unit is configured to execute the short video generation method according to any one of claims 1 to 6 by calling the computer operation instructions.
PCT/CN2020/097520 2019-06-24 2020-06-22 Method and device for generating short video WO2020259449A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910549540.8A CN110418191A (en) 2019-06-24 2019-06-24 A kind of generation method and device of short-sighted frequency
CN201910549540.8 2019-06-24

Publications (1)

Publication Number Publication Date
WO2020259449A1 true WO2020259449A1 (en) 2020-12-30

Family

ID=68359639

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/097520 WO2020259449A1 (en) 2019-06-24 2020-06-22 Method and device for generating short video

Country Status (2)

Country Link
CN (1) CN110418191A (en)
WO (1) WO2020259449A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110418191A (en) * 2019-06-24 2019-11-05 华为技术有限公司 A kind of generation method and device of short-sighted frequency
CN111083138B (en) * 2019-12-13 2022-07-12 北京秀眼科技有限公司 Short video production system, method, electronic device and readable storage medium
CN111083525B (en) * 2019-12-27 2022-01-11 恒信东方文化股份有限公司 Method and system for automatically generating intelligent image
CN111327968A (en) * 2020-02-27 2020-06-23 北京百度网讯科技有限公司 Short video generation method, short video generation platform, electronic equipment and storage medium
CN113259708A (en) * 2021-04-06 2021-08-13 阿里健康科技(中国)有限公司 Method, computer device and medium for introducing commodities based on short video
CN115243107B (en) * 2022-07-08 2023-11-21 华人运通(上海)云计算科技有限公司 Method, device, system, electronic equipment and medium for playing short video

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090003712A1 (en) * 2007-06-28 2009-01-01 Microsoft Corporation Video Collage Presentation
CN103813215A (en) * 2012-11-13 2014-05-21 联想(北京)有限公司 Information collection method and electronic device
CN109565613A (en) * 2016-06-24 2019-04-02 谷歌有限责任公司 The interesting moment pieces together in video
CN110418191A (en) * 2019-06-24 2019-11-05 华为技术有限公司 A kind of generation method and device of short-sighted frequency

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100708337B1 (en) * 2003-06-27 2007-04-17 주식회사 케이티 Apparatus and method for automatic video summarization using fuzzy one-class support vector machines
US9313535B2 (en) * 2011-02-03 2016-04-12 Ericsson Ab Generating montages of video segments responsive to viewing preferences associated with a video terminal
CN102184221B (en) * 2011-05-06 2012-12-19 北京航空航天大学 Real-time video abstract generation method based on user preferences
US20160189753A1 (en) * 2013-06-07 2016-06-30 Robert William Mangold System and process for creating multiple unique versions of a video for placement on unique generated web pages and video-sharing web sites
US10171843B2 (en) * 2017-01-19 2019-01-01 International Business Machines Corporation Video segment manager
CN107436921B (en) * 2017-07-03 2020-10-16 李洪海 Video data processing method, device, equipment and storage medium
CN107566907B (en) * 2017-09-20 2019-08-30 Oppo广东移动通信有限公司 Video clipping method, device, storage medium and terminal
CN108038161A (en) * 2017-12-06 2018-05-15 北京奇虎科技有限公司 Information recommendation method, device and computing device based on photograph album

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090003712A1 (en) * 2007-06-28 2009-01-01 Microsoft Corporation Video Collage Presentation
CN103813215A (en) * 2012-11-13 2014-05-21 联想(北京)有限公司 Information collection method and electronic device
CN109565613A (en) * 2016-06-24 2019-04-02 谷歌有限责任公司 The interesting moment pieces together in video
CN110418191A (en) * 2019-06-24 2019-11-05 华为技术有限公司 A kind of generation method and device of short-sighted frequency

Also Published As

Publication number Publication date
CN110418191A (en) 2019-11-05

Similar Documents

Publication Publication Date Title
WO2020259449A1 (en) Method and device for generating short video
WO2021088510A1 (en) Video classification method and apparatus, computer, and readable storage medium
US10885380B2 (en) Automatic suggestion to share images
WO2021232978A1 (en) Video processing method and apparatus, electronic device and computer readable medium
WO2020119350A1 (en) Video classification method and apparatus, and computer device and storage medium
US10685460B2 (en) Method and apparatus for generating photo-story based on visual context analysis of digital content
US10621755B1 (en) Image file compression using dummy data for non-salient portions of images
US8634603B2 (en) Automatic media sharing via shutter click
WO2018214772A1 (en) Media data processing method and apparatus, and storage medium
EP3477506A1 (en) Video detection method, server and storage medium
JP4228320B2 (en) Image processing apparatus and method, and program
CN113010703B (en) Information recommendation method and device, electronic equipment and storage medium
JP2022523606A (en) Gating model for video analysis
CN111368141B (en) Video tag expansion method, device, computer equipment and storage medium
US9449027B2 (en) Apparatus and method for representing and manipulating metadata
JP2012530287A (en) Method and apparatus for selecting representative images
US20130343618A1 (en) Searching for Events by Attendants
US9754157B2 (en) Method and apparatus for summarization based on facial expressions
CN111382281B (en) Recommendation method, device, equipment and storage medium for content based on media object
KR102313338B1 (en) Apparatus and method for searching image
KR20100018070A (en) Method and apparatus for automatically generating summaries of a multimedia file
US20240127406A1 (en) Image quality adjustment method and apparatus, device, and medium
CN109167939B (en) Automatic text collocation method and device and computer storage medium
US20140379704A1 (en) Method, Apparatus and Computer Program Product for Management of Media Files
CN111046232A (en) Video classification method, device and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20831244

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20831244

Country of ref document: EP

Kind code of ref document: A1