CN117014680A - Video generation method, device, equipment, medium and program product - Google Patents

Video generation method, device, equipment, medium and program product Download PDF

Info

Publication number
CN117014680A
CN117014680A CN202310844602.4A CN202310844602A CN117014680A CN 117014680 A CN117014680 A CN 117014680A CN 202310844602 A CN202310844602 A CN 202310844602A CN 117014680 A CN117014680 A CN 117014680A
Authority
CN
China
Prior art keywords
video
picture
emotion
segment
time period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310844602.4A
Other languages
Chinese (zh)
Inventor
陈勃霖
龙良曲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Insta360 Innovation Technology Co Ltd
Original Assignee
Insta360 Innovation Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Insta360 Innovation Technology Co Ltd filed Critical Insta360 Innovation Technology Co Ltd
Priority to CN202310844602.4A priority Critical patent/CN117014680A/en
Publication of CN117014680A publication Critical patent/CN117014680A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Child & Adolescent Psychology (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The present application relates to a method, apparatus, device, medium and program product for generating video. The method comprises the following steps: acquiring an original video; identifying interesting elements in the original video to obtain a plurality of interesting video clips; each video segment of interest comprises a preset mood word; and generating target video corresponding to the original video according to the plurality of interesting video fragments. By adopting the method, the efficiency of video editing can be improved.

Description

Video generation method, device, equipment, medium and program product
Technical Field
The present application relates to the field of video editing technologies, and in particular, to a method, an apparatus, a device, a medium, and a program product for generating a video.
Background
With the continuous development of self-media technology, short video becomes an important way for obtaining information by virtue of the characteristics of short and quick time, visual content, rich form and the like.
Short videos are often a piece of video that is clipped based on the original video that was taken. In the related art, a photographed original video is analyzed mainly by means of manual editing, an interested fragment in the original video is determined, and editing is performed based on the interested fragment to generate a required video.
However, the related art has a problem in that efficiency is low when video clips are performed.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a video generation method, apparatus, device, medium, and program product that can improve video editing efficiency.
In a first aspect, the present application provides a method for generating a video, the method comprising:
acquiring an original video;
identifying interesting elements in the original video to obtain a plurality of interesting video clips; each video segment of interest comprises a preset mood word;
and generating target video corresponding to the original video according to the plurality of interesting video fragments.
In one embodiment, the elements of interest include preset mood words and preset visual elements;
identifying the interested elements in the original video to obtain a plurality of interested video segments, wherein the method comprises the following steps:
identifying preset emotion words in an original video to obtain a plurality of emotion video clips; and identifying preset visual elements in the original video to obtain a plurality of picture video clips;
each video segment of interest is determined based on each emotional video segment and each visual video segment.
In one embodiment, identifying preset emotion words in an original video to obtain a plurality of emotion video segments includes:
identifying preset emotion words in the original video by using a preset emotion word identification model to obtain a plurality of initial emotion video fragments;
each emotional video segment is determined based on each initial emotional video segment.
In one embodiment, determining each emotional video segment based on each initial emotional video segment includes:
acquiring emotion time intervals of adjacent initial emotion video clips in each initial emotion video clip;
if the emotion time interval is smaller than the preset interval, merging the adjacent initial emotion video segments to obtain each emotion video segment.
In one embodiment, the preset visual elements include expression elements and action elements;
identifying preset visual elements in an original video to obtain a plurality of picture video clips, wherein the method comprises the following steps:
recognizing expression elements in an original video by using a preset expression recognition model to obtain a plurality of expression picture video clips; and/or identifying action elements in the original video by using a preset action identification model to obtain a plurality of action picture video clips;
Each picture video clip is determined based on each expression picture video clip and/or each action picture video clip.
In one embodiment, determining each picture video clip based on each expression picture video clip and/or each action picture video clip includes:
combining the expression picture video clips and the action picture video clips to obtain each picture video clip under the condition that the expression picture video clips and the action picture video clips have overlapping time periods;
in the case where there is no overlapping period of time between the expression picture video clip and the action picture video clip, both the expression picture video clip and the action picture video clip are determined as each picture video clip.
In one embodiment, determining each video segment of interest based on each emotional video segment and each visual video segment comprises:
acquiring an audio time period corresponding to each emotion video segment; acquiring a picture time period corresponding to each picture video clip;
each video segment of interest is determined based on each audio time period and each picture time period. In one embodiment, determining each video segment of interest based on each audio time period and each visual time period comprises:
Under the condition that an overlapping time period exists between the audio time period and the picture time period, fusing the emotion video segments corresponding to the audio time period and the picture video segments corresponding to the picture time period to obtain each interested video segment;
and under the condition that the overlapping time period does not exist in the audio time period and the picture time period, performing expansion processing on the emotion video segments corresponding to the audio time period, and determining the emotion video segments after expansion as the video segments of interest.
In one embodiment, generating a target video corresponding to an original video according to a plurality of interesting video segments includes:
acquiring quality quantization values of all interested video clips;
determining the video segment of interest with the maximum quality quantization value as the video head of the target video;
and generating a target video corresponding to the original video based on the film head and each video segment of interest.
In one embodiment, generating a target video corresponding to an original video based on a slice header and each video segment of interest includes:
screening candidate video clips with preset clip quantity from all the video clips based on the quality quantization value of all the video clips;
And splicing the title and each candidate interested video segment according to the time sequence corresponding to each candidate interested video segment, and generating a target video corresponding to the original video.
In a second aspect, the present application also provides a video generating apparatus, where the apparatus includes:
the acquisition module is used for acquiring an original video;
the identification module is used for identifying the interested elements in the original video to obtain a plurality of interested video clips; each video segment of interest comprises a preset mood word;
and the generating module is used for generating target videos corresponding to the original videos according to the plurality of interesting video clips.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the content of the method for generating video according to any one of the above first aspects when the processor executes the computer program.
In a fourth aspect, the present application also provides a computer-readable storage medium. A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the content of the method of generating a video of any one of the above-described first aspects.
In a fifth aspect, the present application also provides a computer program product. A computer program product comprising a computer program which when executed by a processor implements the content of the method of generating video of any of the above first aspects.
The method, the device, the equipment, the medium and the program product for generating the video acquire an original video; identifying interesting elements in the original video to obtain a plurality of interesting video clips; each video segment of interest comprises a preset mood word; and generating target video corresponding to the original video according to the plurality of interesting video fragments. According to the method, the interested elements are identified from the original video, the interested video fragments containing the preset emotion words can be accurately obtained, and then the target video generated by the plurality of interested video fragments containing the preset emotion words also contains the plurality of preset emotion words; meanwhile, the identification process and the generation process are automatically completed, so that the editing time of the original video is reduced, and the editing efficiency of the target video is improved.
Drawings
FIG. 1 is an application environment diagram of a method of generating video in one embodiment;
FIG. 2 is a flow chart of a method of generating video in one embodiment;
FIG. 3 is a schematic diagram of a video clip corresponding to an emotion word in one embodiment;
FIG. 4 is a schematic diagram of a video clip corresponding to a visual element in one embodiment;
FIG. 5 is a flow chart of a method of generating video in one embodiment;
FIG. 6 is a flow chart of a method of generating video in one embodiment;
FIG. 7 is a flow chart of a method of generating video in one embodiment;
FIG. 8 is a schematic representation of emotional time intervals of adjacent initial emotional video segments in one embodiment;
FIG. 9 is a schematic diagram of an emotional video clip in one embodiment;
FIG. 10 is a flow chart of a method of generating video in one embodiment;
FIG. 11 is a flow chart of a method of generating video in one embodiment;
FIG. 12 is a flow chart of a method of generating video in one embodiment;
FIG. 13 is a flow chart of a method of generating video in one embodiment;
FIG. 14 is a schematic diagram of a video clip of interest in one embodiment;
FIG. 15 is a flow chart of a method of generating video in one embodiment;
FIG. 16 is a flow chart of a method of generating video in one embodiment;
FIG. 17 is a flow chart of a method of generating video in one embodiment;
fig. 18 is a block diagram showing the structure of a video generating apparatus according to an embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The video generation method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. The memory in the internal structure of the computer device includes a nonvolatile storage medium storing an operating system, a computer program, and a database, and an internal memory; the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database is used for storing data in the generation process of the video. The network interface is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements the video generation method provided by the present application.
In one embodiment, as shown in fig. 2, a method for generating video is provided, and the method is applied to the computer device in fig. 1 for illustration, and includes the following steps:
s201, acquiring an original video.
The original videos are videos of any scene acquired by the shooting equipment, the number of the original videos can be one section or multiple sections, and the lengths of the original videos of each section can be the same or different.
In this embodiment, the computer device may search, according to the original video identification information, an original video corresponding to the identification information from an original video database, and use at least one original video obtained by searching as an original video carried in the video generation instruction. Or, the computer device may also send a video acquisition instruction to the image capturing device, and after receiving the video acquisition instruction, the image capturing device acquires an original video in the target scene and sends the original video to the computer device. The present embodiment does not limit the manner in which the original video is acquired.
S202, identifying an interested element in an original video to obtain a plurality of interested video clips; each video segment of interest includes a preset mood word.
The interesting element refers to a wonderful segment in the original video, and comprises wonderful mood words and wonderful pictures, namely preset mood words and preset visual elements. The preset emotion words can be words with cheering emotion, words with sad emotion and the like, and the preset visual elements can be expression elements, action elements, text elements and the like. For example, words with cheering emotion may be "good ball", "too excellent", etc., words with sad emotion may be "behind score", "no edge into a resolution", etc.; the expression elements can be smiles or crying, the action elements can be claps, hands waving and the like, and the character elements can be names of people, team names and the like.
Optionally, the computer device may input the original video into a preset element of interest recognition model, recognize the original video using the element of interest recognition model, determine the emotion words and visual elements in the original video, obtain visual segments corresponding to each emotion word and visual element, and determine the visual segments corresponding to each emotion word and visual element as a plurality of video segments of interest. Optionally, the computer device may further obtain a historical video with emotion words and visual elements, and obtain a similarity between the historical video and the original video by means of sliding comparison, and determine a historical video with the similarity greater than a preset threshold as the video segment of interest. The present embodiment does not limit the manner in which a plurality of video clips of interest are obtained. For example, fig. 3 shows a schematic diagram of a video clip corresponding to an emotion word, where the video clip is 2-4 seconds of the original video, and the emotion word "good ball" in fig. 3 is 3 seconds of the original video. Fig. 4 is a schematic diagram of a video clip corresponding to a visual element, where the video clip corresponding to the visual element is 8 th to 10 th seconds of an original video, and the picture in fig. 4 is 9 th seconds of the original video. The video segment corresponding to the complete overlapping of the emotion word and the visual element is 15-20 seconds of the original video, and the 2-4 seconds, the 8-10 seconds and the 15-20 seconds in the original video are all taken as the interested video segment.
It should be noted that each obtained video segment of interest contains a preset emotion word, so that when the target video is generated, the target video can be ensured to be a video with a certain emotion, so as to improve the quality of the target video.
And S203, generating target videos corresponding to the original videos according to the plurality of interesting video clips.
In this embodiment, after the computer device obtains a plurality of video segments of interest, each video segment of interest may be scored, and the plurality of video segments of interest may be combined according to the scoring sequence, so as to obtain a target video corresponding to the original video. Or, the computer device may further combine the multiple video segments of interest according to the time sequence of the multiple video segments of interest in the original video, so as to obtain the target video corresponding to the original video. Or, the computer device may further select a portion of the video segments of interest from the plurality of video segments of interest, and combine the portion of the video segments of interest to obtain the target video corresponding to the original video. In this embodiment, the method for generating the target video corresponding to the original video according to the multiple interesting video segments is not limited.
In the video generation method, an original video is obtained; identifying interesting elements in the original video to obtain a plurality of interesting video clips; each video segment of interest comprises a preset mood word; and generating target video corresponding to the original video according to the plurality of interesting video fragments. According to the method, the interested elements are identified from the original video, the interested video fragments containing the preset emotion words can be accurately obtained, and then the target video generated by the plurality of interested video fragments containing the preset emotion words also contains the plurality of preset emotion words; meanwhile, the identification process and the generation process are automatically completed, so that the editing time of the original video is reduced, and the editing efficiency of the target video is improved.
On the basis of the embodiment of fig. 2, the interested elements include preset emotion words and preset visual elements, and in this embodiment, the description is presented on the relevant content of step S202 "identify the interested elements in the original video and obtain a plurality of interested video clips" in fig. 2. As shown in fig. 5, the step S202 may include the following:
S301, identifying preset emotion words in an original video to obtain a plurality of emotion video clips; and identifying preset visual elements in the original video to obtain a plurality of picture video clips.
In this embodiment, the computer device may train a preset neural network model using preset emotion words to obtain a trained emotion word recognition model, and recognize the emotion words in the original video using the trained emotion word recognition model to obtain a plurality of emotion video segments corresponding to the emotion words. And training a preset neural network model by using the preset visual elements to obtain a trained visual element identification model. And identifying the visual elements in the original video by using the trained visual element identification model to obtain a plurality of picture video fragments corresponding to the visual elements.
S302, determining each interested video segment based on each emotion video segment and each picture video segment.
In this embodiment, the computer device may analyze each emotion video segment and each picture video segment, fuse the emotion video segments and the picture video segments that belong to the same time period or may be the same time period, and determine the fused video segments as the video segments of interest. If the emotion video segment and the picture video segment do not belong to the same time period, and meanwhile, the emotion video segment is determined to be an interested video segment when the emotion video segment and the picture video segment cannot be in the same time period.
In the video generation method, the preset emotion words in the original video are identified, and a plurality of emotion video fragments are obtained; and identifying preset visual elements in the original video to obtain a plurality of picture video clips, and determining each interested video clip based on each emotion video clip and each picture video clip. According to the method, the preset emotion words and the preset visual originals in the original video are respectively identified, so that the preset emotion words in the audio clips can be identified more accurately, and a plurality of emotion video clips can be acquired accurately; the method can also identify the preset visual elements in the picture segments more accurately, can acquire a plurality of picture video segments accurately, and acquire more accurate interested video segments based on each emotion video segment and each picture video segment.
Based on the embodiment of fig. 5, in this embodiment, the description is presented for the relevant content of step S301 "identify preset emotion words in the original video to obtain multiple emotion video segments" in fig. 5. As shown in fig. 6, the step S301 may include the following:
s401, recognizing preset emotion words in the original video by using a preset emotion word recognition model to obtain a plurality of initial emotion video segments.
The preset emotion word recognition model is obtained by training a neural network model through a large number of preset emotion words.
In this embodiment, the computer device may obtain an audio segment in an original video, input the audio segment into a preset emotion word recognition model, extract an audio frequency domain (FBank) feature in the audio segment through the emotion word recognition model, recognize an emotion word in the audio segment according to the feature, and output the emotion word in the audio segment and a video segment where the emotion word is located, that is, an initial emotion video segment. For example, the identified emotion word is cheering emotion, the corresponding time period of the emotion is 3-5 seconds in the video segment, and the 3-5 seconds in the video segment is taken as the initial emotion video segment.
S402, determining each emotion video segment based on each initial emotion video segment.
In this embodiment, after the computer device obtains each initial emotion video segment, background music may be added to each initial emotion video segment according to the emotion corresponding to each initial emotion video segment, so as to set up the emotion atmosphere of the initial emotion video segment, and the emotion of the initial emotion video segment after adding the background music is stronger, so that the quality of the initial emotion video segment is improved. Or after the computer equipment acquires the initial emotion video segments, merging the initial emotion video segments belonging to the same emotion, and determining each emotion video segment based on the merged initial emotion video segments.
In the video generation method, the preset emotion words in the original video are identified by using the preset emotion word identification model, a plurality of initial emotion video segments are obtained, and each emotion video segment is determined based on each initial emotion video segment. According to the method, the original video can be accurately identified through the trained emotion word identification model, and the initial emotion video segments corresponding to the emotion words in the original video can be accurately obtained, so that each emotion video segment can be accurately determined based on the initial emotion video segments.
On the basis of the embodiment of fig. 6, the present embodiment describes the relevant content of step S402 "determine each emotion video segment based on each initial emotion video segment" in fig. 6. As shown in fig. 7, the step S402 may include the following:
s501, acquiring emotion time intervals of adjacent initial emotion video clips in all the initial emotion video clips.
In this embodiment, for the initial video segment, when a user in the initial video segment encounters a highlight, the user in the initial video segment may send out more than one emotion word, and there may be a relationship between multiple emotion words, so that the initial emotion video segments corresponding to the emotion words with the same emotion need to be combined into the same emotion video segment. Thus, the computer device may obtain the time of two adjacent initial emotional video segments from each initial emotional video segment, calculate a time difference between the initial time of a subsequent one of the two adjacent initial emotional video segments and the end time of a preceding initial emotional video segment, and determine the time difference as the emotional time interval of the adjacent initial emotional video segments. For example, the time period corresponding to the previous initial emotion video segment in the two adjacent initial emotion video segments is 3-5 seconds, the time period corresponding to the next initial emotion video segment is 7-10 seconds, and the time interval between the two adjacent initial emotion video segments is the time difference between the 7 th second and the 5 th second, namely 2 seconds. Fig. 8 is a schematic diagram showing emotion time intervals of adjacent initial emotion video segments, and fig. 8 is a picture of a termination time of a first initial emotion video segment, the termination time being 5 th second of an original video. The starting time of the second initial emotion adjacent to the first initial emotion video segment is 7 seconds of the original video, and the emotion time interval of the adjacent initial emotion video segments is 2 seconds.
S502, if the emotion time interval is smaller than the preset interval, merging the adjacent initial emotion video segments to obtain each emotion video segment.
In this embodiment, the computer device may compare the emotion time interval with the preset interval, and if the emotion time interval is greater than or equal to the preset interval, determine that the time interval between the adjacent initial emotion video segments is longer, and the adjacent initial emotion video segments cannot be merged; if the emotion time interval is smaller than the preset interval, determining that the time interval between the adjacent initial emotion video segments is shorter, and combining the adjacent initial emotion video segments to obtain each emotion video segment. It should be noted that, after determining that the emotion time interval is smaller than the preset interval, it is further required to determine whether the emotion types of the two adjacent initial emotion video segments are the same, and if so, the adjacent initial emotion video segments may be combined to obtain each emotion video segment. For example, referring also to fig. 8, in the case where it is determined that the first initial emotional video segment and the second initial emotional word are the same emotion, the first initial emotional video segment is 3-5 seconds, the second initial emotional video segment is 7-10 seconds, and then the time interval between the first initial emotional video segment and the second initial emotional video segment is 2 seconds, which is shorter, at this time, the first initial emotional video segment and the second initial emotional video segment are combined into the same emotional video segment, and the time period corresponding to the emotional video segment is 3-10 seconds.
Further, the interval between the initial emotion video segments corresponding to the emotion words of the same emotion is smaller, or an overlapping portion may exist between the initial emotion video segments corresponding to the emotion words of the same emotion. Fig. 9 is a schematic diagram of an emotional video segment, wherein the third initial emotional video segment is 5-8 seconds, the fourth initial emotional video segment is 7-9 seconds, and then there is an overlapping portion between the third initial emotional video segment and the fourth initial emotional video segment, at this time, the third initial emotional video segment and the fourth initial emotional video segment are combined into the same emotional video segment, and the time period corresponding to the emotional video segment is 5-9 seconds.
In the video generation method, the emotion time interval of the adjacent initial emotion video segments in each initial emotion video segment is obtained, and if the emotion time interval is smaller than the preset interval, the adjacent initial emotion video segments are combined to obtain each emotion video segment. According to the method, the emotion time intervals of the adjacent initial emotion video segments are compared with the preset intervals, whether the emotion time intervals of the adjacent initial emotion video segments are shorter or not can be determined more accurately, and the adjacent initial emotion video segments are combined under the condition that the emotion time intervals are determined to be shorter, so that the obtained emotion video segments are more accurate.
Based on the embodiment of fig. 5, the preset visual elements include expression elements and action elements, and in this embodiment, the description is presented on the relevant content of step S301 "identify preset visual elements in the original video to obtain multiple frame video clips" in fig. 5. As shown in fig. 10, the step S301 may include the following:
s601, recognizing expression elements in an original video by using a preset expression recognition model to obtain a plurality of expression picture video clips; and/or identifying action elements in the original video by using a preset action identification model to obtain a plurality of action picture video clips.
The preset expression recognition model is a model obtained by training a neural network model through a large number of preset expression elements; the preset action recognition model is a model obtained by training a neural network model through a large number of preset action elements.
In this embodiment, the computer device may obtain a picture segment in the original video, input the picture segment into a preset expression recognition model and a preset action recognition model, and extract expression features in the picture segment according to a preset time interval through the expression recognition model, so as to obtain a plurality of expression picture video segments. And/or extracting the motion characteristics in the picture segments according to a plurality of frames with fixed intervals by the motion recognition model to obtain a plurality of motion picture video segments.
S602, each picture video clip is determined based on each expression picture video clip and each action picture video clip.
In this embodiment, since the user may have a corresponding expression element while performing an action in the initial video clip, the expression picture video clip and the action picture video clip belonging to the same time period are determined as one picture video clip, and the expression picture video clip and the action picture video clip not belonging to the same time period are respectively used as picture video clips, so as to obtain a plurality of picture video clips.
In the video generation method, the preset expression recognition model is utilized to recognize the expression elements in the original video, and a plurality of expression picture video clips are obtained; and identifying action elements in the original video by using a preset action identification model to obtain a plurality of action picture video clips, and determining each picture video clip based on each expression picture video clip and each action picture video clip. According to the method, the picture segments in the original video are respectively identified through the expression identification model and the action identification model, so that a plurality of expression picture video segments and a plurality of action picture video segments are accurately obtained, and the expression picture video segments and the action picture video segments are combined, so that the picture video segments can be more accurately obtained from the original video.
On the basis of the embodiment of fig. 10, the present embodiment describes the related content of step S602 "determine each picture video clip based on each expression picture video clip and each action picture video clip" in fig. 10. As shown in fig. 11, the step S602 may include the following:
and S701, combining the expression picture video clips and the action picture video clips to obtain each picture video clip under the condition that the expression picture video clips and the action picture video clips have overlapping time periods.
In this embodiment, the computer device may determine whether there is an overlapping period of the time period of the expression screen video clip and the time period of the action screen video clip. There are two cases in which the overlapping period includes a case in which one of the period of the expression picture video segment and the period of the action picture video segment is completely overlapped, and then either one of the period of the expression picture video segment or the period of the action picture video segment is taken as the picture video segment. And if the time period of the expression picture video segment and the time period of the action picture video segment are partially overlapped, combining the time period of the expression picture video segment and the time period of the action picture video segment to obtain the picture video segment. For example, when the time period of the expression picture video clip is 2-5 seconds and the time period of the action picture video clip is 4-7 seconds, the combined picture video clip is 2-7 seconds.
S702, determining the expression picture video clip and the action picture video clip as each picture video clip in the case that the overlapped time period does not exist in the expression picture video clip and the action picture video clip.
In this embodiment, since the highlight frame may be a combination of an expression frame and a motion frame, a single expression frame or motion frame is also possible. Thus, in the event that there is no overlapping time period for the expression and action picture video clips, the computer device may determine that both a single expression picture video clip and a single action picture video clip are picture video clips.
In the video generation method, under the condition that the expression picture video clips and the action picture video clips have overlapping time periods, the expression picture video clips and the action picture video clips are combined to obtain each picture video clip; in the case where there is no overlapping period of time between the expression picture video clip and the action picture video clip, both the expression picture video clip and the action picture video clip are determined as each picture video clip. According to the method, the picture time periods corresponding to the expression picture video clips and the action picture video clips are respectively obtained, the expression picture time periods and the action picture time periods are compared, whether the two picture time periods overlap or not is accurately determined, and the overlapped expression picture time periods and the action picture time periods are combined, so that the phenomenon that the expression picture video clips and the action picture video clips overlap is avoided. Meanwhile, for the case where there is no overlap, both a single expression picture video clip and a single action picture video clip are determined as each picture video clip. In this way, each frame of video clip can be acquired more accurately.
On the basis of the embodiment of fig. 5, the present embodiment describes the related content of step S302 "determine each video segment of interest based on each emotional video segment and each frame video segment" in fig. 5. As shown in fig. 12, the step S302 may include the following:
s801, acquiring an audio time period corresponding to each emotion video segment; and acquiring a picture time period corresponding to each picture video clip.
In this embodiment, after each emotional video segment and each frame video segment are acquired, for any one emotional video segment, the computer device may acquire a start time and an end time of the emotional video segment, and determine a time period corresponding to the start time and the end time as an audio time period corresponding to the emotional video segment. Also, the computer device may also acquire the picture period corresponding to each picture video clip in the manner described above.
S802, each video clip of interest is determined based on each audio period and each picture period.
In this embodiment, the computer device may analyze each audio period and each frame period to determine whether the audio period and the frame period overlap, and determine whether to combine the audio period and the frame period according to the determination result, so as to obtain a plurality of video segments of interest including a preset mood word.
In the video generation method, an audio time period corresponding to each emotion video segment is acquired; acquiring a picture time period corresponding to each picture video clip; each video segment of interest is determined based on each audio time period and each picture time period. According to the method, the time periods in the emotion video clips and the picture video clips are respectively acquired, so that a plurality of interested video clips can be accurately acquired based on the audio time period and the picture time period.
On the basis of the embodiment of fig. 12, the present embodiment describes the related content of step S802 "determine each video segment of interest based on each audio period and each screen period" in fig. 12. As shown in fig. 13, the step S802 may include the following:
s901, under the condition that an overlapping time period exists between an audio time period and a picture time period, fusing an emotion video segment corresponding to the audio time period and a picture video segment corresponding to the picture time period to obtain each video segment of interest.
In this embodiment, the computer device may obtain an audio time period corresponding to each emotion video segment and a picture time period corresponding to a picture video segment, compare the audio time period and the picture time period, determine whether there is an overlapping time period between the two, and if there is an overlapping time period between the two, one is that the audio time period and the picture time period are completely overlapped, and use the video segment corresponding to the audio time period or the picture time period as the video segment after fusion. And the other is that the audio time period and the picture time period are partially overlapped, the audio time period and the picture time period are required to be fused, the video segment corresponding to the fused time period is obtained, and the video segment is used as the fused video segment.
And S902, performing expansion processing on the emotion video segments corresponding to the audio time period and the picture time period under the condition that the audio time period and the picture time period do not have overlapping time periods, and determining the expanded emotion video segments as each interested video segment.
In this embodiment, if there is no overlapping time period between the audio time period and the frame time period, the highlight corresponding to the audio time period is not recognized by the interesting visual element recognition process, and at this time, the user in the original video is likely to see the highlight first and then speak the emotion word, in this case, the computer device may expand the audio segment more forward and less backward, and obtain the expanded video segment corresponding to the expanded audio time period. Fig. 14 is a schematic diagram of an interesting video segment, where the interesting video segment is obtained through a process of expansion, the audio time period is 5 th to 7 th seconds, the interesting video segment can be expanded forward for 4 seconds and expanded backward for 2 seconds, and the obtained expanded video segment is 1 st to 9 th seconds in the original video, that is, the interesting video segment has a time period of 1 st to 9 th seconds.
In the video generation method, under the condition that the overlapping time period exists between the audio time period and the picture time period, the emotion video segments corresponding to the audio time period and the picture video segments corresponding to the picture time period are fused to obtain the video segments of interest; and under the condition that the overlapping time period does not exist in the audio time period and the picture time period, performing expansion processing on the emotion video segments corresponding to the audio time period, and determining the emotion video segments after expansion as the video segments of interest. The method judges whether the overlapping time period exists between the audio time period and the picture time period, and different operations are carried out according to different judging results so that the obtained video clips of interest are more accurate.
Based on the embodiment of fig. 2, the present embodiment describes the content related to step S203 "generate the target video corresponding to the original video according to the plurality of interesting video segments" in fig. 2. As shown in fig. 15, the step S203 may include the following:
s1001, obtaining quality quantized values of all the interested video clips.
In this embodiment, since the mood words are more capable of setting up the mood atmosphere of the target video, and the title of one target video requires the mood atmosphere most, the computer device may analyze the mood words in each video segment of interest and determine a quality quantization value corresponding to each video segment of interest in which the mood words exist.
S1002, determining the video segment of interest with the largest quality quantization value as the head of the target video.
In this embodiment, the computer device may determine, from the quality quantized values corresponding to the video segments of interest, the video segment of interest with the largest quality quantized value, where the emotion of the video segment of interest with the largest quality quantized value is highest, and the video segment of interest with the largest quality quantized value is placed in the video segment of interest with the largest quality quantized value, so that the video segment of interest with the largest quality quantized value may be determined as the video segment of interest with the largest quality quantized value.
S1003, generating a target video corresponding to the original video based on the slice header and each interested video segment.
In this embodiment, after the computer device obtains the title of the target video, the title of the target video may be spliced with other interesting video segments except for the title to obtain the target video corresponding to the original video with the title.
In the video generation method, quality quantization values of all interested video clips are obtained; determining the video segment of interest with the maximum quality quantization value as the video head of the target video; and generating a target video corresponding to the original video based on the film head and each video segment of interest. According to the method, the quality quantization value of the video segment of interest is obtained, and the segment head can be accurately screened from the video segment of interest, so that the target video corresponding to the original video with higher quality can be obtained.
On the basis of the embodiment of fig. 15, in this embodiment, the description will be given of the related content of step S1003 "generate the target video corresponding to the original video based on the slice header and each video segment of interest" in fig. 15. As shown in fig. 16, the above step S1003 may include the following:
s1101, screening candidate video segments of preset segment number from the video segments of interest based on the quality quantization value of the video segments of interest.
In this embodiment, the computer device may determine, according to the video size or the video duration, a preset number of segments of the generated video segments, may arrange quality values in order from large to small, and select, according to the preset number of segments, each video segment of interest arranged in front as a candidate video segment of interest. For example, if the preset number of segments is 10, the video segment of interest with the quality quantization value in the first 10 bits is used as the candidate video segment of interest.
S1102, splicing the title and each candidate interested video segment according to the time sequence corresponding to each candidate interested video segment, and generating a target video corresponding to the original video.
In this embodiment, after the computer device obtains each candidate interested video segment, the sequence of each candidate interested video segment in the original video may be determined according to the time sequence corresponding to each candidate interested video segment, and according to the sequence, adjacent candidate interested video segments are spliced in sequence, and the slice header is spliced at the initial position, so as to obtain the target video corresponding to the original video.
In the video generation method, candidate video segments with preset segment numbers are screened out from all the video segments of interest based on the quality quantization value of all the video segments of interest; and splicing the title and each candidate interested video segment according to the time sequence corresponding to each candidate interested video segment, and generating a target video corresponding to the original video. According to the method, candidate interested video clips with higher quality can be accurately screened from all the interested video clips based on the quality quantization value and the preset clip quantity, all the candidate interested video clips are spliced based on time sequence, and the quality of a spliced target video is higher due to the fact that the quality of each candidate interested video clip is higher.
In one embodiment, the following describes the method for generating video in detail, as shown in fig. 17, the method may include:
s1201, acquiring an original video;
s1202, recognizing preset emotion words in an original video by using a preset emotion word recognition model to obtain a plurality of initial emotion video fragments;
s1203, acquiring emotion time intervals of adjacent initial emotion video clips in all the initial emotion video clips;
s1204, if the emotion time interval is smaller than the preset interval, merging adjacent initial emotion video segments to obtain each emotion video segment;
s1205, recognizing expression elements in the original video by using a preset expression recognition model to obtain a plurality of expression picture video clips; and/or identifying action elements in the original video by using a preset action identification model to obtain a plurality of action picture video clips;
s1206, combining the expression picture time period and the action picture time period to obtain each picture video segment under the condition that the overlapped time period exists between the expression picture time period and the action picture time period;
s1207, determining the expression picture video clip and the action picture video clip as each picture video clip when the overlapped time period does not exist in the expression picture video clip and the action picture video clip;
S1208, acquiring an audio time period corresponding to each emotion video segment; acquiring a picture time period corresponding to each picture video clip;
s1209, under the condition that the overlapping time period exists between the audio time period and the picture time period, fusing the emotion video segment corresponding to the audio time period and the picture video segment corresponding to the picture time period to obtain each interested video segment;
s1210, performing expansion processing on emotion video segments corresponding to the audio time period and the picture time period under the condition that the audio time period and the picture time period do not have overlapping time periods, and determining the expanded emotion video segments as each interested video segment;
s1211, obtaining quality quantized values of all the video clips of interest;
s1212, determining the video segment of interest with the largest quality quantization value as the head of the target video;
s1213, screening candidate video clips with preset clip quantity from the video clips based on the quality quantization value of each video clip;
s1214, splicing the title and each candidate interested video segment according to the time sequence corresponding to each candidate interested video segment, and generating a target video corresponding to the original video.
It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a video generation device for realizing the video generation method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiments of the apparatus for generating one or more videos provided below may refer to the limitation of the method for generating videos described above, which is not repeated herein.
In one embodiment, as shown in fig. 18, there is provided a video generating apparatus, including: an acquisition module 11, an identification module 12 and a generation module 13, wherein:
an acquisition module 11, configured to acquire an original video;
the identifying module 12 is configured to identify an element of interest in an original video, so as to obtain a plurality of video segments of interest; each video segment of interest comprises a preset mood word;
the generating module 13 is configured to generate a target video corresponding to the original video according to the multiple video segments of interest.
In one embodiment, the identification module 12 includes: an identification unit and a first determination unit, wherein:
the identification unit is used for identifying preset emotion words in the original video to obtain a plurality of emotion video clips; and identifying preset visual elements in the original video to obtain a plurality of picture video clips;
a first determining unit for determining each video segment of interest based on each emotional video segment and each picture video segment.
In an embodiment, the identifying unit is further configured to identify a preset emotion word in the original video by using a preset emotion word identifying model, so as to obtain a plurality of initial emotion video segments; each emotional video segment is determined based on each initial emotional video segment.
In one embodiment, the identifying unit is further configured to obtain an emotion time interval of an adjacent initial emotion video segment in each initial emotion video segment; if the emotion time interval is smaller than the preset interval, merging the adjacent initial emotion video segments to obtain each emotion video segment.
In an embodiment, the identifying unit is further configured to identify an expression element in the original video by using a preset expression identifying model, so as to obtain a plurality of expression picture video segments; and/or identifying action elements in the original video by using a preset action identification model to obtain a plurality of action picture video clips; each picture video clip is determined based on each expression picture video clip and/or each action picture video clip.
In one embodiment, the identifying unit is further configured to combine the expression picture video segment and the action picture video segment to obtain each picture video segment when there is an overlapping time period between the expression picture video segment and the action picture video segment; in the case where there is no overlapping period of time between the expression picture video clip and the action picture video clip, both the expression picture video clip and the action picture video clip are determined as each picture video clip.
In one embodiment, the first determining unit is further configured to obtain an audio time period corresponding to each emotional video segment; acquiring a picture time period corresponding to each picture video clip; each video segment of interest is determined based on each audio time period and each picture time period.
In one embodiment, the first determining unit is further configured to, when there is an overlapping time period between the audio time period and the picture time period, fuse an emotion video segment corresponding to the audio time period with a picture video segment corresponding to the picture time period, so as to obtain each video segment of interest; and under the condition that the overlapping time period does not exist in the audio time period and the picture time period, performing expansion processing on the emotion video segments corresponding to the audio time period, and determining the emotion video segments after expansion as the video segments of interest.
In one embodiment, the generating module further includes: an acquisition unit, a second determination unit, and a generation unit, wherein:
the acquisition unit is used for acquiring quality quantization values of all the video clips of interest;
a second determining unit, configured to determine a video segment of interest with the largest quality quantization value as a slice header of the target video;
And the generating unit is used for generating a target video corresponding to the original video based on the slice header and each interested video segment.
In one embodiment, the generating unit is further configured to screen out candidate video segments of a preset segment number from the video segments of interest based on the quality quantization value of each video segment of interest; and splicing the title and each candidate interested video segment according to the time sequence corresponding to each candidate interested video segment, and generating a target video corresponding to the original video.
The respective modules in the video generation apparatus described above may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, including a memory and a processor, where the memory stores a computer program, and the processor implements the content of any one of the video generation method embodiments described above when the computer program is executed.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the content of any one of the video generation method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the content of any of the video generation method embodiments described above.
The user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (14)

1. A method of generating video, the method comprising:
acquiring an original video;
identifying the interested elements in the original video to obtain a plurality of interested video clips; each video segment of interest comprises a preset mood word;
and generating target videos corresponding to the original videos according to the plurality of interesting video clips.
2. The method of claim 1, wherein the element of interest comprises a preset mood word and a preset visual element;
identifying the interested elements in the original video to obtain a plurality of interested video segments, including:
identifying preset emotion words in the original video to obtain a plurality of emotion video clips; and identifying preset visual elements in the original video to obtain a plurality of picture video clips;
each of the video segments of interest is determined based on each of the emotional video segments and each of the visual video segments.
3. The method according to claim 2, wherein the identifying the preset mood words in the original video to obtain a plurality of mood video segments includes:
identifying preset emotion words in the original video by using a preset emotion word identification model to obtain a plurality of initial emotion video segments;
and determining each emotion video segment based on each initial emotion video segment.
4. The method of claim 3, wherein said determining each of said emotional video segments based on each of said initial emotional video segments comprises:
Acquiring emotion time intervals of adjacent initial emotion video clips in the initial emotion video clips;
and if the emotion time interval is smaller than a preset interval, merging the adjacent initial emotion video segments to obtain each emotion video segment.
5. The method of claim 2, wherein the preset visual elements include an expression element and an action element;
the identifying the preset visual elements in the original video to obtain a plurality of picture video clips includes:
recognizing expression elements in the original video by using a preset expression recognition model to obtain a plurality of expression picture video clips; and/or identifying action elements in the original video by using a preset action identification model to obtain a plurality of action picture video clips;
and determining each picture video clip based on each expression picture video clip and/or each action picture video clip.
6. The method of claim 5, wherein the determining each of the picture video clips based on each of the expression picture video clips and/or each of the action picture video clips comprises:
Combining the expression picture video clips and the action picture video clips to obtain each picture video clip under the condition that overlapping time periods exist between the expression picture video clips and the action picture video clips;
and determining the expression picture video clip and the action picture video clip as each picture video clip when the overlapped time period does not exist in the expression picture video clip and the action picture video clip.
7. The method of any of claims 2-6, wherein the determining each of the video segments of interest based on each of the emotional video segments and each of the visual video segments comprises:
acquiring an audio time period corresponding to each emotion video segment; acquiring a picture time period corresponding to each picture video clip;
each of the video segments of interest is determined based on each of the audio time periods and each of the picture time periods.
8. The method of claim 7, wherein the determining each video segment of interest based on each audio time period and each visual time period comprises:
Under the condition that the audio time period and the picture time period have overlapping time periods, fusing the emotion video segments corresponding to the audio time period and the picture video segments corresponding to the picture time period to obtain each video segment of interest;
and under the condition that the audio time period and the picture time period do not have overlapping time periods, performing expansion processing on emotion video fragments corresponding to the audio time period, and determining the expanded emotion video fragments as the video fragments of interest.
9. The method of any of claims 1-6, wherein generating a target video corresponding to the original video from the plurality of video segments of interest comprises:
acquiring a quality quantization value of each video segment of interest;
determining a video segment of interest with the maximum quality quantization value as a video segment head of the target video;
and generating a target video corresponding to the original video based on the title and each video segment of interest.
10. The method of claim 9, wherein generating a target video corresponding to the original video based on the slice header and each of the video segments of interest comprises:
Screening candidate video clips with preset clip quantity from the video clips of interest based on the quality quantization value of the video clips of interest;
and splicing the title and each candidate interested video segment according to the time sequence corresponding to each candidate interested video segment, and generating a target video corresponding to the original video.
11. A video generation apparatus, the apparatus comprising:
the acquisition module is used for acquiring an original video;
the identification module is used for identifying the interested elements in the original video to obtain a plurality of interested video clips; each video segment of interest comprises a preset mood word;
and the generation module is used for generating a target video corresponding to the original video according to the plurality of interesting video fragments.
12. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 10 when the computer program is executed.
13. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 10.
14. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 10.
CN202310844602.4A 2023-07-11 2023-07-11 Video generation method, device, equipment, medium and program product Pending CN117014680A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310844602.4A CN117014680A (en) 2023-07-11 2023-07-11 Video generation method, device, equipment, medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310844602.4A CN117014680A (en) 2023-07-11 2023-07-11 Video generation method, device, equipment, medium and program product

Publications (1)

Publication Number Publication Date
CN117014680A true CN117014680A (en) 2023-11-07

Family

ID=88575424

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310844602.4A Pending CN117014680A (en) 2023-07-11 2023-07-11 Video generation method, device, equipment, medium and program product

Country Status (1)

Country Link
CN (1) CN117014680A (en)

Similar Documents

Publication Publication Date Title
US10714145B2 (en) Systems and methods to associate multimedia tags with user comments and generate user modifiable snippets around a tag time for efficient storage and sharing of tagged items
CN107707931B (en) Method and device for generating interpretation data according to video data, method and device for synthesizing data and electronic equipment
CN110119711B (en) Method and device for acquiring character segments of video data and electronic equipment
US10108709B1 (en) Systems and methods for queryable graph representations of videos
CN113709561B (en) Video editing method, device, equipment and storage medium
WO2018107914A1 (en) Video analysis platform, matching method, and accurate advertisement push method and system
US8107689B2 (en) Apparatus, method and computer program for processing information
JP5510167B2 (en) Video search system and computer program therefor
JP4965250B2 (en) Automatic video summarization apparatus and method using fuzzy base characteristic support vector
CN101202864A (en) Player for movie contents
CN111314732A (en) Method for determining video label, server and storage medium
KR20180087970A (en) apparatus and method for tracking image content context trend using dynamically generated metadata
CN114041165A (en) Video similarity detection method, device and equipment
KR101812103B1 (en) Method and program for setting thumbnail image
WO2019128724A1 (en) Method and device for data processing
CN114363695B (en) Video processing method, device, computer equipment and storage medium
CN114117120A (en) Video file intelligent index generation system and method based on content analysis
CN114845149B (en) Video clip method, video recommendation method, device, equipment and medium
KR20150089598A (en) Apparatus and method for creating summary information, and computer readable medium having computer program recorded therefor
CN117201715A (en) Video generation method and device and readable storage medium
CN117014680A (en) Video generation method, device, equipment, medium and program product
CN115665508A (en) Video abstract generation method and device, electronic equipment and storage medium
CN113012723B (en) Multimedia file playing method and device and electronic equipment
CN111444386A (en) Video information retrieval method and device, computer equipment and storage medium
JP5894852B2 (en) Representative still image extraction apparatus and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination