CN111770359B - Event video clipping method, system and computer readable storage medium - Google Patents

Event video clipping method, system and computer readable storage medium Download PDF

Info

Publication number
CN111770359B
CN111770359B CN202010493124.3A CN202010493124A CN111770359B CN 111770359 B CN111770359 B CN 111770359B CN 202010493124 A CN202010493124 A CN 202010493124A CN 111770359 B CN111770359 B CN 111770359B
Authority
CN
China
Prior art keywords
video
event
playback
playback segment
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010493124.3A
Other languages
Chinese (zh)
Other versions
CN111770359A (en
Inventor
赵筠
尹东芹
吴双龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Biying Technology Co ltd
Jiangsu Suning Cloud Computing Co ltd
Original Assignee
Suning Cloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Cloud Computing Co Ltd filed Critical Suning Cloud Computing Co Ltd
Priority to CN202010493124.3A priority Critical patent/CN111770359B/en
Publication of CN111770359A publication Critical patent/CN111770359A/en
Application granted granted Critical
Publication of CN111770359B publication Critical patent/CN111770359B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The invention discloses a method, a system and a computer readable storage medium for editing a video of an event, wherein the method comprises the following steps: separating and correspondingly storing a video track and an audio track of the event video to be processed; identifying playback segments and non-playback segments in a video track; analyzing to obtain an explication voice tail point in an audio track corresponding to the non-playback segment, and intercepting the non-playback segment according to the explication voice tail point to obtain a non-playback segment material; filtering the playback special effect frame of the playback section to obtain playback section materials; intercepting the audio track according to the playback segment material and the non-playback segment material to obtain a corresponding audio material; merging the playback segment material and the non-playback segment material to obtain a material video, and synthesizing the audio material to the material video to obtain a target clip video; the invention supports various events, automatically extracts and reserves important playback segments, simultaneously accurately shortens the time length, quickly manufactures a large number of wonderful highlights with different dimensions, and has high accuracy.

Description

Event video clipping method, system and computer readable storage medium
Technical Field
The invention relates to the field of video data editing processing, in particular to a method and a system for editing event video and a computer readable storage medium.
Background
In traditional copyright event operation, some wonderful segments or highlights can be edited in the event live broadcast process by traditional media, and the wonderful segments or highlights can be quickly browsed and shared by a large number of watching groups. The existing method of the process usually depends on a large amount of manual video editing to perform editing production manually, and has the following problems: 1. timeliness is poor, and a full match splendid collection video often needs the editor to spend a large amount of time watch the match repeatedly and look for splendid passage, and experiment accurate positioning is repeated, and the loaded down with trivial details inefficiency of manufacturing process usually can export after the match is ended for a long time, influences the user and sees match experience. 2. The content yield is low, is limited by operation resources, the content production can only mainly ensure key events, the content yield of non-head events is particularly influenced, and the number of output videos of one event is limited.
With the enrichment of sports games at the present stage, the manual editing method cannot meet the requirement of professional fast editing and outputting of a large number of games, and an automatic video data screening method and an automatic video data screening device gradually appear at present, but the video edited by the automatic video data screening method and the automatic video data screening device at present is poor in accuracy.
Disclosure of Invention
The invention aims at: a method, system and computer readable storage medium for editing a video of an event are provided, which can produce and edit the video quickly and precisely.
The technical scheme of the invention is as follows:
in a first aspect, a method for editing a video of an event is provided, the method comprising:
separating and correspondingly storing a video track and an audio track of the event video to be processed;
identifying playback segments and non-playback segments in the video track;
analyzing to obtain an explication voice tail point in an audio track corresponding to the non-playback segment, and intercepting the non-playback segment according to the explication voice tail point to obtain a non-playback segment material;
filtering the playback special effect frame of the playback segment to obtain playback segment materials;
intercepting the audio track according to the playback segment material and the non-playback segment material to obtain a corresponding audio material;
and combining the playback segment material and the non-playback segment material to obtain a material video, and synthesizing the audio material to the material video to obtain a target clip video.
Further, the identifying playback segments and non-playback segments in the video track specifically includes:
identifying the event logo picture in the video frame of the video track by using a ResNet50 neural network to classify the playback special-effect frame, and recording the classification result in real time;
and identifying playback segments and non-playback segments in the classification result.
Further, the analyzing to obtain the commentary voice tail point in the audio track corresponding to the non-playback segment specifically includes:
setting a reserved time length range threshold T after an event occurs according to the event type;
and analyzing the explanation voice tail point of the corresponding commentator closest to the threshold T boundary of the time length range in the audio track by using a voice endpoint detection method based on the short-time energy and the short-time average zero crossing rate.
Further: the intercepting the non-playback segment according to the commentary voice endpoint to obtain non-playback segment materials specifically includes:
and intercepting the non-playback segment by taking the explication voice tail point as a segment end time point to obtain a non-playback segment material.
Further, the intercepting the audio track according to the playback section material and the non-playback section material to obtain the corresponding audio material specifically includes:
and correspondingly intercepting the audio track from the initial position of the audio track according to the sum of the durations of the non-playback segment material and the playback segment material to obtain the audio material.
Further, before the separating and correspondingly storing the video track and the audio track of the event video to be processed, the method comprises the following steps:
acquiring an event video to be processed; the method specifically comprises the following steps:
acquiring N video frames in a to-be-processed event video stream and a display timestamp corresponding to each video frame;
identifying an event time identifier in each video frame picture in the N video frames, and performing first time axis matching on the event time identifier and a display time stamp to position an effective event video;
analyzing the event data corresponding to the event video stream to be processed to obtain the structured data of all events, wherein the structured data comprises the occurrence time of the events in the events, and performing second time axis matching on the occurrence time and the display time stamps of the events;
acquiring a plurality of related events forming any target event from all events, determining a starting point display time stamp and an end point display time stamp of the target event according to the starting points and the end points of the plurality of related events, positioning and extracting all video frames of the target event in the effective event video, and clipping and pressing the video frames into an event video to be processed.
Further, the identifying an event time identifier in each video frame picture of the N video frames, and performing first time axis matching on the event time identifier and a display timestamp to locate an effective event video specifically includes:
and identifying and verifying the event time identifier in each video frame picture in the N video frames by utilizing a deep learning algorithm based on the Faster R-CNN neural network through an AI identification module, and performing first time axis matching on the verified event time identifier and a display time stamp to position an effective event video.
Further, the method also comprises the following steps: and merging the target clip video according to the occurrence time to obtain a collection video.
In a second aspect, there is provided an event video clip system, the system comprising:
the separation storage module is used for separating and storing the video track and the audio track of the event video to be processed;
the filtering module is used for filtering the playback special effect frames of the playback segments to obtain playback segment materials;
the recognition analysis module is used for recognizing the playback segment and the non-playback segment in the video track and analyzing to obtain the explication voice tail point in the audio track corresponding to the non-playback segment;
the intercepting module is used for intercepting the non-playback segment according to the interpretation voice tail point to obtain a non-playback segment material, and intercepting the audio track according to the playback segment material and the non-playback segment material to obtain a corresponding audio material;
and the merging module is used for merging the playback segment material and the non-playback segment material to obtain a material video, and synthesizing the audio material to the material video to obtain a target clip video.
In a third aspect, a computer-readable storage medium is provided, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any one of the first aspects.
The invention has the advantages that: the method supports various event types, can automatically extract and reserve important playback segments from event videos, simultaneously accurately remove playback special effect pictures and shorten the video duration, quickly produces and manufactures a large number of clipped videos on the basis of massive video resources, has high accuracy, is favorable for users to quickly know the summary and essence of the events, and provides favorable support for clipping, making, sharing and spreading of increasingly red short videos.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flow chart illustrating a method for editing a video clip of an event according to an embodiment of the present invention;
FIG. 2 is a block diagram of a highlight cutting system for a soccer game according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating transition effects achieved by a video editing method for an event according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating goal coordinates and range in a video editing method for an event according to an embodiment of the present invention;
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments that can be derived from the embodiments given herein by a person of ordinary skill in the art are intended to be within the scope of the present disclosure.
In the application, resNet, also called as residual neural network, refers to the idea of adding residual learning (residual learning) to the traditional convolutional neural network, so that the problems of gradient dispersion and precision reduction (training set) in a deep network are solved, the network can be deeper and deeper, the precision is ensured, and the speed is controlled.
Block ID: and (5) identifying the fragments.
CDN: an abbreviation of Content Delivery Network, namely a Content distribution Network, is a layer of intelligent virtual Network on the basis of the existing internet, which is formed by placing node servers at various parts of the Network, and can avoid bottlenecks and links on the internet which possibly affect the data transmission speed and stability as far as possible, so that the Content transmission is faster and more stable.
VAR: an abbreviation of Video assistance Referee refers to that an active Referee provides information to the Referee by playing back videos, and assists the Referee in correcting missing or errors clearly and obviously changing the race trend, and accuracy of the Referee is improved.
The event video clip needs to ensure the integrity of the event, and the video duration needs to be compressed on the premise of the integrity of the event. In order to solve the problems of low efficiency and high cost of manual clipping and low accuracy of automatic clipping (the video duration cannot be effectively compressed or the event is incomplete due to inaccurate clipping of the event), the application aims to provide a method for clipping a race video, which realizes automatic fine clipping operation, compresses the display duration of a single event, highlights the event focus and effectively shortens the overall duration of the video.
Example 1: an event video clipping method, as shown in fig. 1, includes:
101. separating and correspondingly storing a video track and an audio track of the event video to be processed;
specifically, the video track and the audio track of the event video to be processed are separated and correspondingly stored, and the picture sequence and the audio sequence are correspondingly stored.
Prior to this step, the method further comprises: the method for acquiring the event video to be processed specifically comprises the following steps:
101-1, acquiring N video frames in a video stream of the event to be processed and a display time stamp corresponding to each video frame;
the display time stamp is mainly used for measuring when the decoded video frame is displayed, that is, for marking the display time point of each frame in the manufactured target video.
The event video stream to be processed is a live event video stream provided by an event data provider or an on-demand event video stream downloaded through an on-demand video playing address, and the specific type of the event video stream is not limited in this embodiment.
N is at least 4. Take a soccer game as an example:
when the event video stream is a live event video stream, the video frames are acquired by the CDN, the CDN intercepts N video frames from the live event video stream at a preset fixed frequency and extracts a Block ID and a corresponding display timestamp of a TS fragment where each video frame is located; after the first half or the second half of the football game starts, the system acquires the first 6 video frames after each start of the video stream of the game to be processed, the blockID of the corresponding TS fragment and the display timestamp of each video frame from the CDN.
When the event video stream is the on-demand event video stream, the on-demand file is directly read through the AI identification module, the total duration of the on-demand file is fixed, the relative time points of the upper half field and the lower half field relative to the total duration of the video are randomly selected, the picture frame extraction is carried out at the time points, and the corresponding display timestamp is obtained through decoding. N is 4, namely 2 frames in the upper half field and the lower half field respectively, if the identification is wrong, 2 frames are continuously randomly extracted. If overtime exists, the frame extraction processing is also needed in the method as described above when the overtime is used for the upper half and the lower half of the overtime.
101-2, identifying the event time identifier in each video frame picture in the N video frames, and carrying out first time axis matching on the event time identifier and the display time stamp to position an effective event video.
Specifically, an AI identification module is used for identifying the event time identifier in each video frame picture in the N video frames by using a deep learning algorithm. In this embodiment, the event time identifier in each video frame is preferably obtained by the AI identification module identifying the display time of the score in each video frame in the event video stream by using a deep learning algorithm based on the Faster R-CNN neural network, and then checking the display time.
Because the video clips before the event starts exist in the video, the event time identification and the display time stamp are subjected to first time axis matching to position an event part, namely an effective event video, the event starting point is determined by acquiring the effective event time, and the positioning precision is improved.
101-3, analyzing the event data corresponding to the to-be-processed event video stream to obtain the structured data of all events, wherein the structured data comprises the occurrence time of the events in the events, and performing second time axis matching on the occurrence time and the display time stamps of the events;
specifically, when the event video stream is a live event video stream provided by an event data provider, the event data is obtained by the event data provider through feedback or through query;
and when the event video stream is the on-demand event video stream downloaded through the on-demand video playing address, the event data is obtained by directly inquiring the detailed event data of the historical events.
Since the event is interrupted or prolonged, it is necessary to extract the occurrence time of the event in the event, perform the second time axis matching between the occurrence time of the event in the event and the display time stamp, and achieve the alignment between the occurrence time and the display time.
101-4, acquiring a plurality of associated events forming any target event from all events, determining a starting point display time stamp and an end point display time stamp of the target event according to the starting points and the end points of the plurality of associated events, positioning and extracting all video frames of the target event in the effective event video, and clipping and pressing the video frames into an event video to be processed.
Specifically, a core event is determined, and a relevant event of the core event is obtained through a preset association rule, such as: when the core event is a goal, the related events comprise passing, dribbling, passing and middle circle opening events which may occur before and after the goal. The core event and its related events together constitute a related event.
In this embodiment, the preset association rule is not specifically limited.
After the event video to be processed is obtained, the event video to be processed can be screened according to a preset weight rule; for example, when the event is a soccer event, the preset weighting rules at least include: the top half start and the bottom half end, and at least one of a goal, a Highlight event given by the event data provider, a VAR, a threat goal, a red and yellow tile, a point ball. For a football game without goal, the preset weight rules comprise the beginning of the first half and the end of the second half, and at least comprise threatening shots, and the degree of threatening shots needs to be defined, see the diagram of the coordinates and range of the goal in fig. 4, if the football track falls in a circle of Close area near the outer frame of the goal after the shots, the goal is judged to be threatening shots, and other tracks far away from the goal are not screened and recorded.
It should be noted that the preset weighting rule is set according to the event content, the existing editing habit and experience, and the preset weighting rule is not specifically limited in this embodiment.
102. Identifying playback segments and non-playback segments in the video track; the method specifically comprises the following steps:
identifying the event logo picture in the video frame of the video track by using a ResNet50 neural network to classify the playback special-effect frame, and recording the classification result in real time;
and identifying playback segments and non-playback segments in the classification result.
More specifically, the classification result of the real-time recording is analyzed, if the inter-frame distance of the playback special effect frames is smaller than a preset value, the same segment of playback special effect is judged, and the video content between the two segments of playback special effects is judged to be a playback segment, so that the playback segment is identified.
103. Analyzing to obtain an explication voice tail point in an audio track corresponding to the non-playback segment, and intercepting the non-playback segment according to the explication voice tail point to obtain a non-playback segment material;
the analyzing to obtain the comment speech end point in the audio track corresponding to the non-playback segment specifically includes:
setting a reserved time length range threshold T after an event occurs according to the event type;
and analyzing the explanation voice tail point of the corresponding commentator closest to the threshold T boundary of the time length range in the audio track by using a voice endpoint detection method based on the short-time energy and the short-time average zero crossing rate.
The intercepting the non-playback segment according to the narration voice endpoint to obtain a non-playback segment material specifically includes:
and intercepting the non-playback segment by taking the explication voice tail point as a segment end time point to obtain a non-playback segment material.
104. Filtering the playback special effect frame of the playback segment to obtain playback segment materials;
105. intercepting the audio track according to the playback segment material and the non-playback segment material to obtain a corresponding audio material; the method specifically comprises the following steps:
and correspondingly intercepting the audio track from the initial position of the audio track according to the sum of the durations of the non-playback section materials and the playback section materials to obtain audio materials. More specifically, the purpose of capturing the audio track to obtain the audio material is to enable the sound and picture synchronization of the part of the non-playback segment, and the embodiment of the present invention is not limited to whether the part corresponding to the picture of the playback segment is the picture period in the original event video to be processed.
106. And combining the playback segment material and the non-playback segment material to obtain a material video, and synthesizing the audio material to the material video to obtain a target clip video.
The method further comprises the following steps: and merging the target clip video according to the occurrence time to obtain a collection video.
The event video editing method provided by the embodiment supports various event types, can automatically extract and reserve important playback segments from event videos and accurately remove playback special effect pictures to shorten the duration, quickly produces and manufactures a large amount of videos on the basis of massive video resources, has high accuracy, is beneficial to users to quickly know the summary and essence of events, and provides beneficial support for editing, manufacturing, sharing and spreading of increasingly red short videos.
Example 2: the present embodiment provides an event video clip system, which, as shown in fig. 2, includes:
a separation storage module 21, configured to separately store a video track and an audio track of the event video to be processed;
a filtering module 22, configured to filter the playback special effect frames of the playback segment to obtain playback segment material;
the recognition analysis module 23 is configured to recognize a playback segment and a non-playback segment in the video track, and analyze the playback segment and the non-playback segment to obtain an end point of the narration voice in the audio track corresponding to the non-playback segment;
the intercepting module 24 is configured to intercept the non-playback segment according to the narration speech endpoint to obtain a non-playback segment material, and intercept the audio track according to the playback segment material and the non-playback segment material to obtain an audio material;
and the merging module 25 is configured to merge the playback segment material and the non-playback segment material to obtain a material video, and synthesize the audio material onto the material video.
The beneficial effects of the event video clipping system provided in this embodiment for implementing the event video clipping method provided in embodiment 1 are the same as those of the event video clipping method provided in embodiment 1, and are not described herein again.
It should be noted that: in the event video clipping system provided in the above embodiment, when executing an event video clipping method, only the division of the above function modules is used for illustration, and in practical applications, the function distribution may be completed by different function modules according to needs, that is, the internal structure of the device is divided into different function modules to complete all or part of the functions described above. In addition, the event video editing system and the event video editing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.
Embodiment 3, this embodiment provides a computer-readable storage medium having a computer program stored thereon, the computer program when executed by a processor implementing any of the steps of:
acquiring an event video to be processed;
separating and correspondingly storing a video track and an audio track of the event video to be processed;
identifying playback segments and non-playback segments in the video track;
analyzing to obtain an explication voice tail point in an audio track corresponding to the non-playback segment, and intercepting the non-playback segment according to the explication voice tail point to obtain a non-playback segment material;
filtering the playback special effect frame of the playback segment to obtain playback segment materials;
intercepting the audio track according to the playback segment material and the non-playback segment material to obtain a corresponding audio material;
merging the playback segment material and the non-playback segment material to obtain a material video, and synthesizing the audio material to the material video to obtain a target clip video;
and merging the target clip video according to the occurrence time to obtain a collection video.
The beneficial effects of a computer-readable storage medium provided in this embodiment for processing and executing the steps of the event video clipping method provided in embodiment 1 are the same as those of the event video clipping method provided in embodiment 1, and thus, the description thereof is omitted here.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be, but is not limited to, a read-only memory, a magnetic or optical disk, and the like.
It should be understood that the above-mentioned embodiments are only illustrative of the technical concepts and features of the present invention, and are intended to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the scope of the present invention. All modifications made according to the spirit of the main technical scheme of the invention are covered in the protection scope of the invention.

Claims (8)

1. A method for editing a video of an event, the method comprising:
separating and correspondingly storing a video track and an audio track of the event video to be processed;
identifying playback segments and non-playback segments in the video track, comprising: identifying the match logo pictures in the video frames of the video tracks by utilizing a ResNet50 neural network to classify and play back special effect frames, and recording the classification result in real time;
identifying playback segments and non-playback segments in the classification result;
analyzing to obtain an explication voice tail point in an audio track corresponding to the non-playback segment, and intercepting the non-playback segment according to the explication voice tail point to obtain a non-playback segment material;
the analyzing to obtain the commentary voice tail point in the audio track corresponding to the non-playback segment includes: setting a reserved time length range threshold T after an event occurs according to the event type;
analyzing the explication voice tail point of the explicator closest to the threshold T boundary of the time length range in the corresponding audio track by using a voice end point detection method based on short-time energy and short-time average zero crossing rate;
filtering the playback special effect frame of the playback segment to obtain playback segment material;
intercepting the audio track according to the playback segment material and the non-playback segment material to obtain a corresponding audio material;
and combining the playback segment material and the non-playback segment material to obtain a material video, and synthesizing the audio material to the material video to obtain a target clip video.
2. A method of video clipping for an event according to claim 1, wherein: the intercepting the non-playback segment according to the commentary voice endpoint to obtain non-playback segment materials specifically includes:
and intercepting the non-playback segment by taking the explication voice tail point as a segment end time point to obtain a non-playback segment material.
3. The event video clipping method according to claim 1, wherein the clipping the audio track according to the playback segment material and the non-playback segment material to obtain the corresponding audio material specifically comprises:
and correspondingly intercepting the audio track from the initial position of the audio track according to the sum of the durations of the non-playback section materials and the playback section materials to obtain audio materials.
4. An event video clipping method as claimed in claim 1, further comprising, before said separating and correspondingly storing the video track and the audio track of the event video to be processed:
acquiring an event video to be processed; the method specifically comprises the following steps:
acquiring N video frames in a video stream of an event to be processed and a display timestamp corresponding to each video frame;
identifying an event time identifier in each video frame picture in the N video frames, and performing first time axis matching on the event time identifier and a display time stamp to position an effective event video;
analyzing the event data corresponding to the event video stream to be processed to obtain the structured data of all events, wherein the structured data comprises the occurrence time of the events in the events, and performing second time axis matching on the occurrence time and the display time stamps of the events;
acquiring a plurality of related events forming any target event from all events, determining a starting point display time stamp and an end point display time stamp of the target event according to the starting points and the end points of the plurality of related events, positioning and extracting all video frames of the target event in the effective event video, and clipping and pressing the video frames into an event video to be processed.
5. The event video clipping method according to claim 4, wherein the identifying the event time identifier in each of the N video frames, and performing a first time axis matching between the event time identifier and a display time stamp to locate a valid event video specifically comprises:
and identifying and verifying the event time identifier in each video frame picture in the N video frames by utilizing a deep learning algorithm based on the Faster R-CNN neural network through an AI identification module, and performing first time axis matching on the verified event time identifier and a display time stamp to position an effective event video.
6. A method for video clipping of an event as claimed in claim 1, the method further comprising: and merging the target clip video according to the occurrence time to obtain a collection video.
7. An event video clipping system, the system comprising:
the separation storage module is used for separating and correspondingly storing a video track and an audio track of the event video to be processed;
the recognition and analysis module is used for recognizing a playback segment and a non-playback segment in the video track and analyzing to obtain an explication voice tail point in an audio track corresponding to the non-playback segment;
the recognition analysis module comprises: the classification recording unit is used for identifying the event logo picture in the video frame of the video track by utilizing a ResNet50 neural network so as to classify and play back the special effect frame, and recording the classification result in real time;
the recognition unit is used for recognizing the playback segments and the non-playback segments in the classification result;
the setting unit is used for setting a reserved duration range threshold T after an event occurs according to the event type;
the analysis unit is used for analyzing the explication voice tail point of the explicator closest to the threshold T boundary of the duration range in the corresponding audio track by utilizing a voice endpoint detection method based on short-time energy and short-time average zero crossing rate;
the filtering module is used for filtering the playback special effect frames of the playback segments to obtain playback segment materials;
the intercepting module is used for intercepting the audio track according to the playback segment material and the non-playback segment material to obtain a corresponding audio material;
and the merging module is used for merging the playback segment material and the non-playback segment material to obtain a material video, and synthesizing the audio material to the material video to obtain a target clip video.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
CN202010493124.3A 2020-06-03 2020-06-03 Event video clipping method, system and computer readable storage medium Active CN111770359B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010493124.3A CN111770359B (en) 2020-06-03 2020-06-03 Event video clipping method, system and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010493124.3A CN111770359B (en) 2020-06-03 2020-06-03 Event video clipping method, system and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111770359A CN111770359A (en) 2020-10-13
CN111770359B true CN111770359B (en) 2022-10-11

Family

ID=72720600

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010493124.3A Active CN111770359B (en) 2020-06-03 2020-06-03 Event video clipping method, system and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111770359B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113515997B (en) * 2020-12-28 2024-01-19 腾讯科技(深圳)有限公司 Video data processing method and device and readable storage medium
CN112839236A (en) * 2020-12-31 2021-05-25 北京达佳互联信息技术有限公司 Video processing method, device, server and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017177859A1 (en) * 2016-04-11 2017-10-19 腾讯科技(深圳)有限公司 Video playing method and device, and computer readable storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599179B (en) * 2009-07-17 2011-06-01 北京邮电大学 Method for automatically generating field motion wonderful scene highlights
US20120017153A1 (en) * 2010-07-15 2012-01-19 Ken Matsuda Dynamic video editing
US20160037217A1 (en) * 2014-02-18 2016-02-04 Vidangel, Inc. Curating Filters for Audiovisual Content
CN109194978A (en) * 2018-10-15 2019-01-11 广州虎牙信息科技有限公司 Live video clipping method, device and electronic equipment
CN109862388A (en) * 2019-04-02 2019-06-07 网宿科技股份有限公司 Generation method, device, server and the storage medium of the live video collection of choice specimens
CN110188241B (en) * 2019-06-04 2023-07-25 成都索贝数码科技股份有限公司 Intelligent manufacturing system and manufacturing method for events
CN110012348B (en) * 2019-06-04 2019-09-10 成都索贝数码科技股份有限公司 A kind of automatic collection of choice specimens system and method for race program
CN110087097B (en) * 2019-06-05 2021-08-03 西安邮电大学 Method for automatically removing invalid video clips based on electronic endoscope
CN110572722B (en) * 2019-09-26 2021-04-16 腾讯科技(深圳)有限公司 Video clipping method, device, equipment and readable storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017177859A1 (en) * 2016-04-11 2017-10-19 腾讯科技(深圳)有限公司 Video playing method and device, and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Sawitchaya Tippaya ; Suchada Sitjongsataporn ; Tele Tan ; Masood Me.Multi-Modal Visual Features-Based Video Shot Boundary Detection.《IEEE Access》.2017,第5卷 *

Also Published As

Publication number Publication date
CN111770359A (en) 2020-10-13

Similar Documents

Publication Publication Date Title
CN102547141B (en) Method and device for screening video data based on sports event video
CN106162223B (en) News video segmentation method and device
Hanjalic Adaptive extraction of highlights from a sport video based on excitement modeling
AU2021200219A1 (en) System and method for creating and distributing multimedia content
US8195038B2 (en) Brief and high-interest video summary generation
CN102222103B (en) Method and device for processing matching relationship of video content
CN111770359B (en) Event video clipping method, system and computer readable storage medium
CN108235141A (en) Live video turns method, apparatus, server and the storage medium of fragmentation program request
CN111757147B (en) Method, device and system for event video structuring
US9426411B2 (en) Method and apparatus for generating summarized information, and server for the same
CN104185088B (en) A kind of method for processing video frequency and device
CN103165151A (en) Method and device for playing multi-media file
KR20200023013A (en) Video Service device for supporting search of video clip and Method thereof
Merler et al. Automatic curation of golf highlights using multimodal excitement features
CN104320670A (en) Summary information extracting method and system for network video
CN106210773B (en) The method and system of barrage are played in local video
CN114845149A (en) Editing method of video clip, video recommendation method, device, equipment and medium
CN114782879B (en) Video identification method and device, computer equipment and storage medium
CN114339451B (en) Video editing method, device, computing equipment and storage medium
CN111741333B (en) Live broadcast data acquisition method and device, computer equipment and storage medium
CN111615008B (en) Intelligent abstract generation and subtitle reading system based on multi-device experience
US8214854B2 (en) Method and system for facilitating analysis of audience ratings data for content
CN115022663A (en) Live stream processing method and device, electronic equipment and medium
Liu et al. Brief and high-interest video summary generation: evaluating the AT&T labs rushes summarizations
Wang et al. Automatic Set List Identification and Song Segmentation for Full-Length Concert Videos.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: No.1-1 Suning Avenue, Xuzhuang Software Park, Xuanwu District, Nanjing, Jiangsu Province, 210000

Patentee after: Jiangsu Suning cloud computing Co.,Ltd.

Country or region after: China

Address before: No.1-1 Suning Avenue, Xuzhuang Software Park, Xuanwu District, Nanjing, Jiangsu Province, 210000

Patentee before: Suning Cloud Computing Co.,Ltd.

Country or region before: China

CP03 Change of name, title or address
TR01 Transfer of patent right

Effective date of registration: 20240709

Address after: Room 3104, Building A5, No. 3 Gutan Avenue, Economic Development Zone, Gaochun District, Nanjing City, Jiangsu Province, 210000

Patentee after: Jiangsu Biying Technology Co.,Ltd.

Country or region after: China

Address before: No.1-1 Suning Avenue, Xuzhuang Software Park, Xuanwu District, Nanjing, Jiangsu Province, 210000

Patentee before: Jiangsu Suning cloud computing Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right