CN110913240A - Video interception method, device, server and computer readable storage medium - Google Patents

Video interception method, device, server and computer readable storage medium Download PDF

Info

Publication number
CN110913240A
CN110913240A CN201911215289.8A CN201911215289A CN110913240A CN 110913240 A CN110913240 A CN 110913240A CN 201911215289 A CN201911215289 A CN 201911215289A CN 110913240 A CN110913240 A CN 110913240A
Authority
CN
China
Prior art keywords
video
song
time
frame
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911215289.8A
Other languages
Chinese (zh)
Other versions
CN110913240B (en
Inventor
吴瑞平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kugou Computer Technology Co Ltd
Original Assignee
Guangzhou Kugou Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kugou Computer Technology Co Ltd filed Critical Guangzhou Kugou Computer Technology Co Ltd
Priority to CN201911215289.8A priority Critical patent/CN110913240B/en
Publication of CN110913240A publication Critical patent/CN110913240A/en
Application granted granted Critical
Publication of CN110913240B publication Critical patent/CN110913240B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/231Content storage operation, e.g. caching movies for short term storage, replicating data over plural servers, prioritizing data for deletion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/4508Management of client data or end-user data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Abstract

The application discloses a video intercepting method, a video intercepting device, a server and a computer readable storage medium, and belongs to the technical field of videos. According to the method, the initial time and the termination time of the video segment of each song are obtained by identifying the time characteristics of the video data of the video to be processed, and the video data of the video segment of each song in the video data is intercepted based on the initial time and the termination time of the video segment of each song, so that the video to be processed can be intercepted without the need of a song ID in a video stream, and the video segment of the song can be intercepted from any video to be processed, therefore, the video intercepting process provided by the method has universality in video intercepting.

Description

Video interception method, device, server and computer readable storage medium
Technical Field
The present application relates to the field of network technologies, and in particular, to a video capture method, apparatus, server, and computer-readable storage medium.
Background
With the fast pace of life at any time, when consuming videos, a user tends to watch only some wonderful segments or interested parts in the whole videos, but not the whole videos, for example, videos of a concert, the user may only be interested in video segments of a song, and the server can intercept the videos and intercept video segments of a certain song.
Currently, the video capture process may be: for a video stream of a video consisting of video segments of a plurality of songs, a client marks messages in the video stream, specifically, the client adds the same song Identity (ID) in messages used for carrying video data of the same song, and the song IDs added in messages of different songs are not consistent; the client sends the video stream carrying the song ID to a server; the server intercepts the message with the same song ID in the video stream, and acquires a video clip of a song based on the video data carried by the intercepted message.
In the video intercepting process, only if the client marks the message in the video stream, the server can intercept the message with the same song ID in the video stream based on the song ID marked in the message, and once the client does not identify the song ID in the message of the video stream, the server cannot intercept the message with the same song ID from the video stream, and cannot intercept a video clip of a certain song from the video.
Disclosure of Invention
The application provides a video capturing method, a video capturing device, a server and a computer readable storage medium, which can solve the problem that the video capturing process does not have universality. The technical scheme is as follows:
in a first aspect, a video capture method is provided, where the method includes:
acquiring video data of a video to be processed, wherein the video to be processed comprises a video clip of at least one song;
performing time characteristic identification on the video data to obtain the initial time and the termination time of the video clip of each song in the video to be processed;
and intercepting the video data based on the initial time and the termination time of the video clip of each song to obtain the video data of the video clip of each song.
Optionally, the performing temporal feature recognition on the video data includes:
acquiring the time characteristics of the video to be processed based on the video data;
and processing the time characteristics by using a song recognition model to obtain the initial time and the termination time of the video clip of each song.
Optionally, the acquiring video data of the video to be processed includes:
based on the target condition, inquiring the stored video data of each video to obtain the video data of at least one video meeting the target condition;
and determining the video data of each video in the at least one video as the video data of one video to be processed.
Optionally, the target condition comprises at least one of:
the storage time of the video data of the video is within the target time period;
the singer of the song in the video is the target user;
the song type of the song in the video is the target song type.
Optionally, the intercepting the video data based on the initial time and the end time of the video clip of each song includes:
analyzing frame data corresponding to each video frame in the video data to obtain a timestamp of each video frame;
determining at least one group of video frames based on the timestamp of each video frame and the initial time and the ending time of the video clip of each song, wherein each group of video frames corresponds to the video clip of one song;
and respectively storing frame data of each group of video frames in the video data into a song video file, wherein the song video file is used for storing video data of a video clip of a song.
Optionally, the determining at least one group of video frames based on the timestamp of each video frame, the initial time and the end time of the video clip of each song comprises:
and for the video segment of any one of the at least one song, when the time stamp of any video frame is greater than or equal to the initial time of the video segment of any one song and the time stamp of any video frame is less than or equal to the termination time of the video segment of any one song, determining any video frame as one video frame in any group of video frames, wherein any group of video frames corresponds to the video segment of any one song.
Optionally, the determining at least one group of video frames based on the timestamp of each video frame, the initial time and the end time of the video clip of each song further comprises:
when the time stamp of any video frame is smaller than the initial time of the video segment of any song and the difference value between the time stamp of any video frame and the initial time of the video segment of any song is smaller than a preset value, determining any video frame as one video frame in any group of video frames;
and when the time stamp of any video frame is greater than the termination time of the video segment of any song and the difference value between the time stamp of any video frame and the termination time of the video segment of any song is less than the preset value, determining any video frame as one video frame in any group of video frames.
In a second aspect, there is provided a video capture apparatus, the apparatus comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring video data of a video to be processed, and the video to be processed comprises a video clip of at least one song;
the identification module is used for identifying the time characteristics of the video data to obtain the initial time and the termination time of the video clip of each song in the video to be processed;
and the intercepting module is used for intercepting the video data based on the initial time and the termination time of the video clip of each song to obtain the video data of the video clip of each song.
Optionally, the identification module is configured to:
acquiring the time characteristics of the video to be processed based on the video data;
and processing the time characteristics by using a song recognition model to obtain the initial time and the termination time of the video clip of each song.
Optionally, the obtaining module is configured to:
based on the target condition, inquiring the stored video data of each video to obtain the video data of at least one video meeting the target condition;
and determining the video data of each video in the at least one video as the video data of one video to be processed.
Optionally, the target condition comprises at least one of:
the storage time of the video data of the video is within the target time period;
the singer of the song in the video is the target user;
the song type of the song in the video is the target song type.
Optionally, the intercepting module includes:
the analysis unit is used for analyzing frame data corresponding to each video frame in the video data to obtain a time stamp of each video frame;
a determining unit, configured to determine at least one group of video frames based on the timestamp of each video frame and the initial time and the end time of the video clip of each song, where each group of video frames corresponds to a video clip of a song;
and the storage unit is used for respectively storing the frame data of each group of video frames in the video data into one song video file, and the song video file is used for storing the video data of a video clip of one song.
Optionally, the determining unit is configured to:
and for the video segment of any one of the at least one song, when the time stamp of any video frame is greater than or equal to the initial time of the video segment of any one song and the time stamp of any video frame is less than or equal to the termination time of the video segment of any one song, determining any video frame as one video frame in any group of video frames, wherein any group of video frames corresponds to the video segment of any one song.
Optionally, the determining unit is further configured to:
when the time stamp of any video frame is smaller than the initial time of the video segment of any song and the difference value between the time stamp of any video frame and the initial time of the video segment of any song is smaller than a preset value, determining any video frame as one video frame in any group of video frames;
and when the time stamp of any video frame is greater than the termination time of the video segment of any song and the difference value between the time stamp of any video frame and the termination time of the video segment of any song is less than the preset value, determining any video frame as one video frame in any group of video frames.
In a third aspect, a server is provided, which includes a processor and a memory, where at least one instruction is stored in the memory, and the instruction is loaded and executed by the processor to implement the operations performed by the above video capture method.
In a fourth aspect, a computer-readable storage medium is provided, in which at least one instruction is stored, and the instruction is loaded and executed by a processor to implement the operations performed by the above-mentioned video capture method.
In a fifth aspect, there is provided a computer program product comprising one or more instructions executable by a processor of a server to perform the method steps of any of the video capturing methods of the first aspect, which method steps may include:
acquiring video data of a video to be processed, wherein the video to be processed comprises a video clip of at least one song;
performing time characteristic identification on the video data to obtain the initial time and the termination time of the video clip of each song in the video to be processed;
and intercepting the video data based on the initial time and the termination time of the video clip of each song to obtain the video data of the video clip of each song.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
the method comprises the steps of carrying out time characteristic identification on video data of a video to be processed to obtain initial time and ending time of a video clip of each song, and intercepting the video data of the video clip of each song in the video data based on the initial time and the ending time of the video clip of each song, so that the video to be processed can be intercepted without the need of song ID in a video stream, and the video clip of the song can be intercepted from any video to be processed.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of a video capture system provided in an embodiment of the present application;
fig. 2 is a flowchart of a video capture method provided in an embodiment of the present application;
fig. 3 is a flowchart of a video capture method provided in an embodiment of the present application;
fig. 4 is a schematic structural diagram of a server provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of a video capture device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of a video capture system provided in an embodiment of the present application, and referring to fig. 1, the video capture system 100 includes at least one terminal 101 and a server 102. Each terminal 101 is adapted to push a video stream of a video, which may be a video comprising a video clip of at least one song, e.g. a video of a concert, a live video, etc., to the server 102. The terminal may be any computer device such as a mobile phone and a computer, and the embodiment of the present application does not specifically limit the terminal 101.
The server 102 is configured to receive a video stream of videos pushed by the client 101, and for a received video stream of any video, the video stream may be composed of video data of the any video; the server 102 is further configured to perform temporal feature recognition on the video data of any one of the videos in the video stream, obtain a start time and an end time of a video segment of each song in the any one video, and intercept the video segment of each song from the any one video based on the start time and the end time of the video segment of each song.
In some possible implementations, the video intercepting system 100 further includes a target server 103, where the target server 103 is configured to receive and store a video stream sent by the terminal 101, and is further configured to provide functions of querying video data and downloading video data, and of course, the target server may also provide other services, and this embodiment of the present application is not particularly limited to this other service.
The server 102 may also download video data to be processed from the target server, store the video data of each video locally, and store the storage information of the video data of each video in a database; then, based on the query and download function provided by the database, querying and downloading the video data of the video to be processed from the stored video data, wherein the database can be a local database of the server or a cloud database; finally, the server 102 performs temporal feature analysis on the video data of the video to be processed, so as to intercept and process the video to be processed subsequently.
In order to embody the process of intercepting the video data to be processed by the server, refer to a flowchart of a video intercepting method provided by the embodiment of the present application shown in fig. 2, where the flow of the method may include the following steps 201 and 206.
201. The server receives a video stream of a plurality of videos, each video including a video clip of at least one song.
The video streams of the multiple videos may be video streams sent by at least one client to the server, that is, video streams pushed by multiple clients to the server. Since one video may be composed of a plurality of video frames, video data in a video stream of one video may include frame data of a plurality of video frames.
Any of the plurality of videos may include a video clip of a song, for example, a video clip of a 10-minute video that includes a song with a duration of 5 minutes, wherein the video clip of the 3 rd minute to the 8 th minute in the video is a video clip of a song and the other video clips of the video are not video clips of songs.
It should be noted that, when the initial time of the video is 0 for convenience of description, each time point in the video is a relative time relative to 0, for example, a video segment between 0 th time and 3 rd minute of the video is, that is, a first 3 minute video segment of the video, and a video segment between 3 rd minute and 8 th minute of the video is, that is, a video segment other than the first 3 minute video segment in the first 8 minute video segment of the video.
In some embodiments, any video may also include video clips of a plurality of songs, for example, a 1 hour long concert video, a 20 th to 25 th minute of the concert video being a video clip of a first song, a 30 th to 40 th minute being a video clip of a second song, and a 50 th to 55 th minute being a video clip of a third song.
It should be noted that the video clip not being a song in any video may be a video clip of a singer interacting with an audience, or may be a video clip used for promoting any video, and the embodiment of the present application does not limit the specific content of the video clip not being a song in any video.
202. The server stores the video data in the video stream of each video.
The server may store the video data of each video in association with the video information of each video. In one possible implementation manner, the server may store the video data of each video in the local disk, and store the storage address of the video data of each video in the local disk and the video information of each video in association in the database, so that the subsequent server may obtain the video data of each video in the local disk based on the correspondence between the storage address of the video data of each video and the video information of each video.
The storage address of the video data of each video in the local disk is also the storage information of the video data of each video as described above. The video information of a video is used to describe the basic features of a video. The video information for a video may include a user identification, a video duration, and a target time for video data. The user id may include a first user id and a second user id, where the first user id is used for indicating a user who uploads the video and is also used for indicating a user who makes the video, and the second user id is used for indicating a singer of each song in the video, for example, the second user id is 1.xxx and is used for indicating a singer of the first song in the video as xxx. The embodiment of the present application does not specifically limit the representation manner of the user identifier. The video identifier is used for uniquely indicating the video, and the video stream of the video may also carry the video identifier of the video, so that the server determines, through the video identifier carried in the video stream, which video stream the video stream is, that is, the video identifier is also used for uniquely indicating one video stream.
The video duration is used to indicate the playing duration of the video, and the server may use a difference between a timestamp of a first video frame of the video and a timestamp of a last video frame of the video as the video duration of the video. The target time of the video data may be a time when the server stores the video data in the local disk, a time when the server receives the video stream, or a time when the user uploads the video stream.
In some embodiments, the video information may also include the number of video segments of songs in the video and the song type of each song. Still taking the concert video in step 201 as an example, the concert video includes video segments of 3 songs, and thus the number of video segments of songs in the concert video is 3. The song type of each song is used to indicate the song genre of a song, such as a soothing type song, a rock type song, etc.
203. And the server inquires the video data of each stored video based on the target condition to obtain the video data of at least one video meeting the target condition.
The target conditions include at least one of: the storage time of the video data of the video is within the target time period; the singer of the song in the video is the target user; the song type in the video is the target song type.
This step 203 may be triggered by an operation of a user, and in a possible implementation manner, the server may display a query interface of the database, and the user may input at least one of a target time period, a target user, and a target song type in the query interface and click a query button, and then, after detecting an action of the user clicking the query button, the server is triggered to execute this step 203 based on content output by the user in the query interface.
The server may determine whether the video data of each video meets the target condition according to the video information of each video stored in the database. When the target condition is that the storage time of the video data of the video is within the target time period, if the target time in the video information of any video is within the target time period, the video data of any video conforms to the target condition; when the target condition is that the singer of the song in the video is the target user, if the user indicated by the second user identifier in the video information of any video is the target user, the video data of any video conforms to the target condition; and when the song type of the song in the video is the target song type, if the song type in the video information of any video is the target song type, the video data of any video meets the target condition.
When the target condition is the above three kinds of conditions and the content recorded in the video information of any one video conforms to the above three kinds of conditions, the video data of any one of the videos meets the target condition, for example, when the target condition is that the storage time of the video data of the video is within the target time period, and the singer of the song in the video is the target user, if the target time in the video information of any video is within the target time period, but the user indicated by the second user identifier in the video information is not the target user, the video data of any video does not meet the target condition, only when the target time in the video information of any video is within the target time period, and when the user indicated by the second user identifier in the video information is the target user, the video data of any video meets the target condition.
204. The server determines the video data of each video in at least one video as the video data of one video to be processed.
The video data of the at least one video all meets the target condition, and the server can directly regard each video in the at least one video as a video to be processed, and the video data of each video is also the video data of one video to be processed. After determining the video data of the video to be processed, the server may obtain the video data of the video to be processed from the local disk based on the storage address of the video data of the video to be processed stored in the database. It should be noted that the process shown in step 203-204 is also a process of acquiring video data of the video to be processed.
205. And the server identifies the time characteristics of the video data to obtain the initial time and the termination time of the video clip of each song in the video to be processed.
The server can obtain the initial time and the termination time of the video clip of each song in the video to be processed based on the song recognition model and the video data of the video to be processed. The song recognition model may implement an artificial intelligence algorithm, and may be a DDN (deep neural network) model, such as any one of an RNN (recurrent neural network) model, an RNN (long-short-term memory) model, and a CNN (convolutional neural network) model.
In one possible implementation, this step 205 may be implemented by the following process shown in steps 2051-2052.
Step 2051, the server obtains the time characteristic of the video to be processed based on the video data.
The time characteristic of the video data is used to indicate the characteristic of the video to be processed in time, and may be represented by a time characteristic matrix, for example, for the video data, each time point corresponds to one frame of image, each image corresponds to one matrix, and the server arranges the matrices corresponding to all the time points in time sequence to obtain one time characteristic matrix.
The server may obtain the temporal feature matrix based on the video data in various ways, for example, the temporal feature matrix of the video data may be obtained through CNN. Because the video to be processed comprises a plurality of frames of images, the time characteristic matrix of one frame of image in the video to be processed can be obtained by inputting the frame of image into the CNN extraction hidden layer, and then the server arranges the characteristic matrix corresponding to each frame of image in the video file to be processed according to time, so that the time characteristic matrix of the video to be processed is obtained.
The server can also obtain a difference image of one frame of image and the previous frame of image in the video to be processed, input the difference image into the CNN, extract the hidden layer, and thus obtain the time characteristic matrix of the video to be processed.
Step 2052, the server processes the time characteristic by using the song recognition model to obtain the initial time and the termination time of the video segment of each song in the video to be processed.
The server may input a temporal feature vector indicating the temporal feature into the song recognition model, perform temporal feature recognition and analysis on the input temporal feature vector by the song recognition model, and output an initial time and an end time of a video clip of each song of the to-be-processed video.
The song recognition model can be trained by an initial recognition model, and the initial recognition model can learn the time feature recognition capability of the input time feature vector in the training process, so that the song recognition model obtained after the initial recognition model is trained can perform correct time feature recognition on the input time feature vector.
Wherein, the training process of the initial recognition model may be: the server obtains a sample time feature vector of each sample video data based on the sample video data of the plurality of sample videos; for a sample time feature vector of any sample video of the plurality of sample videos, the server marks an initial time and an ending time of a video segment of each song in the any sample video in the sample time feature vector, and takes the marked initial time and ending time of the video segment of each song as a starting expected time and an ending expected time of the video segment of each song respectively; the method comprises the steps that a server inputs a time characteristic vector of each sample video into an initial recognition model, the initial recognition model carries out time characteristic recognition on the time characteristic vector of each sample video based on current model parameters to obtain the start time and the end time of a video clip of each song in each sample video, the initial recognition model carries out evaluation calculation on the start time and the end time of the video clip of each song in each recognized sample video and the marked start expected time and end expected time based on an evaluation function to obtain an evaluation value of the initial recognition model in the training, and the evaluation value is used for indicating the accuracy rate of the start time and the end time of the video clip of each song recognized by the initial recognition model based on the current model parameters; and when the evaluation value is smaller than a preset value, updating model parameters by the initial recognition model based on the evaluation value, and performing the next round of training based on the updated model parameters until the evaluation value after training is larger than or equal to the preset value, finishing the training by the initial recognition model, wherein the finally trained model is the song recognition model. The preset value is not specifically limited in the embodiment of the present application.
In one possible implementation, the functions of obtaining the model of temporal characteristics and the target song recognition model may be implemented by one target recognition model, which may output a start time and an end time of each song in the video data based on the input video data. The server may directly output sample video data of a plurality of sample videos to the initial target recognition model, and train the initial target model until the target recognition model is obtained through training, where a training process of the server on the initial target recognition model is the same as a training process of the initial target recognition model, and here, the training process of the initial target recognition model is not described in detail in the embodiments of the present application.
In a possible implementation manner, the server may input the time feature matrix of the at least one to-be-processed video to the song recognition model, and the song recognition model may perform time feature recognition on the time feature matrix of the at least one to-be-processed video to obtain a start time and an end time of a video segment of each song in each to-be-processed video. The server may input video data of at least one to-be-processed video to the target recognition model, and the target recognition model may output a start time and an end time of a video clip of each song in each to-be-processed video based on the video data of each to-be-processed video.
206. And the server intercepts the video data based on the initial time and the termination time of the video segment of each song to obtain the video data of the video segment of each song.
The server may determine the video frames constituting the video clip of each song based on the initial time and the end time of the video clip of each song, and intercept frame data of the video frames belonging to the video clip of each song from the video data, where the frame data of the video frames belonging to the video clip of each song is the video data of the video clip of each song.
In one possible implementation, this step 206 may be implemented by the process shown in steps 61-63 described below.
And 61, analyzing frame data corresponding to each video frame in the video data by the server to obtain a time stamp of each video frame.
The frame data of a video frame may be carried by a plurality of messages, and the server may obtain a timestamp of each video frame by parsing headers of the plurality of messages, where the timestamp of each video frame is used to indicate a playing time point of a video frame in the video, for example, a timestamp of a video frame is 5 seconds (second, s), and when the server plays the video, the server plays the video frame at the 5 th s.
The embodiment of the present application does not specifically limit the parsing process of the frame data.
Step 62, the server determines at least one group of video frames based on the time stamp of each video frame, the initial time and the termination time of the video clip of each song, and each group of video frames corresponds to the video clip of one song.
Each group of video frames may include a plurality of video frames, and each group of video frames may be used to compose a video clip of a song. For a video clip of any song, the server may compare the output time and the termination time of the video clip of any song with the timestamp of each video frame to determine which video frame belongs to a set of video frames corresponding to the video clip of any song.
In one possible implementation manner, for a video clip of any one of the at least one song, when a timestamp of any one of the video frames is greater than or equal to an initial time of the video clip of any one song and the timestamp of any one of the video frames is less than or equal to a termination time of the video clip of any one song, determining any one of the video frames as one of any one group of video frames, where any one group of video frames corresponds to the video clip of any one song. For example, if the video segment of any song is a video segment between the 3 rd minute and the 8 th minute in the video, the initial time of the video segment of any song is 180 th s, and the ending time is 480 th s, if the timestamp of any video frame is t, if t is greater than or equal to 180 ≦ t ≦ 480, any video frame is taken as one video frame in any group of video frames.
In some embodiments, the video frames in the data stream are key frames, the duration of the key frames may be higher than 1s, for example, 2 seconds, for a key frame, the video data of the key frame may be frame data of a plurality of video frames, for example, the video data of the key frame includes video data 1 and video data 2, the video data 1 is the frame data of video frame 1, the video data 2 is the frame data of video frame 2, and the timestamp of the key frame is the timestamp of the video frame, if the video frame 2 is the first video frame of the video clip of any song, since the timestamp of the key frame is less than the initial time of the video clip of any song, the server may not use the key frame as a video frame in any group of video frames.
In order to avoid this, in one possible implementation, when the timestamp of any video frame is less than the initial time of the video segment of any song, and the difference between the timestamp of any video frame and the initial time of the video segment of any song is less than a preset value, determining any video frame as one video frame in any group of video frames; and when the time stamp of any video frame is greater than the termination time of the video segment of any song and the difference value between the time stamp of any video frame and the termination time of the video segment of any song is less than a preset value, determining any video frame as one video frame in any group of video frames.
The preset data can be a difference value of timestamps of two adjacent video frames, and when the timestamp of any video frame is smaller than the initial time of the video segment of any song and the difference value of the timestamp of any video frame and the initial time of the video segment of any song is smaller than a preset value, the any video frame is a first video frame of the video segment of any song, so that the first video frame can be prevented from being lost; when the time stamp of any video frame is greater than the termination time of the video segment of any song, and the difference value between the time stamp of any video frame and the termination time of the video segment of any song is less than a preset value, the video frame is the last video frame of the video segment of any song, so that the last video frame can be prevented from being lost, and finally, the fact that any group of video frames can accurately form the video segment of any song can be guaranteed.
The any group of video frames may include frame numbers of video frames of the video clips belonging to the any song, so that the server may intercept the video data according to the frame numbers of the any group of video frames, and the any group of video frames may further include frame data of the video frames of the video clips belonging to the any song.
In some embodiments, after the server obtains the start time and the end time of the video segment of any song, any empty video frame group is generated, and the any video frame group corresponds to the video segment of any song. Since the local disk stores the video data in the order of the video frames when storing the video data, after the server determines the video frames of the video clip belonging to any one of the songs, the frame data of the video frames in the video data stored in the database is stored in the video frame group, so that the video frame group for composing the video clip of any one of the songs, i.e., the video frames in one of the video frame groups, can be finally obtained.
And 63, the server respectively stores the frame data of each group of video frames in the video data into a song video file, wherein one song video file is used for storing the video data of a video clip of one song.
When the any group of video frames may include frame numbers of video frames of video clips belonging to the any song, the server may acquire frame data of video frames of video clips belonging to the any song from the video data stored in the local data based on the frame numbers of video frames of video clips belonging to the any song, and store the acquired frame data in a song video file, so that the song video file may include video data of video clips of a song. The server may also store the song video file in a database.
When any user needs to watch the video clip of any song, any user can send a video acquisition request to the server through any terminal, the video acquisition request is used for requesting the video data of the video clip of any song, and after the server receives the video acquisition request, the song video file of any song stored in the database can be returned to any terminal, so that any terminal can play the video clip of any song based on the song video file of any song.
According to the method provided by the embodiment of the application, the initial time and the termination time of the video segment of each song are obtained by performing time characteristic identification on the video data of the video to be processed, and the video data of the video segment of each song in the video data is intercepted based on the initial time and the termination time of the video segment of each song, so that the video to be processed can be intercepted without the need of a song ID in a video stream, and the video segment of the song can be intercepted from any video to be processed, therefore, the video intercepting process provided by the method has universality during video interception.
For further explaining the interaction process of the target server and the server in video capture, refer to a flow chart of a video capture method provided by the embodiment of the present application shown in fig. 3, where the flow of the method includes 301 and 305.
301. And the target server storage terminal sends the video stream and takes the video stream message as a video file.
The target server is a cloud platform that distributes a video stream sent by the client through a CDN (content delivery network), and is further configured to store the video stream sent by the client as a video file for subsequent query and download. The present application may also be considered a network platform.
The video file of a video comprises frame data of each video frame of the video, and the frame data of each video frame are sequentially arranged in the video file according to the size of the frame number.
302. And the server downloads the video files stored in the target server within a preset time period.
The preset time period may be one day, for example, the server periodically downloads the video file stored the day before the target server. The server stores video data of the video to be processed from the video file downloaded from the target server.
303. And the server identifies the time characteristics of the video data in the downloaded video file to obtain the time range of the video clip of each song in the video file.
The time range of the video segments for each song may be represented by the initial time and the ending time of the video segments for each song. The process in step 303 is the same as the process in step 205, and this step 303 is not described in detail in this embodiment of the present application.
304. The server stores the time ranges of the video clips for each song.
The server can store the time range of the video clip of each song, the video information of the video corresponding to the video file and the storage address of the local storage of the video in an associated manner. So that the server can determine the source of the video clip of each song based on the video information of the video corresponding to the video file, and can acquire the video file stored by the server according to the local storage address of the video.
305. The server extracts the video data of the video clip of each song from the video file according to the time range of the video clip of each song.
The process shown in step 305 is the same as the process shown in step 206, and details of step 305 are not described herein in this embodiment of the present application.
To further embody the hardware structure of the server, referring to fig. 4, fig. 4 is a schematic structural diagram of a server provided in the embodiment of the present application, and the server 400 may generate a relatively large difference due to different configurations or performances, and may include one or more CPUs (central processing units) 401 and one or more memories 402, where at least one instruction is stored in the memory 202, and the at least one instruction is loaded and executed by the processor 401 to implement the methods provided by the above-mentioned method embodiments. Of course, the server 400 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the server 00 may also include other components for implementing device functions, which are not described herein again.
In an exemplary embodiment, a computer-readable storage medium, such as a memory, including instructions executable by a processor in a terminal to perform a video interception method in the following embodiments is also provided. For example, the computer-readable storage medium may be a ROM (read-only memory), a RAM (random access memory), a CD-ROM (compact disc read-only memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.
Fig. 5 is a schematic structural diagram of a video capture apparatus provided in an embodiment of the present application, where the apparatus includes:
an obtaining module 501, configured to obtain video data of a video to be processed, where the video to be processed includes a video segment of at least one song;
an identification module 502, configured to perform time feature identification on the video data to obtain an initial time and a termination time of a video segment of each song in the video to be processed;
an intercepting module 503, configured to intercept the video data based on the initial time and the termination time of the video segment of each song, to obtain the video data of the video segment of each song.
Optionally, the identifying module 502 is configured to:
acquiring the time characteristics of the video to be processed based on the video data;
and processing the time characteristics by using a song recognition model to obtain the initial time and the termination time of the video clip of each song.
Optionally, the obtaining module 501 is configured to:
based on the target condition, inquiring the stored video data of each video to obtain the video data of at least one video meeting the target condition;
and determining the video data of each video in the at least one video as the video data of one video to be processed.
Optionally, the target condition comprises at least one of:
the storage time of the video data of the video is within the target time period;
the singer of the song in the video is the target user;
the song type of the song in the video is the target song type.
Optionally, the intercepting module 503 includes:
the analysis unit is used for analyzing frame data corresponding to each video frame in the video data to obtain a time stamp of each video frame;
a determining unit, configured to determine at least one group of video frames based on the timestamp of each video frame and the initial time and the end time of the video clip of each song, where each group of video frames corresponds to a video clip of a song;
and the storage unit is used for respectively storing the frame data of each group of video frames in the video data into one song video file, and the song video file is used for storing the video data of a video clip of one song.
Optionally, the determining unit is configured to:
and for the video segment of any one of the at least one song, when the time stamp of any video frame is greater than or equal to the initial time of the video segment of any one song and the time stamp of any video frame is less than or equal to the termination time of the video segment of any one song, determining any video frame as one video frame in any group of video frames, wherein any group of video frames corresponds to the video segment of any one song.
Optionally, the determining unit is further configured to:
when the time stamp of any video frame is smaller than the initial time of the video segment of any song and the difference value between the time stamp of any video frame and the initial time of the video segment of any song is smaller than a preset value, determining any video frame as one video frame in any group of video frames;
and when the time stamp of any video frame is greater than the termination time of the video segment of any song and the difference value between the time stamp of any video frame and the termination time of the video segment of any song is less than the preset value, determining any video frame as one video frame in any group of video frames.
It should be noted that: in the video capture device provided in the above embodiment, when capturing a video, only the division of each functional module is illustrated, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the video capture device and the video capture method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.
Illustratively, there is also provided a computer program product comprising one or more instructions executable by a processor of a server to perform the method steps of the video capturing method provided in the above embodiments, which method steps may comprise:
acquiring video data of a video to be processed, wherein the video to be processed comprises a video clip of at least one song;
performing time characteristic identification on the video data to obtain the initial time and the termination time of the video clip of each song in the video to be processed;
and intercepting the video data based on the initial time and the termination time of the video clip of each song to obtain the video data of the video clip of each song.
All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method for video capture, the method comprising:
acquiring video data of a video to be processed, wherein the video to be processed comprises a video clip of at least one song;
performing time characteristic identification on the video data to obtain the initial time and the termination time of the video clip of each song in the video to be processed;
and intercepting the video data based on the initial time and the termination time of the video clip of each song to obtain the video data of the video clip of each song.
2. The method of claim 1, wherein the temporally characterizing the video data comprises:
acquiring the time characteristics of the video to be processed based on the video data;
and processing the time characteristics by using a song recognition model to obtain the initial time and the termination time of the video clip of each song.
3. The method of claim 1, wherein the obtaining video data of the video to be processed comprises:
based on the target condition, inquiring the stored video data of each video to obtain the video data of at least one video meeting the target condition;
and determining the video data of each video in the at least one video as the video data of one video to be processed.
4. The method of claim 3, wherein the target condition comprises at least one of:
the storage time of the video data of the video is within the target time period;
the singer of the song in the video is the target user;
the song type of the song in the video is the target song type.
5. The method of claim 1, wherein the intercepting the video data based on the initial time and the ending time of the video segment of each song comprises:
analyzing frame data corresponding to each video frame in the video data to obtain a timestamp of each video frame;
determining at least one group of video frames based on the timestamp of each video frame and the initial time and the ending time of the video clip of each song, wherein each group of video frames corresponds to the video clip of one song;
and respectively storing frame data of each group of video frames in the video data into a song video file, wherein the song video file is used for storing video data of a video clip of a song.
6. The method of claim 5, wherein determining at least one group of video frames based on the timestamp of each video frame, the initial time and the end time of the video clip of each song comprises:
and for the video segment of any one of the at least one song, when the time stamp of any video frame is greater than or equal to the initial time of the video segment of any one song and the time stamp of any video frame is less than or equal to the termination time of the video segment of any one song, determining any video frame as one video frame in any group of video frames, wherein any group of video frames corresponds to the video segment of any one song.
7. The method of claim 6, wherein determining at least one group of video frames based on the timestamp of each video frame, the initial time and the end time of the video clip of each song further comprises:
when the time stamp of any video frame is smaller than the initial time of the video segment of any song and the difference value between the time stamp of any video frame and the initial time of the video segment of any song is smaller than a preset value, determining any video frame as one video frame in any group of video frames;
and when the time stamp of any video frame is greater than the termination time of the video segment of any song and the difference value between the time stamp of any video frame and the termination time of the video segment of any song is less than the preset value, determining any video frame as one video frame in any group of video frames.
8. A video capture device, the device comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring video data of a video to be processed, and the video to be processed comprises a video clip of at least one song;
the identification module is used for identifying the time characteristics of the video data to obtain the initial time and the termination time of the video clip of each song in the video to be processed;
and the intercepting module is used for intercepting the video data based on the initial time and the termination time of the video clip of each song to obtain the video data of the video clip of each song.
9. A server, comprising one or more processors and one or more memories having stored therein at least one instruction, which is loaded and executed by the one or more processors to perform operations performed by the video interception method of any one of claims 1 to 8.
10. A computer-readable storage medium having stored therein at least one instruction which is loaded and executed by a processor to perform operations performed by a video capture method as claimed in any one of claims 1 to 8.
CN201911215289.8A 2019-12-02 2019-12-02 Video interception method, device, server and computer readable storage medium Active CN110913240B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911215289.8A CN110913240B (en) 2019-12-02 2019-12-02 Video interception method, device, server and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911215289.8A CN110913240B (en) 2019-12-02 2019-12-02 Video interception method, device, server and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110913240A true CN110913240A (en) 2020-03-24
CN110913240B CN110913240B (en) 2022-02-22

Family

ID=69821751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911215289.8A Active CN110913240B (en) 2019-12-02 2019-12-02 Video interception method, device, server and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110913240B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114007100A (en) * 2021-10-28 2022-02-01 深圳市商汤科技有限公司 Video processing method, video processing device, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150271598A1 (en) * 2014-03-19 2015-09-24 David S. Thompson Radio to Tune Multiple Stations Simultaneously and Select Programming Segments
CN105898588A (en) * 2015-12-07 2016-08-24 乐视云计算有限公司 Video positioning method and device
CN106708990A (en) * 2016-12-15 2017-05-24 腾讯音乐娱乐(深圳)有限公司 Music clip extraction method and device
US20170309311A1 (en) * 2015-10-16 2017-10-26 Google Inc. Generating videos of media items associated with a user
CN109218746A (en) * 2018-11-09 2019-01-15 北京达佳互联信息技术有限公司 Obtain the method, apparatus and storage medium of video clip
CN110070891A (en) * 2019-04-12 2019-07-30 腾讯音乐娱乐科技(深圳)有限公司 A kind of song recognition method, apparatus and storage medium
CN110516104A (en) * 2019-08-27 2019-11-29 腾讯音乐娱乐科技(深圳)有限公司 Song recommendations method, apparatus and computer storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150271598A1 (en) * 2014-03-19 2015-09-24 David S. Thompson Radio to Tune Multiple Stations Simultaneously and Select Programming Segments
US20170309311A1 (en) * 2015-10-16 2017-10-26 Google Inc. Generating videos of media items associated with a user
CN105898588A (en) * 2015-12-07 2016-08-24 乐视云计算有限公司 Video positioning method and device
CN106708990A (en) * 2016-12-15 2017-05-24 腾讯音乐娱乐(深圳)有限公司 Music clip extraction method and device
CN109218746A (en) * 2018-11-09 2019-01-15 北京达佳互联信息技术有限公司 Obtain the method, apparatus and storage medium of video clip
CN110070891A (en) * 2019-04-12 2019-07-30 腾讯音乐娱乐科技(深圳)有限公司 A kind of song recognition method, apparatus and storage medium
CN110516104A (en) * 2019-08-27 2019-11-29 腾讯音乐娱乐科技(深圳)有限公司 Song recommendations method, apparatus and computer storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩凝: "基于深度神经网络的音乐自动标注技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114007100A (en) * 2021-10-28 2022-02-01 深圳市商汤科技有限公司 Video processing method, video processing device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110913240B (en) 2022-02-22

Similar Documents

Publication Publication Date Title
US9948965B2 (en) Manifest re-assembler for a streaming video channel
CN104869439B (en) A kind of video pushing method and device
CN108848393B (en) Method, device and equipment for showing entrance and storage medium
CN106874273B (en) Channel information statistical method, device and system
CN103997662A (en) Program pushing method and system
CN108415908B (en) Multimedia data processing method and server
CN111339495B (en) Method and device for counting number of people online in live broadcast room, electronic equipment and storage medium
CN110913240B (en) Video interception method, device, server and computer readable storage medium
CN113301386B (en) Video processing method, device, server and storage medium
US20240064357A1 (en) Content-modification system with probability-based selection feature
CN112148920B (en) Data management method
US20200389687A1 (en) Content-Modification System with Content-Presentation Device Grouping Feature
CN111371882B (en) Data sharing and releasing control method and device, electronic equipment and computer readable medium
CN108494875A (en) A kind of method and apparatus of feedback resources file
CN113067913B (en) Positioning method, device, server, medium and product
CN112565904B (en) Video clip pushing method, device, server and storage medium
CN112764988A (en) Data segmentation acquisition method and device
CN113297417B (en) Video pushing method, device, electronic equipment and storage medium
CN109982143B (en) Method, device, medium and equipment for determining video playing time delay
CN110049348B (en) Video analysis method and system and video analysis server
CN113840157A (en) Access detection method, system and device
CN112616073B (en) Video pushing method, device, server and storage medium
US11494705B1 (en) Software path prediction via machine learning
CN116955830B (en) Smoking cabin-based information pushing method, computer equipment and readable storage medium
CN111163327B (en) Method and device for counting number of online accounts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant