CN110324657A - Model generation, method for processing video frequency, device, electronic equipment and storage medium - Google Patents
Model generation, method for processing video frequency, device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN110324657A CN110324657A CN201910459442.5A CN201910459442A CN110324657A CN 110324657 A CN110324657 A CN 110324657A CN 201910459442 A CN201910459442 A CN 201910459442A CN 110324657 A CN110324657 A CN 110324657A
- Authority
- CN
- China
- Prior art keywords
- video
- processed
- unit
- sample
- segment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/233—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
- H04N21/23424—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
Abstract
The present invention provides a kind of model generation, method for processing video frequency, device, electronic equipment and storage mediums.Model generating method includes: acquisition training sample;The training sample includes the markup information of Sample video and the Sample video;The markup information is used to indicate whether the Sample video belongs to theme song classification;The Sample video is divided into multiple unit sample videos;For each unit sample video, the corresponding audio feature vector of the unit sample video is obtained;Preset initial model is trained using the markup information of the Sample video as the target of output using the corresponding audio feature vector of continuous at least two unit samples video as input, the model that training is completed is determined as video processing model.Bai Fumei is based on audio feature vector and is detected, and does not limit video and whether belongs to same album, the videos of multiple types can the general video handle model, adaptivity is stronger.
Description
Technical field
The present invention relates to Internet technical fields, more particularly to a kind of model generation, method for processing video frequency, device, electricity
Sub- equipment and storage medium.
Background technique
Video display are with copy, tape, film, memory etc. for carrier, for the purpose of screen, screen show, to realize
The art form that vision and comprehensive hearing watch, is the comprehensive morphological of modern art, contains in film, TV play, animation etc.
Hold.Theme song is generally comprised in video display video, theme song includes Presence of the Moment, piece caudal flexure.In practical applications, there are detection video display
The demand of theme song in video, for example in order to save user's viewing time, theme song can be detected when playing video display video,
And it skips theme song and directly plays particular content.
The theme song in the method detection video display video of template matching is generallyd use in the prior art.Specific practice is as follows:
A corresponding template is generated for the video display video for belonging to same album, may include the video display video of the album in the template
In theme song feature, to the album video display video carry out theme song detection when, using the corresponding template of the album with to
The video display video of detection is matched, and will there is the feature with the theme song characteristic matching in template in video display video to be detected
Segment as theme song.
But since the theme song feature in the video display video of different albums is different, the video display video of different albums
Corresponding template is different, for example will generate a template to a TV play.Also, this kind of mode is only applicable to comprising multiple
Same template can be used in the TV play of collection of drama, each collection of drama, and for the film for not including multiple collection of dramas, a film
A template will be generated.Therefore, existing theme song detection method adaptivity is poor, is not suitable for the video display of magnanimity, multiplicity
Video library.
Summary of the invention
The embodiment of the present invention provides a kind of model generation, method for processing video frequency, device, electronic equipment and storage medium side
Method, device, electronic equipment and storage medium, it is poor to solve existing theme song detection method adaptivity, be not suitable for magnanimity,
The problem of video display video library of multiplicity.
In a first aspect, the embodiment of the invention provides a kind of model generating methods, which comprises
Obtain training sample;The training sample includes the markup information of Sample video and the Sample video;The mark
Note information is used to indicate whether the Sample video belongs to theme song classification;
The Sample video is divided into multiple unit sample videos;
For each unit sample video, the corresponding audio feature vector of the unit sample video is obtained;
Using the corresponding audio feature vector of continuous at least two unit samples video as input, by the Sample video
Target of the markup information as output, is trained preset initial model,
The model that training is completed is determined as video processing model.
Optionally, described to obtain the corresponding audio feature vector of the unit sample video, comprising: to generate the unit sample
The corresponding spectrogram of audio signal in this video;By the corresponding spectrogram input of audio signal in the unit sample video
The audio feature vector that the neural network model exports is determined as the unit sample video by preset neural network model
Corresponding audio feature vector.
Optionally, the corresponding spectrogram of audio signal generated in the unit sample video, comprising: to the list
Audio signal in the Sample video of position carries out sub-frame processing, obtains multiple audio signal frames;Each audio signal frame is added
Window processing and Fourier transformation processing, obtain the corresponding initial spectrum figure of audio signal in the unit sample video;To institute
It states initial spectrum figure progress Meier conversion process and obtains Meier spectrogram, regarded using the Meier spectrogram as the unit sample
The corresponding spectrogram of audio signal in frequency.
Optionally, described using the corresponding audio feature vector of at least two unit sample videos as input, by the sample
Target of the markup information of this video as output, is trained preset initial model, comprising: randomly selects continuously at least
Two unit sample videos will input the introductory die after the corresponding audio feature vector splicing of the unit sample video of extraction
Type obtains the prediction probability that the Sample video belongs to theme song classification;Belong to theme song classification according to the Sample video
The markup information of prediction probability and the Sample video calculates the corresponding penalty values of the Sample video;In the penalty values
When losing threshold value less than setting, determine that training is completed.
Second aspect, the embodiment of the invention provides a kind of method for processing video frequency, which comprises
Obtain video to be processed;
Head segment and run-out segment are extracted from the video to be processed;
The head segment and the run-out segment are divided into multiple units video to be processed respectively;
For each unit video to be processed, the corresponding audio feature vector of unit video to be processed is obtained;
Including comprising unit video to be processed, the corresponding audio of continuous at least two units video to be processed is special
It levies the pre-generated video of vector input and handles model, determine that the unit is to be processed according to the output that the video handles model
Whether video belongs to theme song classification;Wherein, the video processing model is generated using as above described in any item methods;
By in the unit video to be processed for belonging to theme song classification, continuous unit video to be processed is spliced, and is obtained
Presence of the Moment segment and run-out knee-piece section in the video to be processed.
Optionally, described to obtain the corresponding audio feature vector of the unit video to be processed, comprising: to generate the unit
The corresponding spectrogram of audio signal in video to be processed;By the corresponding frequency spectrum of audio signal in unit video to be processed
Figure inputs preset neural network model, and the audio feature vector that the neural network model exports is determined as the unit and is waited for
Handle the corresponding audio feature vector of video.
Optionally, the corresponding spectrogram of audio signal generated in the unit video to be processed, comprising: to described
Audio signal in unit video to be processed carries out sub-frame processing, obtains multiple audio signal frames;To each audio signal frame into
Row windowing process and Fourier transformation processing, obtain the corresponding initial spectrum of audio signal in unit video to be processed
Figure;Meier conversion process is carried out to the initial spectrum figure and obtains Meier spectrogram, using the Meier spectrogram as the list
The corresponding spectrogram of audio signal in the video to be processed of position.
Optionally, described to be directed to each unit video to be processed, it is special to obtain the corresponding audio of unit video to be processed
Levy vector, comprising: while calling preset first process and preset second process;It divides to obtain for by the head segment
Each unit video to be processed, using first process obtain the corresponding audio frequency characteristics of the unit video to be processed to
Amount;For each unit video to be processed divided by the run-out segment, the list is obtained using second process
The corresponding audio feature vector of position video to be processed.
Optionally, the output for handling model according to the video determines whether unit video to be processed belongs to master
Inscribe bent classification, comprising: the unit video to be processed of the video processing model output belongs to the pre- of theme song classification
Survey whether probability is more than or equal to setting probability threshold value;When if it is being more than or equal to, unit view to be processed is determined
Frequency belongs to theme song classification.
Optionally, in the unit video to be processed that theme song classification will be belonged to, continuous unit video to be processed into
Row splicing, obtains the Presence of the Moment segment and run-out knee-piece section in the video to be processed, comprising: will be divided by the head segment
In the obtained unit video to be processed for belonging to theme song classification, continuous unit video to be processed is spliced, described in acquisition
Presence of the Moment segment in video to be processed;The unit for belonging to theme song classification divided by the run-out segment is to be processed
In video, continuous unit video to be processed is spliced, and obtains the run-out knee-piece section in the video to be processed.
Optionally, the head segment and the run-out segment are divided into multiple units video to be processed respectively described
Later, further includes: mark initial time and the end time of each unit video to be processed respectively;Theme song will be belonged to described
In the unit of classification video to be processed, continuous unit video to be processed is spliced, and obtains the piece in the video to be processed
After cephalic flexure segment and run-out knee-piece section, further includes: by the starting of first unit video to be processed in the Presence of the Moment segment
Initial time of the time as the Presence of the Moment segment, by the knot of the last one unit video to be processed in the Presence of the Moment segment
End time of the beam time as the Presence of the Moment segment;By of first unit video to be processed in the run-out knee-piece section
Begin initial time of the time as the run-out knee-piece section, by the last one unit video to be processed in the run-out knee-piece section
End time of the end time as the run-out knee-piece section.
The third aspect, the embodiment of the invention provides a kind of model generating means, described device includes:
Sample acquisition module, for obtaining training sample;The training sample includes Sample video and the Sample video
Markup information;The markup information is used to indicate whether the Sample video belongs to theme song classification;
First division module, for the Sample video to be divided into multiple unit sample videos;
Primary vector obtains module, and for being directed to each unit sample video, it is corresponding to obtain the unit sample video
Audio feature vector;
Training module is used for using the corresponding audio feature vector of continuous at least two unit samples video as input, will
Target of the markup information of the Sample video as output, is trained preset initial model, the mould that training is completed
Type is determined as video processing model.
Fourth aspect, the embodiment of the invention provides a kind of video process apparatus, described device includes:
Video acquiring module, for obtaining video to be processed;
Snippet extraction module, for extracting head segment and run-out segment from the video to be processed;
It is to be processed to be divided into multiple units for respectively by the second division module for the head segment and the run-out segment
Video;
Secondary vector obtains module, for being directed to each unit video to be processed, obtains unit video pair to be processed
The audio feature vector answered;
Category determination module, for including will including unit video to be processed, continuous at least two unit to wait locating
It manages the pre-generated video of the corresponding audio feature vector input of video and handles model, the output of model is handled according to the video
Determine whether unit video to be processed belongs to theme song classification;Wherein, the video processing model is using as described above
Device generate;
Segment determining module, in the unit video to be processed for that will belong to theme song classification, continuous unit is to be processed
Video is spliced, and Presence of the Moment segment and run-out knee-piece section in the video to be processed are obtained.
Optionally, it includes: call unit that the secondary vector, which obtains module, for call simultaneously preset first process and
Preset second process;Head acquiring unit, for for each unit to be processed divided by the head segment
Video obtains the corresponding audio feature vector of the unit video to be processed using first process;Run-out acquiring unit is used
In for each unit video to be processed divided by the run-out segment, the unit is obtained using second process
The corresponding audio feature vector of video to be processed.
Optionally, the segment determining module includes: Presence of the Moment determination unit, for that will be divided by the head segment
To the unit video to be processed for belonging to theme song classification in, continuous unit video to be processed is spliced, obtain it is described to
Handle the Presence of the Moment segment in video;Piece caudal flexure determination unit, for belonging to theme for what is divided by the run-out segment
In the unit of bent classification video to be processed, continuous unit video to be processed is spliced, and is obtained in the video to be processed
Run-out knee-piece section.
Optionally, described device further include: mark module is used in second division module respectively by the head piece
Section and the run-out segment are divided into after multiple units video to be processed, mark the starting of each unit video to be processed respectively
Time and end time;Time determining module, for by the starting of first unit video to be processed in the Presence of the Moment segment
Initial time of the time as the Presence of the Moment segment, by the knot of the last one unit video to be processed in the Presence of the Moment segment
End time of the beam time as the Presence of the Moment segment;By of first unit video to be processed in the run-out knee-piece section
Begin initial time of the time as the run-out knee-piece section, by the last one unit video to be processed in the run-out knee-piece section
End time of the end time as the run-out knee-piece section.
5th aspect, the embodiment of the invention provides a kind of electronic equipment, comprising: processor;It can for storage processor
The memory executed instruction;Wherein, the processor is configured to executing as above described in any item model generating methods, and/
Or, described in any item method for processing video frequency as above.
6th aspect, the embodiment of the invention provides a kind of non-transitorycomputer readable storage mediums, when the storage
When instruction in medium is executed by the processor of electronic equipment, so that electronic equipment is able to carry out described in any item models as above
Generation method, and/or, described in any item method for processing video frequency as above.
In embodiments of the present invention, training sample is obtained, the training sample includes Sample video and the Sample video
Markup information, the markup information is used to indicate whether the Sample video belongs to theme song classification;Sample video is divided
For multiple unit sample videos;For each unit sample video, obtain the corresponding audio frequency characteristics of the unit sample video to
Amount;Using the corresponding audio feature vector of continuous at least two unit samples video as input, by the mark of the Sample video
Target of the information as output, is trained preset initial model, training is completed originally determined for video processing mould
Type.It follows that in view of the audio of the theme bent portions in video and the audio of particular content part in the embodiment of the present invention
It is distinct, utilize multiple Sample videos for belonging to theme song classification and multiple Sample videos for being not belonging to theme song classification, root
The video for detecting video subject song, which is obtained, according to the corresponding audio feature vector training of Sample video handles model, it is subsequent
Theme song segment therein is detected according to the corresponding audio feature vector of video to be detected using video processing model.Based on sound
Frequency feature vector is detected, and does not limit whether video belongs to same album, and the video of multiple types can the general video
Model is handled, adaptivity is stronger.
Detailed description of the invention
Fig. 1 is a kind of step flow chart of model generating method of the embodiment of the present invention;
Fig. 2 is a kind of step flow chart of method for processing video frequency of the embodiment of the present invention;
Fig. 3 is the step flow chart of another method for processing video frequency of the embodiment of the present invention;
Fig. 4 is a kind of schematic diagram of video processing procedure of the embodiment of the present invention;
Fig. 5 is a kind of structural block diagram of model generating means of the embodiment of the present invention;
Fig. 6 is a kind of structural block diagram of video process apparatus of the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
Referring to Fig.1, a kind of step flow chart of model generating method of the embodiment of the present invention is shown.
The model generating method of the embodiment of the present invention the following steps are included:
Step 101, training sample is obtained.
In training pattern, can be obtained from internet largely from the Sample video of video display class video first.Sample
This video may include theme song video and not a theme vision distortion frequency, and theme song video may include the head vision distortion in video display video
Frequency and run-out vision distortion frequency, not a theme vision distortion frequency may include video of speaking, cheer video, applause video etc. in video display video.
Sample video is labeled by mark personnel, obtains the markup information of Sample video, markup information is used to indicate sample view
Whether frequency belongs to theme song classification.For example, markup information is that " 1 " indicates that Sample video is the theme bent classification, markup information is " 0 "
Instruction Sample video is not a theme song classification.Using the markup information of a Sample video and Sample video as a trained sample
This, using a large amount of training sample as training sample set.The treatment process of each training sample is identical, present invention implementation
The treatment process for being directed to a training sample is mainly introduced in example.
In the embodiment of the present invention, it can come by acquisition from the Sample video of the video display video of multiple and different types
Guarantee the diversity of sample;It can be by the theme song video and not a theme vision distortion frequency of acquisition equal number, to guarantee sample
Uniformity.For example, 2000 Sample videos from TV play class video display video are obtained, wherein 1000 vision distortions that are the theme
Frequently, 1000 are not a theme vision distortion frequency;2000 Sample videos from film class video display video are obtained, wherein 1000 are
Theme song video, 1000 are not a theme vision distortion frequency;2000 Sample videos from animation class video display video are obtained, wherein
1000 vision distortion frequencies that are the theme, 1000 are not a theme vision distortion frequency.By above-mentioned 6000 Sample videos and the mark of Sample video
Information is as training sample set.
Wherein, for the specific duration of each Sample video, those skilled in the art select any suitable based on practical experience
Value, such as duration can be 3s, 4s, 5s, etc..
Step 102, the Sample video is divided into multiple unit sample videos.
The video for detecting the theme song segment in video is trained to handle model in the embodiment of the present invention, it is contemplated that video
In theme song segment be theme song classification there are the audio in consistency namely theme song segment in audio, pass through sound
Frequency feature vector may determine whether the bent classification that is the theme, therefore the video processing model in the embodiment of the present invention is based primarily upon sound
Frequency feature vector detects whether the bent classification that is the theme.
For a Sample video, it is divided into multiple unit sample videos and is analyzed.
In a kind of optional embodiment, Sample video can be divided into multiple unit samples as unit of setting duration
Video.For setting the specific value of duration, those skilled in the art select any suitable value based on practical experience.Than
It such as, is the audio of 1s since neural network model is manageable if obtaining audio feature vector using neural network model
Signal, therefore set duration and can be set to 1s, etc..
Step 103, for each unit sample video, the corresponding audio feature vector of the unit sample video is obtained.
For each unit sample video, the corresponding audio feature vector of unit sample video is obtained respectively.
For example, a length of 5s, is divided into unit sample video for Sample video A as unit of 1s at that time for Sample video A
1, unit sample video 2, unit sample video 3, unit sample video 4, unit sample video 5, totally 5 unit sample videos.
Therefore, respectively obtain the corresponding audio feature vector of unit sample video 1, the corresponding audio feature vector of unit sample video 2,
The corresponding audio feature vector of unit sample video 3, the corresponding audio feature vector of unit sample video 4, unit sample video 5
Corresponding audio feature vector.
In a kind of optional embodiment, obtaining the corresponding audio feature vector of a unit sample video may include step
Rapid A1~A2.
Step A1 generates the corresponding spectrogram of audio signal in the unit sample video.
Step A1 can further include step A11~A13:
Step A11 carries out sub-frame processing to the audio signal in the unit sample video, obtains multiple audio signals
Frame.
Audio signal is extracted from unit sample video, and the audio signal in unit sample video is carried out at framing
Reason.
Audio signal is being macroscopically jiggly, is that smoothly, audio signal has short-term stationarity (10 on microcosmic
It is considered that audio signal approximation is constant in~30ms), thus audio signal can be divided into some short sections to be handled,
Here it is framings, each short section is known as an audio signal frame after framing.For example, can be using the framing side of overlapping segmentation
Method, namely interception way back-to-back is not used, but use the interception way of overlapped a part.Wherein, former frame and
The overlapping part of a later frame is known as frame shifting, and frame, which is moved, is generally 0~0.5 with the ratio of frame length.It can basis for specific frame length
Actual conditions setting, it is 33~100 that frame number per second, which can be set,.
Step A12 carries out windowing process to each audio signal frame and Fourier transformation is handled, obtains the unit sample
The corresponding initial spectrum figure of audio signal in video.
Audio is not stop to change in long range, and the characteristic that do not fix can not process, so each audio is believed
Number frame carries out windowing process, and audio signal frame is multiplied by adding window with a window function.The purpose of adding window is to eliminate each audio
The signal discontinuity that signal frame both ends are likely to result in makes global more continuous.The cost of adding window is an audio signal frame
Both ends part be weakened, so to have when framing, between frame and frame overlapping.In practical applications, audio is believed
Number frame, which carries out common window function when windowing process, to be square window, Hamming window, Hanning window, etc..According to the frequency domain of window function
Characteristic can preferably use Hamming window.
Since the transformation of audio signal in the time domain is generally difficult to find out the characteristic of signal, so usually converting it to frequency
Energy distribution on domain is observed, and different Energy distributions can represent the characteristic of different phonetic.So after windowing process,
Fourier transformation processing is carried out to each audio signal frame after windowing process, to obtain the Energy distribution on frequency spectrum, is obtained each
The frequency spectrum of audio signal frame, and then obtain the corresponding initial spectrum figure of the audio signal in unit sample video.
Step A13 carries out Meier conversion process to the initial spectrum figure and obtains Meier spectrogram, by the Meier frequency spectrum
Figure is as the corresponding spectrogram of audio signal in the unit sample video.
Initial spectrum figure is often a biggish figure, in order to obtain the audio frequency characteristics of suitable size, can be initial frequency
Spectrogram carries out Meier conversion process by Meier (Mel) filter group, is transformed to Meier spectrogram.
The unit of frequency is hertz (Hz), and the frequency range that human ear can be heard is 20-20000Hz, but human ear is this to Hz
Scale unit is not linear perception relationship.For example, if we have adapted to the tone of 1000Hz, if pitch frequency is improved
To 2000Hz, our ear can only be aware of frequency and improve a little, be detectable frequency at all and be doubled.It will be general
Logical frequency translation is mel-frequency, and mapping relations are shown below:
Mel (f)=2595*log10(1+f/700)
Wherein, f is common frequency, and mel (f) is mel-frequency.
By above-mentioned formula, human ear is to the perceptibility of frequency just at linear relationship.That is, under mel-frequency,
If the mel-frequency of two section audios differs twice, the tone that human ear can perceive probably also is differed twice.
According to the actual situation, frequency is divided into multiple Meier filters by human ear sensitivity, obtains Meier filter group,
Meier filter group may include 20~40 Meier filters.In Mel frequency range, the center frequency of each Meier filter
Rate is the linear distribution of equal intervals, but is not equal intervals in frequency range.Using Meier filter group to initial spectrum
Figure is filtered, and obtains Meier spectrogram, and the audio signal which is determined as in unit sample video is corresponding
Spectrogram.
The corresponding spectrogram of audio signal in the unit sample video is inputted preset neural network mould by step A2
Type, by the audio feature vector that the neural network model exports be determined as the corresponding audio frequency characteristics of the unit sample video to
Amount.
In the embodiment of the present invention, neural network model can use, the audio signal in unit sample video is corresponding
Spectrogram inputs neural network model, after carrying out feature extraction inside neural network model, neural network model output
Audio feature vector, the audio feature vector are the corresponding audio feature vector of unit sample video.
In a kind of optional embodiment, the VGGish based on Tensorflow open source deep learning frame can use
(Visual Geometry Group, VGG, visual geometric group) model extraction audio feature vector.VGGish model may include
Convolutional layer, full articulamentum etc., wherein convolutional layer can be used for extracting feature, and full articulamentum can be used for carrying out the feature of extraction
Classification obtains corresponding feature vector.Therefore, the corresponding spectrogram of audio signal in unit sample video is inputted into VGGish
Model extracts the audio frequency characteristics in spectrogram by convolutional layer, and the audio frequency characteristics of extraction are inputted full articulamentum again by convolutional layer, are led to
Cross full articulamentum to classify to audio frequency characteristics, obtain the audio feature vector of 128 dimensions, full articulamentum export the audio frequency characteristics to
Amount.
In the embodiment of the present invention, the corresponding audio feature vector of each unit sample video can be saved as into TFRecord
Format.The data of TFRecord format use binary format in storage, and occupancy disk space is smaller, speed when reading data
Faster.
Step 104, using the corresponding audio feature vector of continuous at least two unit samples video as input, by the sample
Target of the markup information of this video as output, is trained preset initial model, and the model that training is completed determines
Model is handled for video.
If representing a Sample video using the corresponding feature vector of a unit sample video to be trained, due to one
The duration of a unit sample video is shorter, and corresponding feature vector may not be able to accurately and comprehensively represent entire Sample video,
Therefore, a sample is represented using the corresponding audio feature vector of continuous at least two unit samples video in the embodiment of the present invention
Video is trained.
For a Sample video, the continuous at least two unit samples video that will be divided by the Sample video
Corresponding audio feature vector is as input, using the markup information of the Sample video as the target of output, to preset initial
Model is trained.
The process being trained to preset initial model may include step B1~B3:
Step B1 randomly selects continuous at least two unit samples video, by the corresponding sound of unit sample video of extraction
The initial model is inputted after the splicing of frequency feature vector, obtains the prediction probability that the Sample video belongs to theme song classification.
Initial model refers to the model with classification feature not being trained also.Initial model can be to the audio of input
Feature vector is analyzed, and whether output Sample video belongs to the prediction probability of theme song classification, but initial model output
Prediction probability is usually inaccurate, therefore to be trained to initial model, to obtain accurate video processing model.
From the unit sample video divided by Sample video, continuous at least two unit samples view is randomly selected
Frequently, initial model, initial model output are inputted after the corresponding audio feature vector of unit sample video of extraction being spliced
Sample video belongs to the prediction probability of theme song classification.
For example, Sample video A is divided into unit sample video 1, unit sample as unit of 1s for Sample video A
Video 2, unit sample video 3, unit sample video 4, unit sample video 5, totally 5 unit sample videos.From 5 unit samples
Continuous 3 unit sample videos are randomly selected in this video, each unit sample video corresponds to the audio feature vector of 128 dimensions,
The corresponding feature vector of 3 unit sample videos is spliced into the audio feature vector of 128*3=384 dimension, inputs initial model
In.Initial model output Sample video A belongs to the prediction probability of theme song classification.
Step B2 belongs to the prediction probability of theme song classification and the mark of the Sample video according to the Sample video
Information calculates the corresponding penalty values of the Sample video.
The prediction probability that Sample video belongs to theme song classification is the reality output of initial model, the mark letter of Sample video
Breath is the target of output, according to reality output penalty values corresponding with the Sample video that the target of output calculates extraction.Penalty values
It can indicate that Sample video belongs to the extent of deviation of the prediction probability of theme song classification and the markup information of Sample video.
In a kind of optional embodiment, the markup information of Sample video and Sample video can be belonged into theme song classification
Prediction probability between difference as penalty values.For example, the prediction probability that Sample video belongs to theme song classification is 0.8, sample
The markup information of this video is 1, then penalty values can be 0.2.
Step B3 determines that training is completed when the penalty values are less than setting loss threshold value.
Penalty values are smaller, and the robustness of model is better.It is preset in the embodiment of the present invention for measuring whether model instructs
Practice the loss threshold value completed.If penalty values are less than setting loss threshold value, it may be said that bright Sample video belongs to theme song classification
The extent of deviation of the markup information of prediction probability and Sample video is smaller, at this time it is considered that training is completed;If penalty values are big
In or equal to setting loss threshold value, it may be said that bright Sample video belongs to the prediction probability of theme song classification and the mark of Sample video
The extent of deviation for infusing information is larger, and the parameter of adjustable model, continues with next training sample and be trained at this time.
For the specific value of setting loss threshold value, those skilled in the art select any suitable value based on practical experience
?.For example it can be set to 0.1,0.2,0.3, etc..
The model that training is completed can be used as video processing model, be subsequently used for carrying out video the inspection of theme song segment
It surveys.
In addition, in the embodiment of the present invention test sample set, test specimens can also be obtained when obtaining training sample set
This set is similar with training sample set, and test sample includes the markup information of test video and test video.It is obtained in training
After video handles model, video processing model is tested using test sample set.Test process may include: that will survey
Examination video is divided into multiple unit testing videos;For each unit testing video, it is corresponding to obtain the unit testing video
Audio feature vector;The corresponding audio feature vector input video of continuous at least two unit testings video is handled into model, depending on
Frequency processing model output test video belongs to the prediction probability of theme song classification, and the markup information of itself and test video is compared
Compared with so that whether test video processing model is accurate.
In view of the audio of the audio of the theme bent portions in video and particular content part exists in the embodiment of the present invention
Difference, using multiple Sample videos for belonging to theme song classification and multiple Sample videos for being not belonging to theme song classification, according to sample
The corresponding audio feature vector training of this video obtains the processing model of the video for detecting video subject song, subsequent i.e. available
The video handles model and detects theme song segment therein according to the corresponding audio feature vector of video to be detected.Based on audio spy
Sign vector is detected, and does not limit whether video belongs to same album, and the video of multiple types can general video processing
Model, adaptivity are stronger.
Referring to Fig. 2, a kind of step flow chart of method for processing video frequency of the embodiment of the present invention is shown.
The method for processing video frequency of the embodiment of the present invention the following steps are included:
Step 201, video to be processed is obtained.
Video to be processed refers to the video display class video of the demand with detection theme song segment.For example, for TV play class
Video can detect theme song segment, and skip theme song segment to save user's viewing time when playing each collection of drama
Particular content is directly played, therefore the video of each collection of drama can be used as a video to be processed.
Step 202, head segment and run-out segment are extracted from the video to be processed.
Theme song includes Presence of the Moment and piece caudal flexure, and Presence of the Moment is located at the beginning part of video to be processed, piece caudal flexure be located to
Handle the ending of video.Therefore, in order to save the processing time, head segment and run-out can be extracted from video to be processed
Segment only detects the head segment where Presence of the Moment and the run-out segment where piece caudal flexure.
In a kind of optional embodiment, piece can be extracted from the beginning part in video to be processed according to setting percentage
The head segment of Duan Zuowei video to be processed extracts segment conduct from the ending in video to be processed according to setting percentage
The run-out segment of video to be processed.For setting the specific value of percentage, those skilled in the art are arranged according to the actual situation
Any suitable value, for example can be set and set percentage as 10%, 15%, 20%, etc..
Step 203, the head segment and the run-out segment are divided into multiple units video to be processed respectively.
It is similar with above-mentioned steps 102, based on consistency of the theme song segment in audio in video to be processed, Ke Yitong
It crosses audio feature vector and determines whether the bent classification that is the theme.
Head segment and run-out segment in video to be processed for one, are divided into multiple units video to be processed
It is analyzed.It waits locating for example, multiple units can be divided into for head segment and run-out segment as unit of setting duration respectively
Manage video.The setting duration being related in the step 203 can be identical as the setting duration being related in above-mentioned steps 102.
Step 204, for each unit video to be processed, obtain the corresponding audio frequency characteristics of unit video to be processed to
Amount.
Obtaining the corresponding audio feature vector of unit video to be processed may include: to generate the unit view to be processed
The corresponding spectrogram of audio signal in frequency;The corresponding spectrogram input of audio signal in unit video to be processed is pre-
If neural network model, the audio feature vector that the neural network model exports is determined as unit video to be processed
Corresponding audio feature vector.
The corresponding spectrogram of audio signal generated in unit video to be processed may include: to wait locating to the unit
The audio signal managed in video carries out sub-frame processing, obtains multiple audio signal frames;Each audio signal frame is carried out at adding window
Reason and Fourier transformation processing, obtain the corresponding initial spectrum figure of audio signal in unit video to be processed;To described
Initial spectrum figure carries out Meier conversion process and obtains Meier spectrogram, using the Meier spectrogram as unit view to be processed
The corresponding spectrogram of audio signal in frequency.
Step 204 is similar with above-mentioned steps 103, referring in particular to the associated description of step 103, the embodiment of the present invention pair
This is no longer discussed in detail.
For example, for video to be processed, as unit of 1s by video to be processed be divided into multiple units video 1 to be processed,
Unit video 2 to be processed, unit video 3 to be processed, etc..The corresponding audio frequency characteristics of each unit video to be processed are obtained respectively
Vector.
Step 205, by including comprising unit video to be processed, continuous at least two units video to be processed is corresponding
The pre-generated video of audio feature vector input handle model, the list is determined according to the output that the video handles model
Whether position video to be processed belongs to theme song classification.
If directlying adopt whether the corresponding feature vector of a unit video to be processed detects unit video to be processed
Belong to theme song classification, since the duration of a unit video to be processed is shorter, corresponding feature vector may not be able to be accurate
Ground determines whether unit video to be processed really belongs to theme song classification.Therefore, it is used in the embodiment of the present invention and includes the list
Including the video to be processed of position, the corresponding audio feature vector of continuous at least two units video to be processed determines that the unit waits for
Whether processing video belongs to theme song classification.
For a unit video to be processed, including comprising unit video to be processed, continuous at least two
The corresponding audio feature vector of unit video to be processed inputs the video processing model generated in above-mentioned embodiment shown in FIG. 1.
After video processing model analyzes audio feature vector, the prediction that unit video to be processed belongs to theme song classification is exported
Probability.After the output for getting video processing model, the unit video to be processed for comparing video processing model output belongs to master
Whether the prediction probability for inscribing bent classification is more than or equal to setting probability threshold value, when if it is being more than or equal to, determines the list
Position video to be processed belongs to theme song classification.
For setting the specific value of probability threshold value, those skilled in the art select any suitable value based on practical experience
?.For example it can be set to 0.7,0.8,0.9, etc..
For example, waiting locating comprising continuous 3 units including unit video 3 to be processed for video 3 to be processed for unit
Manage video, can as unit of video 1 to be processed, unit video 2 to be processed, unit video 3 to be processed, or unit waits locating
Manage video 2, unit video 3 to be processed, unit video 4 to be processed, or unit video 3 to be processed, unit view to be processed
Frequently 4, unit video 5 to be processed.Wherein, unit video 2 to be processed, unit video 3 to be processed, unit video 4 to be processed this
Scheme had both considered the audio feature vector before unit video 3 to be processed, it is also considered that arrived unit video 3 to be processed it
Audio feature vector afterwards, therefore utilize unit video 2 to be processed, unit video 3 to be processed, unit video 4 to be processed this 3
The corresponding audio feature vector of continuous unit video to be processed, the corresponding result of the unit determined video 3 to be processed are compared
It is more accurate in other two schemes.
The video 2 to be processed, single as unit of comprising continuous 3 units video to be processed including unit video 3 to be processed
For position video 3 to be processed, unit video 4 to be processed, unit video 2 corresponding 128 to be processed is tieed up into audio feature vector, list
Position video 3 corresponding 128 to be processed ties up audio feature vector and unit video 4 corresponding 128 to be processed tie up audio frequency characteristics to
Amount, is spliced into the audio feature vector input video processing model of 128*3=384 dimension, and video processing model output unit waits locating
Reason video 3 belongs to the prediction probability of theme song classification, if the prediction probability is greater than setting probability threshold value, it is determined that unit waits locating
Reason video 3 belongs to theme song classification.
Step 206, by the unit video to be processed for belonging to theme song classification, continuous unit video to be processed is spelled
It connects, obtains the Presence of the Moment segment and run-out knee-piece section in the video to be processed.
After determining whether each unit video to be processed belongs to theme song classification, if some unit view to be processed
Frequency belongs to theme song classification, can determine that unit video to be processed belongs to the part in theme song segment, if some unit
Video to be processed is not belonging to theme song classification, can determine the part that unit video to be processed belongs in not a theme knee-piece section.
So if continuous multiple unit videos to be processed belong to theme song classification, then it will continuously belong to the unit of theme song classification
Video to be processed is spliced, and the Presence of the Moment segment and run-out knee-piece section in video to be processed are obtained.
Under normal conditions, the theme song in a video to be processed may include Presence of the Moment and piece caudal flexure, therefore to from
Presence of the Moment segment and run-out knee-piece section can be determined in reason video.Belong to theme song for what is divided by the head segment
In the unit of classification video to be processed, continuous unit video to be processed is spliced, and obtains the piece in the video to be processed
Cephalic flexure segment;It is continuous single by the unit video to be processed for belonging to theme song classification divided by the run-out segment
Position video to be processed is spliced, and the run-out knee-piece section in the video to be processed is obtained.
After video to be processed is divided into multiple units video to be processed, each unit view to be processed can also be marked
Frequently corresponding initial time and end time.Therefore, continuous single in the unit video to be processed that will belong to theme song classification
Position video to be processed is spliced, and after obtaining the Presence of the Moment segment and run-out knee-piece section in the video to be processed, can be incited somebody to action
Initial time of the initial time of first unit video to be processed as the Presence of the Moment segment in the Presence of the Moment segment, will
End time of the end time of the last one unit video to be processed as the Presence of the Moment segment in the Presence of the Moment segment;
Using the initial time of first unit video to be processed in the run-out knee-piece section as the initial time of the run-out knee-piece section,
Using the end time of the last one unit video to be processed in the run-out knee-piece section as the run-out knee-piece section at the end of
Between.
In the embodiment of the present invention, based on consistency of the theme song segment in audio in video, video handles model root
Theme song segment is detected according to audio feature vector, testing result is more accurate, and the adaptivity that video handles model is stronger.
Referring to Fig. 3, the step flow chart of another method for processing video frequency of the embodiment of the present invention is shown.
The method for processing video frequency of the embodiment of the present invention the following steps are included:
Step 301, video to be processed is obtained.
Step 302, head segment and run-out segment are extracted from the video to be processed.
Step 303, the head segment and the run-out segment are divided into multiple units video to be processed respectively.
Fig. 4 is a kind of schematic diagram of video processing procedure of the embodiment of the present invention.Long video in Fig. 4 is view to be processed
Frequently, long video is divided to obtain multiple units video to be processed.
Step 304, while preset first process and preset second process being called.
In the embodiment of the present invention, if using a process to the multiple lists divided by head segment and run-out segment
Position video to be processed is handled, and treatment effeciency is lower.Therefore the first process and the second process can be set, while calling first
Process and the second process are divided to the multiple units video to be processed divided by head segment and by run-out segment respectively
To multiple units video to be processed handled, to improve treatment effeciency.First process and the second process can store in
In process pool.
It include the first process process1 and the second process process2 in process pool in Fig. 4.
Step 305, for each unit video to be processed divided by the head segment, using described first into
Journey obtains the corresponding audio feature vector of the unit video to be processed.
In the first process, for each unit video to be processed divided by head segment, the unit is obtained
The corresponding audio feature vector of video to be processed.
There is a neural network model, neural network model is specifically as follows Audio in Fig. 4 in first process
VGGish.By the Audio VGGish in the unit divided by head segment the first process of video input to be processed, utilize
Audio VGGish obtains the corresponding 128 dimension audio feature vector of unit video to be processed.
Step 305 is similar with above-mentioned steps 204, referring in particular to the associated description of step 204, the embodiment of the present invention pair
This is no longer discussed in detail.
Step 306, using first process by including comprising unit video to be processed, continuous at least two is single
The pre-generated video of the corresponding audio feature vector input of position video to be processed handles model, handles model according to the video
Output determine whether unit video to be processed belongs to theme song classification.
In the first process, including comprising unit video to be processed, continuous at least two units view to be processed
Frequently corresponding audio feature vector input video trained in advance handles model.Have at a video in the first process in Fig. 4
Model is managed, video processing model is specifically as follows FCs.
Video processing model exports the prediction probability that unit video to be processed belongs to theme song classification, big in prediction probability
Determine that unit video to be processed belongs to theme song classification when setting probability threshold value.Confidence level indicates pre- in Fig. 4
Probability is surveyed, determination belongs to theme song classification when confidence level is more than or equal to 0.7.
Step 307, using first process by the unit video to be processed for belonging to theme song classification, continuous unit
Video to be processed is spliced, and the Presence of the Moment segment in the video to be processed is obtained.
In the first process, the unit for continuously belonging to theme song classification video to be processed is spliced to obtain to be processed
Presence of the Moment segment in video.The Presence of the Moment obtained such as the testing result in Fig. 4.
Step 308, for each unit video to be processed divided by the run-out segment, using described second into
Journey obtains the corresponding audio feature vector of the unit video to be processed.
In the second process, for each unit video to be processed divided by run-out segment, the unit is obtained
The corresponding audio feature vector of video to be processed.
There is a neural network model, neural network model is specifically as follows Audio in Fig. 4 in second process
VGGish.By the Audio VGGish in the unit divided by run-out segment the second process of video input to be processed, utilize
Audio VGGish obtains the corresponding audio feature vector of unit video to be processed.
Step 308 is similar with above-mentioned steps 204, referring in particular to the associated description of step 204, the embodiment of the present invention pair
This is no longer discussed in detail.
Step 309, using second process by including comprising unit video to be processed, continuous at least two is single
The pre-generated video of the corresponding audio feature vector input of position video to be processed handles model, handles model according to the video
Output determine whether unit video to be processed belongs to theme song classification.
In the second process, including comprising unit video to be processed, continuous at least two units view to be processed
Frequently corresponding audio feature vector input video trained in advance handles model.Have at a video in the second process in Fig. 4
Model is managed, video processing model is specifically as follows FCs.
Video processing model exports the prediction probability that unit video to be processed belongs to piece caudal flexure classification, big in prediction probability
Determine that unit video to be processed belongs to theme song classification when setting probability threshold value.Confidence level indicates pre- in Fig. 4
Probability is surveyed, determination belongs to theme song classification when confidence level is more than or equal to 0.7.
Step 310, using second process by the unit video to be processed for belonging to theme song classification, continuous unit
Video to be processed is spliced, and the run-out knee-piece section in the video to be processed is obtained.
In the second process, the unit for continuously belonging to theme song classification video to be processed is spliced to obtain to be processed
Theme song segment in video.The piece caudal flexure obtained such as the testing result in Fig. 4.
In the embodiment of the present invention, using process pool technology, treatment effeciency is greatly improved.
It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method
It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to
According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should
Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented
Necessary to example.
Referring to Fig. 5, a kind of structural block diagram of model generating means of the embodiment of the present invention is shown.
The model generating means of the embodiment of the present invention include sample acquisition module 501, the first division module 502, first to
Amount obtains module 503 and training module 504.
Sample acquisition module 501, for obtaining training sample.The training sample includes Sample video and sample view
The markup information of frequency;The markup information is used to indicate whether the Sample video belongs to theme song classification.
First division module 502, for the Sample video to be divided into multiple unit sample videos.
Primary vector obtains module 503, and for being directed to each unit sample video, it is corresponding to obtain the unit sample video
Audio feature vector.
Training module 504, for will the corresponding audio feature vector of continuous at least two unit samples video as input,
Using the markup information of the Sample video as the target of output, preset initial model is trained, training is completed
Model is determined as video processing model.
In a kind of optional embodiment, it includes: the first generation unit that the primary vector, which obtains module 503, for giving birth to
At the corresponding spectrogram of audio signal in the unit sample video;First determination unit, for regarding the unit sample
The corresponding spectrogram of audio signal in frequency inputs preset neural network model, the audio that the neural network model is exported
Feature vector is determined as the corresponding audio feature vector of the unit sample video.
In a kind of optional embodiment, first generation unit includes: the first framing subelement, for the list
Audio signal in the Sample video of position carries out sub-frame processing, obtains multiple audio signal frames;First processing subelement, for every
A audio signal frame carries out windowing process and Fourier transformation processing, and the audio signal obtained in the unit sample video is corresponding
Initial spectrum figure;First transformation subelement obtains Meier frequency spectrum for carrying out Meier conversion process to the initial spectrum figure
Figure, using the Meier spectrogram as the corresponding spectrogram of audio signal in the unit sample video.
In a kind of optional embodiment, the training module 504 includes: probability acquiring unit, for the company of randomly selecting
Continuous at least two unit sample videos, will input after the corresponding audio feature vector splicing of the unit sample video of extraction it is described just
Beginning model obtains the prediction probability that the Sample video belongs to theme song classification;Acquiring unit is lost, for according to the sample
Video belongs to the prediction probability of theme song classification and the markup information of the Sample video, and it is corresponding to calculate the Sample video
Penalty values;Training detection unit, for determining that training is completed when the penalty values are less than setting loss threshold value.
Referring to Fig. 6, a kind of structural block diagram of video process apparatus of the embodiment of the present invention is shown.
The video process apparatus of the embodiment of the present invention includes video acquiring module 601, snippet extraction module 602, and second stroke
Sub-module 603, secondary vector obtain module 604, category determination module 605 and segment determining module 606.
Video acquiring module 601, for obtaining video to be processed.
Snippet extraction module 602, for extracting head segment and run-out segment from the video to be processed;
Second division module 603 is waited for for the head segment and the run-out segment to be divided into multiple units respectively
Handle video.
Secondary vector obtains module 604, for being directed to each unit video to be processed, obtains unit video to be processed
Corresponding audio feature vector.
Category determination module 605, for including will including unit video to be processed, continuous at least two unit to be waited for
It handles the pre-generated video of the corresponding audio feature vector input of video and handles model, the defeated of model is handled according to the video
Determine whether unit video to be processed belongs to theme song classification out.Wherein, video processing model is to utilize mould shown in fig. 5
What type generating means generated.
Segment determining module 606, in the unit video to be processed for that will belong to theme song classification, continuous unit waits locating
Reason video is spliced, and the Presence of the Moment segment and run-out knee-piece section in the video to be processed are obtained.
In a kind of optional embodiment, it includes: the second generation unit that the secondary vector, which obtains module 604, for giving birth to
At the corresponding spectrogram of audio signal in unit video to be processed;Second determination unit, for waiting locating the unit
The corresponding spectrogram of audio signal managed in video inputs preset neural network model, and the neural network model is exported
Audio feature vector is determined as the corresponding audio feature vector of unit video to be processed.
In a kind of optional embodiment, second generation unit includes: the second framing subelement, for the list
Audio signal in the video to be processed of position carries out sub-frame processing, obtains multiple audio signal frames;Second processing subelement, for pair
Each audio signal frame carries out windowing process and Fourier transformation processing, obtains the audio signal in unit video to be processed
Corresponding initial spectrum figure;Second transformation subelement obtains Meier for carrying out Meier conversion process to the initial spectrum figure
Spectrogram, using the Meier spectrogram as the corresponding spectrogram of audio signal in unit video to be processed.
In a kind of optional embodiment, it includes: call unit that the secondary vector, which obtains module 604, for adjusting simultaneously
With preset first process and preset second process;Head acquiring unit is divided to obtain for being directed to by the head segment
Each unit video to be processed, using first process obtain the corresponding audio frequency characteristics of the unit video to be processed to
Amount;Run-out acquiring unit utilizes described for for each unit video to be processed for being divided by the run-out segment
Two processes obtain the corresponding audio feature vector of the unit video to be processed.
In a kind of optional embodiment, the category determination module 605, for video processing model output
Unit video to be processed belong to the prediction probability of theme song classification and whether be more than or equal to setting probability threshold value;As a result
Determination unit, for determining that unit video to be processed belongs to theme song classification when being more than or equal to.
In a kind of optional embodiment, the segment determining module 606 includes: Presence of the Moment determination unit, for will be by
In the unit video to be processed that the head segment divided belong to theme song classification, continuous unit video to be processed into
Row splicing, obtains the Presence of the Moment segment in the video to be processed;Piece caudal flexure determination unit, for that will be drawn by the run-out segment
In the unit video to be processed for belonging to theme song classification got, continuous unit video to be processed is spliced, and obtains institute
State the run-out knee-piece section in video to be processed.
In a kind of optional embodiment, described device further include: mark module, in second division module point
After the head segment and the run-out segment are not divided into multiple units video to be processed, each unit is marked to wait for respectively
Handle initial time and the end time of video;Time determining module, for waiting for first unit in the Presence of the Moment segment
Initial time of the initial time of video as the Presence of the Moment segment is handled, by the last one unit in the Presence of the Moment segment
End time of the end time of video to be processed as the Presence of the Moment segment;By first unit in the run-out knee-piece section
Initial time of the initial time of video to be processed as the run-out knee-piece section, by the last one list in the run-out knee-piece section
End time of the end time of position video to be processed as the run-out knee-piece section.
In view of the audio of the audio of the theme bent portions in video and particular content part exists in the embodiment of the present invention
Difference, using multiple Sample videos for belonging to theme song classification and multiple Sample videos for being not belonging to theme song classification, according to sample
The corresponding audio feature vector training of this video obtains the processing model of the video for detecting video subject song, subsequent i.e. available
The video handles model and detects theme song segment therein according to the corresponding audio feature vector of video to be detected.Based on audio spy
Sign vector is detected, and does not limit whether video belongs to same album, and the video of multiple types can general video processing
Model, adaptivity are stronger.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple
Place illustrates referring to the part of embodiment of the method.
In an embodiment of the present invention, a kind of electronic equipment is additionally provided.For example, electronic equipment may be provided as a clothes
Business device.The electronic equipment may include one or more processors, and for the memory of storage processor executable instruction,
Executable instruction such as application program.Processor is configured as executing above-mentioned model generating method, and/or, video processing side
Method.
In an embodiment of the present invention, a kind of non-transitorycomputer readable storage medium including instruction is additionally provided,
Memory for example including instruction, above-metioned instruction can be executed by the processor of electronic equipment, to complete above-mentioned model generation side
Method, and/or, method for processing video frequency.For example, the non-transitorycomputer readable storage medium can be ROM, arbitrary access is deposited
Reservoir (RAM), CD-ROM, tape, floppy disk and optical data storage devices etc..
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with
The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can provide as method, apparatus or calculate
Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and
The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can
With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code
The form of the computer program product of implementation.
The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, terminal device (system) and computer program
The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions
In each flow and/or block and flowchart and/or the block diagram in process and/or box combination.It can provide these
Computer program instructions are set to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals
Standby processor is to generate a machine, so that being held by the processor of computer or other programmable data processing terminal devices
Capable instruction generates for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram
The device of specified function.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminal devices
In computer-readable memory operate in a specific manner, so that instruction stored in the computer readable memory generates packet
The manufacture of command device is included, which realizes in one side of one or more flows of the flowchart and/or block diagram
The function of being specified in frame or multiple boxes.
These computer program instructions can also be loaded into computer or other programmable data processing terminal devices, so that
Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus
The instruction executed on computer or other programmable terminal equipments is provided for realizing in one or more flows of the flowchart
And/or in one or more blocks of the block diagram specify function the step of.
Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases
This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as
Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap
Those elements are included, but also including other elements that are not explicitly listed, or further includes for this process, method, article
Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited
Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device for including the element.
It is situated between above to a kind of model generation provided by the present invention, method for processing video frequency, device, electronic equipment and storage
Matter is described in detail, and used herein a specific example illustrates the principle and implementation of the invention, above
The explanation of embodiment is merely used to help understand method and its core concept of the invention;Meanwhile for the general skill of this field
Art personnel, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, in conclusion this
Description should not be construed as limiting the invention.
Claims (18)
1. a kind of model generating method, which is characterized in that the described method includes:
Obtain training sample;The training sample includes the markup information of Sample video and the Sample video;The mark letter
Breath is used to indicate whether the Sample video belongs to theme song classification;
The Sample video is divided into multiple unit sample videos;
For each unit sample video, the corresponding audio feature vector of the unit sample video is obtained;
Using the corresponding audio feature vector of continuous at least two unit samples video as input, by the mark of the Sample video
Target of the information as output, is trained preset initial model,
The model that training is completed is determined as video processing model.
2. the method according to claim 1, wherein described obtain the corresponding audio spy of the unit sample video
Levy vector, comprising:
Generate the corresponding spectrogram of audio signal in the unit sample video;
The corresponding spectrogram of audio signal in the unit sample video is inputted into preset neural network model, by the mind
The audio feature vector exported through network model is determined as the corresponding audio feature vector of the unit sample video.
3. according to the method described in claim 2, it is characterized in that, the audio signal generated in the unit sample video
Corresponding spectrogram, comprising:
Sub-frame processing is carried out to the audio signal in the unit sample video, obtains multiple audio signal frames;
Windowing process and Fourier transformation processing are carried out to each audio signal frame, obtain the audio in the unit sample video
The corresponding initial spectrum figure of signal;
Meier conversion process is carried out to the initial spectrum figure and obtains Meier spectrogram, using the Meier spectrogram as the list
The corresponding spectrogram of audio signal in the Sample video of position.
4. the method according to claim 1, wherein described by the corresponding audio of at least two unit sample videos
Feature vector carries out preset initial model using the markup information of the Sample video as the target of output as input
Training, comprising:
Continuous at least two unit samples video is randomly selected, the corresponding audio feature vector of the unit sample video of extraction is spelled
The initial model is inputted after connecing, and obtains the prediction probability that the Sample video belongs to theme song classification;
Belong to the prediction probability of theme song classification and the markup information of the Sample video according to the Sample video, calculates institute
State the corresponding penalty values of Sample video;
When the penalty values are less than setting loss threshold value, determine that training is completed.
5. a kind of method for processing video frequency, which is characterized in that the described method includes:
Obtain video to be processed;
Head segment and run-out segment are extracted from the video to be processed;
The head segment and the run-out segment are divided into multiple units video to be processed respectively;
For each unit video to be processed, the corresponding audio feature vector of unit video to be processed is obtained;
Including comprising unit video to be processed, the corresponding audio frequency characteristics of continuous at least two units video to be processed to
The pre-generated video of amount input handles model, determines unit video to be processed according to the output that the video handles model
Whether theme song classification is belonged to;Wherein, the video processing model is to utilize method described in any one of any one of claims 1 to 44
It generates;
By in the unit video to be processed for belonging to theme song classification, continuous unit video to be processed is spliced, described in acquisition
Presence of the Moment segment and run-out knee-piece section in video to be processed.
6. according to the method described in claim 5, it is characterized in that, described obtain the corresponding audio of the unit video to be processed
Feature vector, comprising:
Generate the corresponding spectrogram of audio signal in unit video to be processed;
The corresponding spectrogram of audio signal in unit video to be processed is inputted into preset neural network model, it will be described
The audio feature vector of neural network model output is determined as the corresponding audio feature vector of unit video to be processed.
7. according to the method described in claim 6, it is characterized in that, the audio letter generated in the unit video to be processed
Number corresponding spectrogram, comprising:
Sub-frame processing is carried out to the audio signal in unit video to be processed, obtains multiple audio signal frames;
Windowing process and Fourier transformation processing are carried out to each audio signal frame, obtain the sound in unit video to be processed
The corresponding initial spectrum figure of frequency signal;
Meier conversion process is carried out to the initial spectrum figure and obtains Meier spectrogram, using the Meier spectrogram as the list
The corresponding spectrogram of audio signal in the video to be processed of position.
8. according to the method described in claim 5, it is characterized in that, described be directed to each unit video to be processed, described in acquisition
The corresponding audio feature vector of unit video to be processed, comprising:
Preset first process and preset second process are called simultaneously;
For each unit video to be processed divided by the head segment, the list is obtained using first process
The corresponding audio feature vector of position video to be processed;
For each unit video to be processed divided by the run-out segment, the list is obtained using second process
The corresponding audio feature vector of position video to be processed.
9. according to the method described in claim 5, it is characterized in that, the output for handling model according to the video determines institute
State whether unit video to be processed belongs to theme song classification, comprising:
Compare video processing model output unit video to be processed belong to theme song classification prediction probability whether
More than or equal to setting probability threshold value;
When if it is being more than or equal to, determine that unit video to be processed belongs to theme song classification.
10. according to the method described in claim 5, it is characterized in that, the unit view to be processed that theme song classification will be belonged to
In frequency, continuous unit video to be processed is spliced, and obtains Presence of the Moment segment and run-out knee-piece in the video to be processed
Section, comprising:
By in the unit video to be processed for belonging to theme song classification divided by the head segment, continuous unit waits locating
Reason video is spliced, and the Presence of the Moment segment in the video to be processed is obtained;
By in the unit video to be processed for belonging to theme song classification divided by the run-out segment, continuous unit waits locating
Reason video is spliced, and the run-out knee-piece section in the video to be processed is obtained.
11. according to the method described in claim 5, it is characterized in that,
It is described the head segment and the run-out segment are divided into multiple units video to be processed respectively after, also wrap
It includes: marking initial time and the end time of each unit video to be processed respectively;
In the unit video to be processed that will belong to theme song classification, continuous unit video to be processed is spliced, and is obtained
After Presence of the Moment segment and run-out knee-piece section in the video to be processed, further includes:
Using the initial time of first unit video to be processed in the Presence of the Moment segment as the starting of the Presence of the Moment segment
Time, using the end time of the last one unit video to be processed in the Presence of the Moment segment as the knot of the Presence of the Moment segment
The beam time;
Using the initial time of first unit video to be processed in the run-out knee-piece section as the starting of the run-out knee-piece section
Time, using the end time of the last one unit video to be processed in the run-out knee-piece section as the knot of the run-out knee-piece section
The beam time.
12. a kind of model generating means, which is characterized in that described device includes:
Sample acquisition module, for obtaining training sample;The training sample includes the mark of Sample video and the Sample video
Infuse information;The markup information is used to indicate whether the Sample video belongs to theme song classification;
First division module, for the Sample video to be divided into multiple unit sample videos;
Primary vector obtains module, for being directed to each unit sample video, obtains the corresponding audio of the unit sample video
Feature vector;
Training module is used for using the corresponding audio feature vector of continuous at least two unit samples video as input, will be described
Target of the markup information of Sample video as output, is trained preset initial model, and the model that training is completed is true
It is set to video processing model.
13. a kind of video process apparatus, which is characterized in that described device includes:
Video acquiring module, for obtaining video to be processed;
Snippet extraction module, for extracting head segment and run-out segment from the video to be processed;
Second division module, for the head segment and the run-out segment to be divided into multiple units view to be processed respectively
Frequently;
Secondary vector obtains module, and for being directed to each unit video to be processed, it is corresponding to obtain unit video to be processed
Audio feature vector;
Category determination module, for that will include continuous at least two units view to be processed including unit video to be processed
Frequently the pre-generated video of corresponding audio feature vector input handles model, is determined according to the output that the video handles model
Whether the unit video to be processed belongs to theme song classification;Wherein, the video processing model is to utilize claim 12 institute
What the device stated generated;
Segment determining module, in the unit video to be processed for theme song classification will to be belonged to, continuous unit video to be processed
Spliced, obtains the Presence of the Moment segment and run-out knee-piece section in the video to be processed.
14. device according to claim 13, which is characterized in that the secondary vector obtains module and includes:
Call unit, for calling preset first process and preset second process simultaneously;
Head acquiring unit, for utilizing institute for the to be processed video of each unit divided by the head segment
It states the first process and obtains the corresponding audio feature vector of the unit video to be processed;
Run-out acquiring unit, for for each unit video to be processed for being divided by the run-out segment, using described
Second process obtains the corresponding audio feature vector of the unit video to be processed.
15. device according to claim 13, which is characterized in that the segment determining module includes:
Presence of the Moment determination unit, the unit view to be processed for belonging to theme song classification for will be divided by the head segment
In frequency, continuous unit video to be processed is spliced, and obtains the Presence of the Moment segment in the video to be processed;
Piece caudal flexure determination unit, the unit view to be processed for belonging to theme song classification for will be divided by the run-out segment
In frequency, continuous unit video to be processed is spliced, and obtains the run-out knee-piece section in the video to be processed.
16. device according to claim 13, which is characterized in that described device further include:
Mark module, it is multiple for being respectively divided into the head segment and the run-out segment in second division module
After unit video to be processed, initial time and the end time of each unit video to be processed are marked respectively;
Time determining module, for using the initial time of first unit video to be processed in the Presence of the Moment segment as described in
The initial time of Presence of the Moment segment, using the end time of the last one unit video to be processed in the Presence of the Moment segment as institute
State the end time of Presence of the Moment segment;Using the initial time of first unit video to be processed in the run-out knee-piece section as institute
The initial time for stating run-out knee-piece section, using the end time of the last one unit video to be processed in the run-out knee-piece section as
The end time of the run-out knee-piece section.
17. a kind of electronic equipment characterized by comprising
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to executing model generating method according to any one of claims 1-4, and/or, such as
The described in any item method for processing video frequency of claim 5-11.
18. a kind of non-transitorycomputer readable storage medium, which is characterized in that when the instruction in the storage medium is by electronics
When the processor of equipment executes, so that electronic equipment is able to carry out model generating method according to any one of claims 1-4,
And/or such as the described in any item method for processing video frequency of claim 5-11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910459442.5A CN110324657A (en) | 2019-05-29 | 2019-05-29 | Model generation, method for processing video frequency, device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910459442.5A CN110324657A (en) | 2019-05-29 | 2019-05-29 | Model generation, method for processing video frequency, device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110324657A true CN110324657A (en) | 2019-10-11 |
Family
ID=68119305
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910459442.5A Pending CN110324657A (en) | 2019-05-29 | 2019-05-29 | Model generation, method for processing video frequency, device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110324657A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112182301A (en) * | 2020-09-30 | 2021-01-05 | 北京百度网讯科技有限公司 | Method and device for extracting video clip |
CN113569740A (en) * | 2021-07-27 | 2021-10-29 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Video recognition model training method and device and video recognition method and device |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102497594A (en) * | 2011-12-16 | 2012-06-13 | 乐视网信息技术(北京)股份有限公司 | Play method of serial video files |
CN103325403A (en) * | 2013-06-20 | 2013-09-25 | 富泰华工业(深圳)有限公司 | Electronic device and video playing method thereof |
CN105227999A (en) * | 2015-09-29 | 2016-01-06 | 北京奇艺世纪科技有限公司 | A kind of method and apparatus of video cutting |
US20180102136A1 (en) * | 2016-10-11 | 2018-04-12 | Cirrus Logic International Semiconductor Ltd. | Detection of acoustic impulse events in voice applications using a neural network |
CN108024142A (en) * | 2017-12-05 | 2018-05-11 | 深圳市茁壮网络股份有限公司 | A kind of video flow detection method and system |
CN108122562A (en) * | 2018-01-16 | 2018-06-05 | 四川大学 | A kind of audio frequency classification method based on convolutional neural networks and random forest |
CN108924586A (en) * | 2018-06-20 | 2018-11-30 | 北京奇艺世纪科技有限公司 | A kind of detection method of video frame, device and electronic equipment |
CN108989882A (en) * | 2018-08-03 | 2018-12-11 | 百度在线网络技术(北京)有限公司 | Method and apparatus for exporting the snatch of music in video |
CN109166593A (en) * | 2018-08-17 | 2019-01-08 | 腾讯音乐娱乐科技(深圳)有限公司 | audio data processing method, device and storage medium |
-
2019
- 2019-05-29 CN CN201910459442.5A patent/CN110324657A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102497594A (en) * | 2011-12-16 | 2012-06-13 | 乐视网信息技术(北京)股份有限公司 | Play method of serial video files |
CN103325403A (en) * | 2013-06-20 | 2013-09-25 | 富泰华工业(深圳)有限公司 | Electronic device and video playing method thereof |
CN105227999A (en) * | 2015-09-29 | 2016-01-06 | 北京奇艺世纪科技有限公司 | A kind of method and apparatus of video cutting |
US20180102136A1 (en) * | 2016-10-11 | 2018-04-12 | Cirrus Logic International Semiconductor Ltd. | Detection of acoustic impulse events in voice applications using a neural network |
CN108024142A (en) * | 2017-12-05 | 2018-05-11 | 深圳市茁壮网络股份有限公司 | A kind of video flow detection method and system |
CN108122562A (en) * | 2018-01-16 | 2018-06-05 | 四川大学 | A kind of audio frequency classification method based on convolutional neural networks and random forest |
CN108924586A (en) * | 2018-06-20 | 2018-11-30 | 北京奇艺世纪科技有限公司 | A kind of detection method of video frame, device and electronic equipment |
CN108989882A (en) * | 2018-08-03 | 2018-12-11 | 百度在线网络技术(北京)有限公司 | Method and apparatus for exporting the snatch of music in video |
CN109166593A (en) * | 2018-08-17 | 2019-01-08 | 腾讯音乐娱乐科技(深圳)有限公司 | audio data processing method, device and storage medium |
Non-Patent Citations (4)
Title |
---|
冀中等: "新闻视频故事单元分割技术综述", 《中国图象图形学报》 * |
李明浩: "基于深度神经网络的连续语音识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
郭超远: "音频数据采集系统的设计与实施", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
韩凝: "基于深度神经网络的音乐自动标注技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112182301A (en) * | 2020-09-30 | 2021-01-05 | 北京百度网讯科技有限公司 | Method and device for extracting video clip |
EP3836141A3 (en) * | 2020-09-30 | 2021-10-20 | Beijing Baidu Netcom Science And Technology Co. Ltd. | Method and apparatus for extracting video clip |
US11646050B2 (en) | 2020-09-30 | 2023-05-09 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for extracting video clip |
CN113569740A (en) * | 2021-07-27 | 2021-10-29 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Video recognition model training method and device and video recognition method and device |
CN113569740B (en) * | 2021-07-27 | 2023-11-21 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Video recognition model training method and device, and video recognition method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110213670A (en) | Method for processing video frequency, device, electronic equipment and storage medium | |
Schlüter | Learning to Pinpoint Singing Voice from Weakly Labeled Examples. | |
CN110324726A (en) | Model generation, method for processing video frequency, device, electronic equipment and storage medium | |
CN110148400A (en) | The pronunciation recognition methods of type, the training method of model, device and equipment | |
Stein et al. | Automatic detection of audio effects in guitar and bass recordings | |
CN107086040A (en) | Speech recognition capabilities method of testing and device | |
Krijnders et al. | Sound event recognition through expectancy-based evaluation ofsignal-driven hypotheses | |
CN110473525A (en) | The method and apparatus for obtaining voice training sample | |
CN104810025A (en) | Audio similarity detecting method and device | |
Khan et al. | A novel audio forensic data-set for digital multimedia forensics | |
CN109979485B (en) | Audio evaluation method and device | |
CN113257283B (en) | Audio signal processing method and device, electronic equipment and storage medium | |
Müller et al. | Interactive fundamental frequency estimation with applications to ethnomusicological research | |
CN110324657A (en) | Model generation, method for processing video frequency, device, electronic equipment and storage medium | |
CN111724770A (en) | Audio keyword identification method for generating confrontation network based on deep convolution | |
CN107885845B (en) | Audio classification method and device, computer equipment and storage medium | |
CN108877779A (en) | Method and apparatus for detecting voice tail point | |
CN104700831B (en) | The method and apparatus for analyzing the phonetic feature of audio file | |
Goldstein et al. | Guitar Music Transcription from Silent Video. | |
Felipe et al. | Acoustic scene classification using spectrograms | |
Pilia et al. | Time scaling detection and estimation in audio recordings | |
US9445210B1 (en) | Waveform display control of visual characteristics | |
CN116721675A (en) | Audio event detection method and device | |
KR101382356B1 (en) | Apparatus for forgery detection of audio file | |
Bhatia et al. | Analysis of audio features for music representation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191011 |