CN108307229A - A kind of processing method and equipment of video-audio data - Google Patents
A kind of processing method and equipment of video-audio data Download PDFInfo
- Publication number
- CN108307229A CN108307229A CN201810107188.8A CN201810107188A CN108307229A CN 108307229 A CN108307229 A CN 108307229A CN 201810107188 A CN201810107188 A CN 201810107188A CN 108307229 A CN108307229 A CN 108307229A
- Authority
- CN
- China
- Prior art keywords
- video
- content
- audio
- subobject
- feature information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 11
- 230000011218 segmentation Effects 0.000 claims abstract description 18
- 238000000034 method Methods 0.000 claims description 28
- 238000013136 deep learning model Methods 0.000 claims description 11
- 239000000284 extract Substances 0.000 claims description 8
- 230000009467 reduction Effects 0.000 claims description 4
- 238000011946 reduction process Methods 0.000 claims description 4
- 238000000926 separation method Methods 0.000 claims description 4
- 239000000463 material Substances 0.000 abstract description 5
- 238000004880 explosion Methods 0.000 description 16
- 238000000605 extraction Methods 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 7
- 238000003860 storage Methods 0.000 description 7
- 239000002360 explosive Substances 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 235000013399 edible fruits Nutrition 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
This application provides a kind of processing method of video-audio data and equipment, video-audio data Object Segmentation is first multiple subobjects by the program, then the video feature information about video content in the subobject is extracted, and about the audio feature information of audio content in the subobject, further according to the video feature information and audio feature information, determine the content tab of each subobject, the particular content that each subobject includes in video-audio data object can be determined by content tab, the association between content tab can be used for indicating the incidence relation between each section content simultaneously, and then it can be effectively using the audio-video frequency content in video-audio data object, realize that the United Dispatching of video and audio material uses.
Description
Technical field
This application involves information technology field more particularly to the processing methods and equipment of a kind of video-audio data.
Background technology
As development of smart machine and audio frequency and video technology, such as film, TV play etc. include audio content and are regarded
The speed that the video-audio data object of frequency content is generated and propagated is greatly speeded up, but these video-audio data objects are generally independently deposited
The unified method for identifying and applying and channel are being lacked for content therein.And current technology is mainly technology by regarding
Frequently/audio-frequency fingerprint and corresponding audio/video library carry out the identification of video/audio, it is difficult to determine in video-audio data object
The content for including specifically extremely between incidence relation, and then can not effectively apply video-audio data object in audio-video frequency content.
Apply for content
The purpose of the application is to provide a kind of processing method and equipment of video-audio data, to solve in the prior art
The problem of incidence relation being difficult between determining the content for including specifically in video-audio data object extremely.
To achieve the above object, this application provides a kind of processing method of video-audio data, this method includes:
It is multiple subobjects by video-audio data Object Segmentation;
It extracts in video feature information and the subobject in the subobject about video content about in audio
The audio feature information of appearance;
According to the video feature information and audio feature information, the content tab of each subobject is determined.
Another aspect based on the application, additionally provides a kind of processing equipment of video-audio data, which includes:
Divide module, for being multiple subobjects by video-audio data Object Segmentation;
Characteristic extracting module, for extracting video feature information in the subobject about video content and described
About the audio feature information of audio content in subobject;
Categorical match module, for according to the video feature information and audio feature information, determining each subobject
Content tab.
In addition, present invention also provides a kind of processing equipments of video-audio data, wherein the equipment includes:
Processor;And
One or more machine readable medias of machine readable instructions are stored with, when the processor execution machine can
When reading instruction so that the equipment executes the processing method of video-audio data above-mentioned.
It is first multiple subobjects by video-audio data Object Segmentation in the processing scheme of video-audio data provided by the present application,
Then it extracts in video feature information and the subobject in the subobject about video content about audio content
Audio feature information determines the content tab of each subobject further according to the video feature information and audio feature information, leads to
Cross the particular content that content tab can determine that each subobject includes in video-audio data object, while the pass between content tab
Connection can be used for indicating the incidence relation between each section content, and then can effectively apply the sound in video-audio data object
Video content realizes that the United Dispatching of video and audio material uses.
Description of the drawings
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 shows a kind of process chart of the processing method of video-audio data provided by the embodiments of the present application;
Fig. 2 shows overall flows when being handled video-audio data object using method provided by the embodiments of the present application
Schematic diagram;
Fig. 3 shows a kind of structural schematic diagram of the processing equipment of video-audio data provided by the embodiments of the present application;
Fig. 4 shows the structural schematic diagram of the processing equipment of another video-audio data provided by the embodiments of the present application;
Same or analogous reference numeral represents same or analogous component in attached drawing.
Specific implementation mode
The application is described in further detail below in conjunction with the accompanying drawings.
In a typical configuration of this application, terminal, the equipment of service network include one or more processors
(CPU), input/output interface, network interface and memory.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media, can be by any side
Method or technology realize information storage.Information can be computer-readable instruction, data structure, the module of program or other numbers
According to.The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM
(SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory
(ROM), electrically erasable programmable read-only memory (EEPROM), fast flash memory bank or other memory techniques, CD-ROM (CD-
ROM), digital versatile disc (DVD) or other optical storages, magnetic tape cassette, magnetic tape disk storage or other magnetic storages
Equipment or any other non-transmission medium can be used for storage and can be accessed by a computing device information.
The embodiment of the present application provides a kind of processing method of video-audio data, and this method can be to determining video-audio data object
In each subobject particular content for including, can effectively apply the audio-video frequency content in video-audio data object, realize and regard sound
The United Dispatching of frequency data uses.The executive agent of this method can be user equipment, the network equipment or user equipment and network
Equipment is integrated constituted equipment by network, or can also be the application program for running on above equipment.The user
Equipment includes but not limited to all kinds of terminal devices such as computer, mobile phone, tablet computer;The network equipment includes but not limited to such as
The realizations such as network host, single network server, multiple network server collection or the set of computers based on cloud computing.Here,
Cloud is made of a large amount of hosts or network server for being based on cloud computing (Cloud Computing), wherein cloud computing is distributed
One kind of calculating, a virtual machine being made of the computer collection of a group loose couplings.
Fig. 1 shows a kind of processing method of video-audio data provided by the embodiments of the present application, and this method comprises the following steps:
Video-audio data Object Segmentation is multiple subobjects by step S101.The video-audio data in the embodiment of the present application
Object refers to the file or data flow for including audio, video data, and particular content can be film, a TV play etc..It is described
Subobject refers to a portion content of video-audio data object, for example, for 120 minutes films of a duration for, can be with
Multiple segments are averagely divided into according to duration, each segment is a subobject.
In some embodiments of the present application, Spatial-temporal slice can be passed through when being split to video-audio data object
The mode of (spatio-temporal slice) cluster, i.e., according to the video content in video-audio data object, to the audio-visual number
Spatial-temporal slice cluster is carried out according to object, and is based on cluster result, determines multiple subobjects.The Spatial-temporal slice refers to by video figure
As sequence successive frame in same position the image that is formed according to sequential of pixel bars band, since the picture of similar content is in vision
On have certain similitude, video-audio data object is split can be partitioned by way of Spatial-temporal slice cluster
Video-audio data in each subobject belongs to similar content.
For example, the picture in one section of video includes 3 partial contents, first part is two human dialogs in indoor scene
Picture, second part are the picture about gardens scenery in outdoor scene, and Part III is then the picture that outdoor scene explodes
Face.It, can be accurate by way of Spatial-temporal slice cluster since this three parts picture visually has very big difference
This section of video is divided into three parts, the video frame that each part is included is a cluster result, corresponding to
Video and audio be a subobject.
In actual scene, since the actual conditions of each picture can be more complicated, the cluster result based on Spatial-temporal slice
Error may also be will appear, such as first part may be due to personage's about the picture of two human dialogs in indoor scene
It is mobile, cause the image content of wherein certain part that larger change occurs so that the first part is divided into two cluster knots
Fruit, or it is also possible to second part and the picture of Part III is divided into a cluster result.It is tied as a result, based on cluster
Fruit can be according to the similarity between the cluster result, to the cluster result into Mobile state tune when determining multiple subobjects
It is whole, determine multiple subobjects.For example, by setting dynamic threshold so that similarity threshold when being clustered can be adjusted dynamically
It is whole, to be merged to preliminary cluster result or continue to split so that final cluster result is more accurate.
Step S102 is extracted in video feature information and the subobject in the subobject about video content
Audio feature information about audio content.
When handling the part about video, handled based on the video content in each subobject, such as one
Portion's film carries out feature extraction, you can obtain its feature after being divided into multiple segments to the video content in each segment
Information.In some embodiments of the present application, key frame can be first extracted from the video content of the subobject, then to closing
Key frame is handled, and the video feature information of the key frame is obtained, as the video about video content in the subobject
Characteristic information.
Wherein, key frame refers to the frame residing for the key operations in image motion or variation, being capable of reflecting video image sequence
The content actually expressed, such as a video content about explosion, key frame can be indicate explosion cause (such as
Hit when occurring) frame when generating of frame, explosive flame, explosive flame maximum when frame etc. when disappearing of frame and explosive flame
Deng.Since key frame has been able to the physical meaning of preferably reflecting video content, by by the video features of key frame
Information is as the video feature information about video content in the subobject, it is possible to reduce processing operand improves processing speed
Degree.
The video feature information can be the characteristics of image such as texture, color, shape or spatial relationship, in actual scene
In, selection can be needed to be used as video feature information suitable for one or more characteristics of image of current scene according to scene,
To improve the accuracy of processing.The form of multi-C vector collection may be used for the video feature information got to record.
And when handling the part about audio, then it can be handled based on the audio content in each subobject.Such as
For a film, after being divided into multiple segments, feature extraction is carried out to the audio content in each segment, you can obtain
Its characteristic information.For general video-audio data object, audio content includes multiple types, for example, the sound of personage, audio,
Ambient sound, background music etc..By taking the video content of two human dialogs in indoor scene as an example, corresponding audio content can
It can the footsteps when walking about of voice, two personages comprising two personages, the sound that vehicle was opened outside room and background
Music etc., these audio contents can correspond to the different wave of different-waveband.Thus it in some embodiments of the present application, is extracting
When audio frequency characteristics, waveform recognition can be carried out in different wave bands, inhomogeneity is extracted from the audio content of the subobject
The audio collection of type, these audio collections can be voice/audio collection, ambient sound collection or background music collection etc..For these sounds
Frequency collects, and can extract audio feature information therein respectively, believes about the audio frequency characteristics of audio content as in the subobject
Breath.The form of multi-C vector collection may be used for the audio feature information got to record.
It, can be first by audio content from the son when audio content in child objects is handled in actual scene
It is separated in object.Meanwhile the accuracy to improve when audio feature extraction, different wave band carry out waveform recognition it
Before, can noise reduction process first be carried out to the audio content of the subobject.
Step S103 determines the content tab of each subobject according to the video feature information and audio feature information.
The content tab is intended to indicate that the information for the video content that subobject is included actually, can be according to the demand of user from each
A scheduling description video content, such as the content for describing to include, residing scene or corresponding emotion etc..
In some embodiments of the present application, the mode of deep learning may be used to complete the identification of content tab,
Carry out video-audio data processing before, a deep learning model can be built, by marked content tab audio content and
Video content is trained deep learning model the identification so that it can for subobject content tab as training set.Example
Such as, scheme provided by the embodiments of the present application is if desired allow to identify the segment in a certain film whether about in explosion
Hold, then all kinds of videos about explosion and audio can be provided and are used as training set, includes about these videos in the training set
Video feature information and audio feature information about these audios, and it is explosion to have marked its content tab.In training sample
Under the premise of this is enough, deep learning model can be special to the video feature information for not marking content tab or audio of input
Reference breath is identified, and determines whether its content tab can be explosion, so that it is determined that the content corresponding to the vidclip.
It, can be according to the subobject after determining the content tab of subobject in another embodiment of the application
Content tab sorts out the subobject in the video-audio data object, generates object of classification collection.For example, for an electricity
The segment of all about explosion can be classified as the set of explosion segment, the segment that all about personage fights also may be used by shadow
To be individually classified as a set.
In actual scene, when child objects are sorted out, external input or preset classification condition, example can be based on
Keyword input by user can be such as obtained, matched content tab is chosen according to keyword, and then obtains suitable content
Set.By taking film as an example, the trailer of the film is if desired generated, then can will use scheme provided by the embodiments of the present application will
The film is divided into multiple segments, then generates the corresponding content tab of each segment.User can be defeated according to the actual needs
Enter corresponding keyword, to choose the segment for generating trailer and needing, such as user needs to generate the advance notice that style compares tender feeling
Piece can then choose the segment met corresponding to the content tab of the style, as the material for generating trailer, form one
A set of segments.Similarly, if user needs to generate the more trailer of content of fighting, corresponding content label can also be chosen
Segment.
For audio content and video content, its label can be individually set, you can be divided into video content label and sound
Frequency content tab, the two correspond, and are associated with the subobject for the video-audio data object that segmentation obtains.As a result, based on interior
When appearance label is sorted out, it can also classify in combination with audio and video individually according to audio or video,
To generate the set of user's needs, collection can be right according to the video content label and/or audio content label of the subobject
The video content of subobject and/or audio content are sorted out in the video-audio data object, obtain video content collection and/or sound
Frequency content set.
Fig. 2 shows overall flows when being handled video-audio data object using method provided by the embodiments of the present application
Schematic diagram, the overall flow include following processing step:
S201 is primarily based on video content and is split, and is divided into multiple subobjects.
S202 carries out video feature extraction for the video content after segmentation, obtains video feature information.
S203, while audio and video are detached, the audio content corresponding to video after being divided.
S204 carries out noise reduction to audio content, eliminates noise.
S205 identifies waveform in different-waveband, isolates different types of audio, such as separation voice/audio etc..
S206 carries out audio spy's feature extraction to different types of audio, obtains audio feature information.
S207 handles video feature information and audio feature information input deep learning model.
S208 identifies content tab, is classified as in multiple video content collection and audio according to the handling result of deep learning
Hold collection.
Based on same inventive concept, the processing equipment of video-audio data, the equipment pair are additionally provided in the embodiment of the present application
The method answered is the method in previous embodiment, and its principle solved the problems, such as is similar to this method.
The embodiment of the present application provides a kind of processing equipment of video-audio data, which can be to determining video-audio data object
In each subobject particular content for including, can effectively apply the audio-video frequency content in video-audio data object, realize and regard sound
The United Dispatching of frequency data uses.The specific implementation of the equipment can be user equipment, the network equipment or user equipment and network
Equipment is integrated constituted equipment by network, or can also be the application program for running on above equipment.The user
Equipment includes but not limited to all kinds of terminal devices such as computer, mobile phone, tablet computer;The network equipment includes but not limited to such as
The realizations such as network host, single network server, multiple network server collection or the set of computers based on cloud computing.Here,
Cloud is made of a large amount of hosts or network server for being based on cloud computing (Cloud Computing), wherein cloud computing is distributed
One kind of calculating, a virtual machine being made of the computer collection of a group loose couplings.
Fig. 3 shows that a kind of processing equipment of video-audio data provided by the embodiments of the present application, the equipment include segmentation module
310, characteristic extracting module 320 and categorical match module 330.The segmentation module 310 is used for video-audio data Object Segmentation
Multiple subobjects.The video-audio data object in the embodiment of the present application refers to the file or data for including audio, video data
Stream, particular content can be film, a TV play etc..The subobject refers in a portion of video-audio data object
Hold, for example, for 120 minutes films of a duration for, multiple segments, each segment can be averagely divided into according to duration
An as subobject.
In some embodiments of the present application, when segmentation module 310 can pass through when being split to video-audio data object
The mode of cut-in without ball piece (spatio-temporal slice) cluster, i.e., according to the video content in video-audio data object, to described
Video-audio data object carries out Spatial-temporal slice cluster, and is based on cluster result, determines multiple subobjects.The Spatial-temporal slice refer to by
The image that the pixel bars band of same position is formed according to sequential in the successive frame of sequence of video images, due to the picture of similar content
Certain similitude is visually had, video-audio data object is split and can be made by way of Spatial-temporal slice cluster
The video-audio data being partitioned into each subobject belongs to similar content.
For example, the picture in one section of video includes 3 partial contents, first part is two human dialogs in indoor scene
Picture, second part are the picture about gardens scenery in outdoor scene, and Part III is then the picture that outdoor scene explodes
Face.It, can be accurate by way of Spatial-temporal slice cluster since this three parts picture visually has very big difference
This section of video is divided into three parts, the video frame that each part is included is a cluster result, corresponding to
Video and audio be a subobject.
In actual scene, since the actual conditions of each picture can be more complicated, the cluster result based on Spatial-temporal slice
Error may also be will appear, such as first part may be due to personage's about the picture of two human dialogs in indoor scene
It is mobile, cause the image content of wherein certain part that larger change occurs so that the first part is divided into two cluster knots
Fruit, or it is also possible to second part and the picture of Part III is divided into a cluster result.It is tied as a result, based on cluster
Fruit can be according to the similarity between the cluster result, to the cluster result into Mobile state tune when determining multiple subobjects
It is whole, determine multiple subobjects.For example, by setting dynamic threshold so that similarity threshold when being clustered can be adjusted dynamically
It is whole, to be merged to preliminary cluster result or continue to split so that final cluster result is more accurate.
Characteristic extracting module 320 is used to extract video feature information, the Yi Jisuo in the subobject about video content
State the audio feature information about audio content in subobject.Due to being related to the processing of video and audio, the feature extraction mould
Block can include video feature extraction submodule and audio feature extraction submodule.
When handling the part about video, handled based on the video content in each subobject, such as one
Portion's film carries out feature extraction, you can obtain its feature after being divided into multiple segments to the video content in each segment
Information.In some embodiments of the present application, key frame can be first extracted from the video content of the subobject, then to closing
Key frame is handled, and the video feature information of the key frame is obtained, as the video about video content in the subobject
Characteristic information.
Wherein, key frame refers to the frame residing for the key operations in image motion or variation, being capable of reflecting video image sequence
The content actually expressed, such as a video content about explosion, key frame can be indicate explosion cause (such as
Hit when occurring) frame when generating of frame, explosive flame, explosive flame maximum when frame etc. when disappearing of frame and explosive flame
Deng.Since key frame has been able to the physical meaning of preferably reflecting video content, by by the video features of key frame
Information is as the video feature information about video content in the subobject, it is possible to reduce processing operand improves processing speed
Degree.
The video feature information can be the characteristics of image such as texture, color, shape or spatial relationship, in actual scene
In, selection can be needed to be used as video feature information suitable for one or more characteristics of image of current scene according to scene,
To improve the accuracy of processing.The form of multi-C vector collection may be used for the video feature information got to record.
And when handling the part about audio, then it can be handled based on the audio content in each subobject.Such as
For a film, after being divided into multiple segments, feature extraction is carried out to the audio content in each segment, you can obtain
Its characteristic information.For general video-audio data object, audio content includes multiple types, for example, the sound of personage, audio,
Ambient sound, background music etc..By taking the video content of two human dialogs in indoor scene as an example, corresponding audio content can
It can the footsteps when walking about of voice, two personages comprising two personages, the sound that vehicle was opened outside room and background
Music etc., these audio contents can correspond to the different wave of different-waveband.Thus it in some embodiments of the present application, is extracting
When audio frequency characteristics, waveform recognition can be carried out in different wave bands, inhomogeneity is extracted from the audio content of the subobject
The audio collection of type, these audio collections can be voice/audio collection, ambient sound collection or background music collection etc..For these sounds
Frequency collects, and can extract audio feature information therein respectively, believes about the audio frequency characteristics of audio content as in the subobject
Breath.The form of multi-C vector collection may be used for the audio feature information got to record.
In actual scene, equipment provided by the embodiments of the present application can also include noise reduction module, audio and video separation module
Deng, can be first right from the son by audio content when wherein audio content of the noise reduction module in child objects is handled
It is separated as in.Meanwhile the accuracy to improve when audio feature extraction, before different wave bands carries out waveform recognition,
Audio and video separation module first can carry out noise reduction process to the audio content of the subobject.
Categorical match module 330 determines the interior of each subobject according to the video feature information and audio feature information
Hold label.The content tab is intended to indicate that the information for the video content that subobject is included actually, can be according to user's
Demand describes video content, such as the content for describing to include, residing scene or corresponding emotion etc. from each scheduling.
In some embodiments of the present application, it is interior to complete that the mode of deep learning may be used in categorical match module 330
The identification for holding label can build a deep learning model, by marking content mark before the processing for carrying out video-audio data
The audio content and video content of label are trained deep learning model as training set so that it can in subobject
Hold the identification of label.For example, scheme provided by the embodiments of the present application is if desired allow to identify that the segment in a certain film is
The no content about explosion can then provide all kinds of videos about explosion and audio as training set, be wrapped in the training set
Containing about these videos video feature information and about the audio feature information of these audios, and marked its content tab
For explosion.Under the premise of training sample is enough, deep learning model can be to the video for not marking content tab of input
Characteristic information or audio feature information are identified, and determine whether its content tab can be explosion, so that it is determined that the movie film
Content corresponding to section.
In another embodiment of the application, categorical match module 330 is after determining the content tab of subobject, Ke Yigen
According to the content tab of the subobject, the subobject in the video-audio data object is sorted out, generates object of classification collection.Example
Such as, for a film, the segment of all about explosion can be classified as to the set of explosion segment, all about personage is beaten
The segment of bucket can also individually be classified as a set.
In actual scene, when child objects are sorted out, external input or preset classification condition, example can be based on
Keyword input by user can be such as obtained, matched content tab is chosen according to keyword, and then obtains suitable content
Set.By taking film as an example, the trailer of the film is if desired generated, then can will use scheme provided by the embodiments of the present application will
The film is divided into multiple segments, then generates the corresponding content tab of each segment.User can be defeated according to the actual needs
Enter corresponding keyword, to choose the segment for generating trailer and needing, such as user needs to generate the advance notice that style compares tender feeling
Piece can then choose the segment met corresponding to the content tab of the style, as the material for generating trailer, form one
A set of segments.Similarly, if user needs to generate the more trailer of content of fighting, corresponding content label can also be chosen
Segment.
For audio content and video content, its label can be individually set, you can be divided into video content label and sound
Frequency content tab, the two correspond, and are associated with the subobject for the video-audio data object that segmentation obtains.As a result, based on interior
When appearance label is sorted out, it can also classify in combination with audio and video individually according to audio or video,
To generate the set of user's needs, collection can be right according to the video content label and/or audio content label of the subobject
The video content of subobject and/or audio content are sorted out in the video-audio data object, obtain video content collection and/or sound
Frequency content set.
It is first more by video-audio data Object Segmentation in conclusion in the processing scheme of video-audio data provided by the present application
A subobject, then extract in video feature information and the subobject in the subobject about video content about
The audio feature information of audio content determines each subobject further according to the video feature information and audio feature information
Content tab can determine the particular content that each subobject includes in video-audio data object, while content by content tab
Association between label can be used for indicating the incidence relation between each section content, and then can effectively apply audio-visual number
According to the audio-video frequency content in object, realize that the United Dispatching of video and audio material uses.
In addition, the part of the application can be applied to computer program product, such as computer program instructions, when its quilt
When computer executes, by the operation of the computer, it can call or provide according to the present processes and/or technical solution.
And the program instruction of the present processes is called, it is possibly stored in fixed or moveable recording medium, and/or pass through
Broadcast or the data flow in other signal loaded mediums and be transmitted, and/or be stored in the calculating run according to program instruction
In the working storage of machine equipment.Here, including an equipment as shown in Figure 4 according to one embodiment of the application, this sets
Standby includes the one or more machine readable medias 410 for being stored with machine readable instructions and the place for executing machine readable instructions
Manage device 420, wherein when the machine readable instructions are executed by the processor so that the equipment is executed based on aforementioned according to this
The method and/or technology scheme of multiple embodiments of application.
It should be noted that the application can be carried out in the assembly of software and/or software and hardware, for example, can adopt
With application-specific integrated circuit (ASIC), general purpose computer or any other realized similar to hardware device.In one embodiment
In, the software program of the application can be executed by processor to realize above step or function.Similarly, the software of the application
Program (including relevant data structure) can be stored in computer readable recording medium storing program for performing, for example, RAM memory, magnetic or
CD-ROM driver or floppy disc and similar devices.In addition, hardware can be used to realize in some steps or function of the application, for example,
Coordinate to execute the circuit of each step or function as with processor.
It is obvious to a person skilled in the art that the application is not limited to the details of above-mentioned exemplary embodiment, Er Qie
In the case of without departing substantially from spirit herein or essential characteristic, the application can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and scope of the present application is by appended power
Profit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent requirements of the claims
Variation is included in the application.Any reference signs in the claims should not be construed as limiting the involved claims.This
Outside, it is clear that one word of " comprising " is not excluded for other units or step, and odd number is not excluded for plural number.That is stated in device claim is multiple
Unit or device can also be realized by a unit or device by software or hardware.The first, the second equal words are used for table
Show title, and does not represent any particular order.
Claims (21)
1. a kind of processing method of video-audio data, wherein this method includes:
It is multiple subobjects by video-audio data Object Segmentation;
It extracts in video feature information and the subobject in the subobject about video content about audio content
Audio feature information;
According to the video feature information and audio feature information, the content tab of each subobject is determined.
2. it is multiple subobjects by video-audio data Object Segmentation according to the method described in claim 1, wherein, including:
According to the video content in video-audio data object, Spatial-temporal slice cluster is carried out to the video-audio data object;
Based on cluster result, multiple subobjects are determined.
3. according to the method described in claim 2, wherein, it is based on cluster result, determines multiple subobjects, including:
According to the similarity between the cluster result, the cluster result is adjusted into Mobile state, determines multiple subobjects.
4. according to the method described in claim 1, wherein, extracting the video features letter about video content in the subobject
Breath, including:
Key frame is extracted from the video content of the subobject;
The video feature information for obtaining the key frame, as the video feature information about video content in the subobject.
5. according to the method described in claim 1, wherein, extracting the audio frequency characteristics letter about audio content in the subobject
Breath, including:
Waveform recognition is carried out in different wave bands, different types of audio collection is extracted from the audio content of the subobject;
The audio feature information in the audio collection is extracted respectively, as the audio frequency characteristics about audio content in the subobject
Information.
6. according to the method described in claim 5, wherein, waveform recognition is carried out in different wave bands, from the sound of the subobject
Before extracting different types of audio collection in frequency content, further include:
Noise reduction process is carried out to the audio content of the subobject.
7. according to the method described in claim 1, wherein, extracting the audio feature information about audio content in the subobject
Before, further include:
The audio content is isolated from the subobject.
8. according to the method described in claim 1, wherein, according to the video feature information and audio feature information, determining every
The content tab of a subobject, including:
The video feature information and audio feature information are inputted into deep learning model, obtain the content mark of each subobject
Label, wherein the deep learning model is trained acquisition based on the audio content and video content for marking content tab.
9. according to the method described in claim 1, wherein, this method further includes:
According to the content tab of the subobject, the subobject in the video-audio data object is sorted out, generates classification pair
As collection.
10. according to the method described in claim 9, wherein, the content tab includes video content label and audio content mark
Label;
According to the content tab of the subobject, the subobject in the video-audio data object is sorted out, obtains classification pair
As collection, including:
According to the video content label and/or audio content label of the subobject, to subobject in the video-audio data object
Video content and/or audio content sorted out, obtain video content collection and/or video content collection.
11. a kind of processing equipment of video-audio data, wherein the equipment includes:
Divide module, for being multiple subobjects by video-audio data Object Segmentation;
Characteristic extracting module is right for extracting video feature information in the subobject about video content and the son
About the audio feature information of audio content as in;
Categorical match module, for according to the video feature information and audio feature information, determining the content of each subobject
Label.
12. equipment according to claim 11, wherein the segmentation module, for according to regarding in video-audio data object
Frequency content carries out Spatial-temporal slice cluster to the video-audio data object;Based on cluster result, multiple subobjects are determined.
13. equipment according to claim 12, wherein the segmentation module, for according between the cluster result
Similarity adjusts the cluster result into Mobile state, determines multiple subobjects.
14. equipment according to claim 11, wherein the characteristic extracting module, for the video from the subobject
Key frame is extracted in content;The video feature information for obtaining the key frame, as in the subobject about video content
Video feature information.
15. equipment according to claim 11, wherein the characteristic extracting module, in different wave bands into traveling wave
Shape identifies, different types of audio collection is extracted from the audio content of the subobject;It extracts respectively in the audio collection
Audio feature information, as the audio feature information about audio content in the subobject.
16. equipment according to claim 15, wherein the equipment further includes:
Noise reduction module extracts difference for carrying out waveform recognition in different wave bands from the audio content of the subobject
Before the audio collection of type, noise reduction process is carried out to the audio content of the subobject.
17. equipment according to claim 11, wherein the equipment further includes:
Audio and video separation module, for isolating the audio content from the subobject.
18. equipment according to claim 11, wherein according to the video feature information and audio feature information, determine
The content tab of each subobject, including:
The video feature information and audio feature information are inputted into deep learning model, obtain the content mark of each subobject
Label, wherein the deep learning model is trained acquisition based on the audio content and video content for marking content tab.
19. equipment according to claim 11, wherein the categorical match module is additionally operable to according to the subobject
Content tab sorts out the subobject in the video-audio data object, generates object of classification collection.
20. equipment according to claim 19, wherein the content tab includes video content label and audio content mark
Label;
The categorical match module, for the video content label and/or audio content label according to the subobject, to described
The video content of subobject and/or audio content are sorted out in video-audio data object, obtain in video content collection and/or video
Hold collection.
21. a kind of processing equipment of video-audio data, wherein the equipment includes:
Processor;And
One or more machine readable medias of machine readable instructions are stored with, when the processor executes the machine readable finger
When enabling so that the equipment executes the method as described in any one of claims 1 to 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810107188.8A CN108307229B (en) | 2018-02-02 | 2018-02-02 | Video and audio data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810107188.8A CN108307229B (en) | 2018-02-02 | 2018-02-02 | Video and audio data processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108307229A true CN108307229A (en) | 2018-07-20 |
CN108307229B CN108307229B (en) | 2023-12-22 |
Family
ID=62850942
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810107188.8A Active CN108307229B (en) | 2018-02-02 | 2018-02-02 | Video and audio data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108307229B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101920A (en) * | 2018-08-07 | 2018-12-28 | 石家庄铁道大学 | Video time domain unit partioning method |
CN109120996A (en) * | 2018-08-31 | 2019-01-01 | 深圳市万普拉斯科技有限公司 | Video information recognition methods, storage medium and computer equipment |
CN109257622A (en) * | 2018-11-01 | 2019-01-22 | 广州市百果园信息技术有限公司 | A kind of audio/video processing method, device, equipment and medium |
CN109587568A (en) * | 2018-11-01 | 2019-04-05 | 北京奇艺世纪科技有限公司 | Video broadcasting method, device, computer readable storage medium |
CN110213670A (en) * | 2019-05-31 | 2019-09-06 | 北京奇艺世纪科技有限公司 | Method for processing video frequency, device, electronic equipment and storage medium |
CN110234038A (en) * | 2019-05-13 | 2019-09-13 | 特斯联(北京)科技有限公司 | A kind of user management method based on distributed storage |
CN110324726A (en) * | 2019-05-29 | 2019-10-11 | 北京奇艺世纪科技有限公司 | Model generation, method for processing video frequency, device, electronic equipment and storage medium |
CN110677716A (en) * | 2019-08-20 | 2020-01-10 | 咪咕音乐有限公司 | Audio processing method, electronic device, and storage medium |
CN110930997A (en) * | 2019-12-10 | 2020-03-27 | 四川长虹电器股份有限公司 | Method for labeling audio by using deep learning model |
CN111008287A (en) * | 2019-12-19 | 2020-04-14 | Oppo(重庆)智能科技有限公司 | Audio and video processing method and device, server and storage medium |
CN111770375A (en) * | 2020-06-05 | 2020-10-13 | 百度在线网络技术(北京)有限公司 | Video processing method and device, electronic equipment and storage medium |
CN112487248A (en) * | 2020-12-01 | 2021-03-12 | 深圳市易平方网络科技有限公司 | Video file label generation method and device, intelligent terminal and storage medium |
CN113095231A (en) * | 2021-04-14 | 2021-07-09 | 上海西井信息科技有限公司 | Video identification method, system, device and storage medium based on classified object |
CN113163272A (en) * | 2020-01-07 | 2021-07-23 | 海信集团有限公司 | Video editing method, computer device and storage medium |
Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20040041127A (en) * | 2004-04-23 | 2004-05-14 | 학교법인 한국정보통신학원 | An intelligent agent system for providing viewer-customized video skims in digital TV broadcasting |
US6829781B1 (en) * | 2000-05-24 | 2004-12-07 | At&T Corp. | Network-based service to provide on-demand video summaries of television programs |
CN1938714A (en) * | 2004-03-23 | 2007-03-28 | 英国电讯有限公司 | Method and system for semantically segmenting scenes of a video sequence |
CN100538698C (en) * | 2004-01-14 | 2009-09-09 | 三菱电机株式会社 | Summary transcriber and summary reproducting method |
JP2010039877A (en) * | 2008-08-07 | 2010-02-18 | Nippon Telegr & Teleph Corp <Ntt> | Apparatus and program for generating digest content |
US20100104261A1 (en) * | 2008-10-24 | 2010-04-29 | Zhu Liu | Brief and high-interest video summary generation |
US20120201519A1 (en) * | 2011-02-03 | 2012-08-09 | Jennifer Reynolds | Generating montages of video segments responsive to viewing preferences associated with a video terminal |
US20120281969A1 (en) * | 2011-05-03 | 2012-11-08 | Wei Jiang | Video summarization using audio and visual cues |
CN103299324A (en) * | 2010-11-11 | 2013-09-11 | 谷歌公司 | Learning tags for video annotation using latent subtags |
US20140082663A1 (en) * | 2009-05-29 | 2014-03-20 | Cognitive Media Networks, Inc. | Methods for Identifying Video Segments and Displaying Contextually Targeted Content on a Connected Television |
CN103854014A (en) * | 2014-02-25 | 2014-06-11 | 中国科学院自动化研究所 | Terror video identification method and device based on sparse representation of context |
US20140270699A1 (en) * | 2013-03-14 | 2014-09-18 | Centurylink Intellectual Property Llc | Auto-Summarizing Video Content System and Method |
US9002175B1 (en) * | 2013-03-13 | 2015-04-07 | Google Inc. | Automated video trailer creation |
US20150134673A1 (en) * | 2013-10-03 | 2015-05-14 | Minute Spoteam Ltd. | System and method for creating synopsis for multimedia content |
CN105279495A (en) * | 2015-10-23 | 2016-01-27 | 天津大学 | Video description method based on deep learning and text summarization |
CN105611413A (en) * | 2015-12-24 | 2016-05-25 | 小米科技有限责任公司 | Method and device for adding video clip class markers |
US9635337B1 (en) * | 2015-03-27 | 2017-04-25 | Amazon Technologies, Inc. | Dynamically generated media trailers |
CN106649713A (en) * | 2016-12-21 | 2017-05-10 | 中山大学 | Movie visualization processing method and system based on content |
CN106779073A (en) * | 2016-12-27 | 2017-05-31 | 西安石油大学 | Media information sorting technique and device based on deep neural network |
CN106878632A (en) * | 2017-02-28 | 2017-06-20 | 北京知慧教育科技有限公司 | A kind for the treatment of method and apparatus of video data |
CN107077595A (en) * | 2014-09-08 | 2017-08-18 | 谷歌公司 | Selection and presentation representative frame are for video preview |
CN107436921A (en) * | 2017-07-03 | 2017-12-05 | 李洪海 | Video data handling procedure, device, equipment and storage medium |
-
2018
- 2018-02-02 CN CN201810107188.8A patent/CN108307229B/en active Active
Patent Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6829781B1 (en) * | 2000-05-24 | 2004-12-07 | At&T Corp. | Network-based service to provide on-demand video summaries of television programs |
CN100538698C (en) * | 2004-01-14 | 2009-09-09 | 三菱电机株式会社 | Summary transcriber and summary reproducting method |
CN1938714A (en) * | 2004-03-23 | 2007-03-28 | 英国电讯有限公司 | Method and system for semantically segmenting scenes of a video sequence |
KR20040041127A (en) * | 2004-04-23 | 2004-05-14 | 학교법인 한국정보통신학원 | An intelligent agent system for providing viewer-customized video skims in digital TV broadcasting |
JP2010039877A (en) * | 2008-08-07 | 2010-02-18 | Nippon Telegr & Teleph Corp <Ntt> | Apparatus and program for generating digest content |
US20100104261A1 (en) * | 2008-10-24 | 2010-04-29 | Zhu Liu | Brief and high-interest video summary generation |
US20140082663A1 (en) * | 2009-05-29 | 2014-03-20 | Cognitive Media Networks, Inc. | Methods for Identifying Video Segments and Displaying Contextually Targeted Content on a Connected Television |
CN103299324A (en) * | 2010-11-11 | 2013-09-11 | 谷歌公司 | Learning tags for video annotation using latent subtags |
US20120201519A1 (en) * | 2011-02-03 | 2012-08-09 | Jennifer Reynolds | Generating montages of video segments responsive to viewing preferences associated with a video terminal |
US20120281969A1 (en) * | 2011-05-03 | 2012-11-08 | Wei Jiang | Video summarization using audio and visual cues |
US9002175B1 (en) * | 2013-03-13 | 2015-04-07 | Google Inc. | Automated video trailer creation |
US20140270699A1 (en) * | 2013-03-14 | 2014-09-18 | Centurylink Intellectual Property Llc | Auto-Summarizing Video Content System and Method |
US20150134673A1 (en) * | 2013-10-03 | 2015-05-14 | Minute Spoteam Ltd. | System and method for creating synopsis for multimedia content |
CN103854014A (en) * | 2014-02-25 | 2014-06-11 | 中国科学院自动化研究所 | Terror video identification method and device based on sparse representation of context |
CN107077595A (en) * | 2014-09-08 | 2017-08-18 | 谷歌公司 | Selection and presentation representative frame are for video preview |
US9635337B1 (en) * | 2015-03-27 | 2017-04-25 | Amazon Technologies, Inc. | Dynamically generated media trailers |
CN105279495A (en) * | 2015-10-23 | 2016-01-27 | 天津大学 | Video description method based on deep learning and text summarization |
CN105611413A (en) * | 2015-12-24 | 2016-05-25 | 小米科技有限责任公司 | Method and device for adding video clip class markers |
CN106649713A (en) * | 2016-12-21 | 2017-05-10 | 中山大学 | Movie visualization processing method and system based on content |
CN106779073A (en) * | 2016-12-27 | 2017-05-31 | 西安石油大学 | Media information sorting technique and device based on deep neural network |
CN106878632A (en) * | 2017-02-28 | 2017-06-20 | 北京知慧教育科技有限公司 | A kind for the treatment of method and apparatus of video data |
CN107436921A (en) * | 2017-07-03 | 2017-12-05 | 李洪海 | Video data handling procedure, device, equipment and storage medium |
Non-Patent Citations (3)
Title |
---|
B.L. TSENG 等: "Personalized video summary using visual semantic annotations and automatic speech transcriptions", 《IEEE》 * |
兰怡洁: "基于情感的视频摘要研究", 《中国优秀硕士学位论文电子期刊》 * |
谢毓湘, 栾悉道, 吴玲达, 老松杨: "NVPS:一个多模态的新闻视频处理系统", 情报学报, no. 04 * |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101920A (en) * | 2018-08-07 | 2018-12-28 | 石家庄铁道大学 | Video time domain unit partioning method |
CN109101920B (en) * | 2018-08-07 | 2021-06-25 | 石家庄铁道大学 | Video time domain unit segmentation method |
CN109120996A (en) * | 2018-08-31 | 2019-01-01 | 深圳市万普拉斯科技有限公司 | Video information recognition methods, storage medium and computer equipment |
CN109257622A (en) * | 2018-11-01 | 2019-01-22 | 广州市百果园信息技术有限公司 | A kind of audio/video processing method, device, equipment and medium |
CN109587568A (en) * | 2018-11-01 | 2019-04-05 | 北京奇艺世纪科技有限公司 | Video broadcasting method, device, computer readable storage medium |
CN110234038B (en) * | 2019-05-13 | 2020-02-14 | 特斯联(北京)科技有限公司 | User management method based on distributed storage |
CN110234038A (en) * | 2019-05-13 | 2019-09-13 | 特斯联(北京)科技有限公司 | A kind of user management method based on distributed storage |
CN110324726B (en) * | 2019-05-29 | 2022-02-18 | 北京奇艺世纪科技有限公司 | Model generation method, video processing method, model generation device, video processing device, electronic equipment and storage medium |
CN110324726A (en) * | 2019-05-29 | 2019-10-11 | 北京奇艺世纪科技有限公司 | Model generation, method for processing video frequency, device, electronic equipment and storage medium |
CN110213670A (en) * | 2019-05-31 | 2019-09-06 | 北京奇艺世纪科技有限公司 | Method for processing video frequency, device, electronic equipment and storage medium |
CN110213670B (en) * | 2019-05-31 | 2022-01-07 | 北京奇艺世纪科技有限公司 | Video processing method and device, electronic equipment and storage medium |
CN110677716B (en) * | 2019-08-20 | 2022-02-01 | 咪咕音乐有限公司 | Audio processing method, electronic device, and storage medium |
CN110677716A (en) * | 2019-08-20 | 2020-01-10 | 咪咕音乐有限公司 | Audio processing method, electronic device, and storage medium |
CN110930997A (en) * | 2019-12-10 | 2020-03-27 | 四川长虹电器股份有限公司 | Method for labeling audio by using deep learning model |
CN110930997B (en) * | 2019-12-10 | 2022-08-16 | 四川长虹电器股份有限公司 | Method for labeling audio by using deep learning model |
CN111008287A (en) * | 2019-12-19 | 2020-04-14 | Oppo(重庆)智能科技有限公司 | Audio and video processing method and device, server and storage medium |
CN111008287B (en) * | 2019-12-19 | 2023-08-04 | Oppo(重庆)智能科技有限公司 | Audio and video processing method and device, server and storage medium |
CN113163272B (en) * | 2020-01-07 | 2022-11-25 | 海信集团有限公司 | Video editing method, computer device and storage medium |
CN113163272A (en) * | 2020-01-07 | 2021-07-23 | 海信集团有限公司 | Video editing method, computer device and storage medium |
CN111770375A (en) * | 2020-06-05 | 2020-10-13 | 百度在线网络技术(北京)有限公司 | Video processing method and device, electronic equipment and storage medium |
US11800042B2 (en) | 2020-06-05 | 2023-10-24 | Baidu Online Network Technology (Beijing) Co., Ltd. | Video processing method, electronic device and storage medium thereof |
CN112487248A (en) * | 2020-12-01 | 2021-03-12 | 深圳市易平方网络科技有限公司 | Video file label generation method and device, intelligent terminal and storage medium |
CN112487248B (en) * | 2020-12-01 | 2024-09-06 | 重庆市易平方科技有限公司 | Label generation method and device for video file, intelligent terminal and storage medium |
CN113095231A (en) * | 2021-04-14 | 2021-07-09 | 上海西井信息科技有限公司 | Video identification method, system, device and storage medium based on classified object |
CN113095231B (en) * | 2021-04-14 | 2023-04-18 | 上海西井信息科技有限公司 | Video identification method, system, device and storage medium based on classified object |
Also Published As
Publication number | Publication date |
---|---|
CN108307229B (en) | 2023-12-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108307229A (en) | A kind of processing method and equipment of video-audio data | |
US12094209B2 (en) | Video data processing method and apparatus, device, and medium | |
CN109168024B (en) | Target information identification method and device | |
WO2022184117A1 (en) | Deep learning-based video clipping method, related device, and storage medium | |
US10970334B2 (en) | Navigating video scenes using cognitive insights | |
US20190066732A1 (en) | Video Skimming Methods and Systems | |
US9749684B2 (en) | Multimedia processing method and multimedia apparatus | |
US20140178043A1 (en) | Visual summarization of video for quick understanding | |
WO2023197979A1 (en) | Data processing method and apparatus, and computer device and storage medium | |
US10277834B2 (en) | Suggestion of visual effects based on detected sound patterns | |
CN115004299A (en) | Classifying audio scenes using composite image features | |
CN110505498A (en) | Processing, playback method, device and the computer-readable medium of video | |
CN111737516A (en) | Interactive music generation method and device, intelligent sound box and storage medium | |
WO2023045635A1 (en) | Multimedia file subtitle processing method and apparatus, electronic device, computer-readable storage medium, and computer program product | |
CN108921032A (en) | A kind of new video semanteme extracting method based on deep learning model | |
WO2023197749A1 (en) | Background music insertion time point determining method and apparatus, device, and storage medium | |
CN111797850A (en) | Video classification method and device, storage medium and electronic equipment | |
CN111488813B (en) | Video emotion marking method and device, electronic equipment and storage medium | |
WO2019127940A1 (en) | Video classification model training method, device, storage medium, and electronic device | |
CN110475139B (en) | Video subtitle shielding method and device, storage medium and electronic equipment | |
CN113923504B (en) | Video preview moving picture generation method and device | |
JP2005513675A (en) | Moving picture shape descriptor extracting apparatus and method showing statistical characteristics of still picture shape descriptor and moving picture index system using the same | |
CN116013274A (en) | Speech recognition method, device, computer equipment and storage medium | |
CN113542874A (en) | Information playing control method, device, equipment and computer readable storage medium | |
CN116896654B (en) | Video processing method and related device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |