CN109992679A

CN109992679A - A kind of classification method and device of multi-medium data

Info

Publication number: CN109992679A
Application number: CN201910218914.8A
Authority: CN
Inventors: 唐永毅; 马林; 刘威
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-03-21
Filing date: 2019-03-21
Publication date: 2019-07-09

Abstract

The embodiment of the invention discloses a kind of classification method of multi-medium data and devices, are applied to technical field of information processing.In the method for the present embodiment, the sorter of multi-medium data can sequentially in time be divided multiframe multi-medium data to be processed, be divided into multiple groups multi-medium data, and extract the corresponding assemblage characteristic information of each group multi-medium data；Finally further according to the corresponding assemblage characteristic information of each group multi-medium data, the global feature information of multiframe multi-medium data is determined, to classify to multiframe multi-medium data.In this way, the sorter of multi-medium data is when describing multiframe multi-medium data progress feature, consider the temporal characteristics between multiframe multi-medium data, finally obtained global feature information is enabled preferably to reflect multiframe multi-medium data, so that more acurrate to the classification of multiframe multi-medium data.

Description

A kind of classification method and device of multi-medium data

Technical field

The present invention relates to technical field of information processing, in particular to the classification method and device of a kind of multi-medium data.

Background technique

It is existing usually first to extract the characteristic information of video to be processed when classifying to video to be processed, then root again Determine that video to be processed belongs to the probability of each type according to the characteristic information and video classification model of extraction.

Under normal circumstances, the characteristic information of the video to be processed of extraction is characterized vector description, can specifically include: ladder It spends histogram (Histogram of Gradient), light stream histogram (Histogram of Optical Flow), visual word Bag indicates (Bag of Visual Words), Fei Sheer vector (Fisher Vector), localized clusters feature vector (Vector Of Locally Aggregated Descriptor, VLAD) and network part aggregation characteristic vector (Vector of Network Locally Aggregated Descriptor, NetVLAD) etc..Different feature vectors describes method to difference The video or picture classification model of feature have different classification performances.

The characteristic information of the video to be processed extracted at present mainly accounts for single frames grade another characteristic in video, no It is very comprehensively, so that it is very accurate for obtaining result not eventually by video classification model.

Summary of the invention

The embodiment of the present invention provides the classification method and device of a kind of multi-medium data, realizes according to multiple groups multimedia number According to corresponding assemblage characteristic information, the global feature information of multiframe multi-medium data is determined.

First aspect of the embodiment of the present invention provides a kind of classification method of multi-medium data, comprising:

Obtain multiframe multi-medium data to be processed；

The multiframe multi-medium data is divided into multiple groups multi-medium data sequentially in time, in every group of multi-medium data An at least frame multi-medium data including Time Continuous；

Extract the corresponding assemblage characteristic information of each group multi-medium data；

According to the corresponding assemblage characteristic information of each group multi-medium data, the multiframe multi-medium data is determined Global feature information, to classify to the multiframe multi-medium data.

Second aspect of the embodiment of the present invention provides a kind of sorter of multi-medium data, comprising:

Data capture unit, for obtaining multiframe multi-medium data to be processed；

Division unit, for the multiframe multi-medium data to be divided into multiple groups multi-medium data sequentially in time, often It include an at least frame multi-medium data for Time Continuous in group multi-medium data；

Feature extraction unit, for extracting the corresponding assemblage characteristic information of each group multi-medium data；

Characteristics determining unit, for determining institute according to the corresponding assemblage characteristic information of each group multi-medium data The global feature information of multiframe multi-medium data is stated, to classify to the multiframe multi-medium data.

The third aspect of the embodiment of the present invention provides a kind of storage medium, and the storage medium stores a plurality of instruction, the finger It enables and being suitable for as processor loads and executes the classification method of the multi-medium data as described in first aspect of the embodiment of the present invention.

A kind of server of fourth aspect of the embodiment of the present invention, including pocessor and storage media, the processor, for real Existing each instruction；The storage medium is for storing a plurality of instruction, and described instruction by processor for being loaded and executing such as this hair The classification method of multi-medium data described in bright embodiment first aspect.

As it can be seen that the sorter of multi-medium data can be sequentially in time to be processed in the method for the present embodiment Multiframe multi-medium data is divided, and multiple groups multi-medium data is divided into, and extracts the corresponding combination of each group multi-medium data Characteristic information；Finally further according to the corresponding assemblage characteristic information of each group multi-medium data, multiframe multi-medium data is determined Global feature information, to classify to multiframe multi-medium data.In this way, the sorter of multi-medium data is to the more matchmakers of multiframe When volume data progress feature describes, it is contemplated that the temporal characteristics between multiframe multi-medium data, so that finally obtained entirety Characteristic information can preferably reflect multiframe multi-medium data, so that more acurrate to the classification of multiframe multi-medium data.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention without any creative labor, may be used also for those of ordinary skill in the art To obtain other drawings based on these drawings.

Fig. 1 is a kind of schematic diagram of the classification method of multi-medium data provided in an embodiment of the present invention；

Fig. 2 is a kind of flow chart of the classification method of the multi-medium data provided in one embodiment of the invention；

Fig. 3 is a kind of schematic diagram of the classification method for multi-medium data that Application Example of the present invention provides；

Fig. 4 is the video file that the application server that application terminal is shown in Application Example of the present invention is uploaded according to user The recommendation information of transmission；

Fig. 5 is the schematic diagram that the first global feature information is obtained in Application Example of the present invention；

Fig. 6 is a kind of structural schematic diagram of the sorter of multi-medium data provided in an embodiment of the present invention；

Fig. 7 is a kind of structural schematic diagram of server provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Description and claims of this specification and term " first ", " second ", " third " " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so that the embodiment of the present invention described herein for example can be to remove Sequence other than those of illustrating or describe herein is implemented.In addition, term " includes " and " having " and theirs is any Deformation, it is intended that cover not exclusively include, for example, containing the process, method of a series of steps or units, system, production Product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for this A little process, methods, the other step or units of product or equipment inherently.

The embodiment of the present invention provides a kind of classification method of multi-medium data, as shown in Figure 1, mainly multi-medium data Sorter is implemented by the following steps:

Obtain multiframe multi-medium data to be processed；The multiframe multi-medium data is divided into multiple groups sequentially in time Multi-medium data includes an at least frame multi-medium data for Time Continuous in every group of multi-medium data；Extract each group multimedia number According to corresponding assemblage characteristic information；According to the corresponding assemblage characteristic information of each group multi-medium data, institute is determined The global feature information of multiframe multi-medium data is stated, to classify to the multiframe multi-medium data.

Method in the embodiment of the present invention can be applied in the recommender system of multi-medium data, also can be applied to The video that family uploads is filtered and is classified etc. in business.

In this way, the sorter of multi-medium data is when describing multiframe multi-medium data progress feature, it is contemplated that more Temporal characteristics between frame multi-medium data enable finally obtained global feature information preferably to reflect multiframe multimedia number According to so that more acurrate to the classification of multiframe multi-medium data.

The embodiment of the present invention provides a kind of classification method of multi-medium data, the classification of mainly above-mentioned multi-medium data Method performed by device, flow chart are as shown in Figure 2, comprising:

Step 101, multiframe multi-medium data to be processed is obtained.

It is appreciated that after multiframe multi-medium data is uploaded to application server by a certain application terminal by user, it can To be directed to the multiframe multi-medium data that application terminal uploads by the sorter of multi-medium data, the process of the present embodiment is initiated. Wherein, the sorter of multi-medium data can be application server, be also possible to the equipment independently of application server；Multiframe Multi-medium data is specially the multi-medium datas such as corresponding image data of multiple moment and audio data, such as video data Deng.

Step 102, multiframe multi-medium data is divided into multiple groups multi-medium data according to the time, in every group of multi-medium data An at least frame multi-medium data including Time Continuous.

Specifically, multiframe multi-medium data can be divided into mutually by the sorter of multi-medium data sequentially in time Disjoint multiple groups multi-medium data；And the different frame numbers for organizing the multi-medium data for including in multi-medium datas can be identical, Can be different, for example include n frame multi-medium data in a certain group of multi-medium data, and in another group of multi-medium data include m frame Multi-medium data, then m can be equal or different to n.

For example, multiframe multi-medium data is the multi-medium data at T1 moment, the multi-medium data ... ... at T2 moment, when Tn The sorter of the multi-medium data at quarter, multi-medium data can sequentially in time, by the multi-medium data of T1 and T2 moment It is divided into one group, the multi-medium data at T3, T4 and T5 moment is divided into one group ... ..., by the multimedia of Tn-1 and Tn moment Data are divided into one group.In this way, available arrive multi-medium data in different time periods.

Step 103, the corresponding assemblage characteristic information of each group multi-medium data is extracted.

In one case, the sorter of multi-medium data is extracting the corresponding assemblage characteristic of any group of multi-medium data When information, the corresponding characteristic information of all frame multi-medium datas for including in this group of multi-medium data can be first extracted, so The corresponding characteristic information of the every frame multi-medium data of this group of multi-medium data is spliced again afterwards, it is specifically suitable according to the time After being spliced, and using characteristic information after obtained splicing as the corresponding assemblage characteristic information of this group of multi-medium data.

Wherein, the sorter of multi-medium data is specifically when extracting the characteristic information of a certain frame multi-medium data, if The frame multi-medium data is image data, and the sorter of multi-medium data can use convolutional neural networks picture classification mould The models such as type, such as Inception-V4 extract the characteristic information of the frame image data.If the frame multi-medium data is audio The sorter of data, multi-medium data can be using moulds such as convolutional neural networks audio signal classification models, such as VGGish Type extracts the characteristic information of the frame audio data.

In other cases, the sorter of multi-medium data is extracting the corresponding assemblage characteristic of any group of multi-medium data When information, it can not have to obtain the corresponding assemblage characteristic information of each group multi-medium data by way of above-mentioned splicing, than Such as, all frame multi-medium datas that can directly include by one group of multi-medium data are directly inputted to convolutional neural networks or recurrence In neural network, i.e., the corresponding assemblage characteristic information of exportable this group of multi-medium data.Can with other way, herein not into Row limits.

Step 104, according to the corresponding assemblage characteristic information of each group multi-medium data, above-mentioned multiframe multimedia number is determined According to global feature information, to classify to multiframe multi-medium data.

Specifically, the sorter of multi-medium data can believe the corresponding assemblage characteristic of multiple groups multi-medium data Breath, according to the modes such as NetVLAD, NeXtVLAD [3] or NL-NetVLAD [4], determines global feature information.For example, can use The difference of the assemblage characteristic information of each group multi-medium data cluster centre nearest with it respectively indicates global feature information, such as These differences are added to obtained additive value characteristic information as a whole.

It further, can be according to determining whole after the global feature information that above-mentioned multiframe multi-medium data has been determined Body characteristics information and preset disaggregated model, classify to multiframe multi-medium data, obtain the type of multiframe multi-medium data Information.Wherein, directly by determining global feature information input into preset disaggregated model, multiframe multimedia number can be obtained According to type information.

It should be noted that if above-mentioned multiframe multi-medium data is video data, including multiframe image data and multiframe Audio data, then for the sorter of multi-medium data when executing above-mentioned steps 103, the combination that can extract each group image data is special The assemblage characteristic information of reference breath and multiple groups audio data；It, can be according to the group of each group image data when executing above-mentioned steps 104 The first global feature information that characteristic information determines multiframe image data is closed, it is true according to the assemblage characteristic information of each group audio data Determine the second global feature information of multiframe audio data.In this way, when executing the step of above-mentioned classification, it can be according to first obtained Global feature information and/or the second global feature information and preset disaggregated model, classify to video data.

In addition, it is necessary to explanation, above-mentioned steps 101 to 104 are by once being divided to multiframe multi-medium data Afterwards, the global feature information obtained.In other embodiments, the sorter of multi-medium data can be performed a plurality of times above-mentioned Step 102 to 104, that is, above-mentioned division multiple groups multi-medium data is executed, assemblage characteristic information is extracted and determines global feature information The step of it is multiple, every time execute step 102 to 104 after, an available global feature information can obtain multiple candidates in this way Global feature information；Then according to the global feature information of multiple candidates, the final whole spy of multiframe multi-medium data is determined Reference breath, such as the global feature that the weighted sum of the global feature information of multiple candidates is final as multiframe multi-medium data Information.Wherein, the multiple groups multi-medium data obtained when dividing every time to multiframe multi-medium data is different from.

In this case, the sorter of multi-medium data is determining the final entirety of above-mentioned multiframe multi-medium data After characteristic information, multiframe multi-medium data can be divided according to final global feature information and preset disaggregated model Class.

Illustrate the classification method of multi-medium data in the present invention with next specific application example, in the present embodiment Method be applied particularly in application system, in the application system include application terminal and application server, application server The sorter of specially above-mentioned multi-medium data, and in the present embodiment, multi-medium data is specially video data, finally Obtained global feature information is the network part aggregation characteristic vector description (Temporal based on time relationship Relationbased NetVLAD, TR-NetVLAD).Then the classification method of the multi-medium data of the present embodiment can be by such as Lower step realizes that schematic diagram is as shown in Figure 3, comprising:

Step 201, user's operation application terminal, so that application terminal uploaded videos file is to application server.

It step 202, can be by video encoding/decoding method, with certain sampling after application server receives video file Frequency (for example 1 frame is per second), sampling obtain the multiframe image data and multiframe audio data for including in video file.

Step 203, application server respectively to above-mentioned steps 202 obtain multiframe image data and multiframe audio data into Row character representation.

For example, for the video of T frame, the characteristic information of available T frame image dataAnd T frame sound The characteristic information of frequency evidence

Step 204, application server is according to the characteristic information of multiframe image dataDetermine multiframe picture First global feature information of data, i.e. TR-NetVLAD Feature Descriptor vector, are denoted as V_video；According to multiframe audio data Characteristic informationThe the second global feature information for determining multiframe audio data, is denoted as V_audio。

Step 205, application server can be by the first global feature information V_videoWith the second global feature information V_audio, or Any global feature information in person's the first global feature information and the second global feature information, is input to preset disaggregated model In, the available C dimension video classification class vector indicated with probability, wherein C is the other quantity of preset video class.It should The numerical value of each position represents video as the probability of corresponding classification, finally by the video category classification in video classification class vector Vector is converted to classification, to obtain the classification results of above-mentioned video file.

Illustrate by taking C=3 as an example, it is first that video classification class vector vector [0.1,0.9,0.7], which indicates that video file is, The probability of type is 0.1, and the probability of Second Type is 0.9, and the probability of third type is 0.7.Here each type can be independent Ground occurs, do not require numerical value in the other class vector of video class and be 1, but is also not excluded for this programme for single category classification Situation, i.e., in video classification class vector numerical value and be 1 the case where.

Step 206, the classification results for the video file that application server can be obtained according to above-mentioned steps 205, are classifying As a result include at least one type in, and the information of this at least one type is sent to application terminal as recommendation information and is carried out Display.

Such as Fig. 4 show application server system according to the multiple types in classification results, is the user using eventually The recommendation interface for holding recommendation information, if in classification results including: types of entertainment, sports genre and current events type, in this way, answering Recommended in interface with what terminal was shown just including news, the news of sports genre and the news of current events type etc. of types of entertainment.

It should be noted that given T frame D dimensional feature informationObtain corresponding TR-NetVLAD feature description Subvector v, that is, global feature information, the length of the vector v are D × K, and K is the number of preset cluster centre here.Wherein, right In the improvement that the method for obtaining global feature information is to NetVLAD character description method, therefore, VLAD feature is first introduced below Description method, and then NetVLAD character description method is introduced, the TR-NetVLAD feature description in the present embodiment is finally introduced again Method:

(1) VLAD character description method

For the characteristic information of N number of D dimensionVLAD character description method is desirable to by finding these features K D of information ties up cluster centreSpy is carried out with the difference of each characteristic information to the cluster centre nearest with it Sign description.D × K ties up VLAD Feature Descriptor vector V_VLADIt can be indicated by following formula 1:

Wherein, a_k(xⁿ) it is indicative function, work as c^kFor with xⁿBetween distance nearest cluster centre when, a_k(xⁿ) it is 1, it is no It is then 0.Specifically, when calculating VLAD Feature Descriptor vector, each characteristic information x is calculatedⁿWith its nearest cluster centre c^k's D ties up residual vector, and obtained residual vector is added to matrix V_VLADOn corresponding position.

It in this process, can be to each cluster centre L2 regularization；It can also be to V_VLADIt is whole to carry out L2 regularization, It specifically, can be first by V_VLADIt is launched into D × K dimensional vector, then carries out Regularization again.In this way, feature can be optimized Numerical value, so that subsequent classification is more acurrate.

(2) NetVLAD character description method

It is similar with VLAD character description method, unlike, in NetVLAD character description method, for indicative function a_k(xⁿ) and cluster centreDefinition it is different.In NetVLAD character description method, by cluster centreParametrization, can be updated；By indicative function a_k(xⁿ) " softening " at the numerical tabular between one 0 to 1In this way by calculating each characteristic information xⁿThe cluster centre c nearest with it^kDifference, and be finally normalized to phase To weighted value.Therefore,It can be understood as cluster centre c herein^kFor characteristic information xⁿRelative importance, Specifically,It can be indicated by normalizing formula 2 as follows:

HereValue range be 0 to 1 between, can be regarded as relative weighting.

Therefore, NetVLAD feature description vectors V_NetVLADIt can be indicated by following formula 3:

(3) TR-NetVLAD character description method

In the TR-NetVLAD character description method of the present embodiment, time relationship expression first is carried out to characteristic information, specifically Ground, can be by N number of characteristic informationEqual part becomes mutually disjoint N/ τ group sequentially in time, then every group of feature letter It include τ continuous D dimensional feature information in breath；These characteristic informations can be spliced into τ * D dimensional feature information in chronological order, I.e. every group of assemblage characteristic information, is denoted asAccording to every group of assemblage characteristic information and above-mentioned formula 3, can obtain To global feature information, specially TR-NetVLAD Feature Descriptor vector, it is denoted asFor a time scale The global feature information determined under (temporal scale) τ.

In the same manner, the global feature information under available different time scale τRoot again According to the available final global feature information of the global feature information under different time scale τ, specially different time scale Under global feature information weighted sum, can specifically be indicated by following formula 4:

Wherein, w_τFor adjustable parameter, such as J time scale, w_τIt can be set to 1/J.

Therefore, refering to what is shown in Fig. 5, application server determines the first global feature information in executing above-mentioned steps 204 When vvideo, it can be implemented by the following steps, comprising:

Step 301, application server is at time scale τ, by the characteristic information of T frame image data It is divided into multiple groups characteristic information.

Step 302, application server respectively splices the characteristic information in every group of characteristic information in groups sequentially in time Characteristic information is closed, is denoted as

Step 303, application server determines T frame figure according to the assemblage characteristic information and above-mentioned formula 3 of each group characteristic information The global feature information of the piece number, is denoted as

Step 304, application server can execute above-mentioned steps 301 to 303, available different time rule by circulation Global feature information under mould τAnd according to above-mentioned formula 4, available T frame image data it is final first Global feature information V_NetVLAD, as above-mentioned V_video。

Application server according to the method for above-mentioned steps 301 to 304, can also obtain the second entirety of T frame audio data Characteristic information V_audio。

As it can be seen that the feature of video data is described, most in the present embodiment by TR-NetVLAD character description method The classification results obtained eventually according to TR-NetVLAD Feature Descriptor vector, than being described under similarity condition according to NetVLAD feature The performance for the classification results that subvector is assigned to is higher by about 1.7%GAP 20, and first class hit rate is higher by about 2%.Wherein GAP@20 is Multi-class visual classification performance indicator, first class hit refer to the highest classification hit true classification of video of classification confidence.

The embodiment of the present invention also provides a kind of sorter of multi-medium data, and structural schematic diagram is as shown in fig. 6, specific May include:

Data capture unit 10, for obtaining multiframe multi-medium data to be processed.

Division unit 11, the multiframe multi-medium data for obtaining the data capture unit 10 are drawn sequentially in time It is divided into multiple groups multi-medium data, includes an at least frame multi-medium data for Time Continuous in every group of multi-medium data.Wherein, described The frame number for the multi-medium data for mutually disjointing between multiple groups multi-medium data, and including in different group multi-medium data is identical or not It is identical.

Feature extraction unit 12 is respectively corresponded for extracting each group multi-medium data that the division of division unit 11 obtains Assemblage characteristic information.

The feature extraction unit 12, specifically for extracting in one group of multi-medium data including being all frame multimedias The corresponding characteristic information of data；By the corresponding characteristic information of frame multi-medium data every in one group of multi-medium data Spliced, using characteristic information after obtained splicing as the corresponding assemblage characteristic information of one group of multi-medium data.

Characteristics determining unit 13 is distinguished for extracting obtained each group multi-medium data according to the feature extraction unit 12 Corresponding assemblage characteristic information determines the global feature information of the multiframe multi-medium data, to the multiframe multimedia number According to classifying.

Specifically, characteristics determining unit 13, specifically for being distinguished with the assemblage characteristic information of each group multi-medium data The difference of the cluster centre nearest with it indicates the global feature information.

It should be noted that above-mentioned division unit 11, feature extraction unit 12 and characteristics determining unit 13 can execute respectively The step of division multiple groups multi-medium data, extraction assemblage characteristic information and determining global feature information, is multiple, obtains multiple Candidate global feature information；Then this feature determination unit 13 is also used to the global feature according to the multiple candidate Information determines the final global feature information of the multiframe multi-medium data.

Further, the sorter of multi-medium data can also include: taxon 14, for true according to the feature The global feature information and preset disaggregated model that order member 13 determines, classify to the multiframe multi-medium data, obtain The type information of the multiframe multi-medium data.

As it can be seen that division unit 11 can be treated sequentially in time in the sorter of the multi-medium data of the present embodiment The multiframe multi-medium data of processing is divided, and is divided into multiple groups multi-medium data, and more by the extraction of feature extraction unit 12 each group The corresponding assemblage characteristic information of media data；Last characteristics determining unit 13 is respectively corresponded further according to each group multi-medium data Assemblage characteristic information, the global feature information of multiframe multi-medium data is determined, to classify to multiframe multi-medium data.This Sample, the sorter of multi-medium data is when describing multiframe multi-medium data progress feature, it is contemplated that multiframe multimedia number Temporal characteristics between enable finally obtained global feature information preferably to reflect multiframe multi-medium data, to make It obtains more acurrate to the classification of multiframe multi-medium data.

The embodiment of the present invention also provides a kind of server, and as shown with 7, which can be because of configuration or property for structural schematic diagram Energy is different and generates bigger difference, may include one or more central processing units (central processing Units, CPU) 20 (for example, one or more processors) and memory 21, one or more storage application programs 221 or data 222 storage medium 22 (such as one or more mass memory units).Wherein, memory 21 and storage Medium 22 can be of short duration storage or persistent storage.The program for being stored in storage medium 22 may include one or more moulds Block (diagram does not mark), each module may include to the series of instructions operation in server.Further, central processing Device 20 can be set to communicate with storage medium 22, execute the series of instructions operation in storage medium 22 on the server.

Specifically, the application program 221 stored in storage medium 22 includes the application program of multimedia data classification, and The program may include the data capture unit 10 in the sorter of above-mentioned multi-medium data, division unit 11, feature extraction Unit 12, characteristics determining unit 13 and taxon 14, herein without repeating.Further, central processing unit 20 can be with It is set as communicating with storage medium 22, executes the application journey of the multimedia data classification stored in storage medium 22 on the server The corresponding sequence of operations of sequence.

Server can also include one or more power supplys 23, one or more wired or wireless network interfaces 24, and/or, one or more operating systems 223, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..

The step as performed by the sorter of multi-medium data described in above method embodiment can be based on the Fig. 7 Shown in server structure.

The embodiment of the present invention also provides a kind of storage medium, and the storage medium stores a plurality of instruction, and described instruction is suitable for It is loaded as processor and executes the classification method of the multi-medium data as performed by the sorter of above-mentioned multi-medium data.

The embodiment of the present invention also provides a kind of terminal device, including pocessor and storage media, the processor, for real Existing each instruction；The storage medium is for storing a plurality of instruction, and described instruction is for being loaded by processor and being executed as above-mentioned The classification method of multi-medium data performed by the sorter of multi-medium data.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage Medium may include: read-only memory (ROM), random access memory ram), disk or CD etc..

The classification method and device for being provided for the embodiments of the invention multi-medium data above are described in detail, this Apply that a specific example illustrates the principle and implementation of the invention in text, the explanation of above example is only intended to It facilitates the understanding of the method and its core concept of the invention；At the same time, for those skilled in the art, think of according to the present invention Think, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not be construed as pair Limitation of the invention.

Claims

1. a kind of classification method of multi-medium data characterized by comprising

Obtain multiframe multi-medium data to be processed；

The multiframe multi-medium data is divided into multiple groups multi-medium data sequentially in time, includes in every group of multi-medium data An at least frame multi-medium data for Time Continuous；

According to the corresponding assemblage characteristic information of each group multi-medium data, the entirety of the multiframe multi-medium data is determined Characteristic information, to classify to the multiframe multi-medium data.

2. the method as described in claim 1, which is characterized in that mutually disjoint between the multiple groups multi-medium data, and different The frame number for the multi-medium data for including in group multi-medium data is identical or not identical.

3. the method as described in claim 1, which is characterized in that the corresponding combination of one group of multi-medium data of the extraction is special Reference breath, specifically includes:

It extracts in one group of multi-medium data including being the corresponding characteristic information of all frame multi-medium datas；

The corresponding characteristic information of frame multi-medium data every in one group of multi-medium data is spliced, the spelling that will be obtained Rear characteristic information is connect as the corresponding assemblage characteristic information of one group of multi-medium data.

4. method as described in any one of claims 1 to 3, which is characterized in that the method also includes:

According to the global feature information of the determination and preset disaggregated model, classify to the multiframe multi-medium data, Obtain the type information of the multiframe multi-medium data.

5. method as described in any one of claims 1 to 3, which is characterized in that the method also includes:

The step of executing the division multiple groups multi-medium data respectively, extracting assemblage characteristic information and determine global feature information is more It is secondary, obtain the global feature information of multiple candidates；

According to the global feature information of the multiple candidate, the final global feature information of the multiframe multi-medium data is determined.

6. method as claimed in claim 5, which is characterized in that the global feature information according to the multiple candidate, really The final global feature information of the fixed multiframe multi-medium data, specifically includes:

Using the whole spy that the weighted sum of the global feature information of the multiple candidate is final as the multiframe multi-medium data Reference breath.

7. method as described in any one of claims 1 to 3, which is characterized in that described according to each group multi-medium data point Not corresponding assemblage characteristic information, determines the global feature information of the multiframe multi-medium data, specifically includes:

The entirety is indicated with the difference of the assemblage characteristic information of each group multi-medium data cluster centre nearest with it respectively Characteristic information.

8. method as described in any one of claims 1 to 3, which is characterized in that the multiframe multi-medium data is video data, Including multiframe image data and multiframe audio data；The multiple groups multi-medium data includes multiple groups image data and multiple groups audio number According to；

The then corresponding assemblage characteristic information of the extraction each group multi-medium data, specifically includes: extracting each group image data Assemblage characteristic information and multiple groups audio data assemblage characteristic information；

It is described according to the corresponding assemblage characteristic information of each group multi-medium data, determine the multiframe multi-medium data Global feature information, specifically includes: determining the of the multiframe image data according to the assemblage characteristic information of each group image data One global feature information determines the second whole spy of the multiframe audio data according to the assemblage characteristic information of each group audio data Reference breath.

9. a kind of sorter of multi-medium data characterized by comprising

Division unit, for the multiframe multi-medium data to be divided into multiple groups multi-medium data sequentially in time, more than every group It include an at least frame multi-medium data for Time Continuous in media data；

Characteristics determining unit, for determining described more according to the corresponding assemblage characteristic information of each group multi-medium data The global feature information of frame multi-medium data, to classify to the multiframe multi-medium data.

10. a kind of server, which is characterized in that including pocessor and storage media, the processor, for realizing each finger It enables；

The storage medium is for storing a plurality of instruction, and described instruction by processor for being loaded and executing such as claim 1 to 8 The classification method of described in any item multi-medium datas.