CN108197282A - Sorting technique, device and the terminal of file data, server, storage medium - Google Patents
Sorting technique, device and the terminal of file data, server, storage medium Download PDFInfo
- Publication number
- CN108197282A CN108197282A CN201810023498.1A CN201810023498A CN108197282A CN 108197282 A CN108197282 A CN 108197282A CN 201810023498 A CN201810023498 A CN 201810023498A CN 108197282 A CN108197282 A CN 108197282A
- Authority
- CN
- China
- Prior art keywords
- data
- audio
- classification
- training
- class categories
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/685—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Abstract
The embodiment of the invention discloses a kind of sorting technique of file data, device and terminal, server, the method includes:The associated text data of the audio data is obtained, and obtains the audio characteristic data of the audio data;Classification and Identification is carried out to the audio data according to grader and the audio characteristic data, the class categories of the audio data is determined, obtains first category information;Classification analysis is carried out to word included in the content of text of the text data, the class categories belonging to the text data is determined, obtains second category information;If the first category information and the second category information illustrate identical class categories, which is determined as to the classification of the audio data.Using the embodiment of the present invention, it can preferably ensure the correctness of audio data classification so that under application scenes, such as the application scenarios such as music recommendation, accurately user can be given to recommend music.
Description
Technical field
The present invention relates to a kind of computer application technology more particularly to sorting technique of file data, device and ends
End, server, storage medium.
Background technology
With the increasingly raising of people's living standard, the hobby of people is also more and more extensive.And music is as people
One of most popular hobby is tightly tied together with people's life.Meanwhile at this stage, various intelligent sounds
Case emerges in large numbers, and the quality of music song is paid attention to by more and more people.
Current music is there are a large amount of type and school, and the music of generation for years is also a huge number
Amount, how to music, corresponding audio data carries out the hot issue that classification is studied as music service suppliers.
Invention content
The embodiment of the present invention provides a kind of sorting technique of file data, device and terminal, server, can be relatively accurately
Determine the classification of audio data.
On the one hand, an embodiment of the present invention provides a kind of sorting technique of file data, the file data includes sound
Frequency evidence, the method includes:
The associated text data of the audio data is obtained, and obtains the audio characteristic data of the audio data;
Classification and Identification is carried out to the audio data according to grader and the audio characteristic data, determines the audio number
According to class categories, obtain first category information;
Classification analysis is carried out to word included in the content of text of the text data, determines the text data institute
The class categories of category obtain second category information;
If the first category information and the second category information illustrate identical class categories, this is identical
Class categories be determined as the classification of the audio data.
On the other hand, an embodiment of the present invention provides a kind of sorter of file data, the file data includes
Audio data, described device include:
Acquisition module for obtaining the associated text data of the audio data, and obtains the audio of the audio data
Characteristic;
Tagsort module, for carrying out classification knowledge to the audio data according to grader and the audio characteristic data
Not, the class categories of the audio data are determined, obtain first category information;
Text classification module carries out classification analysis for word included in the content of text to the text data,
It determines the class categories belonging to the text data, obtains second category information;
Determining module, if illustrating identical classification class for the first category information and the second category information
Not, then the identical class categories are determined as to the classification of the audio data.
In another aspect, the embodiment of the present invention additionally provides a kind of server, including:Processor and storage device;It is described
Storage device has program stored therein instruction, and the processor calls the program instruction stored in the storage device, for performing such as
The sorting technique of above-mentioned file data.
Correspondingly, the embodiment of the present invention additionally provides a kind of computer storage media, is deposited in the computer storage media
Program instruction is contained, described program instruction is performed, and is used to implement the sorting technique of above-mentioned file data.
The embodiment of the present invention can simultaneously divide the text datas such as the characteristic of audio data and the associated lyrics
Class identifies, only when the recognition result of the two is identical, just determines the class categories of audio data, can ensure audio number in this way
According to the correctness of classification so that under application scenes, such as the application scenarios such as music recommendation, it can accurately be pushed away to user
Recommend music.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the process schematic of the generation grader of the embodiment of the present invention;
Fig. 2 is the flow diagram classified in the embodiment of the present invention to the audio data do not classified;
Fig. 3 is the structure diagram of the application system of the embodiment of the present invention;
Fig. 4 is a kind of schematic diagram of user interface of the embodiment of the present invention;
Fig. 5 is the schematic diagram of another user interface of the embodiment of the present invention;
Fig. 6 is a kind of flow diagram of the sorting technique of file data of the embodiment of the present invention;
Fig. 7 is a kind of flow diagram of classified inquiry method to file data of the embodiment of the present invention;
Fig. 8 is the method flow schematic diagram being trained to grader of the embodiment of the present invention;
Fig. 9 is a kind of structure diagram of the sorter of file data of the embodiment of the present invention;
Figure 10 is a kind of structure diagram of server of the embodiment of the present invention.
Specific embodiment
In embodiments of the present invention, audio data can be some music with text datas such as the lyrics, song comments
File, some audio frequency characteristics and the text data of the audio data included in itself by audio data, to integrate determining audio
The class categories of data.It can be classified by an advance trained grader come the audio frequency characteristics to audio data,
And can then be classified to text data by a classification dictionary, if the classification results of grader and classification dictionary
Classification results are identical, are target classification classification, then can be using the target classification classification as the classification of audio data.
In one embodiment, in order to improve the classification accuracy of grader, for grader, branch on the one hand can be passed through
Vector machine (Support vector machine, SVM) algorithm next life constituent class device is held, cluster is on the other hand can be combined with and calculates
Method handles training data, to be preferably trained to grader.It can be by clustering algorithm first to a large amount of audio
Training data is clustered, and cluster centre is then based on to carry out the audio frequency characteristics training data of each audio training data
Data after conversion process are input in grader by corresponding conversion process, then again will classification to classify to grader
Result with to audio training data manually mark when the mark classification that is marked be compared, if identical, then it is assumed that
Grader can carry out successful classification to the audio data, if it is not the same, parameter then can be carried out to grader as needed
Optimization, in order to complete to optimize training to grader.
Fig. 1 is referred to, is the process schematic of the generation grader of the embodiment of the present invention.Grader can be one and be based on
The initial SVM classifier of SVM algorithm generation can acquire a large amount of audio data as audio training data, audio training number
According to main function be to train SVM classifier, the training process of SVM classifier includes following flow.
First, the audio training data as training data got is manually marked according to specified class categories
Note, the class categories belonging to these audio training datas of Direct Mark, the class categories manually marked are referred to as to mark classification.
In the embodiment of the present invention, the class categories specified can be used for the classification to show emotion as needed, such as expression " happy ",
The classification of emotions such as " sorrows ".
After artificial mark is completed, then audio frequency characteristics training data is extracted from audio training data.In an implementation
In example, the audio frequency characteristics training data of extraction mainly includes:Mel-frequency cepstrum coefficient (the Mel-frequency of audio
Cepstral coefficients, MFCC), normal Q transformation discrete (ConstantQ Transform, the CQT) feature of harmonic conversion
Data and audio rhythm Beat characteristics, the reasons why selecting these audio frequency characteristics training datas, are:It is sent out by testing
Existing, these three characteristics can relatively significantly express the Sentiment orientation of audio, the emotion of user be represented, in other embodiment
In, in order to ensure that Sentiment orientation is preferably embodied, other audio frequency characteristics can be extracted, are added in audio frequency characteristics training data
The corresponding data of other audio frequency characteristics.In one embodiment, it when extracting audio frequency characteristics training data, can only be instructed from audio
Practice in data, feature extraction carried out to the audio data in the audiorange of 20ms, obtains corresponding audio frequency characteristics training data,
It can effectively reduce calculation amount to avoid the feature extraction to entire audio training data in this way.And in one embodiment, may be used
To carry out feature extraction to the audio data in the range of specified time, such as can extract in music data, reproduction time is located at
The period of interlude because under normal circumstances, intermediary time period is the climax parts of a piece of music, can more embody the feelings of music
Sense tends to.
It, can be according to the audio frequency characteristics training data extracted, to each after extraction obtains audio frequency characteristics training data
The class categories subaudio frequency training data specified carries out clustering processing, and clustering processing may be used the realization of K-means methods, complete
After cluster, then determine the cluster centre data of each cluster classification.In one embodiment, it can set in positive and negative class cluster
The number of calculation evidence is equal, i.e., it is identical that the clustering information of positive and negative class assigns weight, such as " happy " is that positive class classification, " sorrow " are
Negative class classification, a positive class classification need a corresponding negative class classification.Certainly, in other embodiments, a positive class classification
Multiple opposite negative class classifications can be corresponded to.
Audio frequency characteristics training data is converted into similarity training data based on cluster centre data.Utilize cluster centre number
The new attribute space with data immanent structure information is mapped to according to by former audio frequency characteristics training data, in one embodiment,
The new attribute space may be used Euclidean distance and be measured, and original audio frequency characteristics training data is converted to similarity training
Data, that is, Euclidean distance data, new audio frequency characteristics of the similarity training data as corresponding audio training data can be with
Regard the generic attribute of audio training data corresponding label as.
Finally the similarity training data obtained after conversion is input in initial SVM classifier, to initial SVM classifier
It is trained, obtains the svm classifier model that the present invention finally uses.
In one embodiment, it by taking specified class categories are five class categories such as " happy ", " sorrow " as an example, determines
Audio training data for 10,000 songs, based on five class categories, 10,000 songs are divided by way of manually marking
To five class categories specified, the class categories belonging to each first song are also manually marked class.Complete artificial mark
Afterwards, it can extract to obtain the audio frequency characteristics training data of each song, in embodiments of the present invention, each first song can use one
The audio frequency characteristics training datas that the numerical value of a 72 dimension are formed represent, wherein, in the audio frequency characteristics training datas of 72 dimensions, CQT is special
Sign accounts for 24 dimensions, NFCC features account for 13 dimensions, and beat features account for 6 dimensions, other features account for 29 dimensions, the form example of audio frequency characteristics training data
Such as can be (0.1,0.11,0.15 ..., 1.1).Based on audio frequency characteristics training data by k-means clustering algorithms to this
10000 first songs carry out clustering processing, and calculate the cluster centre number of five class categories according to five specified class categories
According to, it is assumed that in the class categories of cluster to " happy ", there are 1000 songs, by 72 dimension values of each first song in this 1000 first song
In an average value is calculated per one-dimensional, you can obtain the 72 cluster centre numbers tieed up about " happy " class categories
According to.After obtaining cluster centre data, sung using any one head in " happy " class categories as target song, calculating " is opened
Euclidean distance in the heart " class categories between the audio frequency characteristics training data of target song and this 72 dimension cluster centre data, obtains
To the Euclidean distance data (i.e. similarity training data) of one 72 dimension, and it is opposite to adopt calculating target song in a like fashion
Similarity training data between the cluster centre data of other class categories.The Euclidean distance data that each 72 are tieed up input
Into the initial SVM classifier for waiting for training.Initial each similarity training data of the SVM classifier based on target song, really
Make the probability that target song belongs to some class categories, such as the classification about target song of initial SVM classifier output
Probability is:The probability for belonging to " happy " class categories is 50%, and belongs to only the 10% of " sorrow ", it is also possible to belong to other
Class categories, due to belonging to the maximum probability of " happy " class categories and more than preset probability threshold value, then it is assumed that target song
Belong to the class categories of " happy ".By initial SVM classifier to target song sorted classification results and target song in people
The mark classification that work marks when marking is compared, if identical, success of classifying, and otherwise classification failure.
By each song in 1000 songs of " happy " as target song, if first for the 1000 of " happy "
After song carries out above-mentioned processing, the accuracy of class categories that initial SVM classifier is obtained for this 1000 first song identification is obtained, such as
Fruit accuracy reaches 95% (or error is less than 5%) and then thinks that initial SVM classifier can preferably identify " happy " classification
Otherwise the song of classification, is needed after carrying out parameter optimization to initial SVM classifier, be further continued for carrying out this 1000 first song above-mentioned
Training study.The training managing identical with being somebody's turn to do " happy " class categories is also carried out for other class categories such as " sorrow ", such as
The average value of the classification accuracy rate of all class categories of fruit reaches preset accuracy threshold value, it may be considered that SVM classifier energy
It is enough preferably to classify to song according to each class categories specified, SVM classifier can be disposed, in order to unknown class
Other song is classified according to each class categories specified, and sets emotional category label.
After having obtained final SVM classifier, in one embodiment, Fig. 2 is referred to, be the embodiment of the present invention to not having
There is the flow diagram that the audio data classified is classified, the method for the embodiment of the present invention can be available at one
It is realized in the server for carrying out audio data classification.
In S201, inputting audio data, the audio data for one without carry out all as noted above " happy ",
Class categories such as " sorrows " carry out the original audio data of classification processing, and in S202, audio frequency characteristics are extracted from audio data,
Obtain the audio characteristic data of audio data, the audio characteristic data can be include as described above it is multiple for representing audio
72 dimension datas of the numerical value of feature.In S203, cluster analysis is carried out to audio data, k-means algorithms specifically may be used
Complete clustering processing.The similar degrees of data of audio data is calculated in S204, specifically by audio characteristic data and each point
The cluster centre data of class classification carry out Euclidean distance calculating, obtain similarity of the audio data under each class categories
Data, that is to say, that five similar degrees of data of the audio data can be obtained, complete the structure of class categories attribute.It is each
The cluster centre data of specified class categories are calculated during above-mentioned trained grader.In S205, will
To similar degrees of data be input in SVM classifier, Classification and Identification is carried out by SVM classifier, obtains recognition result, by probability most
Class categories of the class categories as the audio data big and more than predetermined threshold value.
In S206, the text data of the audio data is obtained, in embodiments of the present invention, text data refers to the sound
The lyrics data of frequency evidence can search for the lyrics data for obtaining audio data by way of web search.In the present invention
In embodiment, the class categories forecast period based on lyrics feature is a unsupervised process, does not need to training about the lyrics
The disaggregated model of classification.It in S207, is pre-processed to getting the lyrics, the pretreatment of progress mainly includes removing punctuate symbol
Number and the symbol that cannot identify of some of which.Word segmentation processing is carried out to the lyrics in S208, obtains multiple individual words,
Various effective participle tools can be used to carry out word segmentation processing, obtain the word list of the lyrics.In S209, based on preset
Each word that classification dictionary obtains participle scores, and can carry out positive negativity to judge to come in one embodiment
It scores, all words in word list is matched with emotion dictionary, matching rule is if in word list
Some word is fallen in the other dictionary word of emotion positive sense-class, then emotion positive tropism's value of song adds 1, such as falls on " happy " classification
In classification, then the scoring of " happy " class categories adds 1, if instead some word is fallen in emotion negative sense dictionary word, then song
Emotion negative tropism add 1, such as fall in " sorrow " class categories, then the scoring of " sorrow " class categories adds 1, final relatively to sing
Bent emotion positive tropism and the size of negative tropism value, that is, judge the scoring of the class categories such as " happy ", " sorrow ", by the highest that scores
Class categories of the class categories as the text data.
In S210, whether the class categories of audio data and the class categories of text data are identical, if described
First category information and the second category information illustrate identical class categories, are target classification classification, then in S211
It is middle using the target classification classification as the classification of the audio data, and the feelings of the target classification classification are set for the audio data
Feel class label, convenient follow-up use.Emotional category label can be as an attribute of audio data, by setting the attribute
The mode of value is come the class label that shows emotion.If the property value of the emotional category label of some audio data is sky, show
The class categories of some audio data can not be identified, Classification and Identification failure.
It by the corresponding modes of above-mentioned Fig. 2, can accurately be classified to a large amount of audio data, be these audio numbers
According to the label of setting class categories, and store into audio database.In one embodiment, with reference to Fig. 3, Fig. 4 and Fig. 5 couple
The scene used there is provided the audio data of label is described in detail, and Fig. 3 is the application system of the embodiment of the present invention
Structure diagram, Fig. 4 are a kind of schematic diagrames of user interface of the embodiment of the present invention, and Fig. 5 is the another kind of the embodiment of the present invention
The schematic diagram of user interface.
As shown in figure 3, in the application scenarios of the embodiment of the present invention, including user A and its used intelligent terminal 301,
User B and its used intelligent terminal 302, the intelligent terminal of two users are connected on the server 303 of network side, the net
The server 303 of network side can include multiple servers, or single server, for convenience, the present invention are implemented
Example is described as server.
In one embodiment, any one user can be communicated by intelligent terminal with server 303, to service
Device 303 sends Query Information, for inquiring required audio data, as shown in figure 4, can be in intelligent terminal 301 or intelligent terminal
User interface is shown in 302, for realizing the interaction between user.In one embodiment, user can by voice or
The forms such as person's word input initiate the search inquiry of audio data on the user interface, with by server 303 from audio number
According to the audio data that the emotional category label there is provided corresponding class categories is found in library, such as " happy " two words are inputted, then
Server 303 can search the audio number of the emotional category label corresponding to the class categories of " happy " from audio database
According to.If there is the audio data of the emotional category label corresponding to multiple class categories for being provided with " happy ", server
303 can be determined an audio data by randomly selected mode or be determined according to the priority time sequencing of storage
Determining audio data is sent to user by one newest audio data as query feedback data.
In one embodiment, as shown in figure 5, the user interface can be session circle chatted with virtual robot
Face, phase that the chat messages that virtual robot is sent out based on user on the session interface have been user's inquiry and recommended setting automatically
Answer the audio data of the emotional category label of class categories.For query result, the mode of final determining audio data can be used
The above-mentioned randomly selected mode referred to or in the way of sequencing.It is, of course, also possible to others are added in for determining
Go out the determining strategy of audio data, for example, the historical search data based on user or behavioral data or user property come from
Multiple queries to audio data in determine a suitable user audio data, for example, determining one based on age of user
Or multiple audio datas.
In one embodiment, can instant messaging exchange, the clothes be carried out by server between two intelligent terminals
Business device can be an instant messaging application server, which, which can establish, is stored with audio data
Library and the connection that the server that audio data is inquired by classification is provided.The audio database include it is multiple be provided with emotional category
The audio data of label.During user A and user B chats, the one or more of chats that can be sent out according to user A disappear
Breath, determines the current emotion of user A, is then based on the emotion, and into the audio database, there is provided corresponding emotion classes for inquiry
The audio data of distinguishing label after obtaining query result, shows one or more sound inquired on the chat interface of user A
Frequency evidence.Identical processing can be carried out for user B.In one embodiment, it can also be based on user A and user B simultaneously
Chat messages, determine the common emotional categories of user A and user B, be then based on looking into the emotion to the audio database
The audio data there is provided corresponding emotional category label is ask, one inquired is shown on the session interface of user A and user B
Or multiple audio datas.If satisfactory audio data include it is multiple, can by random selection or other one
A little screening rules therefrom determine that one or more audio data is prompted to user A and/or user B.
The embodiment of the present invention can simultaneously divide the text datas such as the characteristic of audio data and the associated lyrics
Class identifies, only when the recognition result of the two is identical, just determines the class categories of audio data, can ensure audio number in this way
According to the correctness of classification so that under application scenes, such as the application scenarios such as music recommendation, it can accurately be pushed away to user
Recommend music.Also, MFCC, CQT and Beat in audio data has been selected to be used as the audio frequency characteristics to show emotion, it can be compared with
The classification based on emotion is carried out to audio data well.When being trained for grader, do not use and be directly based upon audio
The mode that feature is learnt optimizes grader training, but first passes through k-means algorithms and carry out cluster analysis, obtains
To the cluster centre of each classification, it is then based on cluster centre and audio characteristic data is carried out being converted to input parameter again, then
Optimization is trained grader based on the input parameter, more accurate grader can be obtained.It is discovered by experiment that it uses
This programme to having carried out the prediction of the class categories about emotion more than 100,000 songs, wherein pursue a goal with determination, happily, the classification such as sweetness
To more than 80%, the accuracys rate of other emotional category labels greatly improves the rate of accuracy reached of classification 75% or so
The classification accuracy of emotion class music.
Fig. 6 is referred to again, is a kind of flow diagram of the sorting technique of file data of the embodiment of the present invention, the present invention
The method of embodiment can realize by a server about audio datas such as songs, such as some music applications
Application server.In embodiments of the present invention, the file data can be the audio datas such as song or certain videos
File, the video file include audio data, which for example can be music short-movie (Music Video, MV) etc.
The file of type.Described method includes following steps for the embodiment of the present invention.
S601:The associated text data of the audio data is obtained, and obtains the audio characteristic data of the audio data.
The associated text data of audio data can refer to that the lyrics or the corresponding MV of the audio data of the audio data etc. are regarded
The subtitle of frequency evidence can also be the evaluation contents data such as the corresponding comment of the audio data, can be based on the audio data
Title search for obtain by way of web search or text data in itself when the audio data is obtained simultaneously
It obtains and preserves or can also identify from audio data by modes such as voice recognitions to obtain the text datas such as the lyrics.
In embodiments of the present invention, the classification of audio data is carried out mainly for the emotion of user, it is determined that Duo Geguan
In the class categories of emotion.On this basis, the audio characteristic data of the audio data has mainly been selected in audio data
Audio characteristic data corresponding to MFCC, CQT and Beat feature.In order to ensure that subsequently more accurately audio data can be pressed
Classify according to emotion, can also further supplement other audio frequency characteristics.In one embodiment, audio characteristic data can be with
For the data acquisition system of one 72 dimension, the audio feature vector of 72 dimensions can also be referred to as.The data audio characteristic data set shares
To represent the feature of the audio data.In other embodiments, the data acquisition system of other dimensions can also be divided, dimension is more,
It is more accurate to the feature description of audio data, and dimension is fewer, can be expedited classification speed, improves classification effectiveness.
In one embodiment, can only selected part audio data, therefrom determine audio characteristic data, can basis
The audio data in the N seconds before and after the playing duration M of audio data, selection intermediate period M/2, therefrom extracts audio frequency characteristics number
According to, for example, playing duration be 100 seconds, then can select the intermediate period between 50-10=40 seconds to 50+10=60 seconds
Video data, and therefrom extract audio characteristic data.Only partial video data, which is analyzed and processed, effectively to reduce
The time is calculated, and the climax parts that intermediate period is entire audio, can preferably embody the audio under normal circumstances
The emotional expression of data.
S602:Classification and Identification is carried out to the audio data according to grader and the audio characteristic data, is determined described
The class categories of audio data obtain first category information.The grader can be the svm classifier generated based on SVM algorithm
Device, the SVM classifier obtain after can training optimization beforehand through a large amount of audio data and the class categories specified.At this
In inventive embodiments, similar degrees of data which can be to be obtained based on audio characteristic data and cluster centre data
As input, the probability for belonging to some classification using audio data determines the class categories of audio data, obtains first as output
Classification information.
In one embodiment, the S602 can include:Calculate the audio characteristic data and the class categories specified
Similar degrees of data between corresponding cluster centre data;Calling classification device divides the similar degrees of data being calculated
Class determines that the audio data belongs to the probability for the class categories specified;By probability value maximum and more than preset probability threshold value
Class categories of the class categories belonging to as the audio data.
S603:Classification analysis is carried out to word included in the content of text of the text data, determines the text
Class categories belonging to data obtain second category information.The content of text of text data can be pre-processed, by some
The symbol and punctuation mark of None- identified are deleted, and are then carried out word segmentation processing to remaining content of text again, are obtained including more
The word list of a word.Then classification identification, root are carried out to each word in word list based on preset classification dictionary again
The class categories of text data are determined according to the quantity of the word included by each classification, obtain second category information.
In one embodiment, the S603 can specifically include:The content of text of the text data is segmented
Processing, obtains set of words;The classification belonging to the word that the set of words includes is searched from classification dictionary;According to each
The quantity of word included by a classification scores to classification, and point according to belonging to score result determines the text data
Class classification obtains second category information.
The form of expression of classification dictionary can be as shown in table 1 below.
Word | Classification |
It is happy | " happy " |
It is happy | " happy " |
It is anxious | " sorrow " |
It is gloomy | " sorrow " |
…… | …… |
S604:It, will if the first category information and the second category information illustrate identical class categories
The identical class categories are determined as the classification of the audio data.Only in the classification represented by first category information and second
When classification represented by classification information is identical, the class categories of the audio data can be just uniquely determined, the classification can be based on
Classification sets emotional category label for the audio data, and by there is provided the audio data of emotional category label storages to audio
In database.It can be recorded emotional category label as the attribute information of audio data in audio data.In an implementation
It, can be further if classification represented by classification and second category information represented by first category information differs in example
Classified using other mode classifications to the audio data, in order to set corresponding emotional category label.It is or straight
It connects and the classification of the audio data is set as unknown, the value for playing emotional category label is sky.
It should be noted that in some embodiments, described identical class categories refer to first category information meaning
The classification indicated by classification and second category information shown can be understood as identical classification, for example, first category information is signified
The classification shown is " sorrow " classification, and the classification indicated by second category information is " sorrow " classification, still it is considered that the two table
The identical class categories reached, the identical class categories can be determined that " sorrow " classification or " sorrow " classification, audio file
Final classification can be determined as " sorrow " classification or " sorrow " classification.
The embodiment of the present invention can simultaneously divide the text datas such as the characteristic of audio data and the associated lyrics
Class identifies, only when the recognition result of the two is identical, just determines the class categories of audio data, can ensure audio number in this way
According to the correctness of classification so that under application scenes, such as the application scenarios such as music recommendation, it can accurately be pushed away to user
Recommend music.
Fig. 7 is referred to again, is a kind of flow diagram of classified inquiry method to file data of the embodiment of the present invention,
After the classification of the audio data is determined by the embodiment corresponding to Fig. 6, such is represented for audio data setting
Other emotional category label, and by there is provided in the audio data of label storage audio database.The embodiment of the present invention it is described
Method includes the following steps.
S701:After receiving chat messages from session interface, the class categories of the chat messages are determined.The chat
Message should with music based on the interactive message of instant messaging application or some user between can referring to two users
With the chat messages of interaction between middle robot.In embodiments of the present invention, a kind of practical music application, the music are realized
It is realized using the intelligent terminal and the server of network side that have user.Wherein, above-mentioned carry is provided in the server of network side
And it is stored with there is provided the audio database of the audio data of emotional category label, various audio datas in the audio database
Class categories can refer to the description of above-described embodiment, in server storage audio database, provide inquiry service to user
The intelligent terminal of side, intelligent terminal can be looked into after music application client is mounted with by various feasible user interfaces
It askes and receives audio data.In one embodiment, the server of the network side can also provide audio data inquiry service
Device gives other application server, such as provides query function and give instant messaging application server.
The class categories of chat messages can also be determined based on the above-mentioned class categories specified.In one embodiment,
One or more of chat messages can be pre-processed first, get rid of the character and punctuation mark of None- identified, then
It is segmented by participle tool, the multiple words chatted, then each word is determined based on classification dictionary mentioned above
Classification belonging to language determines the class categories of chat messages according to the quantity of the word included by each classification.Analysis
The quantity of chat messages is more, more accurate to the sentiment analysis of chat user.
S702:Target audio data are searched from the audio database, wherein, the label institute of the target audio data
The classification of expression is identical with the class categories of the chat messages;Emotional category mark based on audio database sound intermediate frequency data
Label, are inquired with the class categories of the chat messages, find one or more audio data.If only there are one audios
Data, then directly as target audio data.If there is multiple, then can therefrom be selected based on certain screening rule
For one audio data as target audio data, screening rule for example can be randomly selected rule or according to audio number
Regulation screened according to the corresponding rule of the sequencing that emotional category label is set or with user property etc..
S703:The identification information of the target audio data is shown on the session interface.On session interface only
Display is used for representing the identification information of the target audio data, such as in the mark shown on interface corresponding to Fig. 4 and Fig. 5
Hold.
S704:Event is chosen to the identification information if received, searches the target audio data, and call
The audio player plays target audio data.The identification information shown on session interface is configured with clicking operation response and patrols
Volume, it after the clicking operation of user is detected, that is, receives and chooses event, target sound frequency is found according to the identification information
According to or the identification information further comprise and be not required to the storage address of the target audio data to be shown, detecting user
Clicking operation after, can target audio data directly be opened according to storage address, and the target be played in audio player
Audio data.
The embodiment of the present invention, can be to audio number by above-mentioned carry out classification based training and the Classification and Identification to audio data
According to accurately being classified, it can fast and accurately provide to the user and currently lead to user during user's chat etc.
Cross the music that the emotion of chat expression is mutually agreed with, the popularization of convenient music.
Fig. 8 is referred to again, is the method flow schematic diagram being trained to grader of the embodiment of the present invention, and the present invention is real
Applying the method for example can equally be performed by a server.Described method includes following steps.
S801:It obtains audio training data set to close, and obtains the audio training number that audio training data set conjunction includes
According to audio frequency characteristics training data.A large amount of audio data can be obtained as audio training data, form audio training data
Set.These audio training datas can be obtained from other audio databases or from some large-scale music
What website was downloaded.These audio training datas in itself can be corresponding with the class categories specified by the embodiment of the present invention.For example,
The embodiment of the present invention divides classification mainly in a manner of emotion, including the classifications such as " happy ", " sorrow ", then go to more
Audio training data can be respectively then " happy ", brisk audio, " sorrow ", sad audio, in order to be able to more
Optimization is trained to grader well.By these audio training datas train come grader can more preferably more accurately
The classification about emotion specified to subsequent audio data.
The audio frequency characteristics training data of the acquisition is primarily referred to as the mel-frequency cepstrum coefficient feature of audio training data
Data, normal Q convert any one or more in harmonic conversion discrete features data, audio rhythm characteristic.Audio frequency characteristics
Training data can be the above-mentioned data acquisition system of 72 dimensions (or other dimensions) referred to.
S802:Cluster meter is carried out to the audio frequency characteristics training data of the acquisition according at least two specified class categories
It calculates, obtains the other audio frequency characteristics training data set of target class at least two class categories.What cluster calculation was based on
Algorithm can be K-means algorithms, be clustered based on k-means algorithms, and cluster centre data, cluster centre is calculated
Data equally can be the data acquisition system of corresponding 72 dimension (or other dimensions).
S803:Preliminary classification device is carried out according to the audio frequency characteristics training data that audio frequency characteristics training data set includes
Training, obtains the grader for classifying to audio data.
In one embodiment, the S803 can specifically include:Included according to audio frequency characteristics training data set
Audio frequency characteristics training data obtains the other cluster centre data of the target class;Determine the phase of target audio feature training data
Like degree training data, the similarity training data is used to represent target audio feature in the audio frequency characteristics training data set
Similarity between training data and the cluster centre data;Call preliminary classification device to the similarity training data into
Row classification determines the training classification of the corresponding audio training data of the target audio feature training data;According to the training class
It is other that the preliminary classification device is updated, to obtain the grader for classifying to audio data.In one embodiment
In, the similarity training data is by the Euclidean distance between target audio feature training data and the cluster centre data
Data are formed.In one embodiment, can be an average value per one dimensional numerical in cluster centre data.For example, cluster arrives
Audio frequency characteristics training data under " happy " classification is 1000, then the value of the first dimension value in cluster centre data is is somebody's turn to do
The average value of first dimension value of 1000 audio frequency characteristics training datas, and so on, obtain corresponding N-dimensional cluster centre number
According to.
In one embodiment, by least two audio frequency characteristics training datas in the audio frequency characteristics training data set
Respectively as target audio feature training data, the audio training data corresponding at least two audio frequency characteristics training datas is obtained
Training classification;It is described that the preliminary classification device is updated according to the training classification, including:According to obtained training classification
Determine the recognition success rate of preliminary classification device;If recognition success rate be less than preset threshold value, to the preliminary classification device into
Row update;Wherein, the recognition success rate is according to training classification and the audio for corresponding target audio feature training data
The mark classification that training data is marked come it is determining, if training classification with mark classification it is identical, success is identified, if not
It is identical, then recognition failures.Mark classification can be marked manually, be labeled as audio training data respectively by artificial mode
The class categories that kind is specified, facilitate subsequent statistical success rate.
The embodiment of the present invention does not use when being trained for grader and is directly based upon what audio frequency characteristics were learnt
Mode optimizes grader training, but first passes through k-means algorithms and carry out cluster analysis, obtains the poly- of each classification
Class center is then based on cluster centre and audio characteristic data is carried out being converted to input parameter, then based on the input parameter again
Optimization is trained to grader, more accurate grader can be obtained.
Fig. 9 is referred to again, is a kind of structure diagram of the sorter of file data of the embodiment of the present invention, the present invention
The described device of embodiment, which can be set, in the server, is more such as capable of providing audio data classification analysis and the clothes of inquiry
It is engaged in device, the file data includes audio data, such as can be some MP3 data, MV data etc., described device packet
Include following module.
Acquisition module 901 for obtaining the associated text data of the audio data, and obtains the sound of the audio data
Frequency characteristic;
Tagsort module 902, for being divided according to grader and the audio characteristic data the audio data
Class identifies, determines the class categories of the audio data, obtains first category information;
Text classification module 903 carries out classification point for word included in the content of text to the text data
Analysis, determines the class categories belonging to the text data, obtains second category information;
Determining module 904, if illustrating identical point for the first category information and the second category information
The identical class categories are then determined as the classification of the audio data by class classification.
In one embodiment, described device can also include:
Training module 905 closes for obtaining audio training data set, and obtains what audio training data set conjunction included
The audio frequency characteristics training data of audio training data;According at least two specified class categories to the audio frequency characteristics of the acquisition
Training data carries out cluster calculation, obtains the other audio frequency characteristics training dataset of target class at least two class categories
It closes;Preliminary classification device is trained according to the audio frequency characteristics training data that audio frequency characteristics training data set includes, is obtained
For the grader classified to audio data.
In one embodiment, the training module 905, for being included according to audio frequency characteristics training data set
When audio frequency characteristics training data is trained preliminary classification device, for the sound included according to audio frequency characteristics training data set
Frequency feature training data obtains the other cluster centre data of the target class;Determine the similar of target audio feature training data
Training data is spent, the similarity training data is used to represent that target audio feature to be instructed in the audio frequency characteristics training data set
Practice the similarity between data and the cluster centre data;Preliminary classification device is called to carry out the similarity training data
Classification determines the training classification of the corresponding audio training data of the target audio feature training data;According to the training classification
The preliminary classification device is updated, to obtain the grader for classifying to audio data.
In one embodiment, at least two audio frequency characteristics in the audio frequency characteristics training data set can be trained
Data obtain the audio training corresponding at least two audio frequency characteristics training datas respectively as target audio feature training data
The training classification of data;The training module 905, for being updated according to the training classification to the preliminary classification device
When, for determining the recognition success rate of preliminary classification device according to obtained training classification;If recognition success rate is less than preset
Threshold value is then updated the preliminary classification device;Wherein, the recognition success rate be according to training classification with for corresponding mesh
The mark classification that is marked of audio training data of mark audio frequency characteristics training data come it is determining, if training classification is with marking class
It is not identical, then success is identified, if it is not the same, then recognition failures.
In one embodiment, the similarity training data is by target audio feature training data and the cluster
Euclidean distance data of the calculation between are formed.
In one embodiment, the tagsort module 902, for point for calculating the audio characteristic data and specifying
Similar degrees of data between the corresponding cluster centre data of class classification;Calling classification device is to the similar degrees of data being calculated
Classify, determine that the audio data belongs to the probability for the class categories specified;Probability value is maximum and general more than preset
Class categories of the class categories of rate threshold value belonging to as the audio data.
In one embodiment, the text classification module 903, divides for the content of text to the text data
Word processing, obtains set of words;The classification belonging to the word that the set of words includes is searched from classification dictionary;According to every
The quantity of word included by one classification scores to classification, and according to belonging to score result determines the text data
Class categories obtain second category information.
In one embodiment, the classification of file data is included according to the specified classification for being used to represent emotion to audio
Data are classified, and the audio characteristic data of the audio data of the acquisition includes:Selected mel-frequency cepstrum system
Number characteristic, normal Q convert any one or more in harmonic conversion discrete features data, audio rhythm characteristic.
In one embodiment, can be that audio data setting represents after the classification that the audio data is determined
The label of the category, and by there is provided in the audio data of label storage audio database, described device can also include:Interaction
Module 906 after receiving chat messages from session interface, determines the class categories of the chat messages;From the sound
Target audio data are searched in frequency database, wherein, classification and the chat represented by the label of the target audio data
The class categories of message are identical;The identification information of the target audio data is shown on the session interface.
In one embodiment, the interactive module 906 chooses thing if being additionally operable to receive to the identification information
Part then searches the target audio data, and calls the audio player plays target audio data.
The embodiment of the present invention can simultaneously divide the text datas such as the characteristic of audio data and the associated lyrics
Class identifies, can effectively ensure the correctness of audio data classification so that under application scenes, such as music recommendation etc.
Application scenarios accurately can recommend music to user.Also, the side of special feature extraction and classifying device training used
Formula can obtain more accurate grader.It is discovered by experiment that using this programme to more than 100,000 songs carried out about
The prediction of the class categories of emotion, wherein pursue a goal with determination, happily, more than 80% rate of accuracy reached of the class categories such as sweetness to, other
The accuracy rate of emotional category label greatly improves the classification accuracy of emotion class music 75% or so.
Figure 10 is referred to again, is a kind of structure diagram of server of the embodiment of the present invention, the clothes of the embodiment of the present invention
Business device can refer to that some can provide to relevant treatment that audio data is classified and/or as needed audio number
According to classification storage with inquiry etc. functions server.The server includes various required shell structures, and including power supply
Power supply, communication interface etc..The server further includes processor 1001 and storage device 1002, input interface 1003, output
Interface 1004.
The input interface 1003 can be that some are supplied to user for inputting audio data to be sorted or being used for
The user interface of the data such as the audio training data of optimization is trained to grader.The output interface 1004 can be
The audio data found is sent to user, the output interface by network interface, the audio data demand that can respond user
1004 can also be memory interface, can be by there is provided the audio data of corresponding emotional category label storages to some other clothes
It is engaged in device.
The storage device 1002 can include volatile memory (volatile memory), such as arbitrary access is deposited
Reservoir (random-access memory, RAM);Storage device 1002 can also include nonvolatile memory (non-
Volatile memory), for example, flash memory (flash memory), solid state disk (solid-state drive, SSD)
Deng;Storage device 1002 can also include the combination of the memory of mentioned kind.
The processor 1001 can be central processing unit 1001 (central processing unit, CPU).It is described
Processor 1001 can further include hardware chip.In one embodiment, above-mentioned hardware chip can be special integrated
Circuit (application-specific integrated circuit, ASIC), programmable logic device
(programmable logic device, PLD) etc..Above-mentioned PLD can be field programmable gate array (field-
Programmable gate array, FPGA), Universal Array Logic (generic arraylogic, GAL) etc..
In one embodiment, the storage device 1002 has program stored therein instruction, and the processor 1001 calls described
The program instruction stored in storage device 1002, for performing the correlation technique referred in above-mentioned each embodiment and step.
In one embodiment, the processor 1001 calls the program instruction stored in the storage device 1002, uses
In the acquisition associated text data of audio data, and obtain the audio characteristic data of the audio data;According to grader
Classification and Identification is carried out to the audio data with the audio characteristic data, the class categories of the audio data is determined, obtains
First category information;Classification analysis is carried out to word included in the content of text of the text data, determines the text
Class categories belonging to data obtain second category information;If the first category information and the second category information table
Identical class categories are shown, then the identical class categories are determined as to the classification of the audio data.
In one embodiment, the processor 1001 is additionally operable to obtain audio training data set conjunction, and obtain the audio
The audio frequency characteristics training data for the audio training data that training data set includes;According at least two specified class categories
Cluster calculation is carried out to the audio frequency characteristics training data of the acquisition, it is other to obtain target class at least two class categories
Audio frequency characteristics training data set;The audio frequency characteristics training data included according to audio frequency characteristics training data set is to initially dividing
Class device is trained, and obtains the grader for classifying to audio data.
In one embodiment, the processor 1001, for being included according to audio frequency characteristics training data set
When audio frequency characteristics training data is trained preliminary classification device, for the sound included according to audio frequency characteristics training data set
Frequency feature training data obtains the other cluster centre data of the target class;Determine the similar of target audio feature training data
Training data is spent, the similarity training data is used to represent that target audio feature to be instructed in the audio frequency characteristics training data set
Practice the similarity between data and the cluster centre data;Preliminary classification device is called to carry out the similarity training data
Classification determines the training classification of the corresponding audio training data of the target audio feature training data;According to the training classification
The preliminary classification device is updated, to obtain the grader for classifying to audio data.
In one embodiment, at least two audio frequency characteristics in the audio frequency characteristics training data set can be trained
Data obtain the audio training corresponding at least two audio frequency characteristics training datas respectively as target audio feature training data
The training classification of data;The processor 1001, when for being updated according to the training classification to the preliminary classification device,
For determining the recognition success rate of preliminary classification device according to obtained training classification;If recognition success rate is less than preset threshold
Value, then be updated the preliminary classification device;Wherein, the recognition success rate be according to training classification with for corresponding target
The mark classification that the audio training data of audio frequency characteristics training data is marked come it is determining, if training classification with mark classification
It is identical, then success is identified, if it is not the same, then recognition failures.
In one embodiment, the similarity training data is by target audio feature training data and the cluster
Euclidean distance data of the calculation between are formed.
In one embodiment, the processor 1001, for according to grader and the audio characteristic data to institute
It states audio data and carries out Classification and Identification, when determining the class categories of the audio data, for calculating the audio characteristic data
Similar degrees of data between cluster centre data corresponding with specified class categories;Calling classification device is calculated to described
Similar degrees of data is classified, and determines that the audio data belongs to the probability for the class categories specified;Probability value is maximum and big
In preset probability threshold value class categories as the audio data belonging to class categories.
In one embodiment, the processor 1001, it is included in for the content of text to the text data
Word carry out classification analysis, the class categories belonging to the text data are determined, when obtaining second category information, for institute
The content of text for stating text data carries out word segmentation processing, obtains set of words;It is searched from classification dictionary in the set of words
Including word belonging to classification;The quantity of word according to included by each classification scores to classification, and according to meter
Point result determines the class categories belonging to the text data, obtains second category information.
In one embodiment, the classification of file data is included according to the specified classification for being used to represent emotion to audio
Data are classified, and the audio characteristic data of the audio data of the acquisition includes:Selected mel-frequency cepstrum system
Number characteristic, normal Q convert any one or more in harmonic conversion discrete features data, audio rhythm characteristic.
In one embodiment, can be that audio data setting represents after the classification that the audio data is determined
The label of the category, and by there is provided the audio data of label store audio database in, the processor 1001, be additionally operable to from
After receiving chat messages on session interface, the class categories of the chat messages are determined;It is searched from the audio database
Target audio data, wherein, classification and the class categories of the chat messages represented by the label of the target audio data
It is identical;The identification information of the target audio data is shown on the session interface.
In one embodiment, the processor 1001 chooses thing if being additionally operable to receive to the identification information
Part then searches the target audio data, and calls the audio player plays target audio data.
The embodiment of the present invention can simultaneously divide the text datas such as the characteristic of audio data and the associated lyrics
Class identifies, can effectively ensure the correctness of audio data classification so that under application scenes, such as music recommendation etc.
Application scenarios accurately can recommend music to user.Also, the side of special feature extraction and classifying device training used
Formula can obtain more accurate grader.It is discovered by experiment that using this programme to more than 100,000 songs carried out about
The prediction of the class categories of emotion, wherein pursue a goal with determination, happily, more than 80% rate of accuracy reached of the class categories such as sweetness to, other
The accuracy rate of emotional category label greatly improves the classification accuracy of emotion class music 75% or so.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the program can be stored in a computer read/write memory medium
In, the program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic
Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access
Memory, RAM) etc..
Above disclosed is only the section Example of the present invention, cannot limit the right of the present invention with this certainly
Range, one of ordinary skill in the art will appreciate that realizing all or part of flow of above-described embodiment, and according to right of the present invention
It is required that the equivalent variations made, still fall within and invent covered range.
Claims (13)
1. a kind of sorting technique of file data, which is characterized in that the file data includes audio data, the method packet
It includes:
The associated text data of the audio data is obtained, and obtains the audio characteristic data of the audio data;
Classification and Identification is carried out to the audio data according to grader and the audio characteristic data, determines the audio data
Class categories obtain first category information;
Classification analysis is carried out to word included in the content of text of the text data, is determined belonging to the text data
Class categories obtain second category information;
If the first category information and the second category information illustrate identical class categories, by identical point
Class classification is determined as the classification of the audio data.
2. the method as described in claim 1, which is characterized in that further include:
It obtains audio training data set to close, and it is special to obtain the audio of audio training data that audio training data set conjunction includes
Levy training data;
Cluster calculation is carried out to the audio frequency characteristics training data of the acquisition according at least two specified class categories, obtains institute
State the other audio frequency characteristics training data set of target class at least two class categories;
Preliminary classification device is trained according to the audio frequency characteristics training data that audio frequency characteristics training data set includes, is obtained
For the grader classified to audio data.
3. method as claimed in claim 2, which is characterized in that the sound included according to audio frequency characteristics training data set
Frequency feature training data is trained preliminary classification device, including:
According to the audio frequency characteristics training data that audio frequency characteristics training data set includes, obtain in the other cluster of the target class
Calculation evidence;
Determine the similarity training data of target audio feature training data, the similarity training data is used to represent the sound
Similarity in frequency feature training data set between target audio feature training data and the cluster centre data;
Preliminary classification device is called to classify the similarity training data, determines the target audio feature training data pair
The training classification for the audio training data answered;
The preliminary classification device is updated according to the training classification, to obtain the classification for classifying to audio data
Device.
4. method as claimed in claim 3, which is characterized in that by least two in the audio frequency characteristics training data set
It is right to obtain at least two audio frequency characteristics training data institutes respectively as target audio feature training data for audio frequency characteristics training data
The training classification for the audio training data answered;
It is described that the preliminary classification device is updated according to the training classification, including:
Training classification according to obtaining determines the recognition success rate of preliminary classification device;
If recognition success rate is less than preset threshold value, the preliminary classification device is updated;
Wherein, the recognition success rate is according to training classification and the audio training for corresponding target audio feature training data
The mark classification that data are marked come it is determining, if training classification with mark classification it is identical, success is identified, if not phase
Together, then recognition failures.
5. method as claimed in claim 3, which is characterized in that the similarity training data is trained by target audio feature
Euclidean distance data between data and the cluster centre data are formed.
6. the method as described in claim 1, which is characterized in that it is described according to grader and the audio characteristic data to described
Audio data carries out Classification and Identification, determines the class categories of the audio data, obtains first category information, including:
Similar degrees of data between the cluster centre data corresponding with the class categories specified that calculate the audio characteristic data;
Calling classification device classifies to the similar degrees of data being calculated, and determines that the audio data belongs to point specified
The probability of class classification;
Class categories of the class categories belonging to as audio data probability value is maximum and more than preset probability threshold value.
7. the method as described in claim 1, which is characterized in that included by the content of text to the text data
Word carries out classification analysis, determines the class categories belonging to the text data, obtains second category information, including:
Word segmentation processing is carried out to the content of text of the text data, obtains set of words;
The classification belonging to the word that the set of words includes is searched from classification dictionary;
The quantity of word according to included by each classification scores to classification, and determines the text according to score result
Class categories belonging to data obtain second category information.
8. the method as described in claim 1, which is characterized in that include being used to represent according to specified to the classification of file data
The classification of emotion classifies to audio data, and the audio characteristic data of the audio data of the acquisition includes:It is selected
Mel-frequency cepstrum coefficient characteristic, normal Q transformation harmonic conversion discrete features data, appointing in audio rhythm characteristic
It anticipates one or more.
9. such as claim 1-8 any one of them methods, which is characterized in that after the classification that the audio data is determined,
Can be the label that audio data setting represents the category, and by there is provided the audio data of label storage audio databases
In, it is described to further include:
After receiving chat messages from session interface, the class categories of the chat messages are determined;
Target audio data are searched from the audio database, wherein, the class represented by the label of the target audio data
It is not identical with the class categories of the chat messages;
The identification information of the target audio data is shown on the session interface.
10. method as claimed in claim 9, which is characterized in that further include:
Event is chosen to the identification information if received, searches the target audio data, and audio is called to play
Device plays the target audio data.
11. a kind of sorter of file data, which is characterized in that the file data includes audio data, described device
Including:
Acquisition module for obtaining the associated text data of the audio data, and obtains the audio frequency characteristics of the audio data
Data;
Tagsort module, for carrying out Classification and Identification to the audio data according to grader and the audio characteristic data,
It determines the class categories of the audio data, obtains first category information;
Text classification module carries out classification analysis for word included in the content of text to the text data, determines
Class categories belonging to the text data obtain second category information;
Determining module, if illustrating identical class categories for the first category information and the second category information,
The identical class categories are then determined as to the classification of the audio data.
12. a kind of server, which is characterized in that including:Processor and storage device;
The storage device has program stored therein instruction, and the processor calls the program instruction stored in the storage device, uses
In the sorting technique for performing such as claim 1-10 any one of them file datas.
13. a kind of computer storage media, which is characterized in that have program stored therein instruction in the computer storage media, described
Program instruction is performed, and is used to implement such as the sorting technique of claim 1-10 any one of them file datas.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810023498.1A CN108197282B (en) | 2018-01-10 | 2018-01-10 | File data classification method and device, terminal, server and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810023498.1A CN108197282B (en) | 2018-01-10 | 2018-01-10 | File data classification method and device, terminal, server and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108197282A true CN108197282A (en) | 2018-06-22 |
CN108197282B CN108197282B (en) | 2020-07-14 |
Family
ID=62588599
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810023498.1A Active CN108197282B (en) | 2018-01-10 | 2018-01-10 | File data classification method and device, terminal, server and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108197282B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109065071A (en) * | 2018-08-31 | 2018-12-21 | 电子科技大学 | A kind of song clusters method based on Iterative k-means Algorithm |
CN110719525A (en) * | 2019-08-28 | 2020-01-21 | 咪咕文化科技有限公司 | Bullet screen expression package generation method, electronic equipment and readable storage medium |
CN110738561A (en) * | 2019-10-15 | 2020-01-31 | 上海云从企业发展有限公司 | service management method, system, equipment and medium based on characteristic classification |
WO2020024396A1 (en) * | 2018-08-02 | 2020-02-06 | 平安科技(深圳)有限公司 | Music style recognition method and apparatus, computer device, and storage medium |
CN111142794A (en) * | 2019-12-20 | 2020-05-12 | 北京浪潮数据技术有限公司 | Method, device and equipment for classified storage of data and storage medium |
CN111339348A (en) * | 2018-12-19 | 2020-06-26 | 北京京东尚科信息技术有限公司 | Information service method, device and system |
CN111428074A (en) * | 2020-03-20 | 2020-07-17 | 腾讯科技(深圳)有限公司 | Audio sample generation method and device, computer equipment and storage medium |
CN111435369A (en) * | 2019-01-14 | 2020-07-21 | 腾讯科技(深圳)有限公司 | Music recommendation method, device, terminal and storage medium |
WO2021103401A1 (en) * | 2019-11-25 | 2021-06-03 | 深圳壹账通智能科技有限公司 | Data object classification method and apparatus, computer device and storage medium |
CN113449123A (en) * | 2021-06-28 | 2021-09-28 | 深圳市英骏利智慧照明科技有限公司 | Multi-LED display control method, system, terminal and medium |
CN113813609A (en) * | 2021-06-02 | 2021-12-21 | 腾讯科技(深圳)有限公司 | Game music style classification method and device, readable medium and electronic equipment |
CN113821630A (en) * | 2020-06-19 | 2021-12-21 | 菜鸟智能物流控股有限公司 | Data clustering method and device |
CN115910042A (en) * | 2023-01-09 | 2023-04-04 | 百融至信(北京)科技有限公司 | Method and apparatus for identifying information type of formatted audio file |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101364222A (en) * | 2008-09-02 | 2009-02-11 | 浙江大学 | Two-stage audio search method |
CN102637251A (en) * | 2012-03-20 | 2012-08-15 | 华中科技大学 | Face recognition method based on reference features |
CN105843931A (en) * | 2016-03-30 | 2016-08-10 | 广州酷狗计算机科技有限公司 | Classification method and device |
CN106060043A (en) * | 2016-05-31 | 2016-10-26 | 北京邮电大学 | Abnormal flow detection method and device |
US20170372725A1 (en) * | 2016-06-28 | 2017-12-28 | Pindrop Security, Inc. | System and method for cluster-based audio event detection |
CN107562850A (en) * | 2017-08-28 | 2018-01-09 | 百度在线网络技术(北京)有限公司 | Music recommends method, apparatus, equipment and storage medium |
-
2018
- 2018-01-10 CN CN201810023498.1A patent/CN108197282B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101364222A (en) * | 2008-09-02 | 2009-02-11 | 浙江大学 | Two-stage audio search method |
CN102637251A (en) * | 2012-03-20 | 2012-08-15 | 华中科技大学 | Face recognition method based on reference features |
CN105843931A (en) * | 2016-03-30 | 2016-08-10 | 广州酷狗计算机科技有限公司 | Classification method and device |
CN106060043A (en) * | 2016-05-31 | 2016-10-26 | 北京邮电大学 | Abnormal flow detection method and device |
US20170372725A1 (en) * | 2016-06-28 | 2017-12-28 | Pindrop Security, Inc. | System and method for cluster-based audio event detection |
CN107562850A (en) * | 2017-08-28 | 2018-01-09 | 百度在线网络技术(北京)有限公司 | Music recommends method, apparatus, equipment and storage medium |
Non-Patent Citations (3)
Title |
---|
曾向阳: "《智能水中目标识别》", 31 March 2016, 国防工业出版社 * |
杨可明: "《遥感原理与应用》", 30 September 2016, 中国矿业大学出版社 * |
陶凯云: ""基于音频和歌词的音乐情感分类研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020024396A1 (en) * | 2018-08-02 | 2020-02-06 | 平安科技(深圳)有限公司 | Music style recognition method and apparatus, computer device, and storage medium |
CN109065071A (en) * | 2018-08-31 | 2018-12-21 | 电子科技大学 | A kind of song clusters method based on Iterative k-means Algorithm |
CN109065071B (en) * | 2018-08-31 | 2021-05-14 | 电子科技大学 | Song clustering method based on iterative k-means algorithm |
CN111339348A (en) * | 2018-12-19 | 2020-06-26 | 北京京东尚科信息技术有限公司 | Information service method, device and system |
CN111435369B (en) * | 2019-01-14 | 2024-04-09 | 腾讯科技(深圳)有限公司 | Music recommendation method, device, terminal and storage medium |
CN111435369A (en) * | 2019-01-14 | 2020-07-21 | 腾讯科技(深圳)有限公司 | Music recommendation method, device, terminal and storage medium |
CN110719525A (en) * | 2019-08-28 | 2020-01-21 | 咪咕文化科技有限公司 | Bullet screen expression package generation method, electronic equipment and readable storage medium |
CN110738561A (en) * | 2019-10-15 | 2020-01-31 | 上海云从企业发展有限公司 | service management method, system, equipment and medium based on characteristic classification |
WO2021103401A1 (en) * | 2019-11-25 | 2021-06-03 | 深圳壹账通智能科技有限公司 | Data object classification method and apparatus, computer device and storage medium |
CN111142794A (en) * | 2019-12-20 | 2020-05-12 | 北京浪潮数据技术有限公司 | Method, device and equipment for classified storage of data and storage medium |
CN111428074B (en) * | 2020-03-20 | 2023-08-08 | 腾讯科技(深圳)有限公司 | Audio sample generation method, device, computer equipment and storage medium |
CN111428074A (en) * | 2020-03-20 | 2020-07-17 | 腾讯科技(深圳)有限公司 | Audio sample generation method and device, computer equipment and storage medium |
CN113821630A (en) * | 2020-06-19 | 2021-12-21 | 菜鸟智能物流控股有限公司 | Data clustering method and device |
CN113821630B (en) * | 2020-06-19 | 2023-10-17 | 菜鸟智能物流控股有限公司 | Data clustering method and device |
CN113813609A (en) * | 2021-06-02 | 2021-12-21 | 腾讯科技(深圳)有限公司 | Game music style classification method and device, readable medium and electronic equipment |
CN113813609B (en) * | 2021-06-02 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Game music style classification method and device, readable medium and electronic equipment |
CN113449123A (en) * | 2021-06-28 | 2021-09-28 | 深圳市英骏利智慧照明科技有限公司 | Multi-LED display control method, system, terminal and medium |
CN115910042A (en) * | 2023-01-09 | 2023-04-04 | 百融至信(北京)科技有限公司 | Method and apparatus for identifying information type of formatted audio file |
CN115910042B (en) * | 2023-01-09 | 2023-05-05 | 百融至信(北京)科技有限公司 | Method and device for identifying information type of formatted audio file |
Also Published As
Publication number | Publication date |
---|---|
CN108197282B (en) | 2020-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108197282A (en) | Sorting technique, device and the terminal of file data, server, storage medium | |
Costa et al. | An evaluation of convolutional neural networks for music classification using spectrograms | |
CN110209844B (en) | Multimedia data matching method, device and storage medium | |
CN110209897B (en) | Intelligent dialogue method, device, storage medium and equipment | |
CN108597541A (en) | A kind of speech-emotion recognition method and system for enhancing indignation and happily identifying | |
EP1840764A1 (en) | Hybrid audio-visual categorization system and method | |
CN107481720A (en) | A kind of explicit method for recognizing sound-groove and device | |
WO2015021937A1 (en) | Method and device for user recommendation | |
CN111666400B (en) | Message acquisition method, device, computer equipment and storage medium | |
CN114117213A (en) | Recommendation model training and recommendation method, device, medium and equipment | |
CN112364168A (en) | Public opinion classification method based on multi-attribute information fusion | |
WO2021017300A1 (en) | Question generation method and apparatus, computer device, and storage medium | |
CN111178081B (en) | Semantic recognition method, server, electronic device and computer storage medium | |
CN111429157A (en) | Method, device and equipment for evaluating and processing complaint work order and storage medium | |
CN114492423A (en) | False comment detection method, system and medium based on feature fusion and screening | |
CN115293817A (en) | Advertisement text generation method and device, equipment, medium and product thereof | |
CN114461804A (en) | Text classification method, classifier and system based on key information and dynamic routing | |
CN108810625A (en) | A kind of control method for playing back of multi-medium data, device and terminal | |
CN114328800A (en) | Text processing method and device, electronic equipment and computer readable storage medium | |
CN113486143A (en) | User portrait generation method based on multi-level text representation and model fusion | |
TWI734085B (en) | Dialogue system using intention detection ensemble learning and method thereof | |
CN112446219A (en) | Chinese request text intention analysis method | |
CN115617974B (en) | Dialogue processing method, device, equipment and storage medium | |
CN110517672A (en) | User's intension recognizing method, method for executing user command, system and equipment | |
Thuseethan et al. | Multimodal deep learning framework for sentiment analysis from text-image web Data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |