CN108197282A

CN108197282A - Sorting technique, device and the terminal of file data, server, storage medium

Info

Publication number: CN108197282A
Application number: CN201810023498.1A
Authority: CN
Inventors: 钟云; 饶孟良; 苏可; 张倩汶
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-01-10
Filing date: 2018-01-10
Publication date: 2018-06-22
Anticipated expiration: 2038-01-10
Also published as: CN108197282B

Abstract

The embodiment of the invention discloses a kind of sorting technique of file data, device and terminal, server, the method includes：The associated text data of the audio data is obtained, and obtains the audio characteristic data of the audio data；Classification and Identification is carried out to the audio data according to grader and the audio characteristic data, the class categories of the audio data is determined, obtains first category information；Classification analysis is carried out to word included in the content of text of the text data, the class categories belonging to the text data is determined, obtains second category information；If the first category information and the second category information illustrate identical class categories, which is determined as to the classification of the audio data.Using the embodiment of the present invention, it can preferably ensure the correctness of audio data classification so that under application scenes, such as the application scenarios such as music recommendation, accurately user can be given to recommend music.

Description

Sorting technique, device and the terminal of file data, server, storage medium

Technical field

The present invention relates to a kind of computer application technology more particularly to sorting technique of file data, device and ends End, server, storage medium.

Background technology

With the increasingly raising of people's living standard, the hobby of people is also more and more extensive.And music is as people One of most popular hobby is tightly tied together with people's life.Meanwhile at this stage, various intelligent sounds Case emerges in large numbers, and the quality of music song is paid attention to by more and more people.

Current music is there are a large amount of type and school, and the music of generation for years is also a huge number Amount, how to music, corresponding audio data carries out the hot issue that classification is studied as music service suppliers.

Invention content

The embodiment of the present invention provides a kind of sorting technique of file data, device and terminal, server, can be relatively accurately Determine the classification of audio data.

On the one hand, an embodiment of the present invention provides a kind of sorting technique of file data, the file data includes sound Frequency evidence, the method includes：

The associated text data of the audio data is obtained, and obtains the audio characteristic data of the audio data；

Classification and Identification is carried out to the audio data according to grader and the audio characteristic data, determines the audio number According to class categories, obtain first category information；

Classification analysis is carried out to word included in the content of text of the text data, determines the text data institute The class categories of category obtain second category information；

If the first category information and the second category information illustrate identical class categories, this is identical Class categories be determined as the classification of the audio data.

On the other hand, an embodiment of the present invention provides a kind of sorter of file data, the file data includes Audio data, described device include：

Acquisition module for obtaining the associated text data of the audio data, and obtains the audio of the audio data Characteristic；

Tagsort module, for carrying out classification knowledge to the audio data according to grader and the audio characteristic data Not, the class categories of the audio data are determined, obtain first category information；

Text classification module carries out classification analysis for word included in the content of text to the text data, It determines the class categories belonging to the text data, obtains second category information；

Determining module, if illustrating identical classification class for the first category information and the second category information Not, then the identical class categories are determined as to the classification of the audio data.

In another aspect, the embodiment of the present invention additionally provides a kind of server, including：Processor and storage device；It is described Storage device has program stored therein instruction, and the processor calls the program instruction stored in the storage device, for performing such as The sorting technique of above-mentioned file data.

Correspondingly, the embodiment of the present invention additionally provides a kind of computer storage media, is deposited in the computer storage media Program instruction is contained, described program instruction is performed, and is used to implement the sorting technique of above-mentioned file data.

The embodiment of the present invention can simultaneously divide the text datas such as the characteristic of audio data and the associated lyrics Class identifies, only when the recognition result of the two is identical, just determines the class categories of audio data, can ensure audio number in this way According to the correctness of classification so that under application scenes, such as the application scenarios such as music recommendation, it can accurately be pushed away to user Recommend music.

Description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with Other attached drawings are obtained according to these attached drawings.

Fig. 1 is the process schematic of the generation grader of the embodiment of the present invention；

Fig. 2 is the flow diagram classified in the embodiment of the present invention to the audio data do not classified；

Fig. 3 is the structure diagram of the application system of the embodiment of the present invention；

Fig. 4 is a kind of schematic diagram of user interface of the embodiment of the present invention；

Fig. 5 is the schematic diagram of another user interface of the embodiment of the present invention；

Fig. 6 is a kind of flow diagram of the sorting technique of file data of the embodiment of the present invention；

Fig. 7 is a kind of flow diagram of classified inquiry method to file data of the embodiment of the present invention；

Fig. 8 is the method flow schematic diagram being trained to grader of the embodiment of the present invention；

Fig. 9 is a kind of structure diagram of the sorter of file data of the embodiment of the present invention；

Figure 10 is a kind of structure diagram of server of the embodiment of the present invention.

Specific embodiment

In embodiments of the present invention, audio data can be some music with text datas such as the lyrics, song comments File, some audio frequency characteristics and the text data of the audio data included in itself by audio data, to integrate determining audio The class categories of data.It can be classified by an advance trained grader come the audio frequency characteristics to audio data, And can then be classified to text data by a classification dictionary, if the classification results of grader and classification dictionary Classification results are identical, are target classification classification, then can be using the target classification classification as the classification of audio data.

In one embodiment, in order to improve the classification accuracy of grader, for grader, branch on the one hand can be passed through Vector machine (Support vector machine, SVM) algorithm next life constituent class device is held, cluster is on the other hand can be combined with and calculates Method handles training data, to be preferably trained to grader.It can be by clustering algorithm first to a large amount of audio Training data is clustered, and cluster centre is then based on to carry out the audio frequency characteristics training data of each audio training data Data after conversion process are input in grader by corresponding conversion process, then again will classification to classify to grader Result with to audio training data manually mark when the mark classification that is marked be compared, if identical, then it is assumed that Grader can carry out successful classification to the audio data, if it is not the same, parameter then can be carried out to grader as needed Optimization, in order to complete to optimize training to grader.

Fig. 1 is referred to, is the process schematic of the generation grader of the embodiment of the present invention.Grader can be one and be based on The initial SVM classifier of SVM algorithm generation can acquire a large amount of audio data as audio training data, audio training number According to main function be to train SVM classifier, the training process of SVM classifier includes following flow.

First, the audio training data as training data got is manually marked according to specified class categories Note, the class categories belonging to these audio training datas of Direct Mark, the class categories manually marked are referred to as to mark classification. In the embodiment of the present invention, the class categories specified can be used for the classification to show emotion as needed, such as expression " happy ", The classification of emotions such as " sorrows ".

After artificial mark is completed, then audio frequency characteristics training data is extracted from audio training data.In an implementation In example, the audio frequency characteristics training data of extraction mainly includes：Mel-frequency cepstrum coefficient (the Mel-frequency of audio Cepstral coefficients, MFCC), normal Q transformation discrete (ConstantQ Transform, the CQT) feature of harmonic conversion Data and audio rhythm Beat characteristics, the reasons why selecting these audio frequency characteristics training datas, are：It is sent out by testing Existing, these three characteristics can relatively significantly express the Sentiment orientation of audio, the emotion of user be represented, in other embodiment In, in order to ensure that Sentiment orientation is preferably embodied, other audio frequency characteristics can be extracted, are added in audio frequency characteristics training data The corresponding data of other audio frequency characteristics.In one embodiment, it when extracting audio frequency characteristics training data, can only be instructed from audio Practice in data, feature extraction carried out to the audio data in the audiorange of 20ms, obtains corresponding audio frequency characteristics training data, It can effectively reduce calculation amount to avoid the feature extraction to entire audio training data in this way.And in one embodiment, may be used To carry out feature extraction to the audio data in the range of specified time, such as can extract in music data, reproduction time is located at The period of interlude because under normal circumstances, intermediary time period is the climax parts of a piece of music, can more embody the feelings of music Sense tends to.

It, can be according to the audio frequency characteristics training data extracted, to each after extraction obtains audio frequency characteristics training data The class categories subaudio frequency training data specified carries out clustering processing, and clustering processing may be used the realization of K-means methods, complete After cluster, then determine the cluster centre data of each cluster classification.In one embodiment, it can set in positive and negative class cluster The number of calculation evidence is equal, i.e., it is identical that the clustering information of positive and negative class assigns weight, such as " happy " is that positive class classification, " sorrow " are Negative class classification, a positive class classification need a corresponding negative class classification.Certainly, in other embodiments, a positive class classification Multiple opposite negative class classifications can be corresponded to.

Audio frequency characteristics training data is converted into similarity training data based on cluster centre data.Utilize cluster centre number The new attribute space with data immanent structure information is mapped to according to by former audio frequency characteristics training data, in one embodiment, The new attribute space may be used Euclidean distance and be measured, and original audio frequency characteristics training data is converted to similarity training Data, that is, Euclidean distance data, new audio frequency characteristics of the similarity training data as corresponding audio training data can be with Regard the generic attribute of audio training data corresponding label as.

Finally the similarity training data obtained after conversion is input in initial SVM classifier, to initial SVM classifier It is trained, obtains the svm classifier model that the present invention finally uses.

In one embodiment, it by taking specified class categories are five class categories such as " happy ", " sorrow " as an example, determines Audio training data for 10,000 songs, based on five class categories, 10,000 songs are divided by way of manually marking To five class categories specified, the class categories belonging to each first song are also manually marked class.Complete artificial mark Afterwards, it can extract to obtain the audio frequency characteristics training data of each song, in embodiments of the present invention, each first song can use one The audio frequency characteristics training datas that the numerical value of a 72 dimension are formed represent, wherein, in the audio frequency characteristics training datas of 72 dimensions, CQT is special Sign accounts for 24 dimensions, NFCC features account for 13 dimensions, and beat features account for 6 dimensions, other features account for 29 dimensions, the form example of audio frequency characteristics training data Such as can be (0.1,0.11,0.15 ..., 1.1).Based on audio frequency characteristics training data by k-means clustering algorithms to this 10000 first songs carry out clustering processing, and calculate the cluster centre number of five class categories according to five specified class categories According to, it is assumed that in the class categories of cluster to " happy ", there are 1000 songs, by 72 dimension values of each first song in this 1000 first song In an average value is calculated per one-dimensional, you can obtain the 72 cluster centre numbers tieed up about " happy " class categories According to.After obtaining cluster centre data, sung using any one head in " happy " class categories as target song, calculating " is opened Euclidean distance in the heart " class categories between the audio frequency characteristics training data of target song and this 72 dimension cluster centre data, obtains To the Euclidean distance data (i.e. similarity training data) of one 72 dimension, and it is opposite to adopt calculating target song in a like fashion Similarity training data between the cluster centre data of other class categories.The Euclidean distance data that each 72 are tieed up input Into the initial SVM classifier for waiting for training.Initial each similarity training data of the SVM classifier based on target song, really Make the probability that target song belongs to some class categories, such as the classification about target song of initial SVM classifier output Probability is：The probability for belonging to " happy " class categories is 50%, and belongs to only the 10% of " sorrow ", it is also possible to belong to other Class categories, due to belonging to the maximum probability of " happy " class categories and more than preset probability threshold value, then it is assumed that target song Belong to the class categories of " happy ".By initial SVM classifier to target song sorted classification results and target song in people The mark classification that work marks when marking is compared, if identical, success of classifying, and otherwise classification failure.

By each song in 1000 songs of " happy " as target song, if first for the 1000 of " happy " After song carries out above-mentioned processing, the accuracy of class categories that initial SVM classifier is obtained for this 1000 first song identification is obtained, such as Fruit accuracy reaches 95% (or error is less than 5%) and then thinks that initial SVM classifier can preferably identify " happy " classification Otherwise the song of classification, is needed after carrying out parameter optimization to initial SVM classifier, be further continued for carrying out this 1000 first song above-mentioned Training study.The training managing identical with being somebody's turn to do " happy " class categories is also carried out for other class categories such as " sorrow ", such as The average value of the classification accuracy rate of all class categories of fruit reaches preset accuracy threshold value, it may be considered that SVM classifier energy It is enough preferably to classify to song according to each class categories specified, SVM classifier can be disposed, in order to unknown class Other song is classified according to each class categories specified, and sets emotional category label.

After having obtained final SVM classifier, in one embodiment, Fig. 2 is referred to, be the embodiment of the present invention to not having There is the flow diagram that the audio data classified is classified, the method for the embodiment of the present invention can be available at one It is realized in the server for carrying out audio data classification.

In S201, inputting audio data, the audio data for one without carry out all as noted above " happy ", Class categories such as " sorrows " carry out the original audio data of classification processing, and in S202, audio frequency characteristics are extracted from audio data, Obtain the audio characteristic data of audio data, the audio characteristic data can be include as described above it is multiple for representing audio 72 dimension datas of the numerical value of feature.In S203, cluster analysis is carried out to audio data, k-means algorithms specifically may be used Complete clustering processing.The similar degrees of data of audio data is calculated in S204, specifically by audio characteristic data and each point The cluster centre data of class classification carry out Euclidean distance calculating, obtain similarity of the audio data under each class categories Data, that is to say, that five similar degrees of data of the audio data can be obtained, complete the structure of class categories attribute.It is each The cluster centre data of specified class categories are calculated during above-mentioned trained grader.In S205, will To similar degrees of data be input in SVM classifier, Classification and Identification is carried out by SVM classifier, obtains recognition result, by probability most Class categories of the class categories as the audio data big and more than predetermined threshold value.

In S206, the text data of the audio data is obtained, in embodiments of the present invention, text data refers to the sound The lyrics data of frequency evidence can search for the lyrics data for obtaining audio data by way of web search.In the present invention In embodiment, the class categories forecast period based on lyrics feature is a unsupervised process, does not need to training about the lyrics The disaggregated model of classification.It in S207, is pre-processed to getting the lyrics, the pretreatment of progress mainly includes removing punctuate symbol Number and the symbol that cannot identify of some of which.Word segmentation processing is carried out to the lyrics in S208, obtains multiple individual words, Various effective participle tools can be used to carry out word segmentation processing, obtain the word list of the lyrics.In S209, based on preset Each word that classification dictionary obtains participle scores, and can carry out positive negativity to judge to come in one embodiment It scores, all words in word list is matched with emotion dictionary, matching rule is if in word list Some word is fallen in the other dictionary word of emotion positive sense-class, then emotion positive tropism's value of song adds 1, such as falls on " happy " classification In classification, then the scoring of " happy " class categories adds 1, if instead some word is fallen in emotion negative sense dictionary word, then song Emotion negative tropism add 1, such as fall in " sorrow " class categories, then the scoring of " sorrow " class categories adds 1, final relatively to sing Bent emotion positive tropism and the size of negative tropism value, that is, judge the scoring of the class categories such as " happy ", " sorrow ", by the highest that scores Class categories of the class categories as the text data.

In S210, whether the class categories of audio data and the class categories of text data are identical, if described First category information and the second category information illustrate identical class categories, are target classification classification, then in S211 It is middle using the target classification classification as the classification of the audio data, and the feelings of the target classification classification are set for the audio data Feel class label, convenient follow-up use.Emotional category label can be as an attribute of audio data, by setting the attribute The mode of value is come the class label that shows emotion.If the property value of the emotional category label of some audio data is sky, show The class categories of some audio data can not be identified, Classification and Identification failure.

It by the corresponding modes of above-mentioned Fig. 2, can accurately be classified to a large amount of audio data, be these audio numbers According to the label of setting class categories, and store into audio database.In one embodiment, with reference to Fig. 3, Fig. 4 and Fig. 5 couple The scene used there is provided the audio data of label is described in detail, and Fig. 3 is the application system of the embodiment of the present invention Structure diagram, Fig. 4 are a kind of schematic diagrames of user interface of the embodiment of the present invention, and Fig. 5 is the another kind of the embodiment of the present invention The schematic diagram of user interface.

As shown in figure 3, in the application scenarios of the embodiment of the present invention, including user A and its used intelligent terminal 301, User B and its used intelligent terminal 302, the intelligent terminal of two users are connected on the server 303 of network side, the net The server 303 of network side can include multiple servers, or single server, for convenience, the present invention are implemented Example is described as server.

In one embodiment, any one user can be communicated by intelligent terminal with server 303, to service Device 303 sends Query Information, for inquiring required audio data, as shown in figure 4, can be in intelligent terminal 301 or intelligent terminal User interface is shown in 302, for realizing the interaction between user.In one embodiment, user can by voice or The forms such as person's word input initiate the search inquiry of audio data on the user interface, with by server 303 from audio number According to the audio data that the emotional category label there is provided corresponding class categories is found in library, such as " happy " two words are inputted, then Server 303 can search the audio number of the emotional category label corresponding to the class categories of " happy " from audio database According to.If there is the audio data of the emotional category label corresponding to multiple class categories for being provided with " happy ", server 303 can be determined an audio data by randomly selected mode or be determined according to the priority time sequencing of storage Determining audio data is sent to user by one newest audio data as query feedback data.

In one embodiment, as shown in figure 5, the user interface can be session circle chatted with virtual robot Face, phase that the chat messages that virtual robot is sent out based on user on the session interface have been user's inquiry and recommended setting automatically Answer the audio data of the emotional category label of class categories.For query result, the mode of final determining audio data can be used The above-mentioned randomly selected mode referred to or in the way of sequencing.It is, of course, also possible to others are added in for determining Go out the determining strategy of audio data, for example, the historical search data based on user or behavioral data or user property come from Multiple queries to audio data in determine a suitable user audio data, for example, determining one based on age of user Or multiple audio datas.

In one embodiment, can instant messaging exchange, the clothes be carried out by server between two intelligent terminals Business device can be an instant messaging application server, which, which can establish, is stored with audio data Library and the connection that the server that audio data is inquired by classification is provided.The audio database include it is multiple be provided with emotional category The audio data of label.During user A and user B chats, the one or more of chats that can be sent out according to user A disappear Breath, determines the current emotion of user A, is then based on the emotion, and into the audio database, there is provided corresponding emotion classes for inquiry The audio data of distinguishing label after obtaining query result, shows one or more sound inquired on the chat interface of user A Frequency evidence.Identical processing can be carried out for user B.In one embodiment, it can also be based on user A and user B simultaneously Chat messages, determine the common emotional categories of user A and user B, be then based on looking into the emotion to the audio database The audio data there is provided corresponding emotional category label is ask, one inquired is shown on the session interface of user A and user B Or multiple audio datas.If satisfactory audio data include it is multiple, can by random selection or other one A little screening rules therefrom determine that one or more audio data is prompted to user A and/or user B.

The embodiment of the present invention can simultaneously divide the text datas such as the characteristic of audio data and the associated lyrics Class identifies, only when the recognition result of the two is identical, just determines the class categories of audio data, can ensure audio number in this way According to the correctness of classification so that under application scenes, such as the application scenarios such as music recommendation, it can accurately be pushed away to user Recommend music.Also, MFCC, CQT and Beat in audio data has been selected to be used as the audio frequency characteristics to show emotion, it can be compared with The classification based on emotion is carried out to audio data well.When being trained for grader, do not use and be directly based upon audio The mode that feature is learnt optimizes grader training, but first passes through k-means algorithms and carry out cluster analysis, obtains To the cluster centre of each classification, it is then based on cluster centre and audio characteristic data is carried out being converted to input parameter again, then Optimization is trained grader based on the input parameter, more accurate grader can be obtained.It is discovered by experiment that it uses This programme to having carried out the prediction of the class categories about emotion more than 100,000 songs, wherein pursue a goal with determination, happily, the classification such as sweetness To more than 80%, the accuracys rate of other emotional category labels greatly improves the rate of accuracy reached of classification 75% or so The classification accuracy of emotion class music.

Fig. 6 is referred to again, is a kind of flow diagram of the sorting technique of file data of the embodiment of the present invention, the present invention The method of embodiment can realize by a server about audio datas such as songs, such as some music applications Application server.In embodiments of the present invention, the file data can be the audio datas such as song or certain videos File, the video file include audio data, which for example can be music short-movie (Music Video, MV) etc. The file of type.Described method includes following steps for the embodiment of the present invention.

S601：The associated text data of the audio data is obtained, and obtains the audio characteristic data of the audio data. The associated text data of audio data can refer to that the lyrics or the corresponding MV of the audio data of the audio data etc. are regarded The subtitle of frequency evidence can also be the evaluation contents data such as the corresponding comment of the audio data, can be based on the audio data Title search for obtain by way of web search or text data in itself when the audio data is obtained simultaneously It obtains and preserves or can also identify from audio data by modes such as voice recognitions to obtain the text datas such as the lyrics.

In embodiments of the present invention, the classification of audio data is carried out mainly for the emotion of user, it is determined that Duo Geguan In the class categories of emotion.On this basis, the audio characteristic data of the audio data has mainly been selected in audio data Audio characteristic data corresponding to MFCC, CQT and Beat feature.In order to ensure that subsequently more accurately audio data can be pressed Classify according to emotion, can also further supplement other audio frequency characteristics.In one embodiment, audio characteristic data can be with For the data acquisition system of one 72 dimension, the audio feature vector of 72 dimensions can also be referred to as.The data audio characteristic data set shares To represent the feature of the audio data.In other embodiments, the data acquisition system of other dimensions can also be divided, dimension is more, It is more accurate to the feature description of audio data, and dimension is fewer, can be expedited classification speed, improves classification effectiveness.

In one embodiment, can only selected part audio data, therefrom determine audio characteristic data, can basis The audio data in the N seconds before and after the playing duration M of audio data, selection intermediate period M/2, therefrom extracts audio frequency characteristics number According to, for example, playing duration be 100 seconds, then can select the intermediate period between 50-10=40 seconds to 50+10=60 seconds Video data, and therefrom extract audio characteristic data.Only partial video data, which is analyzed and processed, effectively to reduce The time is calculated, and the climax parts that intermediate period is entire audio, can preferably embody the audio under normal circumstances The emotional expression of data.

S602：Classification and Identification is carried out to the audio data according to grader and the audio characteristic data, is determined described The class categories of audio data obtain first category information.The grader can be the svm classifier generated based on SVM algorithm Device, the SVM classifier obtain after can training optimization beforehand through a large amount of audio data and the class categories specified.At this In inventive embodiments, similar degrees of data which can be to be obtained based on audio characteristic data and cluster centre data As input, the probability for belonging to some classification using audio data determines the class categories of audio data, obtains first as output Classification information.

In one embodiment, the S602 can include：Calculate the audio characteristic data and the class categories specified Similar degrees of data between corresponding cluster centre data；Calling classification device divides the similar degrees of data being calculated Class determines that the audio data belongs to the probability for the class categories specified；By probability value maximum and more than preset probability threshold value Class categories of the class categories belonging to as the audio data.

S603：Classification analysis is carried out to word included in the content of text of the text data, determines the text Class categories belonging to data obtain second category information.The content of text of text data can be pre-processed, by some The symbol and punctuation mark of None- identified are deleted, and are then carried out word segmentation processing to remaining content of text again, are obtained including more The word list of a word.Then classification identification, root are carried out to each word in word list based on preset classification dictionary again The class categories of text data are determined according to the quantity of the word included by each classification, obtain second category information.

In one embodiment, the S603 can specifically include：The content of text of the text data is segmented Processing, obtains set of words；The classification belonging to the word that the set of words includes is searched from classification dictionary；According to each The quantity of word included by a classification scores to classification, and point according to belonging to score result determines the text data Class classification obtains second category information.

The form of expression of classification dictionary can be as shown in table 1 below.

Word	Classification
		It is happy	" happy "
It is happy	" happy "
		It is anxious	" sorrow "
It is gloomy	" sorrow "
		……	……

S604：It, will if the first category information and the second category information illustrate identical class categories The identical class categories are determined as the classification of the audio data.Only in the classification represented by first category information and second When classification represented by classification information is identical, the class categories of the audio data can be just uniquely determined, the classification can be based on Classification sets emotional category label for the audio data, and by there is provided the audio data of emotional category label storages to audio In database.It can be recorded emotional category label as the attribute information of audio data in audio data.In an implementation It, can be further if classification represented by classification and second category information represented by first category information differs in example Classified using other mode classifications to the audio data, in order to set corresponding emotional category label.It is or straight It connects and the classification of the audio data is set as unknown, the value for playing emotional category label is sky.

It should be noted that in some embodiments, described identical class categories refer to first category information meaning The classification indicated by classification and second category information shown can be understood as identical classification, for example, first category information is signified The classification shown is " sorrow " classification, and the classification indicated by second category information is " sorrow " classification, still it is considered that the two table The identical class categories reached, the identical class categories can be determined that " sorrow " classification or " sorrow " classification, audio file Final classification can be determined as " sorrow " classification or " sorrow " classification.

Fig. 7 is referred to again, is a kind of flow diagram of classified inquiry method to file data of the embodiment of the present invention, After the classification of the audio data is determined by the embodiment corresponding to Fig. 6, such is represented for audio data setting Other emotional category label, and by there is provided in the audio data of label storage audio database.The embodiment of the present invention it is described Method includes the following steps.

S701：After receiving chat messages from session interface, the class categories of the chat messages are determined.The chat Message should with music based on the interactive message of instant messaging application or some user between can referring to two users With the chat messages of interaction between middle robot.In embodiments of the present invention, a kind of practical music application, the music are realized It is realized using the intelligent terminal and the server of network side that have user.Wherein, above-mentioned carry is provided in the server of network side And it is stored with there is provided the audio database of the audio data of emotional category label, various audio datas in the audio database Class categories can refer to the description of above-described embodiment, in server storage audio database, provide inquiry service to user The intelligent terminal of side, intelligent terminal can be looked into after music application client is mounted with by various feasible user interfaces It askes and receives audio data.In one embodiment, the server of the network side can also provide audio data inquiry service Device gives other application server, such as provides query function and give instant messaging application server.

The class categories of chat messages can also be determined based on the above-mentioned class categories specified.In one embodiment, One or more of chat messages can be pre-processed first, get rid of the character and punctuation mark of None- identified, then It is segmented by participle tool, the multiple words chatted, then each word is determined based on classification dictionary mentioned above Classification belonging to language determines the class categories of chat messages according to the quantity of the word included by each classification.Analysis The quantity of chat messages is more, more accurate to the sentiment analysis of chat user.

S702：Target audio data are searched from the audio database, wherein, the label institute of the target audio data The classification of expression is identical with the class categories of the chat messages；Emotional category mark based on audio database sound intermediate frequency data Label, are inquired with the class categories of the chat messages, find one or more audio data.If only there are one audios Data, then directly as target audio data.If there is multiple, then can therefrom be selected based on certain screening rule For one audio data as target audio data, screening rule for example can be randomly selected rule or according to audio number Regulation screened according to the corresponding rule of the sequencing that emotional category label is set or with user property etc..

S703：The identification information of the target audio data is shown on the session interface.On session interface only Display is used for representing the identification information of the target audio data, such as in the mark shown on interface corresponding to Fig. 4 and Fig. 5 Hold.

S704：Event is chosen to the identification information if received, searches the target audio data, and call The audio player plays target audio data.The identification information shown on session interface is configured with clicking operation response and patrols Volume, it after the clicking operation of user is detected, that is, receives and chooses event, target sound frequency is found according to the identification information According to or the identification information further comprise and be not required to the storage address of the target audio data to be shown, detecting user Clicking operation after, can target audio data directly be opened according to storage address, and the target be played in audio player Audio data.

The embodiment of the present invention, can be to audio number by above-mentioned carry out classification based training and the Classification and Identification to audio data According to accurately being classified, it can fast and accurately provide to the user and currently lead to user during user's chat etc. Cross the music that the emotion of chat expression is mutually agreed with, the popularization of convenient music.

Fig. 8 is referred to again, is the method flow schematic diagram being trained to grader of the embodiment of the present invention, and the present invention is real Applying the method for example can equally be performed by a server.Described method includes following steps.

S801：It obtains audio training data set to close, and obtains the audio training number that audio training data set conjunction includes According to audio frequency characteristics training data.A large amount of audio data can be obtained as audio training data, form audio training data Set.These audio training datas can be obtained from other audio databases or from some large-scale music What website was downloaded.These audio training datas in itself can be corresponding with the class categories specified by the embodiment of the present invention.For example, The embodiment of the present invention divides classification mainly in a manner of emotion, including the classifications such as " happy ", " sorrow ", then go to more Audio training data can be respectively then " happy ", brisk audio, " sorrow ", sad audio, in order to be able to more Optimization is trained to grader well.By these audio training datas train come grader can more preferably more accurately The classification about emotion specified to subsequent audio data.

The audio frequency characteristics training data of the acquisition is primarily referred to as the mel-frequency cepstrum coefficient feature of audio training data Data, normal Q convert any one or more in harmonic conversion discrete features data, audio rhythm characteristic.Audio frequency characteristics Training data can be the above-mentioned data acquisition system of 72 dimensions (or other dimensions) referred to.

S802：Cluster meter is carried out to the audio frequency characteristics training data of the acquisition according at least two specified class categories It calculates, obtains the other audio frequency characteristics training data set of target class at least two class categories.What cluster calculation was based on Algorithm can be K-means algorithms, be clustered based on k-means algorithms, and cluster centre data, cluster centre is calculated Data equally can be the data acquisition system of corresponding 72 dimension (or other dimensions).

S803：Preliminary classification device is carried out according to the audio frequency characteristics training data that audio frequency characteristics training data set includes Training, obtains the grader for classifying to audio data.

In one embodiment, the S803 can specifically include：Included according to audio frequency characteristics training data set Audio frequency characteristics training data obtains the other cluster centre data of the target class；Determine the phase of target audio feature training data Like degree training data, the similarity training data is used to represent target audio feature in the audio frequency characteristics training data set Similarity between training data and the cluster centre data；Call preliminary classification device to the similarity training data into Row classification determines the training classification of the corresponding audio training data of the target audio feature training data；According to the training class It is other that the preliminary classification device is updated, to obtain the grader for classifying to audio data.In one embodiment In, the similarity training data is by the Euclidean distance between target audio feature training data and the cluster centre data Data are formed.In one embodiment, can be an average value per one dimensional numerical in cluster centre data.For example, cluster arrives Audio frequency characteristics training data under " happy " classification is 1000, then the value of the first dimension value in cluster centre data is is somebody's turn to do The average value of first dimension value of 1000 audio frequency characteristics training datas, and so on, obtain corresponding N-dimensional cluster centre number According to.

In one embodiment, by least two audio frequency characteristics training datas in the audio frequency characteristics training data set Respectively as target audio feature training data, the audio training data corresponding at least two audio frequency characteristics training datas is obtained Training classification；It is described that the preliminary classification device is updated according to the training classification, including：According to obtained training classification Determine the recognition success rate of preliminary classification device；If recognition success rate be less than preset threshold value, to the preliminary classification device into Row update；Wherein, the recognition success rate is according to training classification and the audio for corresponding target audio feature training data The mark classification that training data is marked come it is determining, if training classification with mark classification it is identical, success is identified, if not It is identical, then recognition failures.Mark classification can be marked manually, be labeled as audio training data respectively by artificial mode The class categories that kind is specified, facilitate subsequent statistical success rate.

The embodiment of the present invention does not use when being trained for grader and is directly based upon what audio frequency characteristics were learnt Mode optimizes grader training, but first passes through k-means algorithms and carry out cluster analysis, obtains the poly- of each classification Class center is then based on cluster centre and audio characteristic data is carried out being converted to input parameter, then based on the input parameter again Optimization is trained to grader, more accurate grader can be obtained.

Fig. 9 is referred to again, is a kind of structure diagram of the sorter of file data of the embodiment of the present invention, the present invention The described device of embodiment, which can be set, in the server, is more such as capable of providing audio data classification analysis and the clothes of inquiry It is engaged in device, the file data includes audio data, such as can be some MP3 data, MV data etc., described device packet Include following module.

Acquisition module 901 for obtaining the associated text data of the audio data, and obtains the sound of the audio data Frequency characteristic；

Tagsort module 902, for being divided according to grader and the audio characteristic data the audio data Class identifies, determines the class categories of the audio data, obtains first category information；

Text classification module 903 carries out classification point for word included in the content of text to the text data Analysis, determines the class categories belonging to the text data, obtains second category information；

Determining module 904, if illustrating identical point for the first category information and the second category information The identical class categories are then determined as the classification of the audio data by class classification.

In one embodiment, described device can also include：

Training module 905 closes for obtaining audio training data set, and obtains what audio training data set conjunction included The audio frequency characteristics training data of audio training data；According at least two specified class categories to the audio frequency characteristics of the acquisition Training data carries out cluster calculation, obtains the other audio frequency characteristics training dataset of target class at least two class categories It closes；Preliminary classification device is trained according to the audio frequency characteristics training data that audio frequency characteristics training data set includes, is obtained For the grader classified to audio data.

In one embodiment, the training module 905, for being included according to audio frequency characteristics training data set When audio frequency characteristics training data is trained preliminary classification device, for the sound included according to audio frequency characteristics training data set Frequency feature training data obtains the other cluster centre data of the target class；Determine the similar of target audio feature training data Training data is spent, the similarity training data is used to represent that target audio feature to be instructed in the audio frequency characteristics training data set Practice the similarity between data and the cluster centre data；Preliminary classification device is called to carry out the similarity training data Classification determines the training classification of the corresponding audio training data of the target audio feature training data；According to the training classification The preliminary classification device is updated, to obtain the grader for classifying to audio data.

In one embodiment, at least two audio frequency characteristics in the audio frequency characteristics training data set can be trained Data obtain the audio training corresponding at least two audio frequency characteristics training datas respectively as target audio feature training data The training classification of data；The training module 905, for being updated according to the training classification to the preliminary classification device When, for determining the recognition success rate of preliminary classification device according to obtained training classification；If recognition success rate is less than preset Threshold value is then updated the preliminary classification device；Wherein, the recognition success rate be according to training classification with for corresponding mesh The mark classification that is marked of audio training data of mark audio frequency characteristics training data come it is determining, if training classification is with marking class It is not identical, then success is identified, if it is not the same, then recognition failures.

In one embodiment, the similarity training data is by target audio feature training data and the cluster Euclidean distance data of the calculation between are formed.

In one embodiment, the tagsort module 902, for point for calculating the audio characteristic data and specifying Similar degrees of data between the corresponding cluster centre data of class classification；Calling classification device is to the similar degrees of data being calculated Classify, determine that the audio data belongs to the probability for the class categories specified；Probability value is maximum and general more than preset Class categories of the class categories of rate threshold value belonging to as the audio data.

In one embodiment, the text classification module 903, divides for the content of text to the text data Word processing, obtains set of words；The classification belonging to the word that the set of words includes is searched from classification dictionary；According to every The quantity of word included by one classification scores to classification, and according to belonging to score result determines the text data Class categories obtain second category information.

In one embodiment, the classification of file data is included according to the specified classification for being used to represent emotion to audio Data are classified, and the audio characteristic data of the audio data of the acquisition includes：Selected mel-frequency cepstrum system Number characteristic, normal Q convert any one or more in harmonic conversion discrete features data, audio rhythm characteristic.

In one embodiment, can be that audio data setting represents after the classification that the audio data is determined The label of the category, and by there is provided in the audio data of label storage audio database, described device can also include：Interaction Module 906 after receiving chat messages from session interface, determines the class categories of the chat messages；From the sound Target audio data are searched in frequency database, wherein, classification and the chat represented by the label of the target audio data The class categories of message are identical；The identification information of the target audio data is shown on the session interface.

In one embodiment, the interactive module 906 chooses thing if being additionally operable to receive to the identification information Part then searches the target audio data, and calls the audio player plays target audio data.

The embodiment of the present invention can simultaneously divide the text datas such as the characteristic of audio data and the associated lyrics Class identifies, can effectively ensure the correctness of audio data classification so that under application scenes, such as music recommendation etc. Application scenarios accurately can recommend music to user.Also, the side of special feature extraction and classifying device training used Formula can obtain more accurate grader.It is discovered by experiment that using this programme to more than 100,000 songs carried out about The prediction of the class categories of emotion, wherein pursue a goal with determination, happily, more than 80% rate of accuracy reached of the class categories such as sweetness to, other The accuracy rate of emotional category label greatly improves the classification accuracy of emotion class music 75% or so.

Figure 10 is referred to again, is a kind of structure diagram of server of the embodiment of the present invention, the clothes of the embodiment of the present invention Business device can refer to that some can provide to relevant treatment that audio data is classified and/or as needed audio number According to classification storage with inquiry etc. functions server.The server includes various required shell structures, and including power supply Power supply, communication interface etc..The server further includes processor 1001 and storage device 1002, input interface 1003, output Interface 1004.

The input interface 1003 can be that some are supplied to user for inputting audio data to be sorted or being used for The user interface of the data such as the audio training data of optimization is trained to grader.The output interface 1004 can be The audio data found is sent to user, the output interface by network interface, the audio data demand that can respond user 1004 can also be memory interface, can be by there is provided the audio data of corresponding emotional category label storages to some other clothes It is engaged in device.

The storage device 1002 can include volatile memory (volatile memory), such as arbitrary access is deposited Reservoir (random-access memory, RAM)；Storage device 1002 can also include nonvolatile memory (non- Volatile memory), for example, flash memory (flash memory), solid state disk (solid-state drive, SSD) Deng；Storage device 1002 can also include the combination of the memory of mentioned kind.

The processor 1001 can be central processing unit 1001 (central processing unit, CPU).It is described Processor 1001 can further include hardware chip.In one embodiment, above-mentioned hardware chip can be special integrated Circuit (application-specific integrated circuit, ASIC), programmable logic device (programmable logic device, PLD) etc..Above-mentioned PLD can be field programmable gate array (field- Programmable gate array, FPGA), Universal Array Logic (generic arraylogic, GAL) etc..

In one embodiment, the storage device 1002 has program stored therein instruction, and the processor 1001 calls described The program instruction stored in storage device 1002, for performing the correlation technique referred in above-mentioned each embodiment and step.

In one embodiment, the processor 1001 calls the program instruction stored in the storage device 1002, uses In the acquisition associated text data of audio data, and obtain the audio characteristic data of the audio data；According to grader Classification and Identification is carried out to the audio data with the audio characteristic data, the class categories of the audio data is determined, obtains First category information；Classification analysis is carried out to word included in the content of text of the text data, determines the text Class categories belonging to data obtain second category information；If the first category information and the second category information table Identical class categories are shown, then the identical class categories are determined as to the classification of the audio data.

In one embodiment, the processor 1001 is additionally operable to obtain audio training data set conjunction, and obtain the audio The audio frequency characteristics training data for the audio training data that training data set includes；According at least two specified class categories Cluster calculation is carried out to the audio frequency characteristics training data of the acquisition, it is other to obtain target class at least two class categories Audio frequency characteristics training data set；The audio frequency characteristics training data included according to audio frequency characteristics training data set is to initially dividing Class device is trained, and obtains the grader for classifying to audio data.

In one embodiment, the processor 1001, for being included according to audio frequency characteristics training data set When audio frequency characteristics training data is trained preliminary classification device, for the sound included according to audio frequency characteristics training data set Frequency feature training data obtains the other cluster centre data of the target class；Determine the similar of target audio feature training data Training data is spent, the similarity training data is used to represent that target audio feature to be instructed in the audio frequency characteristics training data set Practice the similarity between data and the cluster centre data；Preliminary classification device is called to carry out the similarity training data Classification determines the training classification of the corresponding audio training data of the target audio feature training data；According to the training classification The preliminary classification device is updated, to obtain the grader for classifying to audio data.

In one embodiment, at least two audio frequency characteristics in the audio frequency characteristics training data set can be trained Data obtain the audio training corresponding at least two audio frequency characteristics training datas respectively as target audio feature training data The training classification of data；The processor 1001, when for being updated according to the training classification to the preliminary classification device, For determining the recognition success rate of preliminary classification device according to obtained training classification；If recognition success rate is less than preset threshold Value, then be updated the preliminary classification device；Wherein, the recognition success rate be according to training classification with for corresponding target The mark classification that the audio training data of audio frequency characteristics training data is marked come it is determining, if training classification with mark classification It is identical, then success is identified, if it is not the same, then recognition failures.

In one embodiment, the processor 1001, for according to grader and the audio characteristic data to institute It states audio data and carries out Classification and Identification, when determining the class categories of the audio data, for calculating the audio characteristic data Similar degrees of data between cluster centre data corresponding with specified class categories；Calling classification device is calculated to described Similar degrees of data is classified, and determines that the audio data belongs to the probability for the class categories specified；Probability value is maximum and big In preset probability threshold value class categories as the audio data belonging to class categories.

In one embodiment, the processor 1001, it is included in for the content of text to the text data Word carry out classification analysis, the class categories belonging to the text data are determined, when obtaining second category information, for institute The content of text for stating text data carries out word segmentation processing, obtains set of words；It is searched from classification dictionary in the set of words Including word belonging to classification；The quantity of word according to included by each classification scores to classification, and according to meter Point result determines the class categories belonging to the text data, obtains second category information.

In one embodiment, can be that audio data setting represents after the classification that the audio data is determined The label of the category, and by there is provided the audio data of label store audio database in, the processor 1001, be additionally operable to from After receiving chat messages on session interface, the class categories of the chat messages are determined；It is searched from the audio database Target audio data, wherein, classification and the class categories of the chat messages represented by the label of the target audio data It is identical；The identification information of the target audio data is shown on the session interface.

In one embodiment, the processor 1001 chooses thing if being additionally operable to receive to the identification information Part then searches the target audio data, and calls the audio player plays target audio data.

One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer read/write memory medium In, the program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..

Above disclosed is only the section Example of the present invention, cannot limit the right of the present invention with this certainly Range, one of ordinary skill in the art will appreciate that realizing all or part of flow of above-described embodiment, and according to right of the present invention It is required that the equivalent variations made, still fall within and invent covered range.

Claims

1. a kind of sorting technique of file data, which is characterized in that the file data includes audio data, the method packet It includes：

Classification and Identification is carried out to the audio data according to grader and the audio characteristic data, determines the audio data Class categories obtain first category information；

Classification analysis is carried out to word included in the content of text of the text data, is determined belonging to the text data Class categories obtain second category information；

If the first category information and the second category information illustrate identical class categories, by identical point Class classification is determined as the classification of the audio data.

2. the method as described in claim 1, which is characterized in that further include：

It obtains audio training data set to close, and it is special to obtain the audio of audio training data that audio training data set conjunction includes Levy training data；

Cluster calculation is carried out to the audio frequency characteristics training data of the acquisition according at least two specified class categories, obtains institute State the other audio frequency characteristics training data set of target class at least two class categories；

Preliminary classification device is trained according to the audio frequency characteristics training data that audio frequency characteristics training data set includes, is obtained For the grader classified to audio data.

3. method as claimed in claim 2, which is characterized in that the sound included according to audio frequency characteristics training data set Frequency feature training data is trained preliminary classification device, including：

According to the audio frequency characteristics training data that audio frequency characteristics training data set includes, obtain in the other cluster of the target class Calculation evidence；

Determine the similarity training data of target audio feature training data, the similarity training data is used to represent the sound Similarity in frequency feature training data set between target audio feature training data and the cluster centre data；

Preliminary classification device is called to classify the similarity training data, determines the target audio feature training data pair The training classification for the audio training data answered；

The preliminary classification device is updated according to the training classification, to obtain the classification for classifying to audio data Device.

4. method as claimed in claim 3, which is characterized in that by least two in the audio frequency characteristics training data set It is right to obtain at least two audio frequency characteristics training data institutes respectively as target audio feature training data for audio frequency characteristics training data The training classification for the audio training data answered；

It is described that the preliminary classification device is updated according to the training classification, including：

Training classification according to obtaining determines the recognition success rate of preliminary classification device；

If recognition success rate is less than preset threshold value, the preliminary classification device is updated；

Wherein, the recognition success rate is according to training classification and the audio training for corresponding target audio feature training data The mark classification that data are marked come it is determining, if training classification with mark classification it is identical, success is identified, if not phase Together, then recognition failures.

5. method as claimed in claim 3, which is characterized in that the similarity training data is trained by target audio feature Euclidean distance data between data and the cluster centre data are formed.

6. the method as described in claim 1, which is characterized in that it is described according to grader and the audio characteristic data to described Audio data carries out Classification and Identification, determines the class categories of the audio data, obtains first category information, including：

Similar degrees of data between the cluster centre data corresponding with the class categories specified that calculate the audio characteristic data；

Calling classification device classifies to the similar degrees of data being calculated, and determines that the audio data belongs to point specified The probability of class classification；

Class categories of the class categories belonging to as audio data probability value is maximum and more than preset probability threshold value.

7. the method as described in claim 1, which is characterized in that included by the content of text to the text data Word carries out classification analysis, determines the class categories belonging to the text data, obtains second category information, including：

Word segmentation processing is carried out to the content of text of the text data, obtains set of words；

The classification belonging to the word that the set of words includes is searched from classification dictionary；

The quantity of word according to included by each classification scores to classification, and determines the text according to score result Class categories belonging to data obtain second category information.

8. the method as described in claim 1, which is characterized in that include being used to represent according to specified to the classification of file data The classification of emotion classifies to audio data, and the audio characteristic data of the audio data of the acquisition includes：It is selected Mel-frequency cepstrum coefficient characteristic, normal Q transformation harmonic conversion discrete features data, appointing in audio rhythm characteristic It anticipates one or more.

9. such as claim 1-8 any one of them methods, which is characterized in that after the classification that the audio data is determined, Can be the label that audio data setting represents the category, and by there is provided the audio data of label storage audio databases In, it is described to further include：

After receiving chat messages from session interface, the class categories of the chat messages are determined；

Target audio data are searched from the audio database, wherein, the class represented by the label of the target audio data It is not identical with the class categories of the chat messages；

The identification information of the target audio data is shown on the session interface.

10. method as claimed in claim 9, which is characterized in that further include：

Event is chosen to the identification information if received, searches the target audio data, and audio is called to play Device plays the target audio data.

11. a kind of sorter of file data, which is characterized in that the file data includes audio data, described device Including：

Acquisition module for obtaining the associated text data of the audio data, and obtains the audio frequency characteristics of the audio data Data；

Tagsort module, for carrying out Classification and Identification to the audio data according to grader and the audio characteristic data, It determines the class categories of the audio data, obtains first category information；

Text classification module carries out classification analysis for word included in the content of text to the text data, determines Class categories belonging to the text data obtain second category information；

Determining module, if illustrating identical class categories for the first category information and the second category information, The identical class categories are then determined as to the classification of the audio data.

12. a kind of server, which is characterized in that including：Processor and storage device；

The storage device has program stored therein instruction, and the processor calls the program instruction stored in the storage device, uses In the sorting technique for performing such as claim 1-10 any one of them file datas.

13. a kind of computer storage media, which is characterized in that have program stored therein instruction in the computer storage media, described Program instruction is performed, and is used to implement such as the sorting technique of claim 1-10 any one of them file datas.