CN112766368A

CN112766368A - Data classification method, equipment and readable storage medium

Info

Publication number: CN112766368A
Application number: CN202110065077.7A
Authority: CN
Inventors: 张聪; 陈聪; 张超; 严自强
Original assignee: China Mobile Communications Group Co Ltd; MIGU Music Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Music Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2021-01-18
Filing date: 2021-01-18
Publication date: 2021-05-07

Abstract

The invention provides a data classification method, data classification equipment and a readable storage medium, and relates to the field of data processing. The method comprises the following steps: acquiring data to be classified; extracting linear characteristics of data to be classified; inputting the linear characteristics into a fully-directional self-attention neural network to obtain a characteristic matrix of the data to be classified; normalizing the characteristic matrix to obtain the prediction probability of the data to be classified belonging to each class label; the data to be classified comprises n pieces of sub data, n is an integer and is greater than or equal to 2; the fully-oriented self-attention neural network obtains a feature matrix of the data to be classified by utilizing the context features among the subdata, the correlation features among the subdata and the importance degree of each subdata on predicting the class label of the data to be classified. The invention can solve the problem of low data classification prediction accuracy caused by poor universality of a data classification model in the prior art.

Description

Data classification method, equipment and readable storage medium

Technical Field

The embodiment of the invention relates to the field of data processing, in particular to a data classification method, data classification equipment and a readable storage medium.

Background

With the successful application and continuous development of deep learning models in other fields, more and more researches are beginning to utilize the spectrogram of music as the input of the deep learning model to classify the genre of music. The common deep learning algorithm network has problems when in use: for example, RNN (Recurrent Neural Network) can only memorize a partial sequence, and there is a risk of gradient disappearance or gradient explosion; the pooling operation of CNN (Convolutional Neural Networks) may lose a large amount of information features; LSTM (Long Short-Term Memory network) and GRU (Gated Current Unit) cannot identify the Long-distance interdependence characteristics; the Self-Attention mechanism (Self-Attention) cannot capture the context sequence information of the global structure.

Therefore, the network model in the prior art is poor in universality.

Disclosure of Invention

The embodiment of the invention provides a data classification method, data classification equipment and a readable storage medium, and solves the problem of low data classification prediction accuracy caused by poor universality of a data classification model in the prior art.

In order to solve the technical problem, the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a data classification method, including the following steps:

acquiring data to be classified;

extracting linear features of the data to be classified;

inputting the linear features into a fully-directional self-attention neural network to obtain a feature matrix of the data to be classified;

normalizing the characteristic matrix to obtain the prediction probability of the data to be classified belonging to each class label;

the data to be classified comprises n pieces of sub data, n is an integer and is greater than or equal to 2;

and the fully-pointed self-attention neural network obtains a feature matrix of the data to be classified by utilizing the context features among the subdata, the correlation features among the subdata and the importance degree of each subdata on predicting the class label of the data to be classified.

Optionally, after the inputting the linear feature into the fully-directional self-attention neural network, the method further includes:

and acquiring a feature space of the data to be classified, wherein the feature space is used for calculating the similarity between the data to be classified and other data.

Optionally, the inputting the linear feature into a fully-oriented self-attention neural network to obtain a feature matrix of the data to be classified includes:

converting the linear features into nonlinear features by using a Convolutional Neural Network (CNN), wherein the Convolutional Neural Network (CNN) is CNN which is not pooled;

extracting context characteristics among the subdata from the nonlinear characteristics by using a Bi-directional gating circulation unit Bi-GRU;

obtaining correlation characteristics among the subdata according to context characteristics among the subdata by using a Multi-Head Attention mechanism Multi-Head Self-Attention mechanism;

using an Attention mechanism, and obtaining the importance degree of each subdata on predicting the class label of the data to be classified according to the correlation characteristics;

and obtaining a feature matrix of the data to be classified according to the context features among the subdata, the correlation features among the subdata and the importance degree of each subdata on predicting the class label of the data to be classified.

The method for extracting the context features among the subdata from the nonlinear features by utilizing the Bi-directional gating circulation unit Bi-GRU is characterized in that a Multi-Head Self-Attention mechanism is used for the Bi-directional gating circulation unit Bi-GRU, and the method is selectable by using an Attention mechanism, and comprises the following steps:

extracting context features between the sub-data by:

wherein:

o is the nonlinear characteristic representation of the data to be classified, O ═ O₁，o₂，…o_n}；

o_tVector representing the nonlinear characteristics of the tth sub-data input, t ∈ [1, n]T is a positive integer;

an output representing the hidden state of the (t-1) th sub-data forward of the Bi-GRU;

represents the (t-1) th sub-data of the Bi-GRU reverse directionThe hidden layer state of (1);

an output representing the forward hidden state of the Bi-GRU for the tth sub-data;

an output representing the reverse hidden layer state of the Bi-GRU of the tth sub-data;

when t is equal to 1, the first step is carried out,

is obtained by pre-definition;

w_trepresenting the forward hidden layer state corresponding to the Bi-GRU of the tth sub-data

A corresponding weight;

v_trepresenting the reverse hidden layer state corresponding to the Bi-GRU of the tth sub-data

A corresponding weight;

b_trepresenting the bias corresponding to the t-th sub-data hidden layer state;

S_tfeature codes representing the tth sub-data, t ∈ [1, n [ ]]T is a positive integer;

s is the characteristic code of the data to be classified, and S is { S ═ S₁，s₂，…s_n}; the feature codes of the data to be classified comprise context features among the subdata.

Optionally, the obtaining, by using a Multi-Head Attention mechanism, a correlation characteristic between the sub-data according to a context characteristic between the sub-data includes:

s1: and respectively carrying out linear transformation through three linear transformation modes according to the context characteristics among the subdata to obtain a query vector Q, a key vector K and a value vector V:

Q＝W_QS

K＝W_KS

V＝W_VS

wherein:

s is the characteristic code of the data to be classified, and S is { S ═ S₁，s₂，…s_n}; the feature codes of the data to be classified comprise context features among the subdata;

W_Qis a first linear transformation; q is a query vector obtained after the feature code S of the data to be classified is subjected to first linear transformation;

W_Kis a second linear transformation; k is a key vector obtained after the feature code S of the data to be classified is subjected to second linear transformation;

W_Vis a third linear transformation; v is a value vector obtained after the feature code S of the data to be classified is subjected to third linear transformation;

s2: performing linear transformation on the query vector Q, the key vector K and the value vector V for l times to obtain a query vector Q subjected to linear transformation for l times_lKey vector K_lSum vector V_l；

Wherein:

Q_l＝QW_Ql

K_l＝KW_Kl

V_l＝VW_Vl

W_Ql、W_Kl、W_Vlall are parameters defined by initialization;

l is the number of linear transformation, l belongs to [1, m ], and l is a positive integer; m is the total segmentation amount of the multi-head attention mechanism;

Q_lthe query vector is subjected to linear transformation for l times;

K_lis the key vector after l times of linear transformation;

V_lis a value vector after l times of linear transformation;

s3: according to the query vector Q subjected to the linear transformation for the times I_lKey vector K_lSum vector V_lComputing an attention subspace matrix Z subjected to linear transformation for the order of l_l；

Wherein the content of the first and second substances,

represents a key vector K_lDimension (d);

s4: m said i times linearly transformed attention subspace matrix Z_lSplicing the data and the weight matrix W_OAnd multiplying to obtain a matrix Z of the data to be classified, wherein the matrix Z of the data to be classified comprises correlation characteristics among subdata.

Optionally, the obtaining a feature matrix of the to-be-classified data according to the context features among the sub-data, the correlation features among the sub-data, and the importance degree of each sub-data to predicting the class label probability of the to-be-classified data includes:

and carrying out nonlinear combination on the context characteristics among the subdata, the correlation characteristics among the subdata and the importance degree of each subdata on predicting the class label probability of the data to be classified to obtain a characteristic matrix of the data to be classified.

Optionally, the normalizing the feature matrix to obtain the prediction probability that the data to be classified belongs to each class of label includes:

and carrying out normalization processing on the characteristic matrix by using a softmax classifier to obtain the prediction probability of the data to be classified belonging to each class of labels.

Optionally, the method further includes:

training a data classification model comprising an input layer, the fully-oriented self-attentive neural network, and an output layer.

Optionally, the training data classification model includes:

acquiring training data and a class label of the training data;

extracting linear features of the training data;

obtaining a feature space and a feature matrix of the training data by using a fully-oriented self-attention neural network; outputting the correlation among the training data according to the feature space to obtain similar data of each training data;

and carrying out normalization processing on the characteristic matrix to obtain the prediction probability of the training data belonging to each class of labels.

Optionally, each training data includes n pieces of sub-data, n is an integer, and n is greater than or equal to 2;

the obtaining the feature space and the feature matrix of the training data by using the fully-directional self-attention neural network comprises:

converting the linear features of the training data into nonlinear features by using CNN, wherein the CNN is CNN which is not pooled;

extracting context characteristics among subdata of the training data from nonlinear characteristics of the training data by using a Bi-directional gating circulation unit Bi-GRU;

obtaining correlation characteristics among subdata of the training data according to context characteristics among the subdata of the training data by using a Multi-Head Attention mechanism Multi-Head Self-Attention mechanism;

and obtaining the importance degree of each subdata of the training data to predict the class label of the training data according to the correlation characteristic by using an Attention mechanism.

In a second aspect, an embodiment of the present invention further provides a data classification device, including: a memory, a processor, and a program stored on the memory and executable on the processor; the processor is configured to read a program in the memory to implement the steps in the data classification method according to any one of the first aspect.

In a third aspect, an embodiment of the present invention further provides a readable storage medium, which is used for storing a program, and when the program is executed by a processor, the program implements the steps in the data classification method according to any one of the first aspect.

In the embodiment of the invention, the extracted linear characteristics of the data to be classified are input into a fully-directional self-attention neural network to obtain a characteristic matrix of the data to be classified; normalizing the characteristic matrix to obtain the prediction probability of the data to be classified belonging to each class label; the data to be classified comprises n pieces of sub data, and the feature matrix is obtained by utilizing the context features among the sub data, the correlation features among the sub data and the importance degree of each sub data to the prediction of the class label of the data to be classified. By the scheme of the embodiment of the invention, complete data characteristic information can be extracted, the extraction effect is good, and the obtained data classification prediction accuracy is high due to the strong universality of the obtained model.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flow chart of a data classification method provided by an embodiment of the invention;

FIG. 2 is a second flowchart of a data classification method according to an embodiment of the present invention;

FIG. 3 is a third flowchart of a data classification method according to an embodiment of the present invention;

FIG. 4 is a fourth flowchart of a data classification method according to an embodiment of the present invention;

FIG. 5 is a fifth flowchart of a data classification method according to an embodiment of the present invention;

FIG. 6 is a block diagram of a data sorting apparatus according to an embodiment of the present invention;

FIG. 7 is a second block diagram of a data sorting apparatus according to an embodiment of the present invention;

fig. 8 is one of the structural diagrams of the data sorting apparatus according to the embodiment of the present invention.

Detailed Description

The term "and/or" in the embodiments of the present invention describes an association relationship of associated objects, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

In the embodiments of the present application, the term "plurality" means two or more, and other terms are similar thereto.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a flowchart of a data classification method according to an embodiment of the present invention, including the following steps:

step 11: acquiring data to be classified;

step 12: extracting linear features of the data to be classified;

step 13: inputting the linear features into a fully-directional self-attention neural network to obtain a feature matrix of the data to be classified;

step 14: normalizing the characteristic matrix to obtain the prediction probability of the data to be classified belonging to each class label;

In some embodiments of the present invention, the data to be classified may optionally be an audio signal, text, or other data type to be classified.

When the data to be classified is an audio signal, the audio signal comprises n audio frames after framing processing; the extracted linear characteristic of the audio signal may be at least one of a fbank characteristic, a first order difference of the fbank characteristic, and a second order difference of the fbank characteristic of the audio signal.

When the data to be classified is text data, the text comprises n sub-texts after word segmentation processing; the extracted linear feature of the sub-text may be a semantic feature of the sub-text.

In some embodiments of the present invention, optionally, after the inputting the linear feature into the fully-directional self-attention neural network, the method further includes:

In the embodiment of the invention, the feature space of the data to be classified can be obtained according to the fully-oriented self-attention neural network, and the feature space is used for calculating the similarity between the data to be classified and other data.

In some embodiments of the present invention, optionally, when the data to be classified is a song audio signal, the similarity of the song audio signals obtained according to the feature space can solve the problem of cold start of songs in a recommended scene.

Referring to fig. 2, fig. 2 is a second flowchart of a data classification method according to an embodiment of the invention. In some embodiments of the present invention, optionally, the inputting the linear features into a fully-oriented self-attention neural network to obtain a feature matrix of the data to be classified includes the following steps:

step 131: converting the linear features into nonlinear features by using a Convolutional Neural Network (CNN), wherein the Convolutional Neural Network (CNN) is CNN which is not pooled;

step 132: extracting context characteristics among the subdata from the nonlinear characteristics by using a Bi-directional gating circulation unit Bi-GRU;

step 133: obtaining correlation characteristics among the subdata according to context characteristics among the subdata by using a Multi-Head Attention mechanism Multi-Head Self-Attention mechanism;

step 134: using an Attention mechanism, and obtaining the importance degree of each subdata on predicting the class label of the data to be classified according to the correlation characteristics;

step 135: and obtaining a feature matrix of the data to be classified according to the context features among the subdata, the correlation features among the subdata and the importance degree of each subdata on predicting the class label of the data to be classified.

In the embodiment of the invention, the fully-oriented Self-Attention neural network comprises a CNN layer for converting linear characteristics into nonlinear characteristics, a Bi-GRU layer for extracting context characteristics among subdata according to the nonlinear characteristics, a Multi-Head Self-orientation mechanism for obtaining correlation characteristics among the subdata according to the context characteristics among the subdata, and an orientation mechanism for obtaining the importance degree of each subdata on predicting the class label of the data to be classified according to the correlation characteristics; the CNN convolutional neural network strong convolutional kernel feature extractor is used for carrying out preliminary deep feature extraction on data to be classified, meanwhile, feature information of the data to be classified can be completely extracted according to state correlation of front and back moments among all subdata of the data to be classified and a feature matrix of the data to be classified, wherein the feature matrix can be obtained by representing context relation features of all subdata, and classification accuracy of the data to be classified is improved.

In some embodiments of the present invention, optionally, the extracting, by using a Bi-directional gating loop unit Bi-GRU, the context feature between the sub-data from the nonlinear feature includes:

extracting context features between the sub-data by:

wherein:

o is the nonlinear characteristic representation of the data to be classified, and O is { O ═ O₁，o₂，…o_n}；

an output representing the Bi-GRU inverted hidden layer state of the (t-1) th sub-data;

when t is equal to 1, the first step is carried out,

is obtained by pre-definition;

A corresponding weight;

A corresponding weight;

b_trepresenting the bias corresponding to the t-th sub-data hidden layer state;

In the embodiment of the invention, the current time output of the data to be classified is related to the state of the previous time and the state of the next time, and the Bi-GRU is used for extracting the coding features which can represent the context relationship among the subdata from the nonlinear feature representation of the data to be classified output from the upper layer, so that the feature description among the subdata of the data to be classified is more complete.

In some embodiments of the invention, a Bi-GRU is a neural network composed of GRUs that are jointly determined by the states of unidirectional, inverted GRUsAnd (5) structure. At each time, the input provides two GPUs in opposite directions, and the output is determined by two unidirectional GRUs. The current hidden state of the Bi-GRU is represented by the current input vector o_tHidden layer state output with forward (t-1) time

And output of the hidden state in reverse

The three parts jointly determine that the hidden layer state of the Bi-GRU at the time t passes through

And

and obtaining the weight.

In some embodiments of the present invention, optionally, the obtaining, according to a context feature between the sub-data, a correlation feature between the sub-data by using a Multi-Head Attention mechanism includes:

Q＝W_QS

K＝W_KS

V＝W_VS

wherein:

W_Kis a second linear transformation; k is obtained after the characteristic code S of the data to be classified is subjected to second linear transformationThe key vector of (2);

Wherein:

Q_l＝QW_Ql

K_l＝KW_Kl

V_l＝VW_Vl

W_Ql、W_Kl、W_Vlall are parameters defined by initialization;

Q_lthe query vector is subjected to linear transformation for l times;

K_lis the key vector after l times of linear transformation;

V_lis a value vector after l times of linear transformation;

Wherein the content of the first and second substances,

represents a key vector K_lDimension (d);

s4: m said i times linearly transformed attention subspace matrix Z_lSplicing the data and the weight matrix W_OMultiplying to obtain a matrix Z of the data to be classifiedThe matrix Z of data contains correlation features between sub-data.

In the embodiment of the invention, the correlation characteristics and the long-time-sequence-distance interdependence characteristics among the subdata of the data to be classified are captured through a Multi-Head Self-orientation mechanism, so that the defect of weak long-distance interdependence relation for capturing in a GRU is effectively overcome.

In some embodiments of the present invention, optionally, the obtaining, by using an Attention mechanism, an importance degree of each subdata to predicting the class label probability of the data to be classified according to the correlation feature includes:

extracting a feature space M of the data to be classified by the following method:

u_i＝tanh(w_iz_i+b_i)

M＝{m₁，m₂，…m_n}

wherein:

Z_ian attention subspace matrix for the ith sub-data;

w_iweight coefficient representing ith sub-data, b_iA bias coefficient representing the ith sub-data;

u_ia first weight coefficient representing the ith sub-data;

u_wan attention matrix representing a random initialization;

α_ia second weight coefficient representing the ith sub-data;

m_irepresenting a feature subspace of the ith sub-data, wherein the feature subspace of the ith sub-data comprises the importance degree of the ith sub-data to the prediction of the data to be classified;

m represents the feature space of the data to be classified, and the feature space of the data to be classified comprises the importance degree of each subdata on predicting the class label probability of the data to be classified.

In the embodiment of the invention, an Attention mechanism is utilized to help endow different weights to each input part, more key and important information is extracted, and the importance degree of each subdata on predicting the class label probability of the data to be classified is obtained, so that the data classification prediction is more accurate, and meanwhile, larger expenses can not be brought to the calculation and storage of the data classification prediction.

In some embodiments of the present invention, optionally, the obtaining the feature matrix of the to-be-classified data according to the context features among the sub data, the correlation features among the sub data, and the importance degree of each sub data to the prediction of the class label probability of the to-be-classified data includes:

In the embodiment of the invention, the extracted context characteristics among the subdata, the correlation characteristics among the subdata and the importance degree of each subdata on predicting the class label probability of the data to be classified are subjected to nonlinear combination to obtain the characteristic matrix of the data to be classified, and the characteristic matrix is fused with the multidimensional characteristic information of the data to be classified, so that the classification prediction effect is more accurate when the data is classified.

In some embodiments of the present invention, optionally, the method further includes inputting an output result of the Attention mechanism to a full connection layer, where the full connection layer is equivalent to a hidden layer in a conventional feedforward neural network, so as to implement nonlinear combination on the extracted high-order features of the data to be classified, and output a feature matrix.

In some embodiments of the present invention, optionally, the normalizing the feature matrix to obtain the prediction probability of each class label to which the data to be classified belongs includes:

In the embodiment of the invention, the prediction probability of the data to be classified belonging to each class label is calculated by inputting the characteristic matrix of the data to be classified into a softmax classifier; the implementation mode is simple, the classification effect is good, and the obtained data classification model is high in universality.

In some embodiments of the present invention, optionally, the method further comprises:

In the embodiment of the invention, the data classification model comprises an input layer, the fully-oriented self-attention neural network and an output layer; the input layer is used for extracting linear characteristics of the training data; the fully-directional self-attention neural network is used for extracting a feature space and a feature matrix of the training data; and the output layer is used for obtaining the prediction probability of each class label of the training data.

Referring to fig. 3, fig. 3 is a third flowchart of a data classification method according to an embodiment of the invention. In some embodiments of the present invention, optionally, the training data classification model includes the following steps:

step 21: acquiring training data and a class label of the training data;

step 22: extracting linear features of the training data;

step 23: obtaining a feature space and a feature matrix of the training data by using a fully-oriented self-attention neural network;

step 24: outputting the correlation among the training data according to the feature space to obtain similar data of each training data;

step 25: and carrying out normalization processing on the characteristic matrix to obtain the prediction probability of the training data belonging to each class of labels.

In the embodiment of the invention, the training data are marked with corresponding class labels, and each training data comprises a plurality of subdata; inputting the extracted linear characteristics of the training data into a fully-directional self-attention neural network to obtain a characteristic space and a characteristic matrix of the training data; obtaining similar data of each training data according to the feature space; normalizing the characteristic matrix to obtain the prediction probability of each training data belonging to each class label; the feature matrix is obtained by utilizing the context features among the subdata, the correlation features among the subdata and the importance degree of each subdata on predicting the class label of the training data through a full-pointing type self-attention neural network; by the scheme of the embodiment of the invention, complete data characteristic information can be extracted, the extraction effect is good, and the obtained data classification prediction accuracy is high due to the strong universality of the obtained model.

Referring to fig. 4, fig. 4 is a fourth flowchart of a data classification method according to an embodiment of the present invention. In some embodiments of the present invention, optionally, each training data includes n pieces of sub data, n is an integer, and n is greater than or equal to 2;

the method for obtaining the feature space and the feature matrix of the training data by utilizing the fully-directional self-attention neural network comprises the following steps:

step 231: converting the linear characteristic of the training data into a nonlinear characteristic by using a Convolutional Neural Network (CNN), wherein the CNN is CNN which is not pooled;

step 232: extracting context characteristics among subdata of the training data from nonlinear characteristics of the training data by using a Bi-directional gating circulation unit Bi-GRU;

step 233: obtaining correlation characteristics among subdata of the training data according to context characteristics among the subdata of the training data by using a Multi-Head Attention mechanism Multi-Head Self-Attention mechanism;

step 234: and obtaining the importance degree of each subdata of the training data to predict the class label of the training data according to the correlation characteristic by using an Attention mechanism.

In the embodiment of the invention, the fully-oriented Self-Attention neural network comprises a CNN layer for converting linear characteristics into nonlinear characteristics, a Bi-GRU layer for extracting context characteristics among subdata according to the nonlinear characteristics, a Multi-Head Self-orientation mechanism for obtaining correlation characteristics among the subdata according to the context characteristics among the subdata, and an orientation mechanism for obtaining the importance degree of each subdata on predicting the class label of corresponding training data according to the correlation characteristics; the CNN convolutional neural network strong convolutional kernel feature extractor is used for carrying out preliminary deep feature extraction on training data, and meanwhile, feature matrixes obtained by extracting state correlation of front and back moments among sub data of the training data and context relationship features of the sub data can completely extract feature information of the training data, so that the classification accuracy of a data classification model is further improved.

Referring to fig. 5, fig. 5 is a fifth flowchart of a data classification method according to an embodiment of the present invention.

When the data to be classified is an audio signal, the data classification model comprises an input layer, the fully-oriented self-attention neural network and an output layer.

Specifically, the training data classification model includes:

1. preprocessing data of an input layer:

step 51: taking a wav file containing multiple groups of audio training data of [ song audio S, type label y ] as an original input, reading binary data representation, audio channel number and quantization digit of the wav file, and converting the read binary data into an array representation for calculation according to the channel number and the quantization digit;

step 52: extracting fbank signal characteristics by framing (frame length is 25ms, frame overlapping is 10ms), pre-emphasis, calculating power spectrum of each frame, calculating Mel filter bank, calculating inner product of power spectrum and filter bank; extracting a first order difference delta and a second order difference delta of the fbank signal characteristics, and generating and outputting three audio signal characteristic arrays consisting of the fbank signal characteristics, delta and delta; wherein the vertical axis of each feature array represents frequency, and the horizontal axis represents time; wherein each training data is divided into n frames;

2. step 53: training the hidden layer (fully-oriented self-attention neural network) includes:

step 531: and performing deep feature extraction on the audio signal by using a CNN convolutional neural network strong convolutional kernel feature extractor. The coverage model convolution kernel can cover the whole frequency axis by adopting a longitudinal convolution kernel which carries out convolution along a time axis, and can sense the characteristics appearing on the frequency, such as the characteristics of specific sound and overtone of musical instruments and the like; the CNN convolutional neural network comprises 3-6 convolutional layers, and linear characteristics (three audio signal characteristic arrays consisting of fbank signal characteristics, delta and delta) are converted into nonlinear characteristic expression O in the convolutional layers by using the ReLU as an activation function; wherein O ═ { O ═ O₁，o₂…o_t，…o_n}；o_tA vector representing non-linear features of a tth frame of an audio signal;

step 532: extracting coding characteristics S which can represent the context relation of each audio frame from the audio nonlinear characteristic representation O output by the upper layer by using a bidirectional gating circulating unit Bi-GRU; wherein S ═ { S ═ S₁，s₂，…s_t…s_n}，s_tThrough o_tAnd determining bidirectional Bi-GRU parameters;

step 533: capturing a feature matrix Z of the interdependence relation of long time sequence distances between frames of music by using a Multi-Head Attention mechanism; wherein, the coding characteristic S is subjected to three linear transformations to obtain three vectors, and then an attention subspace matrix Z is obtained according to the three vectors subjected to the linear transformation for the first time_lZ is a plurality of attention subspace matrices Z_lSplicing to obtain;

step 534: learning the importance degree of each frame of the song to the predicted category label of the song by using an Attention mechanism, highlighting the importance degree of different frames for classifying the whole song, and obtaining the feature space M of each song by the accumulated sum of products of different probability weights distributed by the Attention mechanism and each hidden layer state of an Attention mechanism matrix; wherein M is a weight coefficient of an Attention mechanism and nAttention subspace matrix Z_iObtaining;

step 535: calculating the correlation among the output songs according to the output audio signal characteristic space M to obtain similar songs of the songs;

step 536: inputting the feature space M of each song into a full-connection layer, carrying out nonlinear combination on the extracted high-order features of each audio frequency, and outputting a vector H;

3. training an output layer:

step 54: and performing normalization processing on the vector H by using a softmax classifier to generate the probability that each song belongs to each label.

After the model training is finished, the probability of each label to which the new song audio signal belongs and the song with the similarity of TOPN to the song can be output only by inputting the new song audio signal into the data classification model. In some embodiments of the present invention, optionally, a CNN convolutional neural network strong convolutional kernel feature extractor is used to perform deep feature extraction on the audio signal, and this step is not performed with pooling in order to reduce loss on position information of each audio feature and attenuation of operation information of convolutional information.

In some embodiments of the present invention, optionally, in addition to the time-sequence relationship of the contexts between the audios, there is also a correlation between the frames, a Multi-Head Attention mechanism is used, the Multi-Head cuts the original input S into n segments, then linear transformation is performed respectively, and after the transformation, the original input S are spliced to form a unity-dimension whole. The multi-head allows the model to focus information from different representation subspaces together at different locations. Therefore, the interdependent features with long time sequence distance between each frame of music can be captured, and the step needs to be trained for many times, and finally a feature matrix Z capable of representing the relationship between the music context and each frame of music is output.

In some embodiments of the present invention, optionally, different frames of audio have different importance degrees in the classification task, an Attention mechanism is used to learn the importance degree of each frame of a song to predict the class label to which the frame belongs, the importance degree of the different frames to classify the whole song is highlighted, and finally, a fully-oriented deep-layer signal feature matrix M capable of representing the context timing relationship and the long-distance correlation in the audio signal is output.

In some embodiments of the present invention, the first and second electrodes, optionally,

the prediction probability that the feature vector H of each group of training data belongs to each class label is as follows:

wherein:

theta is a training parameter;

s represents the total number of category labels;

n is the total number of frames of an audio and is also the dimension of the vector H; i represents the ith dimension i in the feature vector H, belongs to [0, n ], and i is a positive integer;

the loss function of the data classification model is:

wherein:

is a regular term;

n is the total number of frames of an audio and is also the dimension of the vector H;

s represents the total number of category labels;

the proportion of the model complexity loss represented by lambda in the total loss, namely the weight of the regularization term;

t is the total number of audio samples;

r_ka prediction class for the kth audio sample;

p (i | H; θ) is the predicted probability of the ith class in the softmax classifier classification.

In some embodiments of the present invention, optionally, with the loss function J (θ) as a reference, the above steps of the hidden layer and the output layer are trained for multiple times, and the related weight coefficient sets in each layer are updated, so as to finally output a feature matrix H' that can represent the relationship between the music context and each frame of the music and the probability of each label to which the music belongs; where the regularization term is introduced to prevent overfitting.

After the model training is finished, inputting a new song into the model, namely, the probability of each label to which the new song belongs can be output, and meanwhile, the similarity of each song can be identified according to the deep features of each song, so that the problem of cold start of the song in a recommended scene is solved.

Referring to fig. 6, fig. 6 is a diagram of a data classification apparatus 30 according to an embodiment of the present invention. The data sorting apparatus 30 includes:

an obtaining module 31, configured to obtain data to be classified;

an extraction module 32, configured to extract linear features of the data to be classified;

the classification module 33 is configured to input the linear features to a fully-oriented self-attention neural network to obtain a feature matrix of the data to be classified;

the classification module 33 is further configured to perform normalization processing on the feature matrix to obtain a prediction probability that the data to be classified belongs to each class label;

In the embodiment of the invention, the data classification device obtains a characteristic matrix of the data to be classified by inputting the extracted linear characteristics of the data to be classified into the fully-pointed self-attention neural network; normalizing the characteristic matrix to obtain the prediction probability of the data to be classified belonging to each class label; the data to be classified comprises n pieces of sub data, the characteristic matrix is obtained by utilizing the context characteristics among the sub data, the correlation characteristics among the sub data and the importance degree of each sub data on predicting the class label of the data to be classified through a full-pointing type self-attention neural network; the data feature information extracted by the data classification device is complete, the extraction effect is good, the universality is strong, and the obtained data classification prediction accuracy is high.

In some embodiments of the present invention, optionally, the classification module 33 is further configured to input the linear features into a fully-oriented self-attention neural network, so as to obtain a feature space of the data to be classified;

the feature space is used for calculating the similarity between the data to be classified and other data.

Referring to fig. 7, fig. 7 is a second structural diagram of a data classification apparatus according to an embodiment of the present invention.

In some embodiments of the present invention, optionally, the classification module 33 includes: a first module 331, a second module 332, a third module 333, and a fourth module 334;

the first module 331 is configured to convert the linear feature into a non-linear feature using CNN, where CNN is CNN without pooling;

the second module 332 is configured to extract context features between the sub-data from the nonlinear features by using a Bi-directional gating loop unit Bi-GRU;

the third module 333 is configured to obtain correlation characteristics between the sub-data according to context characteristics between the sub-data by using a Multi-Head Attention mechanism;

the fourth module 334 is configured to obtain, according to the correlation characteristic, an importance degree of each subdata to predicting a category label of the to-be-classified data by using an Attention mechanism;

the classification module 33 is further configured to obtain a feature matrix of the to-be-classified data according to the context features among the sub-data, the correlation features among the sub-data, and the importance degree of each sub-data to predicting the class label of the to-be-classified data.

In the embodiment of the invention, the fully-oriented Self-Attention neural network comprises a CNN layer for converting linear characteristics into nonlinear characteristics, a Bi-GRU layer for extracting context characteristics among subdata according to the nonlinear characteristics, a Multi-Head Self-orientation mechanism for obtaining correlation characteristics among the subdata according to the context characteristics among the subdata, and an orientation mechanism for obtaining the importance degree of each subdata on predicting the class label of the data to be classified according to the correlation characteristics; the data classification device uses a CNN convolutional neural network strong convolutional kernel feature extractor to perform preliminary deep feature extraction on data to be classified, and meanwhile, feature information of the data to be classified can be completely extracted according to state correlation of front and back moments among all sub-data of the data to be classified and a feature matrix of the data to be classified, wherein the feature matrix can be obtained by representing context relation features of all the sub-data, and classification accuracy of the data to be classified is improved.

In some embodiments of the present invention, optionally, the second module 332 is further configured to extract context characteristics between the sub-data by:

wherein:

when t is equal to 1, the first step is carried out,

is obtained by pre-definition;

A corresponding weight;

A corresponding weight;

b_trepresenting the bias corresponding to the t-th sub-data hidden layer state;

In the embodiment of the invention, the current time output of the data to be classified is related to the state of the previous time and the state of the next time, and the second module extracts the coding features which can represent the context relationship among the subdata from the nonlinear feature representation of the data to be classified output from the upper layer through the Bi-GRU, so that the feature description among the subdata of the data to be classified is more complete.

In some embodiments of the present invention, optionally, the third module 333 is further configured to perform linear transformation respectively through three linear transformation manners according to the context characteristics between the sub-data to obtain a query vector Q, a key vector K, and a value vector V:

Q＝W_QS

K＝W_KS

V＝W_VS

wherein:

the third module 333 is further configured to perform linear transformation on the query vector Q, the key vector K, and the value vector V for l times to obtain a query vector Q subjected to linear transformation for l times_lKey vector K_lSum vector V_l；

Wherein:

Q_l＝QW_Ql

K_l＝KW_Kl

V_l＝VW_Vl

W_Ql、W_Kl、W_Vlall are parameters defined by initialization;

Q_lthe query vector is subjected to linear transformation for l times;

K_lis the key vector after l times of linear transformation;

V_lis a value vector after l times of linear transformation;

the third module 333 is further configured to perform linear transformation on the query vector Q for l times_lKey vector K_lSum vector V_lComputing an attention subspace matrix Z subjected to linear transformation for the order of l_l；

Wherein the content of the first and second substances,

represents a key vector K_lDimension (d);

the third module 333 is further configured to apply m of the i-th linearly transformed attention subspace matrices Z_lSplicing the data and the weight matrix W_OAnd multiplying to obtain a matrix Z of the data to be classified, wherein the matrix Z of the data to be classified comprises correlation characteristics among subdata.

In the embodiment of the invention, the third module captures the correlation characteristics and the long-time-sequence-distance interdependence characteristics among the subdata of the data to be classified through a Multi-Head Self-orientation mechanism, thereby effectively solving the defect of weak long-distance interdependence relation for capturing in GRU.

In some embodiments of the present invention, optionally, the fourth module 334 is further configured to extract the feature space M of the data to be classified by:

u_i＝tanh(w_iz_i+b_i)

M＝{m₁，m₂，…m_n}

wherein:

Z_ian attention subspace matrix for the ith sub-data;

u_ia first weight coefficient representing the ith sub-data;

u_wan attention matrix representing a random initialization;

α_ia second weight coefficient representing the ith sub-data;

m_irepresenting a feature subspace of the ith sub-data, wherein the feature subspace of the ith sub-data comprises the importance degree of the ith sub-data to the prediction of the data to be classified; m represents the feature space of the data to be classified, and the feature space of the data to be classified comprises the importance degree of each subdata on predicting the class label probability of the data to be classified.

In the embodiment of the invention, the fourth module utilizes the Attention mechanism to help endow different weights to each input part, extracts more critical and important information, and obtains the importance degree of each subdata on predicting the class label probability of the data to be classified, so that the data classification prediction is more accurate, and meanwhile, larger expenses can not be brought to the calculation and storage of the data classification prediction.

In some embodiments of the present invention, optionally, the classification module 33 further includes a fifth module 335, and the fifth module 335 is configured to perform a nonlinear combination on the context features between the sub-data, the correlation features between the sub-data, and the importance degree of each sub-data on predicting the class label probability of the data to be classified, so as to obtain a feature matrix of the data to be classified.

In the embodiment of the invention, the fifth module performs nonlinear combination on the extracted context characteristics among the subdata, the correlation characteristics among the subdata and the importance degree of each subdata on predicting the class label probability of the data to be classified to obtain the characteristic matrix of the data to be classified, and the characteristic matrix fuses multidimensional characteristic information of the data to be classified, so that the classification prediction effect is more accurate when the data is classified.

In some embodiments of the present invention, optionally, the classification module 33 includes a sixth module 336, and the sixth module 336 is configured to perform normalization processing on the feature matrix by using a softmax classifier, so as to obtain a prediction probability that the data to be classified belongs to each class label.

In the embodiment of the invention, a sixth module calculates the prediction probability of the data to be classified belonging to each class label by inputting the characteristic matrix of the data to be classified into a softmax classifier; the implementation mode is simple, the classification effect is good, and the obtained data classification model is high in universality.

In some embodiments of the present invention, optionally, the data classification apparatus further includes a training module 34 for training a data classification model; the data classification model comprises an input layer, the fully-oriented self-attention neural network and an output layer, and the training module 34 comprises an input layer sub-module 341, a network layer sub-module 342 and an output layer sub-module 343.

In some embodiments of the present invention, optionally, the input layer sub-module 341 is configured to obtain training data and class labels of the training data;

the input layer sub-module 341 is further configured to extract linear features of the training data;

the network layer sub-module 342 is configured to obtain a feature space and a feature matrix of the training data by using the fully-oriented self-attention neural network;

the network layer sub-module 342 is further configured to output correlations between the training data according to the feature space to obtain similar data of each training data;

the output layer submodule 343 is configured to perform normalization processing on the feature matrix, and obtain a prediction probability that the training data belongs to each class of label.

In the embodiment of the invention, the training data of the data classification device is marked with corresponding class labels, and each training data comprises a plurality of subdata; inputting the extracted linear characteristics of the training data into a fully-directional self-attention neural network to obtain a characteristic space and a characteristic matrix of the training data; obtaining similar data of each training data according to the feature space; normalizing the characteristic matrix to obtain the prediction probability of each training data belonging to each class label; the feature matrix is obtained by utilizing the context features among the subdata, the correlation features among the subdata and the importance degree of each subdata on predicting the class label of the training data through a full-pointing type self-attention neural network; by the scheme of the embodiment of the invention, complete data characteristic information can be extracted, the extraction effect is good, and the obtained data classification prediction accuracy is high due to the strong universality of the obtained model.

In some embodiments of the present invention, optionally, each training data includes n pieces of sub data, n is an integer, and n is greater than or equal to 2;

the network layer sub-module 342 is further configured to convert the linear features of the training data into non-linear features by using CNN, where CNN is CNN without pooling;

the network layer sub-module 342 is further configured to extract context features between sub-data of the training data from the nonlinear features of the training data by using a Bi-directional gated loop unit Bi-GRU;

the network layer submodule 342 is further configured to obtain correlation characteristics between the sub-data of the training data according to context characteristics between the sub-data of the training data by using a Multi-Head Attention mechanism Multi-Head Self-Attention mechanism;

the network layer sub-module 342 is further configured to obtain, according to the correlation features, a degree of importance of each sub-data of the training data to predict a class label of the training data, using an Attention mechanism.

In some embodiments of the present invention, optionally, the network layer sub-module 342 is further configured to perform a nonlinear combination on the context features among the sub-data, the correlation features among the sub-data, and the importance degree of each sub-data on predicting the class label probability of the data to be classified, so as to obtain a feature matrix of the data to be classified.

Referring to fig. 8, fig. 8 is a diagram illustrating a structure of a data classifying device according to an embodiment of the present invention. An embodiment of the present invention further provides a data classification device 40, including: a memory 401, a processor 402 and a program stored on the memory 401 and executable on the processor 402; the processor 402 is configured to read the program in the memory 401 to implement each process of any one of the above data classification method embodiments, and can achieve the same technical effect, and for avoiding repetition, details are not repeated here.

The embodiment of the present invention further provides a readable storage medium, which is used for storing a program, and when the program is executed by a processor, the program implements each process of any one of the data classification method embodiments described above, and can achieve the same technical effect, and for avoiding repetition, the detailed description is omitted here. The readable storage medium may be any available medium or data storage device that can be accessed by a processor, including but not limited to magnetic memory (e.g., floppy disk, hard disk, magnetic tape, magneto-optical disk (MO), etc.), optical memory (e.g., CD, DVD, BD, HVD, etc.), and semiconductor memory (e.g., ROM, EPROM, EEPROM, nonvolatile memory (NAND FLASH), Solid State Disk (SSD)), etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. With such an understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the methods according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method of data classification, comprising the steps of:

acquiring data to be classified;

extracting linear features of the data to be classified;

2. The data classification method according to claim 1, further comprising, after the inputting the linear features into a fully-oriented self-attention neural network:

3. The data classification method according to claim 1, wherein the inputting the linear features into a fully-oriented self-attention neural network to obtain a feature matrix of the data to be classified comprises:

4. The data classification method of claim 3, wherein the extracting the context features between the sub-data from the nonlinear features by using a Bi-directional gated loop unit Bi-GRU comprises:

extracting context features between the sub-data by:

wherein:

o is the nonlinear characteristic representation of the data to be classified, and O is { O ═ O₁，o₂，...o_n}；

when t is equal to 1, the first step is carried out,

is obtained by pre-definition;

A corresponding weight;

v_trepresents the t-th sub-data of Bi-Reverse hidden layer state corresponding to GRU

A corresponding weight;

b_trepresenting the bias corresponding to the t-th sub-data hidden layer state;

s is the characteristic code of the data to be classified, and S is { S ═ S₁，s₂，...s_n}; the feature codes of the data to be classified comprise context features among the subdata;

S_tfeature codes representing the tth sub-data, t ∈ [1, n [ ]]And t is a positive integer.

5. The method of claim 3, wherein the obtaining the correlation characteristics between the subdata according to the context characteristics between the subdata by using a Multi-Head Self-Attention mechanism comprises:

Q＝W_QS

K＝W_KS

V＝W_VS

wherein:

Wherein:

Q_l＝QW_Ql

K_l＝KW_Kl

V_l＝VW_Vl

W_Ql、W_Kl、W_Vlall are parameters defined by initialization;

Q_lthe query vector is subjected to linear transformation for l times;

K_lis the key vector after l times of linear transformation;

V_lis a value vector after l times of linear transformation;

Wherein the content of the first and second substances,

represents a key vector K_lDimension (d);

6. The data classification method according to claim 3, wherein the deriving the importance of each subdata to predict the class label probability of the data to be classified according to the correlation features by using an Attention mechanism comprises:

u_i＝tanh(w_iz_i+b_i)

M＝{m₁，m₂，...m_n}

wherein:

Z_ian attention subspace matrix for the ith sub-data;

u_ia first weight coefficient representing the ith sub-data;

u_wan attention matrix representing a random initialization;

α_ia second weight coefficient representing the ith sub-data;

7. The data classification method of claim 3, wherein the obtaining the feature matrix of the to-be-classified data according to the context features among the sub-data, the correlation features among the sub-data, and the importance degree of each sub-data to predicting the class label probability of the to-be-classified data comprises:

8. The data classification method according to claim 1, wherein the normalizing the feature matrix to obtain the prediction probability that the data to be classified belongs to each class label comprises:

9. The method of data classification of claim 1, the method further comprising:

10. The data classification method of claim 9, wherein the training of the data classification model comprises:

acquiring training data and a class label of the training data;

extracting linear features of the training data;

obtaining a feature space and a feature matrix of the training data by using a fully-oriented self-attention neural network;

outputting the correlation among the training data according to the feature space to obtain similar data of each training data;

11. The data classification method according to claim 10,

each training data comprises n sub-data, wherein n is an integer and is greater than or equal to 2;

converting the linear characteristic of the training data into a nonlinear characteristic by using a Convolutional Neural Network (CNN), wherein the CNN is a CNN which is not pooled;

12. A data sorting apparatus comprising: a memory, a processor, and a program stored on the memory and executable on the processor; characterized in that the processor, which is adapted to read a program in the memory, implements the steps in the data classification method according to any one of claims 1 to 11.

13. A readable storage medium storing a program, which when executed by a processor performs the steps in the data classification method according to any one of claims 1 to 11.