CN112766368A - Data classification method, equipment and readable storage medium - Google Patents

Data classification method, equipment and readable storage medium Download PDF

Info

Publication number
CN112766368A
CN112766368A CN202110065077.7A CN202110065077A CN112766368A CN 112766368 A CN112766368 A CN 112766368A CN 202110065077 A CN202110065077 A CN 202110065077A CN 112766368 A CN112766368 A CN 112766368A
Authority
CN
China
Prior art keywords
data
classified
subdata
sub
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110065077.7A
Other languages
Chinese (zh)
Inventor
张聪
陈聪
张超
严自强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
MIGU Music Co Ltd
MIGU Culture Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
MIGU Music Co Ltd
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, MIGU Music Co Ltd, MIGU Culture Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202110065077.7A priority Critical patent/CN112766368A/en
Publication of CN112766368A publication Critical patent/CN112766368A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data classification method, data classification equipment and a readable storage medium, and relates to the field of data processing. The method comprises the following steps: acquiring data to be classified; extracting linear characteristics of data to be classified; inputting the linear characteristics into a fully-directional self-attention neural network to obtain a characteristic matrix of the data to be classified; normalizing the characteristic matrix to obtain the prediction probability of the data to be classified belonging to each class label; the data to be classified comprises n pieces of sub data, n is an integer and is greater than or equal to 2; the fully-oriented self-attention neural network obtains a feature matrix of the data to be classified by utilizing the context features among the subdata, the correlation features among the subdata and the importance degree of each subdata on predicting the class label of the data to be classified. The invention can solve the problem of low data classification prediction accuracy caused by poor universality of a data classification model in the prior art.

Description

Data classification method, equipment and readable storage medium
Technical Field
The embodiment of the invention relates to the field of data processing, in particular to a data classification method, data classification equipment and a readable storage medium.
Background
With the successful application and continuous development of deep learning models in other fields, more and more researches are beginning to utilize the spectrogram of music as the input of the deep learning model to classify the genre of music. The common deep learning algorithm network has problems when in use: for example, RNN (Recurrent Neural Network) can only memorize a partial sequence, and there is a risk of gradient disappearance or gradient explosion; the pooling operation of CNN (Convolutional Neural Networks) may lose a large amount of information features; LSTM (Long Short-Term Memory network) and GRU (Gated Current Unit) cannot identify the Long-distance interdependence characteristics; the Self-Attention mechanism (Self-Attention) cannot capture the context sequence information of the global structure.
Therefore, the network model in the prior art is poor in universality.
Disclosure of Invention
The embodiment of the invention provides a data classification method, data classification equipment and a readable storage medium, and solves the problem of low data classification prediction accuracy caused by poor universality of a data classification model in the prior art.
In order to solve the technical problem, the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a data classification method, including the following steps:
acquiring data to be classified;
extracting linear features of the data to be classified;
inputting the linear features into a fully-directional self-attention neural network to obtain a feature matrix of the data to be classified;
normalizing the characteristic matrix to obtain the prediction probability of the data to be classified belonging to each class label;
the data to be classified comprises n pieces of sub data, n is an integer and is greater than or equal to 2;
and the fully-pointed self-attention neural network obtains a feature matrix of the data to be classified by utilizing the context features among the subdata, the correlation features among the subdata and the importance degree of each subdata on predicting the class label of the data to be classified.
Optionally, after the inputting the linear feature into the fully-directional self-attention neural network, the method further includes:
and acquiring a feature space of the data to be classified, wherein the feature space is used for calculating the similarity between the data to be classified and other data.
Optionally, the inputting the linear feature into a fully-oriented self-attention neural network to obtain a feature matrix of the data to be classified includes:
converting the linear features into nonlinear features by using a Convolutional Neural Network (CNN), wherein the Convolutional Neural Network (CNN) is CNN which is not pooled;
extracting context characteristics among the subdata from the nonlinear characteristics by using a Bi-directional gating circulation unit Bi-GRU;
obtaining correlation characteristics among the subdata according to context characteristics among the subdata by using a Multi-Head Attention mechanism Multi-Head Self-Attention mechanism;
using an Attention mechanism, and obtaining the importance degree of each subdata on predicting the class label of the data to be classified according to the correlation characteristics;
and obtaining a feature matrix of the data to be classified according to the context features among the subdata, the correlation features among the subdata and the importance degree of each subdata on predicting the class label of the data to be classified.
The method for extracting the context features among the subdata from the nonlinear features by utilizing the Bi-directional gating circulation unit Bi-GRU is characterized in that a Multi-Head Self-Attention mechanism is used for the Bi-directional gating circulation unit Bi-GRU, and the method is selectable by using an Attention mechanism, and comprises the following steps:
extracting context features between the sub-data by:
Figure BDA0002903972440000021
Figure BDA0002903972440000022
Figure BDA0002903972440000031
wherein:
o is the nonlinear characteristic representation of the data to be classified, O ═ O1,o2,…on};
otVector representing the nonlinear characteristics of the tth sub-data input, t ∈ [1, n]T is a positive integer;
Figure BDA0002903972440000032
an output representing the hidden state of the (t-1) th sub-data forward of the Bi-GRU;
Figure BDA0002903972440000033
represents the (t-1) th sub-data of the Bi-GRU reverse directionThe hidden layer state of (1);
Figure BDA0002903972440000034
an output representing the forward hidden state of the Bi-GRU for the tth sub-data;
Figure BDA0002903972440000035
an output representing the reverse hidden layer state of the Bi-GRU of the tth sub-data;
when t is equal to 1, the first step is carried out,
Figure BDA0002903972440000036
is obtained by pre-definition;
wtrepresenting the forward hidden layer state corresponding to the Bi-GRU of the tth sub-data
Figure BDA0002903972440000037
A corresponding weight;
vtrepresenting the reverse hidden layer state corresponding to the Bi-GRU of the tth sub-data
Figure BDA0002903972440000038
A corresponding weight;
btrepresenting the bias corresponding to the t-th sub-data hidden layer state;
Stfeature codes representing the tth sub-data, t ∈ [1, n [ ]]T is a positive integer;
s is the characteristic code of the data to be classified, and S is { S ═ S1,s2,…sn}; the feature codes of the data to be classified comprise context features among the subdata.
Optionally, the obtaining, by using a Multi-Head Attention mechanism, a correlation characteristic between the sub-data according to a context characteristic between the sub-data includes:
s1: and respectively carrying out linear transformation through three linear transformation modes according to the context characteristics among the subdata to obtain a query vector Q, a key vector K and a value vector V:
Q=WQS
K=WKS
V=WVS
wherein:
s is the characteristic code of the data to be classified, and S is { S ═ S1,s2,…sn}; the feature codes of the data to be classified comprise context features among the subdata;
WQis a first linear transformation; q is a query vector obtained after the feature code S of the data to be classified is subjected to first linear transformation;
WKis a second linear transformation; k is a key vector obtained after the feature code S of the data to be classified is subjected to second linear transformation;
WVis a third linear transformation; v is a value vector obtained after the feature code S of the data to be classified is subjected to third linear transformation;
s2: performing linear transformation on the query vector Q, the key vector K and the value vector V for l times to obtain a query vector Q subjected to linear transformation for l timeslKey vector KlSum vector Vl
Wherein:
Ql=QWQl
Kl=KWKl
Vl=VWVl
WQl、WKl、WVlall are parameters defined by initialization;
l is the number of linear transformation, l belongs to [1, m ], and l is a positive integer; m is the total segmentation amount of the multi-head attention mechanism;
Qlthe query vector is subjected to linear transformation for l times;
Klis the key vector after l times of linear transformation;
Vlis a value vector after l times of linear transformation;
s3: according to the query vector Q subjected to the linear transformation for the times IlKey vector KlSum vector VlComputing an attention subspace matrix Z subjected to linear transformation for the order of ll
Figure BDA0002903972440000041
Wherein the content of the first and second substances,
Figure BDA0002903972440000042
represents a key vector KlDimension (d);
s4: m said i times linearly transformed attention subspace matrix ZlSplicing the data and the weight matrix WOAnd multiplying to obtain a matrix Z of the data to be classified, wherein the matrix Z of the data to be classified comprises correlation characteristics among subdata.
Optionally, the obtaining a feature matrix of the to-be-classified data according to the context features among the sub-data, the correlation features among the sub-data, and the importance degree of each sub-data to predicting the class label probability of the to-be-classified data includes:
and carrying out nonlinear combination on the context characteristics among the subdata, the correlation characteristics among the subdata and the importance degree of each subdata on predicting the class label probability of the data to be classified to obtain a characteristic matrix of the data to be classified.
Optionally, the normalizing the feature matrix to obtain the prediction probability that the data to be classified belongs to each class of label includes:
and carrying out normalization processing on the characteristic matrix by using a softmax classifier to obtain the prediction probability of the data to be classified belonging to each class of labels.
Optionally, the method further includes:
training a data classification model comprising an input layer, the fully-oriented self-attentive neural network, and an output layer.
Optionally, the training data classification model includes:
acquiring training data and a class label of the training data;
extracting linear features of the training data;
obtaining a feature space and a feature matrix of the training data by using a fully-oriented self-attention neural network; outputting the correlation among the training data according to the feature space to obtain similar data of each training data;
and carrying out normalization processing on the characteristic matrix to obtain the prediction probability of the training data belonging to each class of labels.
Optionally, each training data includes n pieces of sub-data, n is an integer, and n is greater than or equal to 2;
the obtaining the feature space and the feature matrix of the training data by using the fully-directional self-attention neural network comprises:
converting the linear features of the training data into nonlinear features by using CNN, wherein the CNN is CNN which is not pooled;
extracting context characteristics among subdata of the training data from nonlinear characteristics of the training data by using a Bi-directional gating circulation unit Bi-GRU;
obtaining correlation characteristics among subdata of the training data according to context characteristics among the subdata of the training data by using a Multi-Head Attention mechanism Multi-Head Self-Attention mechanism;
and obtaining the importance degree of each subdata of the training data to predict the class label of the training data according to the correlation characteristic by using an Attention mechanism.
In a second aspect, an embodiment of the present invention further provides a data classification device, including: a memory, a processor, and a program stored on the memory and executable on the processor; the processor is configured to read a program in the memory to implement the steps in the data classification method according to any one of the first aspect.
In a third aspect, an embodiment of the present invention further provides a readable storage medium, which is used for storing a program, and when the program is executed by a processor, the program implements the steps in the data classification method according to any one of the first aspect.
In the embodiment of the invention, the extracted linear characteristics of the data to be classified are input into a fully-directional self-attention neural network to obtain a characteristic matrix of the data to be classified; normalizing the characteristic matrix to obtain the prediction probability of the data to be classified belonging to each class label; the data to be classified comprises n pieces of sub data, and the feature matrix is obtained by utilizing the context features among the sub data, the correlation features among the sub data and the importance degree of each sub data to the prediction of the class label of the data to be classified. By the scheme of the embodiment of the invention, complete data characteristic information can be extracted, the extraction effect is good, and the obtained data classification prediction accuracy is high due to the strong universality of the obtained model.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart of a data classification method provided by an embodiment of the invention;
FIG. 2 is a second flowchart of a data classification method according to an embodiment of the present invention;
FIG. 3 is a third flowchart of a data classification method according to an embodiment of the present invention;
FIG. 4 is a fourth flowchart of a data classification method according to an embodiment of the present invention;
FIG. 5 is a fifth flowchart of a data classification method according to an embodiment of the present invention;
FIG. 6 is a block diagram of a data sorting apparatus according to an embodiment of the present invention;
FIG. 7 is a second block diagram of a data sorting apparatus according to an embodiment of the present invention;
fig. 8 is one of the structural diagrams of the data sorting apparatus according to the embodiment of the present invention.
Detailed Description
The term "and/or" in the embodiments of the present invention describes an association relationship of associated objects, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
In the embodiments of the present application, the term "plurality" means two or more, and other terms are similar thereto.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart of a data classification method according to an embodiment of the present invention, including the following steps:
step 11: acquiring data to be classified;
step 12: extracting linear features of the data to be classified;
step 13: inputting the linear features into a fully-directional self-attention neural network to obtain a feature matrix of the data to be classified;
step 14: normalizing the characteristic matrix to obtain the prediction probability of the data to be classified belonging to each class label;
the data to be classified comprises n pieces of sub data, n is an integer and is greater than or equal to 2;
and the fully-pointed self-attention neural network obtains a feature matrix of the data to be classified by utilizing the context features among the subdata, the correlation features among the subdata and the importance degree of each subdata on predicting the class label of the data to be classified.
In the embodiment of the invention, the extracted linear characteristics of the data to be classified are input into a fully-directional self-attention neural network to obtain a characteristic matrix of the data to be classified; normalizing the characteristic matrix to obtain the prediction probability of the data to be classified belonging to each class label; the data to be classified comprises n pieces of sub data, and the feature matrix is obtained by utilizing the context features among the sub data, the correlation features among the sub data and the importance degree of each sub data to the prediction of the class label of the data to be classified. By the scheme of the embodiment of the invention, complete data characteristic information can be extracted, the extraction effect is good, and the obtained data classification prediction accuracy is high due to the strong universality of the obtained model.
In some embodiments of the present invention, the data to be classified may optionally be an audio signal, text, or other data type to be classified.
When the data to be classified is an audio signal, the audio signal comprises n audio frames after framing processing; the extracted linear characteristic of the audio signal may be at least one of a fbank characteristic, a first order difference of the fbank characteristic, and a second order difference of the fbank characteristic of the audio signal.
When the data to be classified is text data, the text comprises n sub-texts after word segmentation processing; the extracted linear feature of the sub-text may be a semantic feature of the sub-text.
In some embodiments of the present invention, optionally, after the inputting the linear feature into the fully-directional self-attention neural network, the method further includes:
and acquiring a feature space of the data to be classified, wherein the feature space is used for calculating the similarity between the data to be classified and other data.
In the embodiment of the invention, the feature space of the data to be classified can be obtained according to the fully-oriented self-attention neural network, and the feature space is used for calculating the similarity between the data to be classified and other data.
In some embodiments of the present invention, optionally, when the data to be classified is a song audio signal, the similarity of the song audio signals obtained according to the feature space can solve the problem of cold start of songs in a recommended scene.
Referring to fig. 2, fig. 2 is a second flowchart of a data classification method according to an embodiment of the invention. In some embodiments of the present invention, optionally, the inputting the linear features into a fully-oriented self-attention neural network to obtain a feature matrix of the data to be classified includes the following steps:
step 131: converting the linear features into nonlinear features by using a Convolutional Neural Network (CNN), wherein the Convolutional Neural Network (CNN) is CNN which is not pooled;
step 132: extracting context characteristics among the subdata from the nonlinear characteristics by using a Bi-directional gating circulation unit Bi-GRU;
step 133: obtaining correlation characteristics among the subdata according to context characteristics among the subdata by using a Multi-Head Attention mechanism Multi-Head Self-Attention mechanism;
step 134: using an Attention mechanism, and obtaining the importance degree of each subdata on predicting the class label of the data to be classified according to the correlation characteristics;
step 135: and obtaining a feature matrix of the data to be classified according to the context features among the subdata, the correlation features among the subdata and the importance degree of each subdata on predicting the class label of the data to be classified.
In the embodiment of the invention, the fully-oriented Self-Attention neural network comprises a CNN layer for converting linear characteristics into nonlinear characteristics, a Bi-GRU layer for extracting context characteristics among subdata according to the nonlinear characteristics, a Multi-Head Self-orientation mechanism for obtaining correlation characteristics among the subdata according to the context characteristics among the subdata, and an orientation mechanism for obtaining the importance degree of each subdata on predicting the class label of the data to be classified according to the correlation characteristics; the CNN convolutional neural network strong convolutional kernel feature extractor is used for carrying out preliminary deep feature extraction on data to be classified, meanwhile, feature information of the data to be classified can be completely extracted according to state correlation of front and back moments among all subdata of the data to be classified and a feature matrix of the data to be classified, wherein the feature matrix can be obtained by representing context relation features of all subdata, and classification accuracy of the data to be classified is improved.
In some embodiments of the present invention, optionally, the extracting, by using a Bi-directional gating loop unit Bi-GRU, the context feature between the sub-data from the nonlinear feature includes:
extracting context features between the sub-data by:
Figure BDA0002903972440000091
Figure BDA0002903972440000092
Figure BDA0002903972440000093
wherein:
o is the nonlinear characteristic representation of the data to be classified, and O is { O ═ O1,o2,…on};
otVector representing the nonlinear characteristics of the tth sub-data input, t ∈ [1, n]T is a positive integer;
Figure BDA0002903972440000094
an output representing the hidden state of the (t-1) th sub-data forward of the Bi-GRU;
Figure BDA0002903972440000095
an output representing the Bi-GRU inverted hidden layer state of the (t-1) th sub-data;
Figure BDA0002903972440000096
an output representing the forward hidden state of the Bi-GRU for the tth sub-data;
Figure BDA0002903972440000097
an output representing the reverse hidden layer state of the Bi-GRU of the tth sub-data;
when t is equal to 1, the first step is carried out,
Figure BDA0002903972440000098
is obtained by pre-definition;
wtrepresenting the forward hidden layer state corresponding to the Bi-GRU of the tth sub-data
Figure BDA0002903972440000099
A corresponding weight;
vtrepresenting the reverse hidden layer state corresponding to the Bi-GRU of the tth sub-data
Figure BDA00029039724400000910
A corresponding weight;
btrepresenting the bias corresponding to the t-th sub-data hidden layer state;
Stfeature codes representing the tth sub-data, t ∈ [1, n [ ]]T is a positive integer;
s is the characteristic code of the data to be classified, and S is { S ═ S1,s2,…sn}; the feature codes of the data to be classified comprise context features among the subdata.
In the embodiment of the invention, the current time output of the data to be classified is related to the state of the previous time and the state of the next time, and the Bi-GRU is used for extracting the coding features which can represent the context relationship among the subdata from the nonlinear feature representation of the data to be classified output from the upper layer, so that the feature description among the subdata of the data to be classified is more complete.
In some embodiments of the invention, a Bi-GRU is a neural network composed of GRUs that are jointly determined by the states of unidirectional, inverted GRUsAnd (5) structure. At each time, the input provides two GPUs in opposite directions, and the output is determined by two unidirectional GRUs. The current hidden state of the Bi-GRU is represented by the current input vector otHidden layer state output with forward (t-1) time
Figure BDA0002903972440000101
And output of the hidden state in reverse
Figure BDA0002903972440000102
The three parts jointly determine that the hidden layer state of the Bi-GRU at the time t passes through
Figure BDA0002903972440000103
And
Figure BDA0002903972440000104
and obtaining the weight.
In some embodiments of the present invention, optionally, the obtaining, according to a context feature between the sub-data, a correlation feature between the sub-data by using a Multi-Head Attention mechanism includes:
s1: and respectively carrying out linear transformation through three linear transformation modes according to the context characteristics among the subdata to obtain a query vector Q, a key vector K and a value vector V:
Q=WQS
K=WKS
V=WVS
wherein:
s is the characteristic code of the data to be classified, and S is { S ═ S1,s2,…sn}; the feature codes of the data to be classified comprise context features among the subdata;
WQis a first linear transformation; q is a query vector obtained after the feature code S of the data to be classified is subjected to first linear transformation;
WKis a second linear transformation; k is obtained after the characteristic code S of the data to be classified is subjected to second linear transformationThe key vector of (2);
WVis a third linear transformation; v is a value vector obtained after the feature code S of the data to be classified is subjected to third linear transformation;
s2: performing linear transformation on the query vector Q, the key vector K and the value vector V for l times to obtain a query vector Q subjected to linear transformation for l timeslKey vector KlSum vector Vl
Wherein:
Ql=QWQl
Kl=KWKl
Vl=VWVl
WQl、WKl、WVlall are parameters defined by initialization;
l is the number of linear transformation, l belongs to [1, m ], and l is a positive integer; m is the total segmentation amount of the multi-head attention mechanism;
Qlthe query vector is subjected to linear transformation for l times;
Klis the key vector after l times of linear transformation;
Vlis a value vector after l times of linear transformation;
s3: according to the query vector Q subjected to the linear transformation for the times IlKey vector KlSum vector VlComputing an attention subspace matrix Z subjected to linear transformation for the order of ll
Figure BDA0002903972440000111
Wherein the content of the first and second substances,
Figure BDA0002903972440000112
represents a key vector KlDimension (d);
s4: m said i times linearly transformed attention subspace matrix ZlSplicing the data and the weight matrix WOMultiplying to obtain a matrix Z of the data to be classifiedThe matrix Z of data contains correlation features between sub-data.
In the embodiment of the invention, the correlation characteristics and the long-time-sequence-distance interdependence characteristics among the subdata of the data to be classified are captured through a Multi-Head Self-orientation mechanism, so that the defect of weak long-distance interdependence relation for capturing in a GRU is effectively overcome.
In some embodiments of the present invention, optionally, the obtaining, by using an Attention mechanism, an importance degree of each subdata to predicting the class label probability of the data to be classified according to the correlation feature includes:
extracting a feature space M of the data to be classified by the following method:
ui=tanh(wizi+bi)
Figure BDA0002903972440000113
Figure BDA0002903972440000114
M={m1,m2,…mn}
wherein:
Zian attention subspace matrix for the ith sub-data;
wiweight coefficient representing ith sub-data, biA bias coefficient representing the ith sub-data;
uia first weight coefficient representing the ith sub-data;
uwan attention matrix representing a random initialization;
αia second weight coefficient representing the ith sub-data;
mirepresenting a feature subspace of the ith sub-data, wherein the feature subspace of the ith sub-data comprises the importance degree of the ith sub-data to the prediction of the data to be classified;
m represents the feature space of the data to be classified, and the feature space of the data to be classified comprises the importance degree of each subdata on predicting the class label probability of the data to be classified.
In the embodiment of the invention, an Attention mechanism is utilized to help endow different weights to each input part, more key and important information is extracted, and the importance degree of each subdata on predicting the class label probability of the data to be classified is obtained, so that the data classification prediction is more accurate, and meanwhile, larger expenses can not be brought to the calculation and storage of the data classification prediction.
In some embodiments of the present invention, optionally, the obtaining the feature matrix of the to-be-classified data according to the context features among the sub data, the correlation features among the sub data, and the importance degree of each sub data to the prediction of the class label probability of the to-be-classified data includes:
and carrying out nonlinear combination on the context characteristics among the subdata, the correlation characteristics among the subdata and the importance degree of each subdata on predicting the class label probability of the data to be classified to obtain a characteristic matrix of the data to be classified.
In the embodiment of the invention, the extracted context characteristics among the subdata, the correlation characteristics among the subdata and the importance degree of each subdata on predicting the class label probability of the data to be classified are subjected to nonlinear combination to obtain the characteristic matrix of the data to be classified, and the characteristic matrix is fused with the multidimensional characteristic information of the data to be classified, so that the classification prediction effect is more accurate when the data is classified.
In some embodiments of the present invention, optionally, the method further includes inputting an output result of the Attention mechanism to a full connection layer, where the full connection layer is equivalent to a hidden layer in a conventional feedforward neural network, so as to implement nonlinear combination on the extracted high-order features of the data to be classified, and output a feature matrix.
In some embodiments of the present invention, optionally, the normalizing the feature matrix to obtain the prediction probability of each class label to which the data to be classified belongs includes:
and carrying out normalization processing on the characteristic matrix by using a softmax classifier to obtain the prediction probability of the data to be classified belonging to each class of labels.
In the embodiment of the invention, the prediction probability of the data to be classified belonging to each class label is calculated by inputting the characteristic matrix of the data to be classified into a softmax classifier; the implementation mode is simple, the classification effect is good, and the obtained data classification model is high in universality.
In some embodiments of the present invention, optionally, the method further comprises:
training a data classification model comprising an input layer, the fully-oriented self-attentive neural network, and an output layer.
In the embodiment of the invention, the data classification model comprises an input layer, the fully-oriented self-attention neural network and an output layer; the input layer is used for extracting linear characteristics of the training data; the fully-directional self-attention neural network is used for extracting a feature space and a feature matrix of the training data; and the output layer is used for obtaining the prediction probability of each class label of the training data.
Referring to fig. 3, fig. 3 is a third flowchart of a data classification method according to an embodiment of the invention. In some embodiments of the present invention, optionally, the training data classification model includes the following steps:
step 21: acquiring training data and a class label of the training data;
step 22: extracting linear features of the training data;
step 23: obtaining a feature space and a feature matrix of the training data by using a fully-oriented self-attention neural network;
step 24: outputting the correlation among the training data according to the feature space to obtain similar data of each training data;
step 25: and carrying out normalization processing on the characteristic matrix to obtain the prediction probability of the training data belonging to each class of labels.
In the embodiment of the invention, the training data are marked with corresponding class labels, and each training data comprises a plurality of subdata; inputting the extracted linear characteristics of the training data into a fully-directional self-attention neural network to obtain a characteristic space and a characteristic matrix of the training data; obtaining similar data of each training data according to the feature space; normalizing the characteristic matrix to obtain the prediction probability of each training data belonging to each class label; the feature matrix is obtained by utilizing the context features among the subdata, the correlation features among the subdata and the importance degree of each subdata on predicting the class label of the training data through a full-pointing type self-attention neural network; by the scheme of the embodiment of the invention, complete data characteristic information can be extracted, the extraction effect is good, and the obtained data classification prediction accuracy is high due to the strong universality of the obtained model.
Referring to fig. 4, fig. 4 is a fourth flowchart of a data classification method according to an embodiment of the present invention. In some embodiments of the present invention, optionally, each training data includes n pieces of sub data, n is an integer, and n is greater than or equal to 2;
the method for obtaining the feature space and the feature matrix of the training data by utilizing the fully-directional self-attention neural network comprises the following steps:
step 231: converting the linear characteristic of the training data into a nonlinear characteristic by using a Convolutional Neural Network (CNN), wherein the CNN is CNN which is not pooled;
step 232: extracting context characteristics among subdata of the training data from nonlinear characteristics of the training data by using a Bi-directional gating circulation unit Bi-GRU;
step 233: obtaining correlation characteristics among subdata of the training data according to context characteristics among the subdata of the training data by using a Multi-Head Attention mechanism Multi-Head Self-Attention mechanism;
step 234: and obtaining the importance degree of each subdata of the training data to predict the class label of the training data according to the correlation characteristic by using an Attention mechanism.
In the embodiment of the invention, the fully-oriented Self-Attention neural network comprises a CNN layer for converting linear characteristics into nonlinear characteristics, a Bi-GRU layer for extracting context characteristics among subdata according to the nonlinear characteristics, a Multi-Head Self-orientation mechanism for obtaining correlation characteristics among the subdata according to the context characteristics among the subdata, and an orientation mechanism for obtaining the importance degree of each subdata on predicting the class label of corresponding training data according to the correlation characteristics; the CNN convolutional neural network strong convolutional kernel feature extractor is used for carrying out preliminary deep feature extraction on training data, and meanwhile, feature matrixes obtained by extracting state correlation of front and back moments among sub data of the training data and context relationship features of the sub data can completely extract feature information of the training data, so that the classification accuracy of a data classification model is further improved.
Referring to fig. 5, fig. 5 is a fifth flowchart of a data classification method according to an embodiment of the present invention.
When the data to be classified is an audio signal, the data classification model comprises an input layer, the fully-oriented self-attention neural network and an output layer.
Specifically, the training data classification model includes:
1. preprocessing data of an input layer:
step 51: taking a wav file containing multiple groups of audio training data of [ song audio S, type label y ] as an original input, reading binary data representation, audio channel number and quantization digit of the wav file, and converting the read binary data into an array representation for calculation according to the channel number and the quantization digit;
step 52: extracting fbank signal characteristics by framing (frame length is 25ms, frame overlapping is 10ms), pre-emphasis, calculating power spectrum of each frame, calculating Mel filter bank, calculating inner product of power spectrum and filter bank; extracting a first order difference delta and a second order difference delta of the fbank signal characteristics, and generating and outputting three audio signal characteristic arrays consisting of the fbank signal characteristics, delta and delta; wherein the vertical axis of each feature array represents frequency, and the horizontal axis represents time; wherein each training data is divided into n frames;
2. step 53: training the hidden layer (fully-oriented self-attention neural network) includes:
step 531: and performing deep feature extraction on the audio signal by using a CNN convolutional neural network strong convolutional kernel feature extractor. The coverage model convolution kernel can cover the whole frequency axis by adopting a longitudinal convolution kernel which carries out convolution along a time axis, and can sense the characteristics appearing on the frequency, such as the characteristics of specific sound and overtone of musical instruments and the like; the CNN convolutional neural network comprises 3-6 convolutional layers, and linear characteristics (three audio signal characteristic arrays consisting of fbank signal characteristics, delta and delta) are converted into nonlinear characteristic expression O in the convolutional layers by using the ReLU as an activation function; wherein O ═ { O ═ O1,o2…ot,…on};otA vector representing non-linear features of a tth frame of an audio signal;
step 532: extracting coding characteristics S which can represent the context relation of each audio frame from the audio nonlinear characteristic representation O output by the upper layer by using a bidirectional gating circulating unit Bi-GRU; wherein S ═ { S ═ S1,s2,…st…sn},stThrough otAnd determining bidirectional Bi-GRU parameters;
step 533: capturing a feature matrix Z of the interdependence relation of long time sequence distances between frames of music by using a Multi-Head Attention mechanism; wherein, the coding characteristic S is subjected to three linear transformations to obtain three vectors, and then an attention subspace matrix Z is obtained according to the three vectors subjected to the linear transformation for the first timelZ is a plurality of attention subspace matrices ZlSplicing to obtain;
step 534: learning the importance degree of each frame of the song to the predicted category label of the song by using an Attention mechanism, highlighting the importance degree of different frames for classifying the whole song, and obtaining the feature space M of each song by the accumulated sum of products of different probability weights distributed by the Attention mechanism and each hidden layer state of an Attention mechanism matrix; wherein M is a weight coefficient of an Attention mechanism and nAttention subspace matrix ZiObtaining;
step 535: calculating the correlation among the output songs according to the output audio signal characteristic space M to obtain similar songs of the songs;
step 536: inputting the feature space M of each song into a full-connection layer, carrying out nonlinear combination on the extracted high-order features of each audio frequency, and outputting a vector H;
3. training an output layer:
step 54: and performing normalization processing on the vector H by using a softmax classifier to generate the probability that each song belongs to each label.
After the model training is finished, the probability of each label to which the new song audio signal belongs and the song with the similarity of TOPN to the song can be output only by inputting the new song audio signal into the data classification model. In some embodiments of the present invention, optionally, a CNN convolutional neural network strong convolutional kernel feature extractor is used to perform deep feature extraction on the audio signal, and this step is not performed with pooling in order to reduce loss on position information of each audio feature and attenuation of operation information of convolutional information.
In some embodiments of the present invention, optionally, in addition to the time-sequence relationship of the contexts between the audios, there is also a correlation between the frames, a Multi-Head Attention mechanism is used, the Multi-Head cuts the original input S into n segments, then linear transformation is performed respectively, and after the transformation, the original input S are spliced to form a unity-dimension whole. The multi-head allows the model to focus information from different representation subspaces together at different locations. Therefore, the interdependent features with long time sequence distance between each frame of music can be captured, and the step needs to be trained for many times, and finally a feature matrix Z capable of representing the relationship between the music context and each frame of music is output.
In some embodiments of the present invention, optionally, different frames of audio have different importance degrees in the classification task, an Attention mechanism is used to learn the importance degree of each frame of a song to predict the class label to which the frame belongs, the importance degree of the different frames to classify the whole song is highlighted, and finally, a fully-oriented deep-layer signal feature matrix M capable of representing the context timing relationship and the long-distance correlation in the audio signal is output.
In some embodiments of the present invention, the first and second electrodes, optionally,
the prediction probability that the feature vector H of each group of training data belongs to each class label is as follows:
Figure BDA0002903972440000161
wherein:
theta is a training parameter;
s represents the total number of category labels;
n is the total number of frames of an audio and is also the dimension of the vector H; i represents the ith dimension i in the feature vector H, belongs to [0, n ], and i is a positive integer;
the loss function of the data classification model is:
Figure BDA0002903972440000171
wherein:
Figure BDA0002903972440000172
is a regular term;
n is the total number of frames of an audio and is also the dimension of the vector H;
s represents the total number of category labels;
the proportion of the model complexity loss represented by lambda in the total loss, namely the weight of the regularization term;
t is the total number of audio samples;
rka prediction class for the kth audio sample;
p (i | H; θ) is the predicted probability of the ith class in the softmax classifier classification.
In some embodiments of the present invention, optionally, with the loss function J (θ) as a reference, the above steps of the hidden layer and the output layer are trained for multiple times, and the related weight coefficient sets in each layer are updated, so as to finally output a feature matrix H' that can represent the relationship between the music context and each frame of the music and the probability of each label to which the music belongs; where the regularization term is introduced to prevent overfitting.
After the model training is finished, inputting a new song into the model, namely, the probability of each label to which the new song belongs can be output, and meanwhile, the similarity of each song can be identified according to the deep features of each song, so that the problem of cold start of the song in a recommended scene is solved.
Referring to fig. 6, fig. 6 is a diagram of a data classification apparatus 30 according to an embodiment of the present invention. The data sorting apparatus 30 includes:
an obtaining module 31, configured to obtain data to be classified;
an extraction module 32, configured to extract linear features of the data to be classified;
the classification module 33 is configured to input the linear features to a fully-oriented self-attention neural network to obtain a feature matrix of the data to be classified;
the classification module 33 is further configured to perform normalization processing on the feature matrix to obtain a prediction probability that the data to be classified belongs to each class label;
the data to be classified comprises n pieces of sub data, n is an integer and is greater than or equal to 2;
and the fully-pointed self-attention neural network obtains a feature matrix of the data to be classified by utilizing the context features among the subdata, the correlation features among the subdata and the importance degree of each subdata on predicting the class label of the data to be classified.
In the embodiment of the invention, the data classification device obtains a characteristic matrix of the data to be classified by inputting the extracted linear characteristics of the data to be classified into the fully-pointed self-attention neural network; normalizing the characteristic matrix to obtain the prediction probability of the data to be classified belonging to each class label; the data to be classified comprises n pieces of sub data, the characteristic matrix is obtained by utilizing the context characteristics among the sub data, the correlation characteristics among the sub data and the importance degree of each sub data on predicting the class label of the data to be classified through a full-pointing type self-attention neural network; the data feature information extracted by the data classification device is complete, the extraction effect is good, the universality is strong, and the obtained data classification prediction accuracy is high.
In some embodiments of the present invention, optionally, the classification module 33 is further configured to input the linear features into a fully-oriented self-attention neural network, so as to obtain a feature space of the data to be classified;
the feature space is used for calculating the similarity between the data to be classified and other data.
In the embodiment of the invention, the feature space of the data to be classified can be obtained according to the fully-oriented self-attention neural network, and the feature space is used for calculating the similarity between the data to be classified and other data.
Referring to fig. 7, fig. 7 is a second structural diagram of a data classification apparatus according to an embodiment of the present invention.
In some embodiments of the present invention, optionally, the classification module 33 includes: a first module 331, a second module 332, a third module 333, and a fourth module 334;
the first module 331 is configured to convert the linear feature into a non-linear feature using CNN, where CNN is CNN without pooling;
the second module 332 is configured to extract context features between the sub-data from the nonlinear features by using a Bi-directional gating loop unit Bi-GRU;
the third module 333 is configured to obtain correlation characteristics between the sub-data according to context characteristics between the sub-data by using a Multi-Head Attention mechanism;
the fourth module 334 is configured to obtain, according to the correlation characteristic, an importance degree of each subdata to predicting a category label of the to-be-classified data by using an Attention mechanism;
the classification module 33 is further configured to obtain a feature matrix of the to-be-classified data according to the context features among the sub-data, the correlation features among the sub-data, and the importance degree of each sub-data to predicting the class label of the to-be-classified data.
In the embodiment of the invention, the fully-oriented Self-Attention neural network comprises a CNN layer for converting linear characteristics into nonlinear characteristics, a Bi-GRU layer for extracting context characteristics among subdata according to the nonlinear characteristics, a Multi-Head Self-orientation mechanism for obtaining correlation characteristics among the subdata according to the context characteristics among the subdata, and an orientation mechanism for obtaining the importance degree of each subdata on predicting the class label of the data to be classified according to the correlation characteristics; the data classification device uses a CNN convolutional neural network strong convolutional kernel feature extractor to perform preliminary deep feature extraction on data to be classified, and meanwhile, feature information of the data to be classified can be completely extracted according to state correlation of front and back moments among all sub-data of the data to be classified and a feature matrix of the data to be classified, wherein the feature matrix can be obtained by representing context relation features of all the sub-data, and classification accuracy of the data to be classified is improved.
In some embodiments of the present invention, optionally, the second module 332 is further configured to extract context characteristics between the sub-data by:
Figure BDA0002903972440000191
Figure BDA0002903972440000192
Figure BDA0002903972440000193
wherein:
o is the nonlinear characteristic representation of the data to be classified, and O is { O ═ O1,o2,…on};
otVector representing the nonlinear characteristics of the tth sub-data input, t ∈ [1, n]T is a positive integer;
Figure BDA0002903972440000194
an output representing the hidden state of the (t-1) th sub-data forward of the Bi-GRU;
Figure BDA0002903972440000195
an output representing the Bi-GRU inverted hidden layer state of the (t-1) th sub-data;
Figure BDA0002903972440000196
an output representing the forward hidden state of the Bi-GRU for the tth sub-data;
Figure BDA0002903972440000197
an output representing the reverse hidden layer state of the Bi-GRU of the tth sub-data;
when t is equal to 1, the first step is carried out,
Figure BDA0002903972440000198
is obtained by pre-definition;
wtrepresenting the forward hidden layer state corresponding to the Bi-GRU of the tth sub-data
Figure BDA0002903972440000199
A corresponding weight;
vtrepresenting the reverse hidden layer state corresponding to the Bi-GRU of the tth sub-data
Figure BDA00029039724400001910
A corresponding weight;
btrepresenting the bias corresponding to the t-th sub-data hidden layer state;
Stfeature codes representing the tth sub-data, t ∈ [1, n [ ]]T is a positive integer;
s is the characteristic code of the data to be classified, and S is { S ═ S1,s2,…sn}; the feature codes of the data to be classified comprise context features among the subdata.
In the embodiment of the invention, the current time output of the data to be classified is related to the state of the previous time and the state of the next time, and the second module extracts the coding features which can represent the context relationship among the subdata from the nonlinear feature representation of the data to be classified output from the upper layer through the Bi-GRU, so that the feature description among the subdata of the data to be classified is more complete.
In some embodiments of the present invention, optionally, the third module 333 is further configured to perform linear transformation respectively through three linear transformation manners according to the context characteristics between the sub-data to obtain a query vector Q, a key vector K, and a value vector V:
Q=WQS
K=WKS
V=WVS
wherein:
s is the characteristic code of the data to be classified, and S is { S ═ S1,s2,…sn}; the feature codes of the data to be classified comprise context features among the subdata;
WQis a first linear transformation; q is a query vector obtained after the feature code S of the data to be classified is subjected to first linear transformation;
WKis a second linear transformation; k is a key vector obtained after the feature code S of the data to be classified is subjected to second linear transformation;
WVis a third linear transformation; v is a value vector obtained after the feature code S of the data to be classified is subjected to third linear transformation;
the third module 333 is further configured to perform linear transformation on the query vector Q, the key vector K, and the value vector V for l times to obtain a query vector Q subjected to linear transformation for l timeslKey vector KlSum vector Vl
Wherein:
Ql=QWQl
Kl=KWKl
Vl=VWVl
WQl、WKl、WVlall are parameters defined by initialization;
l is the number of linear transformation, l belongs to [1, m ], and l is a positive integer; m is the total segmentation amount of the multi-head attention mechanism;
Qlthe query vector is subjected to linear transformation for l times;
Klis the key vector after l times of linear transformation;
Vlis a value vector after l times of linear transformation;
the third module 333 is further configured to perform linear transformation on the query vector Q for l timeslKey vector KlSum vector VlComputing an attention subspace matrix Z subjected to linear transformation for the order of ll
Figure BDA0002903972440000211
Wherein the content of the first and second substances,
Figure BDA0002903972440000212
represents a key vector KlDimension (d);
the third module 333 is further configured to apply m of the i-th linearly transformed attention subspace matrices ZlSplicing the data and the weight matrix WOAnd multiplying to obtain a matrix Z of the data to be classified, wherein the matrix Z of the data to be classified comprises correlation characteristics among subdata.
In the embodiment of the invention, the third module captures the correlation characteristics and the long-time-sequence-distance interdependence characteristics among the subdata of the data to be classified through a Multi-Head Self-orientation mechanism, thereby effectively solving the defect of weak long-distance interdependence relation for capturing in GRU.
In some embodiments of the present invention, optionally, the fourth module 334 is further configured to extract the feature space M of the data to be classified by:
ui=tanh(wizi+bi)
Figure BDA0002903972440000213
Figure BDA0002903972440000214
M={m1,m2,…mn}
wherein:
Zian attention subspace matrix for the ith sub-data;
wiweight coefficient representing ith sub-data, biA bias coefficient representing the ith sub-data;
uia first weight coefficient representing the ith sub-data;
uwan attention matrix representing a random initialization;
αia second weight coefficient representing the ith sub-data;
mirepresenting a feature subspace of the ith sub-data, wherein the feature subspace of the ith sub-data comprises the importance degree of the ith sub-data to the prediction of the data to be classified; m represents the feature space of the data to be classified, and the feature space of the data to be classified comprises the importance degree of each subdata on predicting the class label probability of the data to be classified.
In the embodiment of the invention, the fourth module utilizes the Attention mechanism to help endow different weights to each input part, extracts more critical and important information, and obtains the importance degree of each subdata on predicting the class label probability of the data to be classified, so that the data classification prediction is more accurate, and meanwhile, larger expenses can not be brought to the calculation and storage of the data classification prediction.
In some embodiments of the present invention, optionally, the classification module 33 further includes a fifth module 335, and the fifth module 335 is configured to perform a nonlinear combination on the context features between the sub-data, the correlation features between the sub-data, and the importance degree of each sub-data on predicting the class label probability of the data to be classified, so as to obtain a feature matrix of the data to be classified.
In the embodiment of the invention, the fifth module performs nonlinear combination on the extracted context characteristics among the subdata, the correlation characteristics among the subdata and the importance degree of each subdata on predicting the class label probability of the data to be classified to obtain the characteristic matrix of the data to be classified, and the characteristic matrix fuses multidimensional characteristic information of the data to be classified, so that the classification prediction effect is more accurate when the data is classified.
In some embodiments of the present invention, optionally, the classification module 33 includes a sixth module 336, and the sixth module 336 is configured to perform normalization processing on the feature matrix by using a softmax classifier, so as to obtain a prediction probability that the data to be classified belongs to each class label.
In the embodiment of the invention, a sixth module calculates the prediction probability of the data to be classified belonging to each class label by inputting the characteristic matrix of the data to be classified into a softmax classifier; the implementation mode is simple, the classification effect is good, and the obtained data classification model is high in universality.
In some embodiments of the present invention, optionally, the data classification apparatus further includes a training module 34 for training a data classification model; the data classification model comprises an input layer, the fully-oriented self-attention neural network and an output layer, and the training module 34 comprises an input layer sub-module 341, a network layer sub-module 342 and an output layer sub-module 343.
In the embodiment of the invention, the data classification model comprises an input layer, the fully-oriented self-attention neural network and an output layer; the input layer is used for extracting linear characteristics of the training data; the fully-directional self-attention neural network is used for extracting a feature space and a feature matrix of the training data; and the output layer is used for obtaining the prediction probability of each class label of the training data.
In some embodiments of the present invention, optionally, the input layer sub-module 341 is configured to obtain training data and class labels of the training data;
the input layer sub-module 341 is further configured to extract linear features of the training data;
the network layer sub-module 342 is configured to obtain a feature space and a feature matrix of the training data by using the fully-oriented self-attention neural network;
the network layer sub-module 342 is further configured to output correlations between the training data according to the feature space to obtain similar data of each training data;
the output layer submodule 343 is configured to perform normalization processing on the feature matrix, and obtain a prediction probability that the training data belongs to each class of label.
In the embodiment of the invention, the training data of the data classification device is marked with corresponding class labels, and each training data comprises a plurality of subdata; inputting the extracted linear characteristics of the training data into a fully-directional self-attention neural network to obtain a characteristic space and a characteristic matrix of the training data; obtaining similar data of each training data according to the feature space; normalizing the characteristic matrix to obtain the prediction probability of each training data belonging to each class label; the feature matrix is obtained by utilizing the context features among the subdata, the correlation features among the subdata and the importance degree of each subdata on predicting the class label of the training data through a full-pointing type self-attention neural network; by the scheme of the embodiment of the invention, complete data characteristic information can be extracted, the extraction effect is good, and the obtained data classification prediction accuracy is high due to the strong universality of the obtained model.
In some embodiments of the present invention, optionally, each training data includes n pieces of sub data, n is an integer, and n is greater than or equal to 2;
the network layer sub-module 342 is further configured to convert the linear features of the training data into non-linear features by using CNN, where CNN is CNN without pooling;
the network layer sub-module 342 is further configured to extract context features between sub-data of the training data from the nonlinear features of the training data by using a Bi-directional gated loop unit Bi-GRU;
the network layer submodule 342 is further configured to obtain correlation characteristics between the sub-data of the training data according to context characteristics between the sub-data of the training data by using a Multi-Head Attention mechanism Multi-Head Self-Attention mechanism;
the network layer sub-module 342 is further configured to obtain, according to the correlation features, a degree of importance of each sub-data of the training data to predict a class label of the training data, using an Attention mechanism.
In some embodiments of the present invention, optionally, the network layer sub-module 342 is further configured to perform a nonlinear combination on the context features among the sub-data, the correlation features among the sub-data, and the importance degree of each sub-data on predicting the class label probability of the data to be classified, so as to obtain a feature matrix of the data to be classified.
In the embodiment of the invention, the fully-oriented Self-Attention neural network comprises a CNN layer for converting linear characteristics into nonlinear characteristics, a Bi-GRU layer for extracting context characteristics among subdata according to the nonlinear characteristics, a Multi-Head Self-orientation mechanism for obtaining correlation characteristics among the subdata according to the context characteristics among the subdata, and an orientation mechanism for obtaining the importance degree of each subdata on predicting the class label of corresponding training data according to the correlation characteristics; the CNN convolutional neural network strong convolutional kernel feature extractor is used for carrying out preliminary deep feature extraction on training data, and meanwhile, feature matrixes obtained by extracting state correlation of front and back moments among sub data of the training data and context relationship features of the sub data can completely extract feature information of the training data, so that the classification accuracy of a data classification model is further improved.
Referring to fig. 8, fig. 8 is a diagram illustrating a structure of a data classifying device according to an embodiment of the present invention. An embodiment of the present invention further provides a data classification device 40, including: a memory 401, a processor 402 and a program stored on the memory 401 and executable on the processor 402; the processor 402 is configured to read the program in the memory 401 to implement each process of any one of the above data classification method embodiments, and can achieve the same technical effect, and for avoiding repetition, details are not repeated here.
The embodiment of the present invention further provides a readable storage medium, which is used for storing a program, and when the program is executed by a processor, the program implements each process of any one of the data classification method embodiments described above, and can achieve the same technical effect, and for avoiding repetition, the detailed description is omitted here. The readable storage medium may be any available medium or data storage device that can be accessed by a processor, including but not limited to magnetic memory (e.g., floppy disk, hard disk, magnetic tape, magneto-optical disk (MO), etc.), optical memory (e.g., CD, DVD, BD, HVD, etc.), and semiconductor memory (e.g., ROM, EPROM, EEPROM, nonvolatile memory (NAND FLASH), Solid State Disk (SSD)), etc.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. With such an understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the methods according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (13)

1. A method of data classification, comprising the steps of:
acquiring data to be classified;
extracting linear features of the data to be classified;
inputting the linear features into a fully-directional self-attention neural network to obtain a feature matrix of the data to be classified;
normalizing the characteristic matrix to obtain the prediction probability of the data to be classified belonging to each class label;
the data to be classified comprises n pieces of sub data, n is an integer and is greater than or equal to 2;
and the fully-pointed self-attention neural network obtains a feature matrix of the data to be classified by utilizing the context features among the subdata, the correlation features among the subdata and the importance degree of each subdata on predicting the class label of the data to be classified.
2. The data classification method according to claim 1, further comprising, after the inputting the linear features into a fully-oriented self-attention neural network:
and acquiring a feature space of the data to be classified, wherein the feature space is used for calculating the similarity between the data to be classified and other data.
3. The data classification method according to claim 1, wherein the inputting the linear features into a fully-oriented self-attention neural network to obtain a feature matrix of the data to be classified comprises:
converting the linear features into nonlinear features by using a Convolutional Neural Network (CNN), wherein the Convolutional Neural Network (CNN) is CNN which is not pooled;
extracting context characteristics among the subdata from the nonlinear characteristics by using a Bi-directional gating circulation unit Bi-GRU;
obtaining correlation characteristics among the subdata according to context characteristics among the subdata by using a Multi-Head Attention mechanism Multi-Head Self-Attention mechanism;
using an Attention mechanism, and obtaining the importance degree of each subdata on predicting the class label of the data to be classified according to the correlation characteristics;
and obtaining a feature matrix of the data to be classified according to the context features among the subdata, the correlation features among the subdata and the importance degree of each subdata on predicting the class label of the data to be classified.
4. The data classification method of claim 3, wherein the extracting the context features between the sub-data from the nonlinear features by using a Bi-directional gated loop unit Bi-GRU comprises:
extracting context features between the sub-data by:
Figure FDA0002903972430000021
Figure FDA0002903972430000022
Figure FDA0002903972430000023
wherein:
o is the nonlinear characteristic representation of the data to be classified, and O is { O ═ O1,o2,...on};
otVector representing the nonlinear characteristics of the tth sub-data input, t ∈ [1, n]T is a positive integer;
Figure FDA0002903972430000024
an output representing the hidden state of the (t-1) th sub-data forward of the Bi-GRU;
Figure FDA0002903972430000025
an output representing the Bi-GRU inverted hidden layer state of the (t-1) th sub-data;
Figure FDA0002903972430000026
an output representing the forward hidden state of the Bi-GRU for the tth sub-data;
Figure FDA0002903972430000027
an output representing the reverse hidden layer state of the Bi-GRU of the tth sub-data;
when t is equal to 1, the first step is carried out,
Figure FDA0002903972430000028
is obtained by pre-definition;
wtrepresenting the forward hidden layer state corresponding to the Bi-GRU of the tth sub-data
Figure FDA0002903972430000029
A corresponding weight;
vtrepresents the t-th sub-data of Bi-Reverse hidden layer state corresponding to GRU
Figure FDA00029039724300000210
A corresponding weight;
btrepresenting the bias corresponding to the t-th sub-data hidden layer state;
s is the characteristic code of the data to be classified, and S is { S ═ S1,s2,...sn}; the feature codes of the data to be classified comprise context features among the subdata;
Stfeature codes representing the tth sub-data, t ∈ [1, n [ ]]And t is a positive integer.
5. The method of claim 3, wherein the obtaining the correlation characteristics between the subdata according to the context characteristics between the subdata by using a Multi-Head Self-Attention mechanism comprises:
s1: and respectively carrying out linear transformation through three linear transformation modes according to the context characteristics among the subdata to obtain a query vector Q, a key vector K and a value vector V:
Q=WQS
K=WKS
V=WVS
wherein:
s is the characteristic code of the data to be classified, and S is { S ═ S1,s2,...sn}; the feature codes of the data to be classified comprise context features among the subdata;
WQis a first linear transformation; q is a query vector obtained after the feature code S of the data to be classified is subjected to first linear transformation;
WKis a second linear transformation; k is a key vector obtained after the feature code S of the data to be classified is subjected to second linear transformation;
WVis a third linear transformation; v is a value vector obtained after the feature code S of the data to be classified is subjected to third linear transformation;
s2: performing linear transformation on the query vector Q, the key vector K and the value vector V for l times to obtain a query vector Q subjected to linear transformation for l timeslKey vector KlSum vector Vl
Wherein:
Ql=QWQl
Kl=KWKl
Vl=VWVl
WQl、WKl、WVlall are parameters defined by initialization;
l is the number of linear transformation, l belongs to [1, m ], and l is a positive integer; m is the total segmentation amount of the multi-head attention mechanism;
Qlthe query vector is subjected to linear transformation for l times;
Klis the key vector after l times of linear transformation;
Vlis a value vector after l times of linear transformation;
s3: according to the query vector Q subjected to the linear transformation for the times IlKey vector KlSum vector VlComputing an attention subspace matrix Z subjected to linear transformation for the order of ll
Figure FDA0002903972430000031
Wherein the content of the first and second substances,
Figure FDA0002903972430000032
represents a key vector KlDimension (d);
s4: m said i times linearly transformed attention subspace matrix ZlSplicing the data and the weight matrix WOAnd multiplying to obtain a matrix Z of the data to be classified, wherein the matrix Z of the data to be classified comprises correlation characteristics among subdata.
6. The data classification method according to claim 3, wherein the deriving the importance of each subdata to predict the class label probability of the data to be classified according to the correlation features by using an Attention mechanism comprises:
extracting a feature space M of the data to be classified by the following method:
ui=tanh(wizi+bi)
Figure FDA0002903972430000041
Figure FDA0002903972430000042
M={m1,m2,...mn}
wherein:
Zian attention subspace matrix for the ith sub-data;
wiweight coefficient representing ith sub-data, biA bias coefficient representing the ith sub-data;
uia first weight coefficient representing the ith sub-data;
uwan attention matrix representing a random initialization;
αia second weight coefficient representing the ith sub-data;
mirepresenting a feature subspace of the ith sub-data, wherein the feature subspace of the ith sub-data comprises the importance degree of the ith sub-data to the prediction of the data to be classified;
m represents the feature space of the data to be classified, and the feature space of the data to be classified comprises the importance degree of each subdata on predicting the class label probability of the data to be classified.
7. The data classification method of claim 3, wherein the obtaining the feature matrix of the to-be-classified data according to the context features among the sub-data, the correlation features among the sub-data, and the importance degree of each sub-data to predicting the class label probability of the to-be-classified data comprises:
and carrying out nonlinear combination on the context characteristics among the subdata, the correlation characteristics among the subdata and the importance degree of each subdata on predicting the class label probability of the data to be classified to obtain a characteristic matrix of the data to be classified.
8. The data classification method according to claim 1, wherein the normalizing the feature matrix to obtain the prediction probability that the data to be classified belongs to each class label comprises:
and carrying out normalization processing on the characteristic matrix by using a softmax classifier to obtain the prediction probability of the data to be classified belonging to each class of labels.
9. The method of data classification of claim 1, the method further comprising:
training a data classification model comprising an input layer, the fully-oriented self-attentive neural network, and an output layer.
10. The data classification method of claim 9, wherein the training of the data classification model comprises:
acquiring training data and a class label of the training data;
extracting linear features of the training data;
obtaining a feature space and a feature matrix of the training data by using a fully-oriented self-attention neural network;
outputting the correlation among the training data according to the feature space to obtain similar data of each training data;
and carrying out normalization processing on the characteristic matrix to obtain the prediction probability of the training data belonging to each class of labels.
11. The data classification method according to claim 10,
each training data comprises n sub-data, wherein n is an integer and is greater than or equal to 2;
the obtaining the feature space and the feature matrix of the training data by using the fully-directional self-attention neural network comprises:
converting the linear characteristic of the training data into a nonlinear characteristic by using a Convolutional Neural Network (CNN), wherein the CNN is a CNN which is not pooled;
extracting context characteristics among subdata of the training data from nonlinear characteristics of the training data by using a Bi-directional gating circulation unit Bi-GRU;
obtaining correlation characteristics among subdata of the training data according to context characteristics among the subdata of the training data by using a Multi-Head Attention mechanism Multi-Head Self-Attention mechanism;
and obtaining the importance degree of each subdata of the training data to predict the class label of the training data according to the correlation characteristic by using an Attention mechanism.
12. A data sorting apparatus comprising: a memory, a processor, and a program stored on the memory and executable on the processor; characterized in that the processor, which is adapted to read a program in the memory, implements the steps in the data classification method according to any one of claims 1 to 11.
13. A readable storage medium storing a program, which when executed by a processor performs the steps in the data classification method according to any one of claims 1 to 11.
CN202110065077.7A 2021-01-18 2021-01-18 Data classification method, equipment and readable storage medium Pending CN112766368A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110065077.7A CN112766368A (en) 2021-01-18 2021-01-18 Data classification method, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110065077.7A CN112766368A (en) 2021-01-18 2021-01-18 Data classification method, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN112766368A true CN112766368A (en) 2021-05-07

Family

ID=75702831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110065077.7A Pending CN112766368A (en) 2021-01-18 2021-01-18 Data classification method, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN112766368A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113450828A (en) * 2021-06-25 2021-09-28 平安科技(深圳)有限公司 Music genre identification method, device, equipment and storage medium
CN117540306A (en) * 2024-01-09 2024-02-09 腾讯科技(深圳)有限公司 Label classification method, device, equipment and medium for multimedia data

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113450828A (en) * 2021-06-25 2021-09-28 平安科技(深圳)有限公司 Music genre identification method, device, equipment and storage medium
CN117540306A (en) * 2024-01-09 2024-02-09 腾讯科技(深圳)有限公司 Label classification method, device, equipment and medium for multimedia data
CN117540306B (en) * 2024-01-09 2024-04-09 腾讯科技(深圳)有限公司 Label classification method, device, equipment and medium for multimedia data

Similar Documents

Publication Publication Date Title
Gabeur et al. Multi-modal transformer for video retrieval
CN110491416B (en) Telephone voice emotion analysis and identification method based on LSTM and SAE
CN108694225B (en) Image searching method, feature vector generating method and device and electronic equipment
CN106570464B (en) Face recognition method and device for rapidly processing face shielding
CN110209824B (en) Text emotion analysis method, system and device based on combined model
CN112818861B (en) Emotion classification method and system based on multi-mode context semantic features
CN111783474A (en) Comment text viewpoint information processing method and device and storage medium
CN113177141B (en) Multi-label video hash retrieval method and device based on semantic embedded soft similarity
CN110727765B (en) Problem classification method and system based on multi-attention machine mechanism and storage medium
CN111414461A (en) Intelligent question-answering method and system fusing knowledge base and user modeling
CN111859010B (en) Semi-supervised audio event identification method based on depth mutual information maximization
CN107545276A (en) The various visual angles learning method of joint low-rank representation and sparse regression
CN112749549B (en) Chinese entity relation extraction method based on incremental learning and multi-model fusion
CN113806554B (en) Knowledge graph construction method for massive conference texts
CN113822125B (en) Processing method and device of lip language recognition model, computer equipment and storage medium
Huang et al. Large-scale weakly-supervised content embeddings for music recommendation and tagging
CN112766368A (en) Data classification method, equipment and readable storage medium
CN115831102A (en) Speech recognition method and device based on pre-training feature representation and electronic equipment
CN113255366A (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN114694255B (en) Sentence-level lip language recognition method based on channel attention and time convolution network
CN113836896A (en) Patent text abstract generation method and device based on deep learning
CN113593606B (en) Audio recognition method and device, computer equipment and computer-readable storage medium
CN113065356B (en) IT equipment operation and maintenance fault suggestion processing method based on semantic analysis algorithm
Kang et al. Pivot correlational neural network for multimodal video categorization
CN115481313A (en) News recommendation method based on text semantic mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination