CN113157921A

CN113157921A - Chinese text classification method integrating radical semantics

Info

Publication number: CN113157921A
Application number: CN202110388441.3A
Authority: CN
Inventors: 刘忠宝; 荀恩东; 赵文娟
Original assignee: BEIJING LANGUAGE AND CULTURE UNIVERSITY
Current assignee: BEIJING LANGUAGE AND CULTURE UNIVERSITY
Priority date: 2021-04-12
Filing date: 2021-04-12
Publication date: 2021-07-23
Anticipated expiration: 2041-04-12
Also published as: CN113157921B

Abstract

The invention provides a Chinese text classification method integrated with radical semantics, which comprises the steps of firstly using a BERT model to carry out vectorization representation on a Chinese text and a radical, then respectively using different deep learning models to extract the characteristics of the Chinese text and the components, finally fusing the characteristic vectors of the Chinese text and the components, realizing the classification of the Chinese text by using a softmax classifier, the proposal fully considers Chinese as a unique pictograph, the components of the characters contain rich semantic information, plays an important role in the semantic understanding process of Chinese texts, can further improve the performance of the classifier, based on the information content of the components, the training complexity is greatly reduced, and the purposes of reducing the learning time, improving the classification efficiency, organically combining the advantages of different Chinese text classification methods and realizing efficient and accurate Chinese text classification are achieved.

Description

Chinese text classification method integrating radical semantics

Technical Field

The invention relates to a Chinese text classification method integrated with radical semantics, belonging to the technical field of computers.

Background

The Chinese text classification refers to a process of automatically classifying Chinese texts according to a certain classification system or rule, and is widely applied to the fields of information indexing, digital book management, information filtering and the like.

Methods for Chinese text classification are generally classified into three categories: a classification method based on Knowledge Engineering (KE), a classification method based on Machine Learning (ML), and a classification method based on Deep Learning (DL).

The classification method based on knowledge engineering refers to the manual text classification according to classification task rules written by domain experts, and the method has obvious inefficiency and limitation, and although some achievements are achieved, the method is eliminated quickly.

The classification method based on machine learning means that automatic classification of texts is realized by independently learning and extracting text classification rules through a computer, the classification method has high efficiency and strong portability, is widely applied to the field of Chinese text classification, but still has the defects, such as: the classification effect of the naive Bayes algorithm depends on the prior probability, and the expression form of the input data can greatly influence the classification result of the Chinese text; support vector machine algorithms are sensitive to missing data and have no general solution to the non-linear problem; the decision tree algorithm is easy to ignore the correlation of the attributes of the data set when text classification is carried out, and an overfitting problem occurs; the neural network algorithm has a large number of parameters to be determined in the process of training data, the learning process among networks cannot be observed, the learning time is long, and the output result is difficult to explain.

The deep learning-based classification method is used for extracting features of the Chinese text in the process of constructing a deep learning model, so that higher-level and more abstract semantic representations are obtained, and the Chinese text is classified. The method comprises the steps of firstly using a BERT pre-training language model to express feature vectors of sentences of a text, then inputting the feature vectors of the sentences of the text into a softmax regression model to realize Chinese text classification, and obtaining a better Chinese text classification effect. However, the method is directly transplanted to English text classification, the characteristics of Chinese characters are ignored, the structure of a pre-training model is huge, and a large amount of data and equipment resources are needed to complete the training process.

Although the three different Chinese text classification methods can meet the target requirement of classifying Chinese texts, the problems of low algorithm efficiency, poor field pertinence, easy occurrence of overfitting in the learning process and the like still exist, and the hot problem of how to reduce the learning time, improve the classification efficiency, organically combine the advantages of the different Chinese text classification methods and realize efficient and accurate Chinese text classification is the research in the field of natural language processing.

Disclosure of Invention

In order to solve the defects of the prior art, the invention provides a Chinese text classification method integrating radical semantics, which comprises the steps of firstly using a BERT model to carry out vectorization representation on a Chinese text and a radical, then respectively using different deep learning models to carry out feature extraction on the Chinese text and the radical, and finally fusing feature vectors of the Chinese text and the radical, and realizing Chinese text classification by using a softmax classifier The purpose of efficiently and accurately classifying the Chinese text is achieved.

The technical scheme adopted by the invention for solving the technical problem is as follows: the Chinese text classification method integrated with the radical semantics comprises the following steps:

s1, forming the components of each character in the Chinese text set into a component set, and forming the Chinese text set and the component set into a training set;

s2, vectorizing and representing the Chinese text set and the component set in the training set to obtain a Chinese text vector and a component vector;

s3, extracting the characteristics of the Chinese text vector and the radical vector by using a deep learning model to obtain a Chinese text characteristic vector and a radical characteristic vector;

and S4, fusing the Chinese text feature vectors and the radical feature vectors, and classifying the Chinese text by using a classifier.

In step S1, the data set is preprocessed to obtain a chinese text set, including removing noise data in the data set that is not related to text classification.

In step S1, the component set is obtained by mapping chinese characters and radicals in the xinhua dictionary data set.

Step S2 is to adopt the BERT model to vectorially represent the chinese text set and the radical set in the training set, which specifically includes the following procedures:

s2.1, respectively representing a Chinese text set and a radical set by using a word vector, a segment vector and a position vector, wherein the Chinese text set is recorded as

Radicals are collected as

；

S2.2, respectively collecting Chinese text sets

And the component set

Inputting the encoder of the BERT model, and training to obtain Chinese text vector

And radical vector

。

In step S3, a deep learning model is selected according to the characteristics of the chinese text set corresponding to the chinese text vector to perform feature extraction on the chinese text vector and the radical vector.

The deep learning model comprises one or more than 2 combinations of a bidirectional circulation neural network Bi-RNN, a bidirectional long and short memory network Bi-LSTM, a bidirectional circulation neural network ATT-Bi-RNN introducing an attention mechanism and a bidirectional long and short memory network ATT-Bi-LSTM introducing the attention mechanism, and feature extraction is carried out according to the following four conditions:

a. if the Chinese text set corresponding to the Chinese text vector is a simple short text, performing feature extraction on the Chinese text vector and the radical vector by adopting a Bi-directional recurrent neural network Bi-RNN;

b. if the Chinese text set corresponding to the Chinese text vector is a complex short text, performing feature extraction on the Chinese text vector and the radical vector by adopting a bidirectional recurrent neural network ATT-Bi-RNN;

c. if the Chinese text set corresponding to the Chinese text vector is a long text with simple semantic expression, performing feature extraction on the Chinese text vector and the radical vector by adopting a bidirectional long-short memory network Bi-LSTM;

d. if the Chinese text set corresponding to the Chinese text vector is a long text with complex semantic expression, extracting the characteristics of the Chinese text vector and the radical vector by adopting a bidirectional long-short memory network ATT-Bi-LSTM introducing an attention mechanism;

wherein the content of the first and second substances,tthe update of the RNN neuron information at the time is expressed by the following equation:

（1）

（2）

wherein the content of the first and second substances,

to representtThe time of day implies the information of the layer,

to representt-1 time instant the information of the hidden layer,

a weight matrix representing the input information,

representation updatet-1 a weight matrix of time instant information,

to representtInformation of the input layer at the moment whentWhen =1

I.e. a chinese text vector or a radical vector,

representation updatet-1 a matrix of offset values for the time instance information,

to representtThe time instants imply the information output by the layer,

representation updatetThe time instants imply a weight matrix of the layer output information,

representation updatetA matrix of offset values for the time instant implicit layer output information,

in the form of a function of the hyperbolic tangent,

is a normalized exponential function;

ttemporal LSTM spiritThe updating of the meta information is expressed by the following formula:

（3）

（4）

（5）

（6）

（7）

（8）

wherein

A sigmoid activation function is represented,

a forgetting gate weight matrix is represented,

represents the output gate weight matrix,

Indicating entry rightsThe weight matrix is a matrix of the weight,

a weight matrix representing the current information is represented,

a matrix of forgetting gate bias values is represented,

a matrix of input gate offset values is represented,

a matrix of output gate offset values is represented,

a matrix of bias values representing the information is represented,

to representtA temporary variable of the time of day information,

to representtThe information of the text of the moment of time,

to representt-a text message of a time instant 1,

to representtInformation input at a moment whentWhen =1

I.e. a chinese text vector or a radical vector,

to representt-hidden layer information at time 1,

to representtThe time implies layer information;

the ATT attention handling mechanism is expressed by the following formula:

（9）

（10）

（11）

whereinHRepresents the vector sum of the output layers of the bidirectional recurrent neural network ATT-Bi-RNN or the bidirectional long and short memory network ATT-Bi-LSTM introducing attention mechanism,Mto representHThe vector matrix after the calculation of the tanh function,

a transposed matrix representing the weights of the keywords,

represents passing through

The vector matrix after the function is calculated,

to represent

The transpose matrix of (a) is,Yrepresenting the output of the ATT attention handling mechanism.

In step S4, the text feature vector and the radical feature vector are fused using the following formulas:

（12）

wherein the content of the first and second substances,

the feature vector after the fusion is represented,

the feature vector of the Chinese text is represented,

representing radical feature vectors.

In step S4, the softmax classifier is used to classify the chinese text, which is expressed by the following formula:

（13）

wherein the content of the first and second substances,Rthe result of the classification of the chinese text is represented,

a matrix of weights is represented by a matrix of weights,

representing a matrix of bias values.

The invention has the beneficial effects based on the technical scheme that:

the Chinese text classification method integrated with the radical semantics, provided by the invention, takes Chinese texts and radicals as research objects, obtains abundant semantic information from the Chinese texts and the radicals, firstly uses a BERT model to carry out vectorization representation on the Chinese texts and the radicals, then respectively uses different deep learning models to carry out feature extraction on the Chinese texts and the radicals, finally fuses feature vectors of the Chinese texts and the radicals, and utilizes a softmax classifier to realize Chinese text classification. The experimental result not only shows the superiority of the Chinese text classification model provided by the invention, but also verifies the effectiveness of the components in the Chinese text classification task, solves the problems of low efficiency, poor field pertinence, easy occurrence of overfitting in the learning process and the like of the traditional Chinese text classification algorithm, achieves the purposes of reducing the learning time, improving the classification efficiency and organically combining the advantages of different Chinese text classification methods, and realizes the efficient and accurate Chinese text classification.

Drawings

FIG. 1 is a model diagram of a Chinese text classification method with incorporated radical semantics according to the present invention.

FIG. 2 is a schematic diagram of a BERT model training process.

FIG. 3 is a schematic diagram of the Bi-RNN model.

FIG. 4 is a schematic diagram of RNN neuron structure.

FIG. 5 is a diagram of the Bi-LSTM model.

FIG. 6 is a schematic diagram of the structure of an LSTM neuron.

Detailed Description

The invention is further illustrated by the following figures and examples.

The research idea of the invention is as follows:

chinese is a language derived from pictographs, and not only can express specific semantic information through characters, but also components contain rich semantic information. As shown in table 1:

component side	Name (R)	Examples of the present invention
			Bean curd	Beside the handle	Picking, picking and carrying
	Side of foot character	Kicking, running and jumping
			Chinese medicine	Three-point water	River and sea
	Side of disease character	Pain, ache and scar
			Rice and its production process	Chinese character mi side	Powder, material and grain
Soil for soil	Side for lifting soil	Ground, city

TABLE 1 radical introduction

Use "hand" to "beat" or "pluck"; use the "foot" to "kick" or "run"; "river" and "sea" are related to the meaning of "water"; "ground" and "city" are related to the meaning of "soil", etc., and these examples fully reveal the importance of the radical for semantic understanding, but existing research rarely uses the radical for Chinese text classification. Therefore, the method takes the Chinese text and the components as research objects, obtains richer semantic information from the research objects, and improves the classification effect of the Chinese text.

Example (b):

the invention provides a Chinese text classification method integrated with radical semantics, which comprises the following steps with reference to fig. 1:

s1, training set preprocessing: the method comprises the steps that the components of each character in a Chinese text set form a component set, and the Chinese text set and the component set form a training set; first, noise data of the chinese dataset that is not related to the text classification is removed, for example: stop words, web site links, English letters, and the like. Then, to obtain the radicals of each Chinese character in the data set, the Xinhua dictionary data set is used^[1]Mapping of each Chinese character and each radical is realized, and the Xinhua dictionary data set comprises all Chinese characters and radicals appearing in the data set, wherein the Xinhua dictionary data set comprises 20849 Chinese characters and 270 radicals;

for example, the text in the Chinese text set is 'the player comes from football family', and the corresponding radical is 'Wangkouzuowanyi '.

S2, centralizing trainingVectorizing and expressing the Chinese text set and the component set to obtain a Chinese text vector and a component vector; referring to fig. 2, the BERT model may be specifically used to vectorize the chinese text set and the radical set in the training set, respectively. The BERT model is applied to a transform encoder [8 ]]The method is a bidirectional language model obtained by improving a GPT (general Pre-tracing, GPT) language model of a main body structure. As shown in fig. 1, when the BERT model vectorizes a Chinese text (a player comes from a football family) or a radical (wankou from king \ 2342452), the trained word vector model has better generalization capability by randomly masking some words in the Chinese text or the radical and using the unmasked words for prediction. The vectorization process of the BERT model for the text and components of a Chinese language is shown in FIG. 2, in which

The sum of the word vectors, segment vectors and position vectors representing chinese text or radicals, trm representing the Transformer encoder,

and vectors representing the Chinese text or the components obtained after training.

The method specifically comprises the following steps:

Radicals are collected as

；

S2.2, respectively collecting Chinese text sets

And the component set

Inputting the encoder of the BERT model, and training to obtain Chinese text(Vector)

And radical vector

。

S3, extracting the characteristics of the Chinese text vector and the radical vector by using a deep learning model to obtain a Chinese text characteristic vector and a radical characteristic vector; and selecting a deep learning model according to the characteristics of the Chinese text set corresponding to the Chinese text vector to extract the characteristics of the Chinese text vector and the radical vector.

The deep learning model comprises one or more than 2 combinations of a Bidirectional circulation Neural network Bi-RNN (Bi-RNN), a Bidirectional Long Short Memory network Bi-LSTM (Bidirectional Long Short Memory, Bi-LSTM), an Attention-drawing Bidirectional circulation Neural network ATT-Bi-RNN (Attention-Based Bidirectional Short Memory access, ATT-Bi-RNN) and an Attention-drawing Bidirectional Long Short Memory network ATT-Bi-LSTM (Attention-Based Bidirectional Long Short Memory access, ATT-Bi-LSTM), and feature extraction is carried out according to the following four conditions:

a. if the Chinese text set corresponding to the Chinese text vector is a simple short text, for example, the Zhao Li Ying Ma is really wearing thick-bottom shoes and comfortable without feet, a bidirectional recurrent neural network Bi-RNN is adopted to extract the characteristics of the Chinese text vector and the radical vector; the Bi-RNN model is shown in FIG. 3, wherein RNN represents neurons of the recurrent neural network, as shown in FIG. 4;

b. if the Chinese text set corresponding to the Chinese text vector is a long text with simple semantic expression, performing feature extraction on the Chinese text vector and the radical vector by adopting a bidirectional long and short memory network Bi-LSTM, as shown in FIG. 5, wherein the LSTM represents neurons of the long and short memory network, as shown in FIG. 6;

in FIGS. 3 and 6

Represents each hourThe information of the depth model is input at an instant,

representing the features of each time point obtained by the learning of the depth model,

a weight matrix representing updated information between the input layer and the previous layer,

a weight matrix representing updated information between the previous-time hidden layer and the current-time hidden layer,

a weight value representing update information between the input layer and the backward layer,

a weight matrix representing updated information between the previous layer and the output layer,

a weight matrix representing updated information between the hidden layer at the later time and the hidden layer at the current time,

a weight matrix representing updated information between the backward layer and the output layer.

c. If the Chinese text set corresponding to the Chinese text vector is a complex short text, extracting features of the Chinese text vector and the radical vector by using a bidirectional recurrent neural network ATT-Bi-RNN, as shown in FIG. 7, wherein ATT represents an attention processing mechanism, as shown in FIG. 9;

the ATT-Bi-RNN can give different weights to different words in complex short text expressions, and highlight the effect of keywords, so that the modified text classification effect is achieved, for example: "did the revolution in 1911 make an aggressive missense", the ATT-Bi-RNN model would give greater weight to the revolution in 1911, and thus correctly classify it as a historical text.

d. If the Chinese text set corresponding to the Chinese text vector is a long text with complex semantic expression, feature extraction is performed on the Chinese text vector and the radical vector by using a bidirectional long-short memory network ATT-Bi-LSTM with attention mechanism introduced, as shown in FIG. 9.

In the deep learning model described above, the learning model,tthe update of the RNN neuron information at the time is expressed by the following equation:

（1）

（2）

wherein the content of the first and second substances,

to representtThe time of day implies the information of the layer,

to representt-1 time instant the information of the hidden layer,

a weight matrix representing the input information,

representation updatet-1 a weight matrix of time instant information,

to representtInformation of the input layer at the moment whentWhen =1

I.e. a chinese text vector or a radical vector,

to representtThe time instants imply the information output by the layer,

in the form of a function of the hyperbolic tangent,

is a normalized exponential function;

the depth model automatically updates the parameters of the matrixes according to the training process to obtain the optimal solution, wherein the parameters can be updated by adopting an Adagarad method in the training process of the ATT-Bi-LSTM model.

tThe update of LSTM neuron information at a time is expressed by the following equation:

（3）

（4）

（5）

（6）

（7）

（8）

wherein

A sigmoid activation function is represented,

a forgetting gate weight matrix is represented,

represents the output gate weight matrix,

A weight matrix of the input gates is represented,

a weight matrix representing the current information is represented,

a matrix of forgetting gate bias values is represented,

a matrix of input gate offset values is represented,

a matrix of output gate offset values is represented,

a matrix of bias values representing the information is represented,

to representtA temporary variable of the time of day information,

to representtThe information of the text of the moment of time,

to representt-a text message of a time instant 1,

to representtInformation input at a moment whentWhen =1

I.e. a chinese text vector or a radical vector,

to representt-hidden layer information at time 1,

to representtThe time implies layer information;

the ATT attention handling mechanism is expressed by the following formula:

（9）

（10）

（11）

a transposed matrix representing the weights of the keywords,

represents passing through

The vector matrix after the function is calculated,

to represent

And S4, fusing the Chinese text feature vectors and the radical feature vectors, and classifying the Chinese text by using a classifier. Specifically, the following formula is used for fusing the Chinese text feature vector and the radical feature vector:

（12）

wherein the content of the first and second substances,

the feature vector after the fusion is represented,

the feature vector of the Chinese text is represented,

representing radical feature vectors;

and then, classifying the Chinese text by using a softmax classifier, wherein the Chinese text is expressed by the following formula:

（13）

a matrix of weights is represented by a matrix of weights,

representing a matrix of bias values.

Experimental data:

in this embodiment, a THUCNEWS data set is selected for an experiment, where the data set includes 74 ten thousand new wave news texts, and a writer removes english, english symbols, numbers, and stop words in the texts on the basis of an original new wave news classification system, and manually labels 70241 news texts to divide the texts into: finance, lottery, real estate, stock, home, education, science and technology, society, fashion, time, sports, constellation, games and entertainment 14 categories, wherein the training set comprises 57981 news texts and the test set comprises 12260 news texts.

Setting model parameters:

the default hyper-parameters of the model are used for training the hyper-parameters of each model, and the specific settings are shown in tables 2 and 3:

parameter(s)	BERT
		max_seq_length	50
dimsh	768

TABLE 2 BERT model parameter settings

Parameter(s)	Bi-RNN	Bi-LSTM	ATT- Bi-RNN	ATT- Bi-LSTM
					batch_size	128	128	128	128
epoch	40	40	40	40
					dropout	0.5	0.5	0.5	0.5
learning_rate	0.0001	0.0001	0.0001	0.0001
					num_nodes	128	128	128	128
max _length	500	500	500	500

TABLE 3 depth model parameter settings

Wherein max _ seq _ length represents the maximum length of an input text, dimsh represents the vector dimension of each word, batch _ size represents the number of texts input after training once, epoch represents the training times of all texts, dropout represents the parameter value for solving the neural network overfitting problem, learning _ rate represents the learning rate, num _ nodes represents the number of neurons in an implicit layer, and max _ length represents the training time step.

Evaluation indexes are as follows:

the text classification experiment result is evaluated by using evaluation indexes such as accuracy P (precision), recall rate R (recall), F value (F-value) and the like, and the evaluation indexes are calculated as shown in the following formulas:

（14）

（15）

（16）

wherein A, B, C represents the number of correctly recognized, incorrectly recognized, and unrecognized chinese texts, respectively.

The experimental results are as follows:

in order to verify the superiority of the Chinese text classification method integrated with the radical semantics and the effectiveness of the radical in the Chinese text classification task, the Chinese text classification task is completed by two different methods, namely: a method not incorporating radical training; the second method comprises the following steps: the invention provides a method for integrating the Chinese text with the radical semantics, and particularly relates to a method for integrating the radical training. The results of the experiment are shown in table 4.

TABLE 4 results of the first and second methods

The F value of the Bi-LSTM model in the table 4 is improved by about 0.03 compared with the F value of the Bi-RNN model, because the problem of gradient disappearance can occur in the process of continuously propagating the update parameters backwards by RNN, so that the semantic information is lost, the long-term dependence relationship of the semantic information is difficult to learn, and the long-term dependence relationship of the semantic information is well learned by introducing a gate structure into the LSTM, so that the F value of the Bi-LSTM model is improved; the F value of the ATT-Bi-LSTM model is improved by about 0.2 compared with the F value of the LSTM model, and the F value of the ATT-Bi-RNN model is also improved by about 0.2 compared with the F value of the RNN model, because the attention mechanism can automatically find words which play a key role in classification in the text and give different weights to the words, more important semantic information is captured from the text, and the F value is improved after the attention mechanism is added; compared with the F values of the first and second learning model methods with different depths, the F value of the second learning model method is improved, which shows that more accurate semantic information can be obtained through the components, and the Chinese text classification effect can be improved; the average value of P, R, F of all deep learning models is about 0.82, which shows that the Chinese text classification model provided by the invention can achieve better text classification effect.

Claims

1. A Chinese text classification method integrated with radical semantics is characterized by comprising the following steps:

2. The method of claim 1, wherein the method comprises the following steps: in step S1, the data set is preprocessed to obtain a chinese text set, including removing noise data in the data set that is not related to text classification.

3. The method of claim 1, wherein the method comprises the following steps: in step S1, the component set is obtained by mapping chinese characters and radicals in the xinhua dictionary data set.

4. The method of claim 1, wherein the method comprises the following steps: step S2 is to adopt the BERT model to vectorially represent the chinese text set and the radical set in the training set, which specifically includes the following procedures:

Radicals are collected as

；

S2.2, respectively collecting Chinese text sets

And the component set

And radical vector

。

5. The method of claim 1, wherein the method comprises the following steps: in step S3, a deep learning model is selected according to the characteristics of the chinese text set corresponding to the chinese text vector to perform feature extraction on the chinese text vector and the radical vector.

6. The method of claim 5 for Chinese text classification with incorporated radical semantics, wherein: the deep learning model comprises one or more than 2 combinations of a bidirectional circulation neural network Bi-RNN, a bidirectional long and short memory network Bi-LSTM, a bidirectional circulation neural network ATT-Bi-RNN introducing an attention mechanism and a bidirectional long and short memory network ATT-Bi-LSTM introducing the attention mechanism, and feature extraction is carried out according to the following four conditions:

b. if the Chinese text set corresponding to the Chinese text vector is a long text with simple semantic expression, performing feature extraction on the Chinese text vector and the radical vector by adopting a bidirectional long-short memory network Bi-LSTM;

c. if the Chinese text set corresponding to the Chinese text vector is a complex short text, performing feature extraction on the Chinese text vector and the radical vector by adopting a bidirectional recurrent neural network ATT-Bi-RNN;

（1）

（2）

wherein the content of the first and second substances,

to representtThe time of day implies the information of the layer,

to representt-1 time instant the information of the hidden layer,

a weight matrix representing the input information,

representation updatet-1 a weight matrix of time instant information,

to representtInformation of the input layer at the moment whentWhen =1

I.e. a chinese text vector or a radical vector,

to representtThe time instants imply the information output by the layer,

in the form of a function of the hyperbolic tangent,

is a normalized exponential function;

（3）

（4）

（5）

（6）

（7）

（8）

wherein

A sigmoid activation function is represented,

a forgetting gate weight matrix is represented,

representing output gate weight matrices、

A weight matrix of the input gates is represented,

a weight matrix representing the current information is represented,

a matrix of forgetting gate bias values is represented,

a matrix of input gate offset values is represented,

a matrix of output gate offset values is represented,

a matrix of bias values representing the information is represented,

to representtA temporary variable of the time of day information,

to representtThe information of the text of the moment of time,

to representt-a text message of a time instant 1,

to representtInformation input at a moment whentWhen =1

I.e. a chinese text vector or a radical vector,

to representt-hidden layer information at time 1,

to representtThe time implies layer information;

the ATT attention handling mechanism is expressed by the following formula:

（9）

（10）

（11）

a transposed matrix representing the weights of the keywords,

represents passing through

The vector matrix after the function is calculated,

to represent

7. The method of claim 1, wherein the method comprises the following steps: in step S4, the text feature vector and the radical feature vector are fused using the following formulas:

（12）

wherein the content of the first and second substances,

the feature vector after the fusion is represented,

the feature vector of the Chinese text is represented,

representing radical feature vectors.

8. The method of classifying Chinese text incorporated with radical semantics as claimed in claim 7, wherein: in step S4, the softmax classifier is used to classify the chinese text, which is expressed by the following formula:

（13）

a matrix of weights is represented by a matrix of weights,

representing a matrix of bias values.