CN117874234A

CN117874234A - Text classification method and device based on semantics, computer equipment and storage medium

Info

Publication number: CN117874234A
Application number: CN202410050857.8A
Authority: CN
Inventors: 俞悦
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2024-01-12
Filing date: 2024-01-12
Publication date: 2024-04-12

Abstract

The application belongs to the field of artificial intelligence and finance, and relates to a text classification method based on semantics, which comprises the steps of inputting a training sample set and a knowledge graph into a knowledge enhancement language model to obtain text semantic feature vectors with enhanced knowledge; inputting the text semantic feature vector into a capsule network model, and outputting a classification prediction result; calculating a loss value between the prediction classification result and the classification label; adjusting model parameters based on the loss value, and outputting a model to be verified; verifying the model to be verified by using the test sample set to obtain a text semantic classification model; and inputting the text to be classified into a text semantic classification model for classification. The application also provides a text classification device based on the semantics, a computer device and a storage medium. In addition, the present application relates to blockchain technology in which categorized text data sets may be stored. The method and the device can effectively identify the multi-label text, and improve the text classification efficiency and accuracy.

Description

Text classification method and device based on semantics, computer equipment and storage medium

Technical Field

The application relates to the technical field of artificial intelligence and financial science and technology, in particular to a text classification method, device, computer equipment and storage medium based on semantics.

Background

With the development of the big data age, a great deal of text data is accumulated, the types related to the text data are various, and how to extract useful information from a great deal of unstructured massive text data has become an urgent requirement. In the general field, by performing text classification on the text data, the method has a great forward effect on big data processing work.

Text classification is an important component in text mining applications, including question classification, emotion analysis, and topic classification. In the current text classification method, the general fields BERT and RoBERTa are used for text representation, and influence of external knowledge on text semantics is ignored. Meanwhile, in the text feature extraction stage, only one full connection layer is usually connected or CNN (convolutional neural network), RNN (cyclic neural network) and the like are used for text classification, but the methods cannot accurately identify the features of the highly overlapped objects, so that feature understanding capability is reduced, multi-label texts are unfriendly, and high text classification accuracy is difficult to obtain.

Disclosure of Invention

The embodiment of the application aims to provide a text classification method, a text classification device, computer equipment and a storage medium based on semantics, so as to solve the technical problems that the text classification method in the prior art cannot accurately identify the characteristics of highly overlapped objects, is not beneficial to multi-label text classification, and is difficult to obtain higher text classification accuracy.

In order to solve the above technical problems, the embodiments of the present application provide a text classification method based on semantics, which adopts the following technical scheme:

acquiring a classified text data set and a corresponding knowledge graph, and dividing the classified text data set into a training sample set and a test sample set, wherein the classified text data set comprises a plurality of classified texts and classification labels corresponding to each classified text;

inputting the training sample set and the knowledge graph into a pre-constructed knowledge enhancement language model to obtain a text semantic feature vector with enhanced knowledge;

inputting the text semantic feature vector into a pre-constructed capsule network model for classification calculation, and outputting a classification prediction result;

calculating a loss value between the prediction classification result and the classification label according to a preset loss function;

adjusting model parameters of the knowledge enhancement language model and the capsule network model based on the loss value, continuing iterative training until convergence to obtain final target model parameters, and outputting a model to be verified according to the target model parameters;

inputting the test sample set into the model to be verified to obtain a verification result, and determining the model to be verified as a text semantic classification model when the verification result meets a preset condition;

And obtaining a text to be classified, and inputting the text to be classified into the text semantic classification model to obtain a text classification result.

Further, the knowledge enhancement language model comprises a knowledge layer, an embedding layer, a visible layer and an encoding layer; the step of inputting the training sample set and the knowledge graph into a pre-constructed knowledge enhancement language model to obtain a text semantic feature vector with enhanced knowledge comprises the following steps:

injecting knowledge in the knowledge graph into text sentences of the training sample set through the knowledge layer to form sentence trees, and respectively inputting the sentence trees into the embedding layer and the visible layer;

performing position embedding on the sentence tree through the embedding layer to obtain a text position coding vector;

constructing a text visible matrix of the sentence tree through the visible layer;

and inputting the text position coding vector and the text visible matrix into the coding layer for attention calculation, and outputting a text semantic feature vector.

Further, the step of injecting knowledge in the knowledge graph into text sentences of the training sample set through the knowledge layer to form sentence sub-trees includes:

Invoking a knowledge query function of the knowledge layer, identifying all entities corresponding to each text sentence in the training sample set, and querying triples corresponding to each entity in a knowledge graph;

and calling a knowledge injection function of the knowledge layer, and embedding the triples into corresponding positions in the text sentences to obtain sentence subtrees.

Further, the step of obtaining the text position coding vector by performing position embedding on the sentence tree through the embedding layer includes:

inputting the sentence tree into the embedding layer to perform segment embedding operation, soft position embedding operation and word embedding operation respectively to obtain corresponding sentence coding vectors, position coding vectors and word coding vectors;

and summing the sentence coding vector, the position coding vector and the word coding vector to obtain a text position coding vector.

Further, the step of inputting the text position coding vector and the text visible matrix into the coding layer to perform attention calculation and outputting text semantic feature vectors includes:

determining a query vector parameter matrix, a key vector parameter matrix and a value vector parameter matrix of the coding layer;

Calculating self-attention according to the query vector parameter matrix, the key vector parameter matrix, the value vector parameter matrix, the text position coding vector and the text visible matrix;

and calculating the multi-head attention based on the self-attention to obtain text semantic feature vectors.

Further, the capsule network model comprises a convolution layer, a capsule layer and a classification layer; the step of inputting the text semantic feature vector into a pre-constructed capsule network model for classification calculation and outputting a classification prediction result comprises the following steps:

inputting the text semantic feature vector into the convolution layer to perform convolution feature extraction to obtain a convolution feature vector;

performing text aggregation on the convolution feature vectors through the capsule layer to obtain global semantic vectors containing context semantics;

and inputting the global semantic vector into the classification layer to perform classification prediction, and outputting a classification prediction result.

Further, the capsule layers comprise a main capsule layer and a digital capsule layer; the step of performing text aggregation on the convolution feature vectors through the capsule layer to obtain global semantic vectors containing context semantics comprises the following steps:

inputting the convolution characteristic vector into the main capsule layer to perform one-dimensional convolution operation to obtain a vector capsule;

And inputting the vector capsules into the digital capsule layer, and carrying out mapping operation on the vector capsules through a dynamic routing algorithm to obtain global semantic vectors.

In order to solve the technical problems, the embodiment of the application also provides a text classification device based on semantics, which adopts the following technical scheme:

the system comprises an acquisition module, a test module and a storage module, wherein the acquisition module is used for acquiring a classified text data set and a corresponding knowledge graph, and dividing the classified text data set into a training sample set and a test sample set, wherein the classified text data set comprises a plurality of classified texts and classification labels corresponding to each classified text;

the text enhancement module is used for inputting the training sample set and the knowledge graph into a pre-constructed knowledge enhancement language model to obtain a text semantic feature vector with enhanced knowledge;

the classification prediction module is used for inputting the text semantic feature vector into a pre-constructed capsule network model for classification calculation and outputting a classification prediction result;

the loss calculation module is used for calculating a loss value between the prediction classification result and the classification label according to a preset loss function;

the adjustment module is used for adjusting the model parameters of the knowledge enhancement language model and the capsule network model based on the loss value, continuing to iterate training until convergence, obtaining final target model parameters, and outputting a model to be verified according to the target model parameters;

The verification module is used for inputting the test sample set into the model to be verified to obtain a verification result, and determining that the model to be verified is a text semantic classification model when the verification result accords with a preset condition;

the classification module is used for acquiring texts to be classified, inputting the texts to be classified into the text semantic classification model, and obtaining text classification results.

In order to solve the above technical problems, the embodiments of the present application further provide a computer device, which adopts the following technical schemes:

the computer device comprises a memory having stored therein computer readable instructions which when executed implement the steps of the semantic based text classification method as described above.

In order to solve the above technical problems, embodiments of the present application further provide a computer readable storage medium, which adopts the following technical solutions:

the computer readable storage medium has stored thereon computer readable instructions which when executed by a processor implement the steps of the semantic based text classification method as described above.

Compared with the prior art, the embodiment of the application has the following main beneficial effects:

According to the method, the knowledge graph is introduced into the knowledge enhancement language model, and the knowledge graph is combined with the training sample set to conduct feature extraction on the knowledge enhancement language model, so that text semantic feature vectors containing rich knowledge information can be obtained, and feature expression of the text is enhanced; the text semantic feature vector with enhanced knowledge is input into the capsule network model for classification calculation, so that semantic information relations among words can be further obtained, the extraction capacity of important text features is improved, multi-label texts can be effectively identified, and further the efficiency and accuracy of text classification are improved.

Drawings

For a clearer description of the solution in the present application, a brief description will be given below of the drawings that are needed in the description of the embodiments of the present application, it being obvious that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a semantic-based text classification method according to the present application;

FIG. 3 is a flow chart of one embodiment of step S202 of FIG. 2;

FIG. 4 is a schematic structural diagram of one embodiment of a semantic-based text classification apparatus according to the present application;

FIG. 5 is a schematic structural diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the description of the figures above are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to better understand the technical solutions of the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The present application provides a text classification method based on semantics, which can be applied to a system architecture 100 as shown in fig. 1, where the system architecture 100 may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.

It should be noted that, the text classification method based on semantics provided in the embodiments of the present application is generally executed by a server/terminal device, and accordingly, the text classification device based on semantics is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow chart of one embodiment of a semantic-based text classification method according to the present application is shown, comprising the steps of:

step S201, a classified text data set and a corresponding knowledge graph are obtained, the classified text data set is divided into a training sample set and a test sample set, and the classified text data set comprises a plurality of classified texts and classification labels corresponding to each classified text.

The classified text data set can be acquired according to service scenes, wherein the service scenes can comprise insurance service scenes such as insurance theme classification, customer appeal classification, insurance scene classification and the like, and also can comprise emotion analysis and the like of comment texts such as an electronic commerce platform, a social platform and the like. For example, if the insurance subject is classified, acquiring an insurance subject text data set; if the customer appeal is classified, acquiring a text data set of the customer appeal; if the insurance scene is classified, acquiring an insurance scene text data set; and if the emotion analysis is carried out, corresponding comment text data containing positive and negative emotions are obtained from the related e-commerce platform or the social platform.

A Knowledge Graph (knowledgegraph) is a Knowledge base, in which data is integrated by a data model or topology of a Graph structure. Typically, a semantic relationship graph representing entities and relationships is stored in the form of a < head entity, relationship, tail entity > triplet, the head and tail entity representing a specific thing that exists in the real world, the relationship expressing some semantic association between entities, e.g., the triplet < chinese, head, beijing > where chinese is the head entity, head is the relationship word, beijing is the tail entity.

Different business scenes correspond to different knowledge patterns, and in this embodiment, the corresponding knowledge patterns can be selected according to the business scene where the classified text data set is located.

In some embodiments, after the classified text data set is obtained, the classified text data set is preprocessed, including deduplication, processing missing values, processing outliers, correcting error values, and the like. Randomly dividing the preprocessed classified text data set into a training sample set and a test sample set according to a preset proportion, for example, the training sample set: test sample set = 8:2.

It should be appreciated that the classified text dataset includes a plurality of classified texts and corresponding classified labels, which are the true categories of the classified texts.

It is emphasized that to further ensure the privacy and security of the categorized text data sets, the categorized text data sets may also be stored in nodes of a blockchain.

The blockchain referred to in the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

Step S202, inputting the training sample set and the knowledge graph into a pre-constructed knowledge enhancement language model to obtain a knowledge enhancement text semantic feature vector.

The knowledge enhancement language model is a K-BERT model, wherein the K-BERT model is a representation mode of introducing a knowledge graph on the basis of the original BERT model, an input text contains domain knowledge which is not contained in the original text after the input text passes through the K-BERT, and a feature vector containing rich domain knowledge is output.

In this embodiment, the pre-built knowledge-enhanced language model includes a knowledge layer, an embedded layer, a visible layer, and an encoding layer.

In some optional implementations of this embodiment, the step of inputting the training sample set and the knowledge graph into the pre-constructed knowledge-enhanced language model to obtain the knowledge-enhanced text semantic feature vector includes:

step S301, the knowledge in the knowledge graph is injected into the text sentences of the training sample set through the knowledge layer to form sentence trees, and the sentence trees are respectively input into the embedding layer and the visible layer.

The knowledge layer is responsible for injecting knowledge in the knowledge graph into text sentences to form sentence subtrees. For example, for each input sentence s= { w ₁ ，w ₂ ，w ₃ ，…，w _n Firstly, obtaining a named entity from sentences, then inquiring the corresponding relation and value (tail entity) of the obtained named entity in a knowledge graph to form a sentence<Header entity, relation, value>And then returns the triplet to the corresponding position of the Sentence to form a Tree structure called a Sentence Tree (Sentence Tree). The sentence tree realizes the complementation of sentence background information and solves the problem that word vectors deviate from core semantics due to the fact that a single sentence does not have a knowledge background.

In this embodiment, forming sentence trees is divided into two steps: knowledge Query (K-Query) and knowledge injection (K-object).

Further, a knowledge query function of a knowledge layer is called, all entities corresponding to each text sentence in a training sample set are identified, and triples corresponding to each entity are queried in a knowledge graph; and calling a knowledge injection function of the knowledge layer, and embedding the triples into corresponding positions in the text sentences to obtain sentence subtrees.

The K-Query is responsible for inquiring the corresponding relation and value of each entity in each text sentence from the knowledge graph, namely, the triplet, and the specific inquiring process is as follows:

E＝K_Query(s,K)；

wherein the function k_query represents that the text sentence s queries the knowledge graph K to obtain a triplet set e= { (w) _i ，r _i0 ，w _i0 )，......，(w _i ，r _ik ，w _ik )}。

K-object is responsible for embedding the triplet set E into the corresponding position in the text sentence s to form a sentence subtree, each triplet forms a branch, and then the matrix sentence subtree output by the knowledge layer is: t= { w ₁ ，w ₂ ，w ₃ ，……，w _i (r _i0 ，w _i0 )，……，w _i (r _ik ，w _ik ),……，w _n }

And step S302, carrying out position embedding on the sentence trees through an embedding layer to obtain text position coding vectors.

Sentence trees cannot be directly input as sequences, which are converted into text sequences by the embedding layer.

The embedding layers include a token embedding layer, a soft-position embedding (soft position embedding) layer, and a segment embedding (segment embedding) layer. The token mapping layer maps each token in the sentence into a vector representation with a dimension of H, and the head of each sentence is provided with a [ CLS ] special token, which is mainly used for classifying the sentences; soft-position embedding layer encodes the embedded values and relationships using soft positions, distinguished from the position encoding of the entity; segment embedding layers are used to distinguish between two sentence fragments.

Specifically, the sentence tree input embedding layer is respectively subjected to segment embedding operation, soft position embedding operation and word embedding operation to obtain corresponding sentence coding vectors, position coding vectors and word coding vectors; and summing the sentence coding vector, the position coding vector and the word coding vector to obtain a text position coding vector.

Performing word embedding operation on each word in the sentence tree through a token embedding layer to obtain sentence coding vectors; soft position embedding operation is carried out on each word in the sentence subtree through a soft-position embedding layer, so that a position coding vector is obtained, and the position coding vector keeps trunk position information of the sentence; and carrying out word embedding operation on each word in the sentence subtree through a segment embedding layer to obtain a word coding vector, and summing the sentence coding vector, the position coding vector and the word coding vector to obtain a text position coding vector, wherein structural information of a tree structure is reserved in the text position coding vector, and feature representation can be enhanced, so that semantic features of a text can be better obtained.

In step S303, a text visibility matrix of the sentence subtree is constructed by the visibility layer.

Since triples in the sentence subtrees may affect the meaning of the original text sentence, in order to prevent knowledge noise from affecting the sentence, a text visual matrix (M) is constructed to define that each word can only see the context and knowledge related to itself.

Let two token be w _i And w _j ，M _ij When=0, it is represented in the text visibility matrix by w _i For w _j Is visible, M _ij When = - ≡, it is represented by w in the text visible matrix _i For w _j Is invisible.

And S304, inputting the text position coding vector and the text visible matrix into a coding layer for attention calculation, and outputting a text semantic feature vector.

In this embodiment, the use of the coding layer can limit the visible region of the self-attention mechanism, thereby capturing deep bidirectional structures in the text sentence. The coding layer is formed by stacking a plurality of mask-self-attention layers, and adds a text visible matrix M on the basis of the mask-self-attention layers to perform attention calculation on the text position coding vector.

Further, the step of inputting the text position coding vector and the text visible matrix into the coding layer for attention calculation and outputting the text semantic feature vector comprises the following steps:

and performing multi-head attention calculation based on the self-attention to obtain text semantic feature vectors.

Assuming that the layer number of the mask-self-attitudes is L and the head number is A, determining a query vector parameter matrix W of each layer of mask-self-attitudes _q Key vector parameter matrix W _k Sum vector parameter matrix W _v The self-attention calculation formula for each layer mask-self-attention is as follows:

wherein Q is _i ＝H _i-1 W _q ；K _i ＝H _i-1 W _k ；V _i ＝H _i-1 W _v ；Q _i Representing a query vector, K, at layer i _i Representing the key vector of the i-th layer, V _i A value vector representing an i-th layer; h _i A mask-self-intent output representing the i-th layer; d, d _k Representing the dimensions of the entered text position-coding vector.

The multi-head attention calculation formula is as follows:

MultiHead＝Concat(head ₁ ,head ₂ ,…,head _A )W ₀ ；

wherein Concat represents a matrix splicing function; w (W) ₀ Representing the parameter matrix as each self-attention is compressed. In this embodiment, the multi-head attention is output to obtain the text semantic feature vector.

The result information of sentence subtrees is obtained through the text visible matrix, and attention calculation is carried out according to the text visible matrix, so that the aim of not increasing noise under the condition of embedding knowledge is fulfilled, and the accuracy of outputting text semantic feature vectors is ensured.

In the embodiment, the knowledge enhancement language model is used for increasing the domain knowledge of the classified text, so that the text semantics are enriched, and the problems of inconsistent space of the diversified word vector codes and statement deviation from core semantics are avoided; meanwhile, the knowledge graph is fused, so that the method can be used for text classification in the professional field.

And step S203, inputting the text semantic feature vector into a pre-constructed capsule network model for classification calculation, and outputting a classification prediction result.

The pre-constructed capsule network model comprises a convolution layer, a capsule layer and a classification layer, wherein the convolution layer adopts an N-gram convolution layer and is used for extracting characteristics of text semantic feature vectors and packaging the extracted characteristics into vectors of space information; the capsule layer is used for extracting and encoding semantic features of high-level abstract in sentences; the classifying layer is used for classifying semantic features extracted by the capsule layer.

In some optional implementations of this embodiment, the step of inputting the text semantic feature vector into a pre-constructed capsule network model to perform classification calculation and outputting a classification prediction result includes:

inputting the text semantic feature vector into a convolution layer for convolution feature extraction to obtain a convolution feature vector;

text aggregation is carried out on the convolution feature vectors through the capsule layer, and global semantic vectors containing context semantics are obtained;

and inputting the global semantic vector into a classification layer for classification prediction, and outputting a classification prediction result.

The basic idea of the N-gram is to slide text content into a sliding window with the size of N bytes to form a segment sequence with the length of N bytes, each byte is called a gram, word frequency statistics is performed on all gram segments, filtering operation is performed on features with lower word frequency according to a set threshold, and finally a keyword gram list, namely a feature vector space of the text, is formed. In this embodiment, features in the text semantic feature vector are extracted by the N-gram, and a convolution feature vector containing spatial information is obtained.

The capsule layer comprises a main capsule layer and a digital capsule layer, the main capsule layer realizes the conversion from scalar neurons to vector neurons (capsules), and a dynamic routing algorithm is adopted to further encode convolution feature vectors, so that the vector transfer between the main capsule layer and the digital capsule layer is realized, the model identification efficiency is improved, and the model can be quickly and stably converged; the digital capsule layer contains a plurality of capsules, and the probability of the capsule belonging to a certain category is predicted by the length of each capsule activity vector.

In some optional implementations, the step of performing text aggregation on the convolution feature vectors through the capsule layer to obtain a global semantic vector including context semantics includes:

inputting the convolution characteristic vector into a main capsule layer to perform one-dimensional convolution operation to obtain a capsule vector;

and inputting the capsule vectors into a digital capsule layer, and carrying out mapping operation on the vector capsules through a dynamic routing algorithm to obtain global semantic vectors.

In the training process of the capsule network model, a full connection mode is adopted between each vector of the main capsule layer and each vector of the digital capsule layer. The ith capsule vector u in the main capsule _i The j-th output vector v connected to the digital capsule layer _j Transformation matrix W _ij Coupling coefficient c _ij And the prediction vector isWherein the prediction vector->The calculation method of (1) is as follows:

carrying out route iteration on the main capsule layer, and calculating the coupling coefficient of the dynamic routing algorithm:

wherein b _ij Initial coupling coefficients for which no weighting is applied; c _ij The coupling coefficient determined by the dynamic routing algorithm is the coupling coefficient obtained by softmax weighting the initial coupling coefficient.

According to the coupling coefficient c _ij Calculating a weighted sum s _j The calculation formula is as follows:

wherein s is _j The vector weighted sum of the main capsule layer reaches the coupling output of the digital capsule layer.

Ensuring the final output vector v by using a compression function square function _j The length of (2) is between 0 and 1, calculated as follows:

using predictive vectors asAnd output vector v of digital capsule layer _j Updating b by inner product of (b) _ij Further update the coupling coefficient c _ij The update method is as follows:

according to the updated coupling coefficient c _ij Updating the transformation matrix W using a back propagation algorithm _ij The final output vector v _j I.e. global semantic vector, output vector v _j Representing the likelihood that its corresponding category exists.

The loss function formula of the capsule network model is as follows:

L _k ＝T _k max(0,m ⁺ -‖v _k ‖) ² +λ(1-T _k )max(0,‖v _k ‖-m ^- ) ² +‖W‖；

where k is the number of classification categories; t (T) _k Indicating whether a class exists; m is m ⁺ Taking the value as 0.9 for the upper bound; m is m ^- Representing the lower bound, and taking the value of 0.1; λ is a scaling factor, which may be set to 0.5; II v _k II represents the probability that the capsule element belongs to the category; II indicates regularized loss of the weight parameters.

According to the embodiment, redundant information is effectively reduced when information is transmitted between the main capsule layer and the digital capsule layer by using a dynamic routing algorithm, and the training efficiency of the model is improved.

And inputting the global semantic vector into a classification layer to calculate the probability of the classified text on various categories, and obtaining a classification prediction result.

By combining the multilayer capsules with the dynamic routing mechanism, the method can capture the advanced features of all aspects in sentences and effectively encode the sentences, and improve the classification accuracy.

Step S204, calculating a loss value between the prediction classification result and the classification label according to a preset loss function.

The preset loss function calculation formula is as follows:

Loss＝-[ylogy′+(1-y)log(1-y′)]；

wherein y represents a real classification label; y' represents the prediction classification result.

And step S205, adjusting model parameters of the knowledge enhancement language model and the capsule network model based on the loss value, continuing to iterate training until convergence, obtaining final target model parameters, and outputting a model to be verified according to the target model parameters.

And respectively adjusting model parameters of the knowledge enhancement language model and the capsule network model according to the loss value, wherein the convergence condition is satisfied, namely the loss value does not change significantly, or the iteration times reach the preset times.

And when the model parameters of the knowledge enhancement language model and the model parameters of the capsule network model are target model parameters during convergence, obtaining a model to be verified based on the target model parameters, namely the model to be verified consists of the knowledge enhancement language model and the capsule network model.

Since the loss function L according to the capsule network model is already in the capsule network model training process _k The capsule network model is optimized, and in some alternative embodiments, model parameters of the knowledge-enhanced language model may be adjusted based on the loss values.

Step S206, inputting the test sample set into the model to be verified to obtain a verification result, and determining the model to be verified as a text semantic classification model when the verification result meets a preset condition.

Inputting the test sample set into the model to be verified to obtain a classification verification result, calculating the prediction precision of the model according to the classification verification result, and taking the prediction precision as the verification result.

The calculation formula of the prediction precision is as follows:

Where N is the number of samples in the test sample set, y _i ^′ Is the result of classification verification, y _i Is the actual class label; 1 (y) _i ^′ ＝y _i ) The sample count is 1, which indicates that the predicted result is the same as the actual value.

When the prediction precision is greater than or equal to a preset threshold value, outputting a model to be verified as a final text semantic classification model; if the prediction accuracy is smaller than the preset threshold, the prediction accuracy of the model is not high, the number of samples is required to be increased, or parameters of the model are required to be modified, and the model is retrained to improve the prediction accuracy.

Step S207, obtaining a text to be classified, and inputting the text to be classified into a text semantic classification model to obtain a text classification result.

The trained text semantic classification model can be applied to corresponding business scenes to classify the text. The text to be classified is obtained, the text semantic classification model is used for classifying, the classification efficiency and the accuracy are improved, and new technical improvement is brought to the service.

The subject application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 4, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a semantic-based text classification apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied in various electronic devices.

As shown in fig. 4, the text classification device 400 based on semantics according to the present embodiment includes: an acquisition module 401, a text enhancement module 402, a classification prediction module 403, a loss calculation module 404, an adjustment module 405, a verification module 406, and a classification module 407. Wherein:

The obtaining module 401 is configured to obtain a classified text data set and a corresponding knowledge graph, and divide the classified text data set into a training sample set and a test sample set, where the classified text data set includes a plurality of classified texts and classification labels corresponding to each of the classified texts;

the text enhancement module 402 is configured to input the training sample set and the knowledge graph into a pre-constructed knowledge enhancement language model to obtain a text semantic feature vector with enhanced knowledge;

the classification prediction module 403 is configured to input the text semantic feature vector into a pre-constructed capsule network model for classification calculation, and output a classification prediction result;

the loss calculation module 404 is configured to calculate a loss value between the prediction classification result and the classification label according to a preset loss function;

the adjustment module 405 is configured to adjust model parameters of the knowledge enhancement language model and the capsule network model based on the loss value, continue iterative training until convergence, obtain final target model parameters, and output a model to be verified according to the target model parameters;

the verification module 406 is configured to input the test sample set into the model to be verified, obtain a verification result, and determine that the model to be verified is a text semantic classification model when the verification result meets a preset condition;

The classification module 407 is configured to obtain a text to be classified, and input the text to be classified into the text semantic classification model to obtain a text classification result.

Based on the text classification device 400 based on the semantic, the knowledge graph is introduced into the knowledge enhancement language model, and the knowledge graph is combined with the training sample set to extract the characteristics of the knowledge enhancement language model, so that text semantic characteristic vectors containing rich knowledge information can be obtained, and the characteristic expression of the text is enhanced; the text semantic feature vector with enhanced knowledge is input into the capsule network model for classification calculation, so that semantic information relations among words can be further obtained, the extraction capacity of important text features is improved, multi-label texts can be effectively identified, and further the efficiency and accuracy of text classification are improved.

In some alternative implementations, the knowledge-enhanced language model includes a knowledge layer, an embedding layer, a visible layer, and an encoding layer, and the text enhancement module 402 includes:

the knowledge injection submodule is used for injecting knowledge in the knowledge graph into text sentences of the training sample set through the knowledge layer to form sentence trees, and inputting the sentence trees into the embedding layer and the visible layer respectively;

The embedding sub-module is used for carrying out position embedding on the sentence tree through the embedding layer to obtain a text position coding vector;

a matrix construction sub-module for constructing a text visible matrix of the sentence tree through the visible layer;

and the coding submodule is used for inputting the text position coding vector and the text visible matrix into the coding layer for attention calculation and outputting a text semantic feature vector.

The knowledge enhancement language model is used for increasing the domain knowledge of the classified text, so that the text semantics are enriched, and the problems of inconsistent coding space of diversified word vectors and statement deviation from core semantics are avoided; meanwhile, the knowledge graph is fused, so that the method can be used for text classification in the professional field.

In this embodiment, the knowledge injection submodule includes:

the knowledge query unit is used for calling a knowledge query function of the knowledge layer, identifying all entities corresponding to each text sentence in the training sample set, and querying triples corresponding to each entity in a knowledge graph;

and the knowledge injection unit is used for calling a knowledge injection function of the knowledge layer and embedding the triples into corresponding positions in the text sentences to obtain sentence sub-trees.

The completion of sentence background information is realized through the sentence tree, and the problem that word vectors deviate from core semantics due to the fact that a single sentence does not have a knowledge background is solved.

In some optional implementations of the present embodiment, the embedding submodule includes:

the embedding unit is used for inputting the sentence tree into the embedding layer to respectively perform segment embedding operation, soft position embedding operation and word embedding operation to obtain corresponding sentence coding vectors, position coding vectors and word coding vectors;

and the summing unit is used for summing the sentence coding vector, the position coding vector and the word coding vector to obtain a text position coding vector.

The structural information of the tree structure is reserved in the text position coding vector obtained through the embedding operation, so that the characteristic representation can be enhanced, and the semantic characteristics of the text can be better obtained.

In this embodiment, the encoding submodule includes:

the determining unit is used for determining a query vector parameter matrix, a key vector parameter matrix and a value vector parameter matrix of the coding layer;

a self-attention calculating unit configured to calculate self-attention based on the query vector parameter matrix, the key vector parameter matrix, the value vector parameter matrix, the text position coding vector, and the text visibility matrix;

And the multi-head attention calculating unit is used for carrying out multi-head attention calculation based on the self-attention to obtain text semantic feature vectors.

In some alternative implementations, the capsule network model includes a convolution layer, a capsule layer, and a classification layer, and the classification prediction module 403 includes:

the convolution submodule is used for inputting the text semantic feature vector into the convolution layer to carry out convolution feature extraction to obtain a convolution feature vector;

the capsule sub-module is used for carrying out text aggregation on the convolution feature vectors through the capsule layer to obtain global semantic vectors containing context semantics;

and the prediction sub-module is used for inputting the global semantic vector into the classification layer to perform classification prediction and outputting a classification prediction result.

By combining the multi-layer capsule with the dynamic routing mechanism, the method can capture the advanced features of all aspects in sentences and effectively encode the sentences, thereby improving the classification accuracy.

In this embodiment, the capsule layer includes a main capsule layer and a digital capsule layer, and the capsule submodule includes:

the one-dimensional convolution unit is used for inputting the convolution characteristic vector into the main capsule layer to perform one-dimensional convolution operation to obtain a vector capsule;

and the dynamic reason unit is used for inputting the vector capsule into the digital capsule layer, and mapping the vector capsule through a dynamic routing algorithm to obtain a global semantic vector.

By using the dynamic routing algorithm, redundant information is effectively reduced when information is transmitted between the main capsule layer and the digital capsule layer, and training efficiency of the model is improved.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 5, fig. 5 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 5 comprises a memory 51, a processor 52, a network interface 53 which are communicatively connected to each other via a system bus. It should be noted that only the computer device 5 with components 51-53 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 51 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 51 may be an internal storage unit of the computer device 5, such as a hard disk or a memory of the computer device 5. In other embodiments, the memory 51 may also be an external storage device of the computer device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 5. Of course, the memory 51 may also comprise both an internal memory unit of the computer device 5 and an external memory device. In this embodiment, the memory 51 is typically used to store an operating system and various application software installed on the computer device 5, such as computer readable instructions of a semantic-based text classification method. Further, the memory 51 may be used to temporarily store various types of data that have been output or are to be output.

The processor 52 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 52 is typically used to control the overall operation of the computer device 5. In this embodiment, the processor 52 is configured to execute computer readable instructions stored in the memory 51 or process data, such as computer readable instructions for executing the semantic-based text classification method.

The network interface 53 may comprise a wireless network interface or a wired network interface, which network interface 53 is typically used to establish communication connections between the computer device 5 and other electronic devices.

According to the method, the steps of the text classification method based on the semantics of the embodiment are realized when the processor executes the computer readable instructions stored in the memory, and the feature extraction is carried out on the knowledge enhancement language model through the combination of the knowledge graph and the training sample set, so that the text semantic feature vector containing rich knowledge information can be obtained, and the feature expression of the text is enhanced; the text semantic feature vector with enhanced knowledge is input into the capsule network model for classification calculation, so that semantic information relations among words can be further obtained, the extraction capacity of important text features is improved, multi-label texts can be effectively identified, and further the efficiency and accuracy of text classification are improved.

The application also provides another embodiment, namely, a computer readable storage medium is provided, the computer readable storage medium stores computer readable instructions, the computer readable instructions can be executed by at least one processor, so that the at least one processor executes the steps of the semantic-based text classification method, and the knowledge-enhanced language model is extracted by combining a knowledge graph with a training sample set, so that text semantic feature vectors containing rich knowledge information can be obtained, and feature expression of texts is enhanced; the text semantic feature vector with enhanced knowledge is input into the capsule network model for classification calculation, so that semantic information relations among words can be further obtained, the extraction capacity of important text features is improved, multi-label texts can be effectively identified, and further the efficiency and accuracy of text classification are improved.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

It is apparent that the embodiments described above are only some embodiments of the present application, but not all embodiments, the preferred embodiments of the present application are given in the drawings, but not limiting the patent scope of the present application. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a more thorough understanding of the present disclosure. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing, or equivalents may be substituted for elements thereof. All equivalent structures made by the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the protection scope of the application.

Claims

1. A semantic-based text classification method, comprising the steps of:

2. The semantic-based text classification method according to claim 1, wherein the knowledge-enhanced language model comprises a knowledge layer, an embedding layer, a visible layer, and an encoding layer; the step of inputting the training sample set and the knowledge graph into a pre-constructed knowledge enhancement language model to obtain a text semantic feature vector with enhanced knowledge comprises the following steps:

3. The semantic-based text classification method according to claim 2, wherein the step of injecting knowledge in the knowledge-graph into text sentences of the training sample set through the knowledge layer to form sentence sub-trees comprises:

4. The semantic-based text classification method according to claim 2, wherein the step of performing position embedding on the sentence tree by the embedding layer to obtain a text position-coding vector comprises:

5. The semantic-based text classification method according to claim 2, wherein said step of inputting said text position-coding vector and said text visibility matrix into said coding layer for attention computation, outputting text semantic feature vectors comprises:

6. The semantic-based text classification method according to claim 1, wherein the capsule network model comprises a convolution layer, a capsule layer, and a classification layer; the step of inputting the text semantic feature vector into a pre-constructed capsule network model for classification calculation and outputting a classification prediction result comprises the following steps:

7. The semantic-based text classification method according to claim 6, wherein said capsule layers comprise a main capsule layer and a digital capsule layer; the step of performing text aggregation on the convolution feature vectors through the capsule layer to obtain global semantic vectors containing context semantics comprises the following steps:

8. A semantic-based text classification apparatus, comprising:

9. A computer device comprising a memory having stored therein computer readable instructions which when executed implement the steps of the semantic based text classification method according to any of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the semantic based text classification method according to any of claims 1 to 7.