CN115292483A

CN115292483A - Short text classification method based on uncertainty perception heterogeneous graph attention network

Info

Publication number: CN115292483A
Application number: CN202210288840.7A
Authority: CN
Inventors: 冀振燕; 孔德焱; 杨燕燕; 吴睿智; 韩梦豪
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2022-11-04

Abstract

The invention discloses a short text classification method based on an uncertainty perception heterogeneous graph attention network. The method comprises the following steps: acquiring a tagged data set and a non-tagged data set, wherein the number of tagged data is less than that of non-tagged data; training a heterogeneous graph attention network model based on the labeled dataset; predicting the label-free data set by using the trained heterogeneous graph attention network model, and calculating an uncertainty value of a prediction result; marking pseudo labels on the label-free data set by using a predicted value and the uncertainty value as confidence degrees at the same time, and selecting positive and negative sample pseudo labels; mixing the selected pseudo label data and the labeled data set to obtain a mixed data set, and further training the heterogeneous graph attention network model by using the mixed data set; and repeating the steps until the set termination condition is met. The method and the device can accurately classify the common short texts under the condition of low data source.

Description

Short text classification method based on uncertainty perception heterogeneous graph attention network

Technical Field

The invention relates to the technical field of natural language processing, in particular to a short text classification method based on an uncertainty perception heterogeneous graph attention network.

Background

In recent years, natural language processing techniques based on deep learning have been rapidly developed and widely used in the fields of text classification, word vector representation, semantic similarity, comment viewpoint extraction, emotional tendency analysis, and the like. With the rapid development of network social media and electronic commerce, short text information such as network news, comments, tweets and the like is more and more popular on the internet. Methods based on deep learning have had great success in mining valid information from low-volume data.

The great success of deep learning approaches is mainly attributed to the advancement of learning algorithms and the availability of large-scale labeled datasets. However, in some practical scenarios, the labeling data of these fields is scarce, and it is time-consuming to label by human. Semi-supervised learning is one of the most important approaches to solving this problem, with the goal of utilizing large unlabeled datasets and small labeled datasets. The semi-supervised learning classification task mainly comprises two core methods, namely consistency regularization and false labeling. The consistency regularization based approach is to achieve decision boundaries in low density regions by making the network output immune to small input perturbations. However, this method often relies on a rich set of data enhancement methods, such as affine transformations, clipping, and color dithering in images, which limits their ability to some areas where the enhancement is less effective. While pseudo-labeling is selecting unlabeled samples with high confidence predictions to move the decision boundary to a low density region, many of these selected predictions are incorrect due to poor calibration of the neural network, and false predictions may have high confidence for poorly calibrated networks.

In conclusion, how to fully utilize limited labeled data and a large amount of unlabeled data becomes a key problem of short text classification. Moreover, how to effectively capture the importance of different information, integrating information at multiple granularity levels to solve the sparsity problem, and reducing the weight of noise information to obtain a more accurate classification result is still a difficulty at present.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a short text classification method based on an uncertainty perception heterogeneous graph attention network. The method comprises the following steps:

step S1, acquiring a labeled data set and a non-labeled data set, wherein the number of labeled data is less than that of non-labeled data, and the labeled data set reflects the corresponding relation between short text data and label types;

s2, training a heterogeneous graph attention network model based on the labeled data set;

s3, predicting the unlabeled data set by using the trained heterogeneous graph attention network model, and calculating an uncertainty value of a prediction result;

s4, using the predicted value and the uncertainty value as confidence degrees, marking pseudo labels on the label-free data set, and selecting positive and negative sample pseudo labels;

s5, mixing the selected pseudo label data and the labeled data set to obtain a mixed data set, and further training the attention network model of the heterogeneous map by using the mixed data set;

and S6, repeating the steps S3 to S5 until the set termination condition is met.

Compared with the prior art, the method has the advantages that the method integrates the theme in a heterogeneous information network structure and the entity information in a knowledge map aiming at the characteristic that the short text has sparseness, deeply excavates the internal structure information among the texts, dynamically integrates the context information characteristics of massive texts, fully utilizes the label-free data to calibrate the model, and obviously improves the accuracy and generalization capability of model classification.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a schematic diagram of a short text heterogeneous information network architecture according to one embodiment of the present invention;

FIG. 2 is a block diagram of a short text classification model based on an uncertainty aware heterogeneous graph attention network according to one embodiment of the invention;

FIG. 3 is a schematic diagram of a heterogeneous graph attention network architecture in accordance with one embodiment of the present invention;

FIG. 4 is a flow diagram of an uncertainty-aware pseudo tag selector according to one embodiment of the present invention.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as exemplary only and not as limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Aiming at the characteristics that the short text has sparse semantics, is fuzzy and lacks context, the invention designs a heterogeneous information network structure and further constructs a short text classification model of the heterogeneous graph attention network based on uncertainty perception, and in the following description, the heterogeneous graph attention network based on uncertainty perception is also referred to as the heterogeneous graph attention network or the heterogeneous graph attention network model for short.

Fig. 1 is a constructed short text heterogeneous information network structure, in which a Topic (Topic) is illustrated on the left, short text content is in the middle, and a knowledge graph is on the right. The heterogeneous information network structure integrates subject information, short text information and entity information in a knowledge graph, can capture rich relations between texts and additional information, and solves the problem of sparsity of the short texts. Specifically, the heterogeneous information network structure comprises three types of nodes and three types of relations, wherein the node types comprise a subject information node, a short text information node and an entity information node; the relationship type node comprises a relationship between entities, a relationship between texts and the entities and a relationship between the texts and the topics.

In one embodiment, the topic information node representation utilizes LDA (latent dirichlet distribution) mining of potential topics and represented by a probability distribution of words, then assigns each short text to, for example, the top 2 topics with the highest probability, and constructs edges between topics and text.

For short text inodes, sentence vector representations of short text may use pre-trained language models including, but not limited to, word2Vec, glove, fastText, BERT, and the like. In one embodiment, BERT-based short text vector embedded representation is adopted, specifically, a packaged library BERT-as-service is used, wherein the maximum length max _ seq _ len of a sentence is set to 256, and the BERT-based short text vector embedded representation dynamically fuses context information characteristics of massive texts, so that high-quality characteristic vector input representation can be provided for short texts.

The entity information node represents entity representation in the associated knowledge graph from the text, and neighbor nodes adjacent to the associated entity are added into the heterogeneous information network structure, for example, the added neighbor nodes are nodes one hop away from the related entity node, and meanwhile, edges between the entity and the text and edges between the entity and the entity are constructed.

Further, a short text classification model based on the uncertainty perception heterogeneous graph attention network is constructed based on the designed heterogeneous information network structure. FIG. 2 is a schematic diagram of a short text classification model based on uncertainty aware heterogeneous graph attention networks, which collectively includes a heterogeneous graph attention network, a two-tier attention mechanism, and a pseudo tag selector.

The heterogeneous graph attention network fully utilizes the advantages of a knowledge graph and a large amount of unmarked data, realizes the propagation of information along the graph, effectively captures the relationship between adjacent nodes to solve the semantic sparsity problem of short texts, and is used for mining the internal structure information among the texts.

The double-layer attention mechanism is used for extracting important characteristic information of neighbor nodes and endowing different types of nodes with different weights to reduce the weight of noise information, and alpha is shown in combination with figure 3 ₁ 、α ₂ 、α ₃ Respectively representing the importance of different types of nodes to node i (i.e., type-level attention), and beta ₁ 、β ₂ Indicating that the node N is in the same node type _i ¹ 、N _i ² Importance to node i (i.e., node level attention), where N _i ¹ And N _i ² Are nodes of the same type, N _i ⁴ And N _i ⁵ Are nodes of the same type, N _i ³ And N _i ⁶ Are the same type of node.

A pseudo tag selector is used to select positive and negative sample pseudo tags based on the calculated uncertainty, and fig. 4 is a flow diagram of a pseudo tag selection framework based on uncertainty perception. The uncertainty perception pseudo label selection framework is used for creating positive and negative pseudo labels for a large amount of label-free data, reducing noise amount by using uncertainty perception to improve the accuracy of the pseudo labels, generating more training data and improving the performance of the model through continuous iteration.

The following description focuses only on the short text classification model based on the uncertainty perception heterogeneous graph attention network designed by the invention, and the construction of the heterogeneous information network structure is not repeated. Referring to fig. 4, the short text classification method based on the uncertainty perception heterogeneous graph attention network includes the following steps:

and step S110, constructing a heterogeneous information network by using the acquired short text data.

For example, 6000 labeled data are randomly selected from the public news data set AGNews and are evenly divided into four categories, 100 labeled short texts are randomly extracted for each category, wherein half of the labeled short texts are used as a training set, half of the labeled short texts are used as a verification set, and the rest of the labeled short texts are used as a test set and unlabeled data. Tagged data may be labeled D _L The unlabeled data being labeled D _U 。

Step S120, using the heterogeneous graph attention network to perform a first stage training on the labeled data.

During training, the dropout parameter (drop rate) may be set to 0.5.

And S130, using the model trained in the first stage to label the label-free data with a pseudo label.

This step is a process of predicting categories for unlabeled data using a trained model. For example, dropout is used multiple times in the model prediction stage to perform multiple predictions on the same input (i.e., the same unlabeled data), and the operations of averaging and statistically varying these output values are performed in parallel to calculate the uncertainty value of each unlabeled data. See FIG. 1 for a graph of μ ₁ 、μ ₂ And mu ₃ Denotes the mean value, σ ₁ 、σ ₂ And σ ₃ The variance is indicated.

And step S140, using the output predicted value and the uncertainty value as confidence degrees at the same time, and marking positive and negative pseudo labels on the two confidence degrees at the same time.

Specifically, labeling is performed when the prediction probability is high and the model output prediction value is stable, for example, when the prediction mean of the samples is greater than 0.7 and the prediction variance of the samples is less than 0.05, the samples may be selected as positive samples, and when the prediction mean of the samples is less than 0.1 and the prediction variance of the samples is less than 0.05, the samples may be selected as negative samples.

For example, assuming that P represents the set of probabilities that a sample belongs to class a through ten consecutive predictions by the model, then:

the conditions satisfied by selecting it as a positive sample are:

μ(P)>0.7&&σ(P)>0.05

the conditions satisfied by selecting it as a negative sample are:

μ(P)<0.1&&σ(P)>0.05

the selected pseudo label data of the positive and negative samples is marked as Dselected, and the mean threshold of the positive samples in FIG. 2 is marked as T _μ Variance threshold is marked as T _σ The mean threshold of negative samples is marked as F _μ Variance threshold is marked as F _σ . It is to be understood that T _μ And F _μ Can be set to the same value or different values, T _σ And F _σ May be set to the same value or different values.

And S150, mixing the selected pseudo label data and the labeled data set to obtain a mixed data set, taking the mixed data set as a training set of the model, and performing second-stage training on the model.

In fig. 4, the training set of models is represented as D = D _L ∪D _selected 。

And step S160, repeating the steps S130-S150 until the full is the set termination condition.

For example, when the number of iterations reaches a given maximum number of iterations (e.g., maxterion = 50) or the loss value of the model satisfies a set condition, the training is terminated.

It should be noted that those skilled in the art can appropriately change or modify the above-described embodiments without departing from the spirit and scope of the present invention. For example, the threshold value of the false label printing of the positive and negative samples in the false label selector, the number of the selected subjects in the heterogeneous information network structure, the order of the selected distance-related entity neighbor in the knowledge graph spectrum, and the like can be set according to factors such as actual needs or equipment limitations, and the specific structure is not limited by the invention.

In summary, the invention utilizes the heterogeneous information network structure to model the short text, the topic information and the entity knowledge in the knowledge graph, utilizes BERT to carry out sentence vector embedding representation on the short text, inputs the fused network structure into the heterogeneous graph attention network model based on uncertainty perception, and captures the relationship between the short text and the topic information to realize short text classification. The heterogeneous map attention network based on uncertainty perception comprises three parts, namely a heterogeneous map attention network, a double-layer attention mechanism and a pseudo tag selector. The heterogeneous graph attention network fully utilizes the advantages of a knowledge graph and a large amount of unmarked data, realizes the propagation of information along the graph, effectively captures the relationship between adjacent nodes to solve the problem of semantic sparsity of short texts, wherein the graph neural network is used for mining the internal structure information among the texts; the double-layer attention mechanism is used for extracting important characteristic information of the neighbor nodes and endowing different types of nodes with different weights so as to reduce the weight of noise information; knowledge maps and topics provide additional information for short text; the method is characterized in that the BERT-based short text vector is embedded and dynamically fused with context information characteristics of massive texts, and high-quality characteristic vector input representation is provided for the short texts. The pseudo tag selector creates positive and negative pseudo tags for a large amount of unlabeled data, reduces the amount of noise by using uncertainty perception to improve the accuracy of the pseudo tags, is used for generating more training data, and improves the model performance by continuously iterating. The invention can fully utilize the label-free data to calibrate the model under the condition of a low data source, obviously improves the accuracy and generalization capability of model classification, and is proved by experiments that the accuracy of short text classification is obviously improved after a pseudo label selector based on uncertainty perception is added in a heterogeneous graph attention network, and the invention is suitable for common various types of short texts.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer-readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present invention may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + +, python, or the like, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are equivalent.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. A short text classification method based on an uncertainty perception heterogeneous graph attention network comprises the following steps:

s4, using the predicted value and the uncertainty value as confidence degrees, marking pseudo labels on the non-label data set, and selecting positive and negative sample pseudo labels;

2. The method of claim 1, wherein the heterogeneous graph attention network model comprises a heterogeneous graph attention network, a two-tier attention mechanism module and a pseudo-label selector, the heterogeneous graph attention network enabling propagation of information along a graph to capture relationships between adjacent nodes using a knowledge graph and unlabeled data; the double-layer attention mechanism module is used for extracting important characteristic information of neighbor nodes and endowing different types of nodes with different weights; the pseudo label selector uses the predicted values and the uncertainty values to create positive and negative pseudo labels for the unlabeled data to extend a training set in a training iterative process.

3. The method of claim 1, wherein short text, topic information, and entity knowledge in the knowledge graph are modeled using a heterogeneous information network structure, and a heterogeneous graph network node vector representation is obtained as an input to the heterogeneous graph attention network model.

4. The method of claim 1, wherein the heterogeneous information network architecture is configured to integrate a plurality of additional information and capture a relationship between the short text and the additional information.

5. The method of claim 4, wherein the additional information includes topic information, entity information and short text information, wherein the topic information is topic information of potential mining of short text by an implicit Dirichlet distributed clustering method; the entity information is represented by a vector obtained from a domain knowledge graph or a Wikipedia; the short text information is vector-expressed by using a pre-training language model to extract language features from the short text data.

6. The method according to claim 3, wherein the heterogeneous graph attention network model takes the heterogeneous information network structure as input, projects different types of information into an implicit public space, realizes a heterogeneous graph convolution neural network, and introduces a double-layer attention mechanism for extracting important characteristic information of neighbor nodes and giving different weights to different types of nodes.

7. The method of claim 1, wherein the dual-tier attention mechanism module is configured with a type-level attention mechanism and a node-level attention mechanism, the type-level attention mechanism being configured to characterize the importance of different types of nodes to a particular node, the node-level attention mechanism being configured to characterize the importance of nodes of the same type to the particular node.

8. The method according to claim 6, wherein the uncertainty value is obtained by predicting the same input by using the discarding rate for multiple times during the training process of the heterogeneous graph attention network model, and performing averaging and statistical variance operations on the multiple predicted values in parallel.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.

10. A computer device comprising a memory and a processor, on which memory a computer program is stored which is executable on the processor, characterized in that the processor realizes the steps of the method according to any one of claims 1 to 8 when executing the computer program.