CN114722833A - Semantic classification method and device - Google Patents

Semantic classification method and device Download PDF

Info

Publication number
CN114722833A
CN114722833A CN202210412719.0A CN202210412719A CN114722833A CN 114722833 A CN114722833 A CN 114722833A CN 202210412719 A CN202210412719 A CN 202210412719A CN 114722833 A CN114722833 A CN 114722833A
Authority
CN
China
Prior art keywords
classified
text data
semantic
semantic classification
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210412719.0A
Other languages
Chinese (zh)
Inventor
冯铃
王鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202210412719.0A priority Critical patent/CN114722833A/en
Publication of CN114722833A publication Critical patent/CN114722833A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The invention provides a semantic classification method and a semantic classification device, wherein the method comprises the following steps: acquiring text data to be classified; inputting the text data to be classified into a semantic classification model, performing semantic understanding on the text data to be classified based on word embedding and a dependency tree of the text data to be classified, and classifying based on a semantic understanding result to obtain a classification result output by the semantic classification model; the semantic classification model is obtained by training based on the text sample and the class labels corresponding to the text sample, wherein each class label is predetermined according to the text sample and corresponds to the text sample one by one. The semantic classification method provided by the embodiment of the invention realizes the analysis of the text content through word embedding, and combines the dependency tree to obtain the deep semantic information of the text to be classified, thereby improving the accuracy of semantic understanding and the accuracy of semantic classification.

Description

Semantic classification method and device
Technical Field
The invention relates to the technical field of computers, in particular to a semantic classification method and a semantic classification device.
Background
With the explosive growth of information, manual annotation of data has become time consuming, of low quality, and affected by the subjective awareness of annotators. Therefore, the method has practical significance for automatically labeling the data by using a machine, can effectively overcome the problems by handing repeated and boring text labeling tasks to a computer for processing, and has the characteristics of consistency, high quality and the like of the labeled data. Currently, for rich language content, there is a method to establish a dictionary of words related to categories by the LIWC for corresponding classification of text data, i.e. dictionary-based text classification.
However, due to factors such as network, the meaning, the using method and the sentence pattern of more and more words change faster, and the text classification method based on the dictionary cannot adapt to the language environment with faster change speed. If the lexicon is not continuously improved, the lexicon-based classification method hardly provides satisfactory performance, resulting in low accuracy of language content classification.
Disclosure of Invention
The invention provides a semantic classification method, which is used for overcoming the defect of low accuracy rate of language content classification in the prior art and improving the accuracy rate of language content classification.
In a first aspect, the present invention provides a semantic classification method, including:
acquiring text data to be classified;
inputting the text data to be classified into a semantic classification model, performing semantic understanding on the text data to be classified based on word embedding and a dependency tree of the text data to be classified, classifying the text data based on semantic understanding results, and obtaining classification results output by the semantic classification model;
the semantic classification model is obtained by training based on a text sample and class labels corresponding to the text sample, and each class label is predetermined according to the text sample and corresponds to the text sample one by one.
Optionally, the semantic classification model comprises: an encoding module and a relationship module;
the step of inputting the text data to be classified into a semantic classification model, performing semantic understanding on the text data to be classified based on word embedding and a dependency tree of the text data to be classified, and classifying based on semantic understanding results to obtain classification results output by the semantic classification model includes:
inputting the text data to be classified into the coding module to obtain a vector to be classified output by the coding module;
and inputting the vectors to be classified into the relation module to obtain the classification result output by the relation module.
Optionally, the encoding module comprises a BERT unit, a dependency tree unit, a construction unit, a dependency graph unit, and an attention unit;
the inputting the text data to be classified into the encoding module to obtain the vector to be classified output by the encoding module includes:
inputting the text data to be classified into the BERT unit to obtain a word embedding vector set output by the BERT unit;
inputting the text data to be classified into the dependency tree unit to obtain a dependency tree output by the dependency tree unit;
inputting the dependency tree and the word embedding vector set into the construction unit to obtain a dependency graph output by the construction unit;
inputting the dependency graph into the dependency graph unit, and acquiring a first matrix to be classified output by the dependency graph unit;
and inputting the first matrix to be classified to the attention unit to obtain a first vector to be classified output by the attention unit.
Optionally, the inputting the dependency tree and the word embedding vector set to the constructing unit to obtain the dependency graph output by the constructing unit includes:
determining nodes of the dependency graph based on the set of word embedding vectors;
determining adjacency relationships between each of the nodes based on the dependency tree and an adjacency matrix;
the adjacency matrix is:
Figure BDA0003604417390000031
wherein A isi,jShowing the connection relationship between the ith node and the jth node, i is a positive integer greater than or equal to 1, j is a positive integer greater than or equal to 1, and T (W)i,Wj) Represents the ith node WiAnd the jth node WjThere is a relationship between them.
Optionally, the dependency graph unit includes at least one level of hidden layer; the output formula of the hidden layer at each stage is as follows:
Figure BDA0003604417390000032
Figure BDA0003604417390000033
wherein HIA hidden representation representing a level I hidden layer, ReLU representing a ReLU activation function,
Figure BDA0003604417390000034
denotes the normalized adjacency matrix, A denotes the adjacency matrix, D denotes the degree of A, WIRepresenting training parameters, bIRepresenting the training parameters.
Optionally, the encoding module further includes a picture unit and a connection unit;
the method further comprises the following steps:
acquiring to-be-classified picture data corresponding to the to-be-classified text data;
inputting the picture data to be classified into the picture unit to obtain a second matrix to be classified output by the picture unit;
splicing the first matrix to be classified and the second matrix to be classified to obtain a third matrix to be classified;
and inputting the third matrix to be classified to the attention unit to obtain a second vector to be classified output by the attention unit.
In a second aspect, the present invention further provides a semantic classification apparatus, including:
the acquisition module is used for acquiring text data to be classified;
the classification module is used for inputting the text data to be classified into a semantic classification model, performing semantic understanding on the text data to be classified based on word embedding and a dependency tree of the text data to be classified, classifying the text data based on semantic understanding results, and obtaining classification results output by the semantic classification model;
the semantic classification model is obtained by training based on a text sample and class labels corresponding to the text sample, and each class label is predetermined according to the text sample and corresponds to the text sample one by one.
In a third aspect, the present invention also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the semantic classification method according to the first aspect is implemented.
In a fourth aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the semantic classification method according to the first aspect.
In a fifth aspect, the present invention also provides a computer program product comprising a computer program which, when executed by a processor, implements the semantic classification method according to the first aspect.
According to the semantic classification method and device provided by the invention, the semantic understanding is carried out on the text data to be classified based on the word embedding and dependency tree of the text data to be classified, and the classification is carried out based on the semantic understanding result, so that the accuracy of the semantic understanding is improved, and the accuracy of the semantic classification is improved.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart of a semantic classification method according to an embodiment of the present invention;
FIG. 2 is a second flowchart illustrating a semantic classification method according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a semantic classification apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
The technical terms related to the invention are described as follows:
meta-learning (Meta-learning): meta-learning refers to the ability of a model to acquire a kind of "learning" so that it can quickly learn new tasks based on acquiring existing "knowledge", and its intention is to design a model capable of quickly learning new skills or adapting to new environments through a small number of training examples. Meta-learning systems are trained on a large number of tasks (tasks) and predict their ability to learn new tasks.
Artificial Intelligence (AI): AI is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. AI base technologies generally include, for example, sensors, dedicated AI chips, cloud computing, distributed storage, big data processing, operating/interactive systems, mechatronics, computer vision, speech processing, natural language processing, and machine learning/deep learning.
Attention (attention): is a very common, but overlooked fact. For example, when a bird in the sky flies over, the human attention tends to follow the bird, and the sky naturally becomes background (background) information in the human visual system. The basic idea of attention mechanism in computer vision is to let the system learn to focus on places of interest, ignoring background information and focusing on important information.
Loss Function (Loss Function): also called cost function or objective function, to measure how inconsistent the predicted value f (x) of the model is from the true value Y, usually a non-negative real-valued function denoted L (Y, f (x)). In general, the smaller the value of the loss function (i.e., the loss value), the better the model fits and the stronger the predictive power for new data. The loss function is a 'baton' of the training model in deep learning, and guides model parameter learning by back propagation of errors generated by marking of prediction samples and real samples. When the loss value of the loss function gradually decreases (converges), it can be considered that the model training is completed.
Semantic classification may be applied to social media-based pressure category detection. In today's society, people are experiencing various pressures. Excessive stress that is not relieved in time can cause various physiological and psychological problems, which are harmful to human health. Therefore, knowing the pressure source that causes the pressure is important to help people take effective measures to counter the pressure. Thanks to the informative linguistic content on social media, there have been methods to detect pressure under each category by building a pressure source dictionary of words of categories related to the pressure source by the LIWC, i.e. pressure category detection.
However, as network languages are popular on social platforms, more and more word usages and sentence patterns are changed, the dictionary-based pressure category detection method cannot meet the current social media environment, and the dictionary-based method is difficult to provide satisfactory performance due to limitations of dictionary update speed and update contents, thereby resulting in low classification accuracy for the social media-based pressure category.
The semantic classification method provided by the embodiment of the invention can be applied to pressure category classification based on social media, and the classification accuracy is improved.
The following describes a semantic classification method provided by an embodiment of the present invention with reference to fig. 1 to fig. 2.
Fig. 1 is a schematic flow diagram of a semantic classification method provided in an embodiment of the present invention, and as shown in fig. 1, the embodiment of the present invention provides a semantic classification method, including:
step 110, acquiring text data to be classified;
for example, in a social media-based pressure category classification scenario, the text data to be classified may be social media content, such as blog content posted by a user on social media or social content dynamically shared. If the user '123' publishes the dynamic content 'needs to study in the next day' on the social platform, the job is not written up yet. It should be understood that the text data to be classified may be captured in the network, and the source of the text data to be classified is not limited in the embodiment of the present invention.
Step 120, inputting the text data to be classified into a semantic classification model, performing semantic understanding on the text data to be classified based on the word embedding and dependency tree of the text data to be classified, and classifying the text data based on a semantic understanding result to obtain a classification result output by the semantic classification model;
the semantic classification model is obtained by training based on a text sample and class labels corresponding to the text sample, and each class label is predetermined according to the text sample and corresponds to the text sample one by one.
Specifically, a dependency tree (dependency tree) represents the syntactic information of a sentence using semantic edges, which describe the meaning of the vocabulary by the semantic framework to which the vocabulary is exposed.
According to the semantic classification method provided by the embodiment of the invention, text data which cannot be calculated is converted into a vector form through word embedding, further calculation can be carried out on text information, the analysis on the text content is realized, and the deep semantic information of the text to be classified is obtained by combining the characteristic that a dependency tree can span the surface layer syntactic structure constraint of a sentence, so that the accuracy of semantic understanding is improved, and the accuracy of semantic classification is improved.
In the following, a possible implementation manner of the above steps in a specific embodiment is further described.
Optionally, the semantic classification model comprises: an encoding module and a relationship module;
the step of inputting the text data to be classified into a semantic classification model, performing semantic understanding on the text data to be classified based on word embedding and a dependency tree of the text data to be classified, and classifying based on semantic understanding results to obtain classification results output by the semantic classification model includes:
step 121, inputting the text data to be classified into the encoding module, and performing semantic understanding on the text data to be classified based on word embedding and a dependency tree of the text data to be classified to obtain a vector to be classified output by the encoding module;
and 123, inputting the vector to be classified into the relation module to obtain a classification result output by the relation module.
Specifically, the encoding module is used for acquiring a category-related representation of the text data to be classified; the relation module is used for determining which category the text data to be classified belongs to. The relation module can obtain the category vector representation of a plurality of categories based on training, and can also obtain the category vector representation based on a knowledge base and the like. And calculating the distance between the vector to be classified and each category vector in the relation module, and taking the distance as a matching score. The distance may include a cosine distance or a euclidean distance, and the specific calculation method is the same as that in the prior art, which is not described herein again.
Optionally, the encoding module comprises a BERT unit, a dependency tree unit, a construction unit, a dependency graph unit, and an attention unit;
the inputting the text data to be classified into the encoding module to obtain the vector to be classified output by the encoding module includes:
step 1211, inputting the text data to be classified into the BERT unit, and obtaining a word embedding vector set output by the BERT unit;
it should be understood that each word in the text data corresponds to a word embedding vector.
In particular, the BERT cell is a pre-trained BERT model. The BERT model may provide a powerful context-dependent representation and may be used for various target tasks. Therefore, in the embodiment of the present invention, a pre-trained BERT is used to obtain a word embedding vector set corresponding to text data to be classified. The word embedding vector set can embody semantic representation of text data to be classified, and the semantic representation of words can effectively understand the semantics of the whole sentence.
Step 1212, inputting the text data to be classified into the dependency tree unit, and obtaining a dependency tree output by the dependency tree unit;
specifically, the dependency tree unit may be formed by a dependency tree generation tool such as spaCy, and the tool for generating the dependency tree is not limited in the embodiment of the present invention.
Step 1213, inputting the dependency tree and the word embedding vector set to the construction unit, and obtaining a dependency graph output by the construction unit;
specifically, the dependency graph is an undirected graph which can represent semantic meaning of a text object to be classified, and comprises nodes and edges, wherein the nodes are word embedding vectors, and the edges are relations in a dependency tree. The weight of an edge may be valued based on the dependency between nodes. Exemplarily, when an edge between two nodes takes a value of zero, it indicates that there is no dependency between the two nodes; when the edge value between the two nodes is a non-zero positive value, the dependency relationship between the two nodes is represented, and the larger the edge value between the two nodes is, the stronger the dependency relationship between the two nodes is represented.
Optionally, the inputting the dependency tree and the word embedding vector set to the construction unit to obtain the dependency graph output by the construction unit includes:
step 12131, determining nodes of the dependency graph based on the set of word embedding vectors;
specifically, each node in the dependency graph corresponds to a word embedding vector, i.e., one node corresponds to one word. If the node 1 corresponds to tomorrow, the node 2 corresponds to "study", the node 3 corresponds to "job", the node 4 corresponds to "none", and the node 5 corresponds to "write".
Step 12132, determining the adjacency relation between each node based on the dependency tree and the adjacency matrix;
the adjacency matrix is calculated as follows:
Figure BDA0003604417390000091
wherein A isi,jRepresenting the connection relationship between the ith node and the jth node, Ai,j∈RY×YI.e. Ai,jA node matrix of Y × Y dimensions, i being a positive integer of 1 or more, j being a positive integer of 1 or more, T (W)i,Wj) Represents the ith node WiAnd the jth node WjThere is a relationship between A i,j1 indicates that there is an edge between the node of the ith node and the node of the jth node, ai,j0 means that there is no edge between the node of the ith node and the node of the jth node. It is understood that Ai,jThe connection relationship between the ith node and the jth node can be determined by a dependency tree.
Table 1 is a schematic table of an adjacency matrix provided in the embodiment of the present invention, and as shown in table 1, the adjacency matrix corresponding to the example in step 12131 is:
TABLE 1. adjacency matrix schematic Table
Node sequence number 1 2 3 4 5
1 1 1 0 0 0
2 1 1 0 0 0
3 0 0 1 1 1
4 0 0 1 1 1
5 0 0 1 1 1
Step 1214, inputting the dependency graph into the dependency graph unit, and acquiring a first matrix to be classified output by the dependency graph unit;
in particular, the dependency graph unit may be a dependency graph convolution network that learns syntactic dependency information using a graph convolution network. The dependency graph convolutional network takes a dependency graph and a word embedding vector corresponding to each node in the dependency graph as input, and each node in the layer I is updated according to the hidden representation of the neighborhood:
Figure BDA0003604417390000101
Figure BDA0003604417390000102
wherein HIHidden representation of hidden layer of layer I, HI∈RY×d,HIA Y x d dimensional matrix with one node representation for each row. The ReLU denotes a ReLU activation function,
Figure BDA0003604417390000103
denotes the normalized adjacency matrix, A denotes the adjacency matrix, D denotes the degree of A, WIIs a training parameter of layer I, WI∈Rd×d,WIIs a d x d dimensional matrix, bIIs a training parameter for layer I.
It is understood that HIThe method is represented by the dependency information which is obtained after the dependency graph convolution network is strengthened, and the normalized adjacency matrix can express the syntactic structure and reduce the complexity of the syntactic structure, so that the time complexity is reduced. The activation function may be a ReLU function, or may be another activation function.
According to the semantic classification method provided by the embodiment of the invention, the dependency tree and the word embedding are combined, so that the syntax structure and the word meaning are combined, the semantic classification model can understand the semantics of the whole text data to be classified from the syntax structure and the word meaning, and the accuracy of semantic understanding of the text data to be classified is improved.
Step 1215, inputting the first matrix to be classified into the attention unit, and obtaining a first vector to be classified output by the attention unit.
Specifically, the attention unit is constructed based on an attention mechanism, and taking the application scene of pressure category classification based on social media as an example, the attention unit can enable the semantic classification model to learn attention to the pressure-related information.
Optionally, the attention unit may obtain a vector to be classified based on an attention vector formula and the matrix to be classified.
The attention vector formula is:
v=Softmax(PWp+bp);
F=vTP;
where v denotes the attention vector, v ∈ Rt×1V is a matrix of dimension t × 1, Softmax denotes the Softmax function, P denotes the matrix to be classified, WpRepresenting a training parameter, Wp∈Rd×1,WpIs a d × 1 dimensional matrix, bpRepresenting training parameters, F is a vector to be classified and is used for representing text data to be classified, and F belongs to R1×dF is a matrix of dimensions 1 × d, vTRepresenting the transpose of v. It should be understood that in the case where the input data is only text data to be classified, P is the first matrix to be classified and F is the first vector to be classified.
Optionally, the encoding module further includes a picture unit and a connection unit;
the method further comprises the following steps:
step 111, acquiring to-be-classified picture data corresponding to the to-be-classified text data;
for example, in a social media-based stress category detection scenario, a user may compose a text on a social media, such as a published text "do not worry today", and compose a medical picture, which may indicate that the user is stressed due to physical health, and may be used for language content classification because the picture data may provide rich content information. It should be understood that the semantic classification model may be obtained by training a text sample and a category label corresponding to the text sample, an image sample and a category label corresponding to the image sample, a combined sample and a category label corresponding to the combined sample. The combined sample comprises a text sample and a picture sample corresponding to the text sample.
Step 1216, inputting the picture data to be classified into the picture unit, and obtaining a second matrix to be classified output by the picture unit;
specifically, the picture unit is configured to encode picture data to obtain a second matrix to be classified, where the second matrix to be classified is used to represent a picture meaning. Optionally, the picture unit may include ResNet. ResNet can efficiently extract image features from social media data, so picture features can be extracted based on a pre-trained 34-layer ResNet. The final second matrix to be classified is X ═ X1,x2,…,xδWhere δ denotes the number of picture data (if fig. 3 is provided in the text data to be classified, δ is equal to 3), and xiRepresenting the vector representation of the ith picture, wherein i is more than or equal to 1 and less than or equal to delta; x is formed by Rδ×dThe second matrix to be classified is a matrix of δ × d dimensions.
Step 1217, splicing the first matrix to be classified and the second matrix to be classified to obtain a third matrix to be classified;
taking the simplified first matrix to be classified and the simplified second matrix to be classified as a simple example:
Figure BDA0003604417390000121
Figure BDA0003604417390000122
Figure BDA0003604417390000123
h is a text representation of the text data to be classified, namely a first matrix to be classified, X is a picture representation of the text data to be classified, namely a second matrix to be classified, con represents a splicing function, P represents a third matrix to be classified, and P belongs to Rt×dP is a matrix of dimensions t × d, where t ═ Y + δ.
Step 1218, inputting the third matrix to be classified to the attention unit, and obtaining a second vector to be classified output by the attention unit.
It should be understood that the attention unit is described in step 1215 and will not be described in detail here. The attention unit fuses the modalities of the text data to be classified and the image data to be classified based on an attention mechanism, so that the semantic classification model learns to pay attention to the pressure related information.
The process of generating the second to-be-classified vector by the attention unit is described in step 1215, and is not described herein again. It should be understood that in the case of the input being the third classification matrix, P is the third matrix to be classified and F is the second vector to be classified in the attention vector formula.
It should be understood that the vector to be classified in the embodiment of the present invention includes any one of the first vector to be classified and the second vector to be classified.
Referring to fig. 2, fig. 2 is a second schematic flowchart of the semantic classification method according to the embodiment of the present invention. The semantic classification model provided by the embodiment of the invention can be obtained by meta-learning based on a training set and a test set, wherein the training set comprises training samples and corresponding classification labels, the test set comprises test samples and corresponding classification labels, the support set comprises at least one category, and each category comprises at least one support sample and a classification label corresponding to each support sample.
The training set and training set may each include any one or combination of the following sample types:
a text sample and a category label corresponding to the text sample;
the method comprises the steps of obtaining a picture sample and a category label corresponding to the picture sample;
a combined sample and a category label corresponding to the combined sample;
the combined sample comprises a text sample and a picture sample corresponding to the text sample.
Under the condition that the semantic classification model is obtained after meta-learning based on a training set and a test set, the semantic classification method may further include:
acquiring a support set;
and inputting the support set and the text data to be classified into a semantic classification model to obtain a classification result output by the semantic classification model.
The support set comprises at least one category, and each category comprises at least one support sample and a classification label corresponding to each support sample. The support set may be obtained from a database or manually labeled, and the source of the support set is not limited in the embodiment of the present invention.
Through meta-learning, in the training process of the semantic classification model, through a large amount of training, different tasks are encountered in each training, and samples which are not seen in the previous task exist in the task. The semantic classification model needs to learn a new task every time, and after a large amount of training, the semantic classification model can have the capability of processing the new task. The new task can be a small sample task, the support set comprises a small number of marked samples which are not seen in the previous training of the semantic classification model, and after meta-learning, the semantic classification model can classify and identify the unmarked text data to be classified with the help of the small samples. The learning mode enables the semantic classification model to better process the difference between tasks after being trained by different tasks for a plurality of times, and the characteristics of specific tasks are ignored. Therefore, accurate classification is realized based on a small amount of labeled samples when a new task of small sample classification is met.
According to the semantic classification method provided by the embodiment of the invention, the semantic classification model can have the ability of learning how to learn through meta-learning, so that after the semantic classification model is trained on frequently-appearing samples, the semantic classification model does not need to be adjusted, and only needs a small amount of data of infrequently-appearing samples, so that the infrequently-appearing classes in the data to be classified (including the text data to be classified and/or the image data to be classified) can be identified.
Under the condition that the semantic classification model is obtained after meta-learning based on a training set and a test set, the semantic classification model may further include: a summarizing module;
the semantic classification method may further include:
and step 122, inputting the support set into the induction module, and obtaining a category vector set output by the induction module, wherein the category vector set comprises at least one category vector, and each category vector corresponds to the classification label one by one.
Optionally, the induction module comprises a vector unit and an empowerment unit;
the inputting the support set into the induction module to obtain a category vector set output by the induction module includes:
step 1221, inputting the support set to the vector unit, and obtaining an expert vector set output by the vector unit, where the expert vector set includes at least one expert vector, and each expert vector corresponds to a support sample in the support set one to one;
specifically, the support set may include N categories of support samples, each category may include K support samples, and the process of converting the support samples into the expert vectors may refer to steps 1211 to 1218.
And 1222, inputting the expert vector set to an empowerment unit, performing empowerment operation on the expert vector set based on a preset empowerment formula, and obtaining an empowerment unit output category vector set.
In particular, the induction module is configured to generate a class representation corresponding to each class from the support set. The empowerment unit is designed based on a mixed expert mechanism and is used for generating a category representation corresponding to each category based on an expert vector set. The weighting unit weights each expert vector in each category so that a plurality of different expert vectors with different weights can be combined to represent the corresponding category, thereby enabling the semantic classification model to utilize key information of each expert vector (i.e. support sample) and ignore noise. That is, each support sample in the support set is considered an expert, each expert aiming at learning information related to a category. All experts share the same parameters.
Optionally, the preset weighting formula may be determined by a data modeling algorithm — an optimal weighting method, or may be determined by a combined weighting method or an index weighting method, and the specific form of the preset weighting formula is not limited in the embodiment of the present invention.
Optionally, the preset weighting formula comprises an initial value formula, a gate value formula and a category expression function;
the initial value formula is as follows:
ηi=Relu(eiWg+bg);
the gate value formula is:
Figure BDA0003604417390000151
the class representation function is:
Figure BDA0003604417390000161
wherein eta isiDenotes the initial gate value corresponding to the ith expert sample, Relu denotes the ReLU activation function, eiRepresenting the expert vector, W, corresponding to the ith expert samplegRepresenting training parameters, bgRepresenting a training parameter, giRepresenting the final gate value corresponding to the ith expert sample, exp representing an exponential function, K representing K samples contained in a category, cjAnd a category vector representing the jth category, i being a positive integer greater than or equal to 1, j being a positive integer greater than or equal to 1, and K being a positive integer greater than or equal to 1.
Specifically, the gate value is a non-linear transformation of the expert vector, and the gate value calculation is used for semantic classification model learning to measure the importance of expert information. It should be understood that the gate value unit in fig. 2 is used to calculate the gate values by the gate value formula and the reconciliation unit is used to calculate the class representations.
According to the semantic classification method provided by the embodiment of the invention, the expression (expert vector) of the support sample is weighted, so that the semantic classification model can ignore noise by using the key information of each expert vector, and the accuracy of class expression is improved.
Under the condition that the semantic classification model is obtained after meta-learning based on a training set and a test set, the semantic classification method may further include:
step 1231, inputting the vector to be classified and the category vector set to the relationship module, and obtaining a classification result output by the relationship module, including:
and inputting the vector to be classified and the class vector set into the relation module, determining a class vector most similar to the vector to be classified based on cosine similarity, and obtaining a classification result output by the relation module.
Specifically, the relationship module is used for measuring the correlation between the query vector (i.e. the vector to be classified) and the category (i.e. each category vector in the category vector set). Cosine similarity shows great superiority in measuring the relationship between two vectors. Therefore, the embodiment of the invention uses a cosine similarity formula to calculate the similarity between the vector to be classified and the category vector:
the cosine similarity formula is:
Figure BDA0003604417390000162
wherein q represents a vector to be classified, ciA class vector representing the ith class, i being a positive integer, j being a positive integer,
Figure BDA0003604417390000171
representing a relationship score in the range-1 to 1, with a score near 1 representing the more two vectorsSimilarly.
And classifying the text data to be classified into the category corresponding to the maximum relation score.
According to the semantic classification method provided by the embodiment of the invention, the vectors to be classified and the category vectors are compared one by one, and the semantic meaning of the text data to be classified can be comprehensively represented by the vectors to be classified, so that the accuracy of the category vectors in representing the categories is improved by weighting, and the accuracy of the social media pressure category classification is improved.
Optionally, the semantic classification model is obtained by training through the following steps: :
step 210, performing meta-training on the semantic classification model based on the training set, the loss function and a preset meta-training task number;
220, performing meta-testing on the semantic classification model based on the test set and a preset meta-testing task number;
and alternately executing the steps until preset iteration times are used up or the meta-test result output by the semantic classification model reaches preset accuracy.
Specifically, table 1 is a meta learning algorithm provided by an embodiment of the present invention.
TABLE 1 Meta learning Algorithm
Figure BDA0003604417390000172
Figure BDA0003604417390000181
Meta-training in order to efficiently utilize the training set, we employ a meta-training strategy. In each training epamode, a meta-task is constructed to compute gradients and update our model. As shown in table 1, the meta-task is formed by randomly sampling N classes from the training set classes, then selecting K labeled samples for each class as the support set S, and a portion of the remaining samples for those classes as the query set Q. The meta-training process aims at learning how to learn meaningful features to minimize the loss on the query set Q.
This meta-training mechanism allows the semantic classification model to learn the general parts of different meta-tasks, e.g. how to extract important features and compare sample similarities, while forgetting the task specific parts in the meta-task. Thus, the model can still work efficiently in the face of new meta-tasks.
After meta-training is completed, the same epicode construction mechanism is applied to test whether the semantic classification model can be directly applied to a new category. To create a test epicode, first N new classes are randomly drawn from the test set. The test support set and test query set are then sampled from the N classes. The output of the semantic classification model may represent the average performance of the set of queries that test the epicode.
And under the condition that the accuracy of the output result of the semantic classification model after the meta-test does not reach the preset accuracy, performing meta-training on the semantic classification model again, detecting the semantic classification model which is subjected to the meta-training again, and stopping training the semantic classification model until the iteration times are used up or the accuracy of the output result of the semantic classification model reaches the preset accuracy.
Optionally, in the meta-training phase, we use the mean square error to optimize our model.
The semantic classification method provided by the embodiment of the invention enables the semantic classification model to have the ability of learning how through meta-learning, so that the semantic classification model can still ensure the classification accuracy on the basis of small sample data, and for example, in a social media-based pressure classification scene, after the semantic classification model is trained on frequently-occurring pressure categories, the semantic classification model does not need to be adjusted, and only a few data of the infrequently-occurring pressure categories can be used for directly identifying the infrequently-occurring pressure categories.
The following describes the semantic classification device provided by the present invention, and the semantic classification device described below and the semantic classification method described above may be referred to in correspondence with each other.
Fig. 3 is a schematic structural diagram of a semantic classification device according to an embodiment of the present invention, and as shown in fig. 3, the semantic classification device according to the embodiment of the present invention includes:
an obtaining module 310, configured to obtain text data to be classified;
the classification module 320 is configured to input the text data to be classified into a semantic classification model, perform semantic understanding on the text data to be classified based on the word embedding and dependency tree of the text data to be classified, perform classification based on a semantic understanding result, and obtain a classification result output by the semantic classification model;
the semantic classification model is obtained by training based on a text sample and class labels corresponding to the text sample, and each class label is predetermined according to the text sample and corresponds to the text sample one by one.
It should be noted that, the apparatus provided in the embodiment of the present invention can implement all the method steps implemented by the method embodiment and achieve the same technical effect, and detailed descriptions of the same parts and beneficial effects as the method embodiment in this embodiment are omitted here.
Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. Processor 410 may call logic instructions in memory 430 to perform semantic classification methods, including: acquiring text data to be classified; inputting the text data to be classified into a semantic classification model, performing semantic understanding on the text data to be classified based on word embedding and a dependency tree of the text data to be classified, classifying the text data based on semantic understanding results, and obtaining classification results output by the semantic classification model; the semantic classification model is obtained by training based on a text sample and class labels corresponding to the text sample, and each class label is predetermined according to the text sample and corresponds to the text sample one by one.
In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product includes a computer program, the computer program can be stored on a non-transitory computer readable storage medium, when the computer program is executed by a processor, a computer can execute the semantic classification method provided by the above methods, the method includes: acquiring text data to be classified; inputting the text data to be classified into a semantic classification model, performing semantic understanding on the text data to be classified based on word embedding and a dependency tree of the text data to be classified, classifying the text data based on semantic understanding results, and obtaining classification results output by the semantic classification model; the semantic classification model is obtained by training based on text samples and class labels corresponding to the text samples, and each class label is predetermined according to the text samples and corresponds to the text samples one by one.
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the semantic classification method provided by the above methods, the method comprising: acquiring text data to be classified; inputting the text data to be classified into a semantic classification model, performing semantic understanding on the text data to be classified based on word embedding and a dependency tree of the text data to be classified, classifying the text data based on semantic understanding results, and obtaining classification results output by the semantic classification model; the semantic classification model is obtained by training based on a text sample and class labels corresponding to the text sample, and each class label is predetermined according to the text sample and corresponds to the text sample one by one.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of semantic classification, comprising:
acquiring text data to be classified;
inputting the text data to be classified into a semantic classification model, performing semantic understanding on the text data to be classified based on the word embedding and dependency tree of the text data to be classified, classifying the text data to be classified based on a semantic understanding result, and obtaining a classification result output by the semantic classification model;
the semantic classification model is obtained by training based on a text sample and class labels corresponding to the text sample, and each class label is predetermined according to the text sample and corresponds to the text sample one by one.
2. The semantic classification method according to claim 1, characterized in that the semantic classification model comprises: an encoding module and a relationship module;
the step of inputting the text data to be classified into a semantic classification model, performing semantic understanding on the text data to be classified based on word embedding and a dependency tree of the text data to be classified, and classifying based on semantic understanding results to obtain classification results output by the semantic classification model includes:
inputting the text data to be classified into the coding module, and performing semantic understanding on the text data to be classified based on word embedding and a dependency tree of the text data to be classified to obtain a vector to be classified output by the coding module;
and inputting the vectors to be classified into the relation module to obtain the classification result output by the relation module.
3. The semantic classification method according to claim 2, characterized in that the coding module comprises a BERT unit, a dependency tree unit, a construction unit, a dependency graph unit and an attention unit;
the inputting the text data to be classified into the encoding module, performing semantic understanding on the text data to be classified based on word embedding and a dependency tree of the text data to be classified, and obtaining a vector to be classified output by the encoding module, includes:
inputting the text data to be classified into the BERT unit to obtain a word embedding vector set output by the BERT unit;
inputting the text data to be classified into the dependency tree unit to obtain a dependency tree output by the dependency tree unit;
inputting the dependency tree and the word embedding vector set into the construction unit to obtain a dependency graph output by the construction unit;
inputting the dependency graph into the dependency graph unit, and acquiring a first matrix to be classified output by the dependency graph unit;
and inputting the first matrix to be classified into the attention unit to obtain a first vector to be classified output by the attention unit.
4. The semantic classification method according to claim 3, wherein the inputting the dependency tree and the word embedding vector set into the construction unit to obtain the dependency graph output by the construction unit comprises:
determining nodes of the dependency graph based on the set of word embedding vectors;
determining an adjacency relationship between each of the nodes based on the dependency tree and an adjacency matrix;
the adjacency matrix is:
Figure FDA0003604417380000021
wherein A isi,jShowing the connection relationship between the ith node and the jth node, i is a positive integer greater than or equal to 1, j is a positive integer greater than or equal to 1, and T (W)i,Wj) Represents the ith node WiAnd the jth node WjThere is a relationship between them.
5. The semantic classification method according to claim 4, characterized in that the dependency graph element comprises at least one level of hidden layers; the output formula of the hidden layer at each stage is as follows:
Figure FDA0003604417380000022
Figure FDA0003604417380000023
wherein HIA hidden representation representing a hidden representation of the level I hidden layer, ReLU representing a ReLU activation function,
Figure FDA0003604417380000031
denotes the normalized adjacency matrix, A denotes the adjacency matrix, D denotes the degree of A, WIRepresenting training parameters, bIRepresenting the training parameters.
6. The semantic classification method according to claim 3, characterized in that the coding module further comprises a picture unit and a connection unit;
the method further comprises the following steps:
acquiring to-be-classified picture data corresponding to the to-be-classified text data;
inputting the picture data to be classified into the picture unit to obtain a second matrix to be classified output by the picture unit;
splicing the first matrix to be classified and the second matrix to be classified to obtain a third matrix to be classified;
and inputting the third matrix to be classified to the attention unit to obtain a second vector to be classified output by the attention unit.
7. A semantic classification apparatus, comprising:
the acquisition module is used for acquiring text data to be classified;
the classification module is used for inputting the text data to be classified into a semantic classification model, performing semantic understanding on the text data to be classified based on word embedding and a dependency tree of the text data to be classified, classifying the text data based on semantic understanding results, and obtaining classification results output by the semantic classification model;
the semantic classification model is obtained by training based on text samples and class labels corresponding to the text samples, and each class label is predetermined according to the text samples and corresponds to the text samples one by one.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the semantic classification method according to any one of claims 1 to 6 when executing the program.
9. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the semantic classification method according to any one of claims 1 to 6.
10. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the semantic classification method according to any one of claims 1 to 6.
CN202210412719.0A 2022-04-19 2022-04-19 Semantic classification method and device Pending CN114722833A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210412719.0A CN114722833A (en) 2022-04-19 2022-04-19 Semantic classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210412719.0A CN114722833A (en) 2022-04-19 2022-04-19 Semantic classification method and device

Publications (1)

Publication Number Publication Date
CN114722833A true CN114722833A (en) 2022-07-08

Family

ID=82243009

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210412719.0A Pending CN114722833A (en) 2022-04-19 2022-04-19 Semantic classification method and device

Country Status (1)

Country Link
CN (1) CN114722833A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116127079A (en) * 2023-04-20 2023-05-16 中电科大数据研究院有限公司 Text classification method
CN117332789A (en) * 2023-12-01 2024-01-02 诺比侃人工智能科技(成都)股份有限公司 Semantic analysis method and system for dialogue scene

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116127079A (en) * 2023-04-20 2023-05-16 中电科大数据研究院有限公司 Text classification method
CN116127079B (en) * 2023-04-20 2023-06-20 中电科大数据研究院有限公司 Text classification method
CN117332789A (en) * 2023-12-01 2024-01-02 诺比侃人工智能科技(成都)股份有限公司 Semantic analysis method and system for dialogue scene

Similar Documents

Publication Publication Date Title
US10861456B2 (en) Generating dialogue responses in end-to-end dialogue systems utilizing a context-dependent additive recurrent neural network
CN107679234B (en) Customer service information providing method, customer service information providing device, electronic equipment and storage medium
CN106502985B (en) neural network modeling method and device for generating titles
CN110737758A (en) Method and apparatus for generating a model
CN106557563B (en) Query statement recommendation method and device based on artificial intelligence
CN110457708B (en) Vocabulary mining method and device based on artificial intelligence, server and storage medium
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN113127624B (en) Question-answer model training method and device
CN106778878B (en) Character relation classification method and device
JP6291443B2 (en) Connection relationship estimation apparatus, method, and program
CN114722833A (en) Semantic classification method and device
WO2023137911A1 (en) Intention classification method and apparatus based on small-sample corpus, and computer device
CN111930792B (en) Labeling method and device for data resources, storage medium and electronic equipment
CN111274790A (en) Chapter-level event embedding method and device based on syntactic dependency graph
CN113128206B (en) Question generation method based on word importance weighting
CN113657100A (en) Entity identification method and device, electronic equipment and storage medium
CN115309915B (en) Knowledge graph construction method, device, equipment and storage medium
CN113836271B (en) Method and product for natural language processing
Cao et al. Unsupervised construction of knowledge graphs from text and code
CN116644148A (en) Keyword recognition method and device, electronic equipment and storage medium
CN114936274A (en) Model training method, dialogue generating device, dialogue training equipment and storage medium
CN110633363B (en) Text entity recommendation method based on NLP and fuzzy multi-criterion decision
CN113886530A (en) Semantic phrase extraction method and related device
CN114676699A (en) Entity emotion analysis method and device, computer equipment and storage medium
CN114398482A (en) Dictionary construction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination