CN114861629B - Automatic judgment method for text style - Google Patents

Automatic judgment method for text style Download PDF

Info

Publication number
CN114861629B
CN114861629B CN202210475512.8A CN202210475512A CN114861629B CN 114861629 B CN114861629 B CN 114861629B CN 202210475512 A CN202210475512 A CN 202210475512A CN 114861629 B CN114861629 B CN 114861629B
Authority
CN
China
Prior art keywords
model
text
style
data
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210475512.8A
Other languages
Chinese (zh)
Other versions
CN114861629A (en
Inventor
陈峥
陈建树
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210475512.8A priority Critical patent/CN114861629B/en
Publication of CN114861629A publication Critical patent/CN114861629A/en
Application granted granted Critical
Publication of CN114861629B publication Critical patent/CN114861629B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to an automatic judgment method of a text style, which belongs to the technical field of artificial intelligence, obtains an automatic text style judgment model through the flow of text style judgment of specific style label extraction, deep learning model training tuning and the like, deploys the judgment model in a text judgment system, increases the efficiency of text screening, does not need retraining the model when a new label is added, better accords with human understanding, and has good expandability.

Description

Automatic judgment method for text style
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an automatic text style judging method.
Background
Currently in some automated text generation systems, there is a need to obtain text that satisfies high level stylistic constraints, such as lyrics that require the text to implement a certain emotion. Generally, the computing power of today's devices can automatically generate a large amount of target text in a short time, but there are few automated methods for screening and evaluating text. The common situation is that manual screening is needed, on one hand, a large amount of texts can make a human screening staff dazzling, and the screening quantity which can be carried out in a certain time is extremely limited, so that the working efficiency is low; on the other hand, the consumption of physical strength, the loss of spirit and even the condition of mood greatly influence the subjective judgment of a manual screener, and further influence the judgment of results. Therefore, manual screening has two major disadvantages: 1. The labor and time costs are high; 2. the evaluation result is greatly influenced by subjective factors.
In the prior art, a small number of methods for automatically screening and evaluating text classification by using a machine are also available, and the method mainly comprises the steps of setting K labels in advance, abstracting a text classification task into a K-dimensional classification model, inputting each sample into the model to obtain a K-dimensional vector, and enabling each dimension to represent the probability that the corresponding label is true. There are two disadvantages to this approach: 1. when the model needs to judge a plurality of tags, the probability of other tags is influenced by the maximum probability tag due to the mutual exclusion characteristic, and when all tags are false, the maximum probability tag is still given; 2. when new labels need to be added, data needs to be reconstructed and the model needs to be retrained, expansibility is limited, and efficiency is low.
Disclosure of Invention
In order to solve the technical problems, the invention provides an automatic text style evaluation method, and aims to reduce heavy workload and labor time cost of manual evaluation, improve screening efficiency, reduce misjudgment rate and improve expansibility of a classification model.
In order to achieve the above object, the present invention provides an automatic text style judging method, which comprises the following steps:
step 1), carrying out syntactic analysis on the existing text comment data by using a syntactic analysis tool HanLP to obtain an analyzed data result; step 2), single data style label extraction:
a) Self-defining Node nodes, recursively constructing a multi-branch tree by using a bottom-up method, restoring a data result obtained by analysis into a tree structure of a phrase structure tree, and recording phrase properties and word contents of each Node by using a hash table A;
b) Taking the node type as VP and the number of words contained in the VP as a screening rule to extract labels, filtering the data nodes in the hash table A according to the screening rule to obtain a result meeting the condition, and storing the result meeting the condition in another hash table B;
step 3), constructing a full-volume style label set: performing the operation of the step 2) on each piece of data in the database to obtain a final hash table B, performing reverse ordering according to the frequency of phrases, and taking the first K phrases as style labels of model training data, wherein the style labels need to be judged by using a longest common substring algorithm, two labels with high text similarity cannot be used, and the model is an ALBERT pre-training model;
step 4), model training data construction: constructing a binary data set by using a negative sampling mode, keeping the proportion balance of positive and negative samples, marking one positive sample of the data as 1, then randomly selecting a style label which does not appear in the comment, constructing a negative sample by using the same splicing mode, and marking the label as 0;
and 5) training and optimizing the deep learning model, finely adjusting the ALBERT pre-training model by adopting a deep learning model training frame, and performing performance verification on a verification set.
Preferably, step 5) specifically comprises the following steps:
a. disorganizing the sequence in the constructed data set and sequentially inputting the sequence into an ALBERT pre-training model in small batches;
b. the ALBERT pre-training model carries out pre-processing operation on input, converts the input into one-hot vectors, carries out embedding operation, and then embeds position information and fragment information, wherein the fragment id of a label text is 0, and the fragment id of a composition text is 1;
c. inputting the preprocessed result into a neural network, respectively operating with three weight matrixes to obtain Q, K, V three matrixes, and respectively operating Q, K, V through a self-attention module to obtain an attention score matrix between each character and other characters, wherein the operation mode is as follows:
Figure SMS_1
wherein, Z i For the encoded vector, T is the matrix transpose, M is the mask matrix, d k Is a single-headed attention hiding layer vector dimension, i is a positive integer from 1 to n;
d. using multi-head attention to Z 1 ~Z n Splicing the input matrix X and the multi-head attention input matrix X together, and then transmitting the input matrix X into a linear layer to obtain a final output Z with the same dimension as the multi-head attention input matrix X;
e. after the final output Z of the input matrix X in the same dimension is obtained, residual error connection is carried out by utilizing the final output Z of the multi-head attention module and the X, then layer normalization operation is carried out, and the input of each layer of neurons is converted into the mean value variance which is converted into the standard normal distribution;
f. a feedforward module in the ALBERT pre-training model processes the result by using two layers of full connection layers, so that the output dimension is consistent with the input dimension, then residual connection and layer normalization operation are performed again, the output is used as the input of the next cycle, and N times of cycles are performed;
g. sending CLS vectors in the ALBERT pre-training model into a linear layer, activating, performing loss operation by adopting a two-class cross entropy loss function, and performing model parameter optimization by back propagation;
h. and repeating the steps a-g until the model training is completed.
The invention has the beneficial effects that:
1) The style label is extracted in a syntactic analysis mode, and the obtained label text is more consistent with human understanding;
2) The invention uses the understandable text to splice with the input text to construct two classification tasks, and the model obtains the understanding of the label text, so that the addition of a new label does not need to retrain the model, thereby having good expansibility and greatly improving the text classification efficiency;
3) The invention can greatly reduce labor and time cost, improve screening efficiency and reduce misjudgment rate.
Drawings
FIG. 1 is an exemplary diagram of a phrase structure tree for style keyword extraction according to the present invention;
FIG. 2 is a schematic diagram of the classification method of the ALBERT model of the present invention.
Detailed Description
Fig. 1 shows an example of the present invention performing style keyword extraction to obtain Phrase structure tree, which is illustrated by using only one short sentence "xiaoming to go to the middle and see the electronic product" due to limited space, wherein the Phrase structure tree always contains words of the sentence as its leaf nodes, and other non-leaf nodes represent the constituent components of the sentence, usually Verb phrases (Verb Phrase, VP) and Noun phrases (Noun Phrase, NP).
The present invention will be described in detail below with reference to a comment (as an example of existing text comment data) corresponding to a composition, the contents of which are: the prose is light in meaning, durable in taste and rich in meaning, and the writer sings and rings the praise to the youth from different angles. The article language is heavy and frustrated, is rich in romantic colors, and is said to be a successful practice. ".
The invention provides an automatic judgment method of a text style, which comprises the following steps:
step 1), carrying out syntactic analysis on the existing text comment data by using a syntactic analysis tool HanLP, and obtaining a data result of a list structure in python as follows: <xnotran> [ [ 'IP', [ [ 'IP', [ [ 'NP', [ [ 'DP', [ [ '_', [ '' ] ], [ 'CLP', [ [ '_', [ '' ] ] ] ] ] ], [ 'NP', [ [ '_', [ '' ] ] ] ] ] ], [ '_', [ ',' ] ], [ 'IP', [ [ 'IP', [ [ 'IP', [ [ 'NP', [ [ '_', [ '' ] ] ] ], [ 'VP', [ [ 'VP', [ [ '_', [ '' ] ] ] ], [ '_', [ '' ] ], [ 'VP', [ [ '_', [ '' ] ] ] ] ] ] ] ], [ '_', [ ',' ] ], [ 'VP', [ [ '_', [ '' ] ], [ 'NP', [ [ 'CP', [ [ 'CP', [ [ 'IP', [ [ 'VP', [ [ '_', [ '' ] ] ] ] ] ], [ '_', [ '' ] ] ] ] ] ], [ 'NP', [ [ '_', [ '' ] ] ] ] ] ] ] ] ] ], [ '_', [ ',' ] ], [ 'IP', [ [ 'NP', [ [ '_', [ '' ] ] ] ], [ 'VP', [ [ 'PP', [ [ '_', [ '' ] ], [ 'NP', [ [ 'CP', [ [ 'CP', [ [ 'ADJP', [ [ '_', [ '' ] ] ] ], [ '_', [ '' ] ] ] ] ] ], [ 'NP', [ [ '_', [ '' ] ] ] ] ] ] ] ], [ '_', [ ',' ] ], [ 'VP', [ [ 'VRD', [ [ '_', [ '' ] ], [ '_', [ '' ] ] ] ], </xnotran> "[ ', [ ' is ' ] ], [ ' NP ', [ ' DNP ', [ ' PP ', [ ' is ' to ' ] ], [ ' NP ', [ ' is ' ] ] ] ] ] ], the present invention is directed to the compositions of the present invention, wherein the compositions are in the form of [ ', [ ', [ ' ] ] ] ], [ ' ] ] ] ] ] ] ] ], [ '. <xnotran> '] ], [' IP ', [ [' NP ', [ [' _ ', [' '] ] ] ], [' IP ', [ [' IP ', [ [' IP ', [ [' NP ', [ [' _ ', [' '] ] ] ], [' VP ', [ [' VCD ', [ [' _ ', [' '] ], [' _ ', [' '] ] ] ] ] ] ] ], [' _ ', [', '] ], [' VP ', [ [' _ ', [' '] ], [' NP ', [ [' ADJP ', [ [' _ ', [' '] ] ] ], [' NP ', [ [' _ ', [' '] ] ] ] ] ] ] ] ] ], [' _ ', [', '] ], [' IP ', [ [' VP ', [ [' ADVP ', [ [' _ ', [' '] ] ] ], [' VP ', [ [' _ ', [' '] ], [' NP ', [ [' QP ', [ [' _ ', [' '] ], [' CLP ', [ [' _ ', [' '] ] ] ] ] ], [' CP ', [ [' CP ', [ [' IP ', [ [' VP ', [ [' _ ', [' '] ] ] ] ] ], [' _ ', [' '] ] ] ] ] ], [' NP ', [ [' _ ', [' '] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ], [' _ ', ['. </xnotran> ' ] ] ], wherein underlining indicates that the word is a terminal leaf node;
step 2), single data style label extraction:
a) Self-defining Node nodes, recursively constructing a multi-branch tree by using a bottom-up method, restoring a data result obtained by analysis into a tree structure of a phrase structure tree, and recording phrase properties and word contents of each Node by using a hash table A;
b) Taking the node type as VP and the number of words contained in the VP as a screening rule to extract labels, filtering the data nodes in the hash table A according to the screening rule to obtain a result meeting the condition, and storing the result meeting the condition in another hash table B;
through the step 2), phrases meeting the conditions, such as ' shallow appearance and human taste resistance ', ' rich implications ', ' depression and pause and contusion ', and rich romantic colors ' can be obtained;
step 3), constructing a full-volume style label set: performing the operation of the step 2) on each piece of data in the database to obtain a final hash table B, performing reverse ordering according to the frequency of phrases, taking the first K phrases as style labels of model training data (the value of K is determined according to subsequent experiments), judging the style labels by using a longest common substring algorithm, and not using two labels with high text similarity (the length of the longest common substring can be defined by itself), wherein the model is an ALBERT pre-training model;
step 4), model training data construction: constructing a binary data set by using a negative sampling mode, and keeping proportion balance of positive samples and negative samples, wherein one positive sample of data is exemplified by the article of "[ CLS ]: light and durable for people to look for the flavor. [ SEP ]18 years old, a flower of normal age, when the wind of youth blows, people always have the young breath and overflow our lives … …' with the label of 1; then randomly selecting a style label which does not appear in the comment, such as 'good at citing typical cases', constructing a negative sample by using the same splicing mode, wherein the label of the negative sample is 0; because the number of labels is relatively large and positive samples are sparse, a data set is constructed by using a negative sampling mode;
step 5), training and optimizing a deep learning model, finely adjusting an ALBERT pre-training model by adopting a deep learning model training frame, and performing performance verification on a verification set, wherein the method specifically comprises the following steps:
a. disorganizing the sequence in the constructed data set and sequentially inputting the sequence into an ALBERT pre-training model in small batches;
b. the method comprises the steps that an ALBERT pre-training model carries out preprocessing operation on input, converts the input into one-hot vectors and carries out embedding (embedding) operation, and then embeds position information and fragment information, wherein the fragment id of a label text is 0, and the fragment id of a composition text is 1;
c. inputting the preprocessed result into a neural network, and respectively operating the preprocessed result with three weight matrixes to obtain Q, K, V three matrixes, and respectively operating Q, K, V through a self-attention module to obtain an attention score matrix between each character and other characters, wherein the operation mode is as follows:
Figure SMS_2
wherein Z is i For the encoded vector, T is matrix transpose, M is mask matrix, d k Is a vector dimension of a single-head attention hiding layer, and i is a positive integer from 1 to n;
d. z is expressed by Multi-Head Attention (Multi-Head Attention) 1 ~Z n Splicing (concat) together, and then introducing a Linear (Linear) layer to obtain a final output Z with the same dimension as a Multi-Head Attention (Multi-Head Attention) input matrix X;
e. the method comprises the steps that an added & Norm Layer is arranged in an ALBERT pre-training model and consists of an added part and a Norm part, after a final output Z of an input matrix X in the same dimension is obtained, residual error connection (added) is carried out on the final output Z and the X of a Multi-Head Attention (Multi-Head Attention) module, then Layer Normalization (Layer Normalization) operation is carried out, and the input of each Layer of neurons is converted into a mean variance which is converted into a standard normal distribution Layer Norm (X + Z);
f. a Feed-Forward (Feed Forward) module in the ALBERT pre-training model processes a result by using two layers of full connection layers, so that the output dimension is consistent with the input dimension, then residual connection and layer normalization operation are performed once again, the output is used as the input of the next cycle, and the cycle is performed for N times.
g. Sending CLS vectors in an ALBERT pre-training model into a linear layer, activating, performing loss (loss) operation by adopting a two-class cross entropy loss function, and performing model parameter optimization by back propagation, wherein the loss (loss) operation formula is as follows:
loss=y n ·log(x n )+(1-y n )·log(1-x n ),
wherein, y n Is a real label with a value range of {0,1}, x n Is the probability that the sample output by the model is positive, and the value range is (0,1);
h. and repeating the steps a-g until the model training is completed.
The present invention and its embodiments have been described above, and the description is not intended to be limiting, and the drawings are only one embodiment of the present invention, and the actual structure is not limited thereto. In summary, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (3)

1. An automatic judgment method for text style is characterized by comprising the following steps:
step 1), carrying out syntactic analysis on the existing text comment data by using a syntactic analysis tool HanLP to obtain a data result of a list structure in python;
step 2), single data style label extraction:
a) Self-defining Node nodes, recursively constructing a multi-branch tree by using a bottom-up method, restoring a data result obtained by analysis into a tree structure of a phrase structure tree, and recording phrase properties and word contents of each Node by using a hash table A;
b) Taking the node type as VP and the number of words as 3-5 as a screening rule to extract a label, filtering the data nodes of the hash table A according to the screening rule to obtain a result meeting the condition, and storing the result meeting the condition in another hash table B, wherein the VP is a verb phrase;
step 3), constructing a full-volume style label set: performing the operation of the step 2) on each piece of data in the database to obtain a final hash table B, performing reverse sequencing according to the frequency of the phrases, and taking the first K phrases as style labels of model training data, wherein the model is an ALBERT pre-training model;
step 4), model training data construction: constructing a binary data set by using a negative sampling mode, keeping the proportion balance of positive and negative samples, identifying one positive sample of the data by using a label 1, then randomly selecting a style label which does not appear in the comment, constructing a negative sample by using the same splicing mode, and identifying the negative sample by using a label 0;
step 5), deep learning model training and tuning: and (3) fine tuning the ALBERT pre-training model by adopting a deep learning model training frame, and performing performance verification on a verification set.
2. The automatic judgment method of text style according to claim 1, wherein in step 3), style labels are judged repeatedly by using the longest common substring algorithm, and two labels with high text similarity cannot be used.
3. The method for automatically judging the style of text according to claim 2, wherein the step 5) comprises the following steps:
a. disordering the sequence in the constructed data set and sequentially inputting the sequence into an ALBERT pre-training model in small batches;
b. the ALBERT pre-training model carries out pre-processing operation on input, converts the input into one-hot vectors, carries out embedding operation, and then embeds position information and fragment information, wherein the fragment id of a label text is 0, and the fragment id of a composition text is 1;
c. inputting the result after the preprocessing operation into a neural network, respectively operating with three weight matrixes to obtain Q, K, V three matrixes, and Q, K, V respectively obtaining an attention score matrix between each character and other characters through a self-attention module, wherein the operation mode is as follows:
Figure FDA0003625301810000021
wherein, Z i For the encoded vector, T is the matrix transpose, M is the mask matrix, d k Is a single-headed attention hiding layer vector dimension, i is a positive integer from 1 to n;
d. using multi-head attention to Z 1 ~Z n Splicing the input matrix X and the multi-head attention input matrix X together, and then transmitting the input matrix X into a linear layer to obtain a final output Z with the same dimension as the multi-head attention input matrix X;
e. after the final output Z of the input matrix X in the same dimension is obtained, residual error connection is carried out by utilizing the final output Z of the multi-head attention module and the X, then layer normalization operation is carried out, the input of each layer of neurons is converted into the mean value variance, and the mean value variance is converted into the standard normal distribution LayerNorm (X + Z);
f. a feedforward module in the ALBERT pre-training model processes a result by using two layers of full connection layers, so that the dimension of output is consistent with that of input, then residual connection and layer normalization operation are performed again, the output is used as the input of the next cycle, and the cycle is performed for N times;
g. sending CLS vectors in an ALBERT pre-training model into a linear layer, activating, performing loss operation by adopting a two-class cross entropy loss function, and performing model parameter optimization by back propagation, wherein the loss, namely the loss, has the operation formula as follows:
loss=y n ·log(x n )+(1-y n )·log(1-x n ),
wherein, y n Is a real label with a value range of {0,1}, x n Is the probability that the sample output by the model is positive, and the value range is (0,1);
h. and repeating the steps a-g until the model training is completed.
CN202210475512.8A 2022-04-29 2022-04-29 Automatic judgment method for text style Active CN114861629B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210475512.8A CN114861629B (en) 2022-04-29 2022-04-29 Automatic judgment method for text style

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210475512.8A CN114861629B (en) 2022-04-29 2022-04-29 Automatic judgment method for text style

Publications (2)

Publication Number Publication Date
CN114861629A CN114861629A (en) 2022-08-05
CN114861629B true CN114861629B (en) 2023-04-04

Family

ID=82635015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210475512.8A Active CN114861629B (en) 2022-04-29 2022-04-29 Automatic judgment method for text style

Country Status (1)

Country Link
CN (1) CN114861629B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122416A (en) * 2017-03-31 2017-09-01 北京大学 A kind of Chinese event abstracting method
CN112101004A (en) * 2020-09-23 2020-12-18 电子科技大学 General webpage character information extraction method based on conditional random field and syntactic analysis
CN112214599A (en) * 2020-10-20 2021-01-12 电子科技大学 Multi-label text classification method based on statistics and pre-training language model
KR20210037934A (en) * 2019-09-30 2021-04-07 한국과학기술원 Method and system for trust level evaluationon personal data collector with privacy policy analysis
CN113158674A (en) * 2021-04-01 2021-07-23 华南理工大学 Method for extracting key information of document in field of artificial intelligence

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122416A (en) * 2017-03-31 2017-09-01 北京大学 A kind of Chinese event abstracting method
KR20210037934A (en) * 2019-09-30 2021-04-07 한국과학기술원 Method and system for trust level evaluationon personal data collector with privacy policy analysis
CN112101004A (en) * 2020-09-23 2020-12-18 电子科技大学 General webpage character information extraction method based on conditional random field and syntactic analysis
CN112214599A (en) * 2020-10-20 2021-01-12 电子科技大学 Multi-label text classification method based on statistics and pre-training language model
CN113158674A (en) * 2021-04-01 2021-07-23 华南理工大学 Method for extracting key information of document in field of artificial intelligence

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Alberto Holts等.Automated Text Binary Classification Using Machine Learning Approach.《2010 XXIX International Conference of the Chilean Computer Science Society》.2011,第212-217页. *
景栋盛 等.基于深度Q网络的垃圾邮件文本分类方法.《计算机与现代化》.2020,(第06期),第89-94页. *

Also Published As

Publication number Publication date
CN114861629A (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN112732916B (en) BERT-based multi-feature fusion fuzzy text classification system
CN111401061A (en) Method for identifying news opinion involved in case based on BERT and Bi L STM-Attention
CN108519890A (en) A kind of robustness code abstraction generating method based on from attention mechanism
CN109308319B (en) Text classification method, text classification device and computer readable storage medium
CN107025284A (en) The recognition methods of network comment text emotion tendency and convolutional neural networks model
CN110580292A (en) Text label generation method and device and computer readable storage medium
CN110222163A (en) A kind of intelligent answer method and system merging CNN and two-way LSTM
CN111027595B (en) Double-stage semantic word vector generation method
CN110502753A (en) A kind of deep learning sentiment analysis model and its analysis method based on semantically enhancement
CN111506732B (en) Text multi-level label classification method
CN107688576B (en) Construction and tendency classification method of CNN-SVM model
CN113516198B (en) Cultural resource text classification method based on memory network and graphic neural network
CN110046356B (en) Label-embedded microblog text emotion multi-label classification method
CN111078833A (en) Text classification method based on neural network
CN112749274A (en) Chinese text classification method based on attention mechanism and interference word deletion
CN114461804B (en) Text classification method, classifier and system based on key information and dynamic routing
CN112487237B (en) Music classification method based on self-adaptive CNN and semi-supervised self-training model
CN110009025A (en) A kind of semi-supervised additive noise self-encoding encoder for voice lie detection
CN114004220A (en) Text emotion reason identification method based on CPC-ANN
CN112905736A (en) Unsupervised text emotion analysis method based on quantum theory
CN115935975A (en) Controllable-emotion news comment generation method
CN114648016A (en) Event argument extraction method based on event element interaction and tag semantic enhancement
CN114328934A (en) Attention mechanism-based multi-label text classification method and system
CN114564563A (en) End-to-end entity relationship joint extraction method and system based on relationship decomposition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant