CN117313709A - Method for detecting generated text based on statistical information and pre-training language model - Google Patents

Method for detecting generated text based on statistical information and pre-training language model Download PDF

Info

Publication number
CN117313709A
CN117313709A CN202311614320.1A CN202311614320A CN117313709A CN 117313709 A CN117313709 A CN 117313709A CN 202311614320 A CN202311614320 A CN 202311614320A CN 117313709 A CN117313709 A CN 117313709A
Authority
CN
China
Prior art keywords
text
model
statistical
learning model
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311614320.1A
Other languages
Chinese (zh)
Other versions
CN117313709B (en
Inventor
张勇东
毛震东
徐本峰
张立成
胡博
郭子康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202311614320.1A priority Critical patent/CN117313709B/en
Publication of CN117313709A publication Critical patent/CN117313709A/en
Application granted granted Critical
Publication of CN117313709B publication Critical patent/CN117313709B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of generated text detection, and discloses a generated text detection method based on statistical information and a pre-training language model, wherein a class label of a generated text is detected through a detection model consisting of a statistical learning model, a deep learning model and a dynamic fusion frame; the construction method of the detection model comprises the following steps: constructing a statistical learning model; constructing a deep learning model; constructing a dynamic fusion frame; based on the training dataset, the detection model is trained by computing a cross entropy loss function for the dynamically fused class label probability distribution and the true class label. The statistical learning model effectively relieves the problem of poor model migration under the condition of limited multi-field labeling data, the deep learning model gets rid of the problem of manual design characteristics, more implicit characteristics can be extracted, and the dynamic fusion framework improves the model migration capacity on the premise of losing less detection effect.

Description

Method for detecting generated text based on statistical information and pre-training language model
Technical Field
The invention relates to the technical field of generated text detection, in particular to a generated text detection method based on statistical information and a pre-training language model.
Background
With the development of large-scale language models, the generated text is more and more similar to human writing. But at the same time poses a serious security problem in that machine-generated text may be used to mislead people maliciously. The generation of text detection systems aimed at distinguishing whether text is generated by a machine or a human has become a research hotspot in the field of natural language processing in recent years. Although statistical learning models do not require a large amount of labeling data to train and are easily migrated to new areas, their detection accuracy tends to be low. The deep learning model can automatically extract the features, avoids inconvenience and effect dependence caused by manual design rules and features, can extract more implicit features, and can obtain better detection effects. Training these models requires a large amount of labeling data in the domain, and the effect of detection is greatly reduced when migrating to a new domain. However, obtaining high-quality labeling data in multiple fields is generally time-consuming and labor-consuming in many real-world scenarios, and thus how to build a good-performance generated text detection system under limited resources and data becomes a significant challenge.
Considering the situations that the statistical learning model has strong mobility but poor effect and the deep learning model has poor mobility, the invention hopes to combine the statistical learning model and the deep learning model to solve the problem of poor detection effect of the generated text in multiple fields.
Disclosure of Invention
In order to solve the technical problems, the invention provides a generated text detection method based on statistical information and a pre-training language model, which is characterized in that the statistical characteristics such as confusion degree, word frequency and the like are obtained through the language model, the depth characteristics of the text are extracted through a deep learning model, probability calibration is carried out on the prediction results of the statistical characteristics and the depth characteristics respectively, and finally dynamic fusion prediction of the generated text is realized.
In order to solve the technical problems, the invention adopts the following technical scheme:
a method for detecting generated text based on statistical information and a pre-training language model detects class labels of the generated text through a detection model consisting of a statistical learning model, a deep learning model and a dynamic fusion frame; the training data set adopted in the training of the detection model is recorded as,/>Corresponding tag set->And->For tag collection->For the length of the training dataset, +.>Is->A corresponding category label; text->Is a word sequence,/>Represents the first->Text->The%>Individual words->For text->Is a length of (2);
the construction method of the detection model comprises the following steps:
step one, constructing a statistical learning model:
the statistical learning model adopts an autoregressive language model; obtaining the generation probability of each word in the text to be detected through an autoregressive language modelCounting the number of words in the text that appear in the vocabulary in the top ten words respectivelyThe number of the previous hundred->The number of previous thousand->The method comprises the steps of carrying out a first treatment on the surface of the Generating probability based on each word->Calculate text +.>Probability of->According to->Calculate text +.>Is->The method comprises the steps of carrying out a first treatment on the surface of the Will be、/>、/>The confusion degree with the text is used as a statistical feature, and the category label probability distribution ++f of the text to be detected based on statistical feature prediction is obtained through a logistic regression classifier>
Step two, constructing a deep learning model:
the deep learning model adopts a self-coding language model, and the text to be detected is coded by the self-coding language model and then starts the initiator [ CLS ] of the text]Vector representation of (a)As whole textSemantic representation, then obtaining the class label probability distribution of the text to be detected based on depth coding feature prediction through a fully connected network and a classifier network>
Step three, constructing a dynamic fusion framework:
using tag smoothing to separate original single thermal tags fromThe range of values of (2) is extended to +.>,/>Is a constant representing the degree of smoothness, increasing the true probability distribution of the predicted class after label smoothing +.>The method comprises the following steps:
wherein the method comprises the steps ofClass labels representing predictions of statistical and deep learning models for which cross entropy loss functions are used, so +.>The class labels used for the prediction of the two models are collectively referred to, or +.>、/>And (3) representing. />K represents the total number of category labels for the real category labels; the cross entropy loss function of the detection model is:
original cross entropy loss of a logistic regression classifier and a classifier network is achieved, and finally, category label probability distribution which is predicted based on two characteristics and is dynamically fused is obtained through dynamic fusion>:/>The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>、/>Are all weight parameters;
step four, based on the training data set, calculating the informationAnd->Cross entropy loss function->To train the detection model.
Further, in step one, the probability of generation is based on each wordCalculate text +.>Probability of (2)When (1):
wherein,representing conditional probabilities.
Further, in step one, according toCalculate text +.>Is->When (1):
further, in the first step、/>、/>The confusion degree with the text is used as a statistical feature, and the category label probability distribution of the text based on statistical feature prediction is obtained through a logistic regression classifierWhen (1):
wherein,for logistic regression classifier,>representing a stitching operation.
Further, in the second step, the initiator [ CLS ] of the text is added]Vector representation of (a)As semantic representation of the whole text and obtaining the class label probability distribution ++of the text to be detected based on depth coding feature prediction through the fully connected network and classifier network>When (1):
wherein,for the activation function of the classifier network, +.>Is a fully connected network, < >>Is a bias parameter.
Compared with the prior art, the invention has the beneficial technical effects that:
the detection model comprises a statistical learning model, a deep learning model and a dynamic fusion framework; the statistical learning model provides statistical characteristics, so that the problem of poor model mobility under the condition of limited labeling data in multiple fields is effectively solved. The deep learning model gets rid of the problem of manual design characteristics, more implicit characteristics can be extracted, the pre-training language model provides potential inter-word correlation characteristics by means of strong coding capacity, and the detection effect of the model is improved. On one hand, the dynamic fusion framework uses label smoothing to calibrate the probability of the model, converts the probability of model prediction into real probability, on the other hand, the dynamic fusion framework combines the advantages of the statistical learning model and the deep learning model, greatly improves the migration capability of the model on the premise of losing less detection effect, obtains good detection effect in the new field, and has very wide application prospect.
Drawings
FIG. 1 is a schematic diagram of a detection model in an embodiment of the invention.
Detailed Description
A preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.
In the present invention, training data setsCorresponding tag set->And->Tag set->,/>Representing a human being->Indicating machine->Is the length of the training dataset. Text->Is a word sequence +.>,/>Represents->Personal text->The%>Individual words->Is->Is a length of (c). The goal of the task is to learn a pass +.>To predict the correct category label->Function of->
The detection model provided by the invention is shown in fig. 1, and comprises the following three parts: (1) statistical learning models; (2) a deep learning model; (3) dynamic fusion framework.
(1) Statistical learning model
The body of the statistical learning model adopts an autoregressive language model such as GPT-2, because the generation process of the autoregressive language model can better simulate the process of generating text by human beings. These models employ autoregressive means to predict the next word or token from the previously generated word or token, thereby progressively generating semantically coherent text. Since language models tend to sample words that have a higher probability of being generated, words that are selected by humans are more random. Thus selecting a language model such as GPT-2 to obtain the probability of generation of each wordProbability of generation->Representing the given +.>Condition of individual word->Predictive probability distribution of individual words, and statistics of the number of words in the text that appear in the vocabulary in the first ten, hundred and thousand, respectively, expressed as +.>、/>、/>
Each text is first calculated based on the probability of generation of each wordProbability of->
Thereby calculating the confusion degree of each text
The obtained statistical result of word ranking and the confusion degree of the text are used as statistical characteristics, and the category label probability distribution of the input text based on statistical characteristic prediction is obtained through a logistic regression classifier
Wherein the method comprises the steps ofFor logistic regression classifier,>representing a stitching operation.
(2) Deep learning model
The subject of the deep learning model employs an auto-coded language model such as BERT, rather than an autoregressive language model, because the auto-coded language model generally performs better on language understanding class tasks. After being encoded by a language model such as BERT, the initiator [ CLS ] of the text is encoded]Vector representation of (a)As semantic representation of the whole text, the class label probability distribution of the input text based on depth coding feature prediction is then obtained via a fully connected network and a classifier network>
Wherein the method comprises the steps ofFor the activation function of the classifier network, +.>Is a fully connected network, < >>Is a bias parameter.
(3) Dynamic fusion framework
In performing the classification tasks, it is generally only of interest if the output of the model is greater than a certain threshold, and not of interest as to how confidence. However atThe confidence measure is also important in the field of generating text detection. The model calibration aims to keep the model prediction probability consistent with the true experience probability, namely, the probability of model prediction is as close as possible to the true probability. The present invention uses label smoothing to separate the original single thermal label from the labelThe value range of (2) is extended to a larger range, i.e. +.>Wherein->Is a small number indicating the degree of smoothness. Increasing the true probability distribution of the label post-smoothing prediction category +.>The process is as follows:
wherein the method comprises the steps ofClass labels representing statistical and deep learning model predictions,>for the true category label, K represents the total number of categories, in this embodiment k=2. Cross entropy loss function->The change is as follows:
the original cross entropy loss of the logistic regression classifier and the classifier network is finally calculated by dynamicThe final class label probability distribution which is predicted based on two characteristics and dynamically fused is obtained through state fusion>
Wherein,,/>and->,/>And->The weight of each input probability distribution is controlled by adjusting +.>And->To obtain the best results.
Based on the training data set, by calculating the information aboutAnd->Is +.>To train the detection model.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a single embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to specific embodiments, and that the embodiments may be combined appropriately to form other embodiments that will be understood by those skilled in the art.

Claims (5)

1. A method for detecting generated text based on statistical information and a pre-training language model detects class labels of the generated text through a detection model consisting of a statistical learning model, a deep learning model and a dynamic fusion frame; the training data set adopted in the training of the detection model is recorded as,/>Corresponding tag set->And->,/>For tag collection->For the length of the training dataset, +.>Is->A corresponding category label; text->Is a word sequence,/>Represents the first->Text->The%>Individual words->For text->Is a length of (2);
the construction method of the detection model comprises the following steps:
step one, constructing a statistical learning model:
the statistical learning model adopts an autoregressive language model; obtaining the generation probability of each word in the text to be detected through an autoregressive language modelCounting the number of words in the text that appear in the vocabulary in the top ten words, respectively +.>The number of the previous hundred->The number of previous thousand->The method comprises the steps of carrying out a first treatment on the surface of the Generating probabilities on a per word basisCalculate text +.>Probability of->According to->Calculate text +.>Is->The method comprises the steps of carrying out a first treatment on the surface of the Will be、/>、/>The confusion degree with the text is used as a statistical feature, and the category label probability distribution ++f of the text to be detected based on statistical feature prediction is obtained through a logistic regression classifier>
Step two, constructing a deep learning model:
the deep learning model adopts a self-coding language model, and the text to be detected is coded by the self-coding language model and then starts the initiator [ CLS ] of the text]Vector representation of (a)As semantic representation of the whole text, then obtaining the class label probability distribution ++of the text to be detected based on the depth coding feature prediction through the fully connected network and the classifier network>
Step three, constructing a dynamic fusion framework:
using tag smoothing to separate original single thermal tags fromThe range of values of (2) is extended to +.>,/>Is a constant representing the degree of smoothness, increasing the true probability distribution of the predicted class after label smoothing +.>The method comprises the following steps:
wherein the method comprises the steps ofClass labels representing statistical and deep learning model predictions,>for true class labels, K generationA total number of table category labels; the cross entropy loss function of the detection model is:
original cross entropy loss of a logistic regression classifier and a classifier network is achieved, and finally, category label probability distribution which is predicted based on two characteristics and is dynamically fused is obtained through dynamic fusion>:/>The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>、/>Are all weight parameters;
step four, based on the training data set, calculating the informationAnd->Cross entropy loss function->To train the detection model.
2. The method for detecting generated text based on statistical information and pre-trained language model according to claim 1, wherein in step one, the probability of generation is based on each wordCalculate text +.>Probability of->When (1):
wherein,representing conditional probabilities.
3. The method for text detection based on statistics and pre-training language model according to claim 1, wherein in step one, according toCalculate text +.>Is->When (1):
4. the method for detecting generated text based on statistical information and pre-trained language model according to claim 1, wherein in step one, the method comprises the steps of、/>、/>The confusion degree with the text is used as a statistical feature, and the category label probability distribution of the text based on statistical feature prediction is obtained through a logistic regression classifier>When (1):
wherein,for logistic regression classifier,>representing a stitching operation.
5. The method for detecting generated text based on statistical information and pre-trained language model according to claim 1, characterized in that in step two, the initiator [ CLS ] of the text is entered]Vector representation of (a)As semantic representation of the whole text, and obtaining the category label probability distribution of the text to be detected based on depth coding feature prediction through a fully connected network and a classifier networkWhen (1):
wherein,for the activation function of the classifier network, +.>Is a fully connected network, < >>Is a bias parameter.
CN202311614320.1A 2023-11-29 2023-11-29 Method for detecting generated text based on statistical information and pre-training language model Active CN117313709B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311614320.1A CN117313709B (en) 2023-11-29 2023-11-29 Method for detecting generated text based on statistical information and pre-training language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311614320.1A CN117313709B (en) 2023-11-29 2023-11-29 Method for detecting generated text based on statistical information and pre-training language model

Publications (2)

Publication Number Publication Date
CN117313709A true CN117313709A (en) 2023-12-29
CN117313709B CN117313709B (en) 2024-03-29

Family

ID=89250323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311614320.1A Active CN117313709B (en) 2023-11-29 2023-11-29 Method for detecting generated text based on statistical information and pre-training language model

Country Status (1)

Country Link
CN (1) CN117313709B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556817A (en) * 2024-01-10 2024-02-13 国开启科量子技术(安徽)有限公司 Text detection method, device, equipment and medium based on quantum circuit

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180113856A1 (en) * 2016-10-26 2018-04-26 Abbyy Infopoisk Llc Producing training sets for machine learning methods by performing deep semantic analysis of natural language texts
US20200257943A1 (en) * 2019-02-11 2020-08-13 Hrl Laboratories, Llc System and method for human-machine hybrid prediction of events
WO2020248471A1 (en) * 2019-06-14 2020-12-17 华南理工大学 Aggregation cross-entropy loss function-based sequence recognition method
CN112214599A (en) * 2020-10-20 2021-01-12 电子科技大学 Multi-label text classification method based on statistics and pre-training language model
US20210365773A1 (en) * 2020-05-22 2021-11-25 Element Ai Inc. Method of and system for training machine learning algorithm to generate text summary
WO2022036616A1 (en) * 2020-08-20 2022-02-24 中山大学 Method and apparatus for generating inferential question on basis of low labeled resource
US20220067558A1 (en) * 2020-09-03 2022-03-03 International Business Machines Corporation Artificial intelligence explaining for natural language processing
US20220083739A1 (en) * 2020-09-14 2022-03-17 Smart Information Flow Technologies, Llc, D/B/A Sift L.L.C. Machine learning for joint recognition and assertion regression of elements in text
CN115081437A (en) * 2022-07-20 2022-09-20 中国电子科技集团公司第三十研究所 Machine-generated text detection method and system based on linguistic feature contrast learning
US20230153546A1 (en) * 2020-07-13 2023-05-18 Ai21 Labs Controllable reading guides and natural language generation
CN116757164A (en) * 2023-06-21 2023-09-15 张丽莉 GPT generation language recognition and detection system
CN116775880A (en) * 2023-06-29 2023-09-19 重庆邮电大学 Multi-label text classification method and system based on label semantics and transfer learning
CN117131877A (en) * 2023-09-12 2023-11-28 广州木木信息科技有限公司 Text detection method and system based on contrast learning

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180113856A1 (en) * 2016-10-26 2018-04-26 Abbyy Infopoisk Llc Producing training sets for machine learning methods by performing deep semantic analysis of natural language texts
US20200257943A1 (en) * 2019-02-11 2020-08-13 Hrl Laboratories, Llc System and method for human-machine hybrid prediction of events
WO2020248471A1 (en) * 2019-06-14 2020-12-17 华南理工大学 Aggregation cross-entropy loss function-based sequence recognition method
US20210365773A1 (en) * 2020-05-22 2021-11-25 Element Ai Inc. Method of and system for training machine learning algorithm to generate text summary
US20230153546A1 (en) * 2020-07-13 2023-05-18 Ai21 Labs Controllable reading guides and natural language generation
WO2022036616A1 (en) * 2020-08-20 2022-02-24 中山大学 Method and apparatus for generating inferential question on basis of low labeled resource
US20220067558A1 (en) * 2020-09-03 2022-03-03 International Business Machines Corporation Artificial intelligence explaining for natural language processing
US20220083739A1 (en) * 2020-09-14 2022-03-17 Smart Information Flow Technologies, Llc, D/B/A Sift L.L.C. Machine learning for joint recognition and assertion regression of elements in text
CN112214599A (en) * 2020-10-20 2021-01-12 电子科技大学 Multi-label text classification method based on statistics and pre-training language model
CN115081437A (en) * 2022-07-20 2022-09-20 中国电子科技集团公司第三十研究所 Machine-generated text detection method and system based on linguistic feature contrast learning
CN116757164A (en) * 2023-06-21 2023-09-15 张丽莉 GPT generation language recognition and detection system
CN116775880A (en) * 2023-06-29 2023-09-19 重庆邮电大学 Multi-label text classification method and system based on label semantics and transfer learning
CN117131877A (en) * 2023-09-12 2023-11-28 广州木木信息科技有限公司 Text detection method and system based on contrast learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李舟军;范宇;吴贤杰;: "面向自然语言处理的预训练技术研究综述", 计算机科学, no. 03, 24 March 2020 (2020-03-24) *
黄露;周恩国;李岱峰;: "融合特定任务信息注意力机制的文本表示学习模型", 数据分析与知识发现, no. 09, 9 June 2020 (2020-06-09) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556817A (en) * 2024-01-10 2024-02-13 国开启科量子技术(安徽)有限公司 Text detection method, device, equipment and medium based on quantum circuit
CN117556817B (en) * 2024-01-10 2024-05-24 国开启科量子技术(安徽)有限公司 Quantum circuit-based large model generation text detection method, device and equipment

Also Published As

Publication number Publication date
CN117313709B (en) 2024-03-29

Similar Documents

Publication Publication Date Title
CN110502749B (en) Text relation extraction method based on double-layer attention mechanism and bidirectional GRU
CN110609891B (en) Visual dialog generation method based on context awareness graph neural network
CN113254599B (en) Multi-label microblog text classification method based on semi-supervised learning
CN110245229B (en) Deep learning theme emotion classification method based on data enhancement
CN109086658B (en) Sensor data generation method and system based on generation countermeasure network
CN108416065B (en) Hierarchical neural network-based image-sentence description generation system and method
Liu et al. Chinese image caption generation via visual attention and topic modeling
Li et al. Improving convolutional neural network for text classification by recursive data pruning
CN109948158A (en) Emotional orientation analytical method based on environment member insertion and deep learning
CN108984530A (en) A kind of detection method and detection system of network sensitive content
CN108595601A (en) A kind of long text sentiment analysis method incorporating Attention mechanism
CN110909736B (en) Image description method based on long-term and short-term memory model and target detection algorithm
Li et al. Image sentiment prediction based on textual descriptions with adjective noun pairs
CN113536922A (en) Video behavior identification method for weighting fusion of multiple image tasks
CN111522908A (en) Multi-label text classification method based on BiGRU and attention mechanism
CN117313709B (en) Method for detecting generated text based on statistical information and pre-training language model
CN110727844B (en) Online commented commodity feature viewpoint extraction method based on generation countermeasure network
JP2015511733A (en) How to classify text
Yin et al. Sentiment lexical-augmented convolutional neural networks for sentiment analysis
CN109670169B (en) Deep learning emotion classification method based on feature extraction
Yue et al. Multi-task adversarial autoencoder network for face alignment in the wild
Chakraborty et al. Sign Language Recognition Using Landmark Detection, GRU and LSTM
CN113722536A (en) Video description method based on bilinear adaptive feature interaction and target perception
Yan et al. Image captioning based on a hierarchical attention mechanism and policy gradient optimization
CN116662924A (en) Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant