CN117313709A - Method for detecting generated text based on statistical information and pre-training language model - Google Patents
Method for detecting generated text based on statistical information and pre-training language model Download PDFInfo
- Publication number
- CN117313709A CN117313709A CN202311614320.1A CN202311614320A CN117313709A CN 117313709 A CN117313709 A CN 117313709A CN 202311614320 A CN202311614320 A CN 202311614320A CN 117313709 A CN117313709 A CN 117313709A
- Authority
- CN
- China
- Prior art keywords
- text
- model
- statistical
- learning model
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 title claims description 25
- 238000001514 detection method Methods 0.000 claims abstract description 34
- 238000013136 deep learning model Methods 0.000 claims abstract description 22
- 230000004927 fusion Effects 0.000 claims abstract description 16
- 230000006870 function Effects 0.000 claims abstract description 8
- 238000010276 construction Methods 0.000 claims abstract description 3
- 238000007477 logistic regression Methods 0.000 claims description 11
- 238000009499 grossing Methods 0.000 claims description 7
- 239000003999 initiator Substances 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 238000013179 statistical model Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 10
- 238000002372 labelling Methods 0.000 abstract description 5
- 238000013461 design Methods 0.000 abstract description 3
- 238000013508 migration Methods 0.000 abstract description 3
- 230000005012 migration Effects 0.000 abstract description 3
- 241000282414 Homo sapiens Species 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 102100033814 Alanine aminotransferase 2 Human genes 0.000 description 2
- 101710096000 Alanine aminotransferase 2 Proteins 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/27—Regression, e.g. linear or logistic regression
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/096—Transfer learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the technical field of generated text detection, and discloses a generated text detection method based on statistical information and a pre-training language model, wherein a class label of a generated text is detected through a detection model consisting of a statistical learning model, a deep learning model and a dynamic fusion frame; the construction method of the detection model comprises the following steps: constructing a statistical learning model; constructing a deep learning model; constructing a dynamic fusion frame; based on the training dataset, the detection model is trained by computing a cross entropy loss function for the dynamically fused class label probability distribution and the true class label. The statistical learning model effectively relieves the problem of poor model migration under the condition of limited multi-field labeling data, the deep learning model gets rid of the problem of manual design characteristics, more implicit characteristics can be extracted, and the dynamic fusion framework improves the model migration capacity on the premise of losing less detection effect.
Description
Technical Field
The invention relates to the technical field of generated text detection, in particular to a generated text detection method based on statistical information and a pre-training language model.
Background
With the development of large-scale language models, the generated text is more and more similar to human writing. But at the same time poses a serious security problem in that machine-generated text may be used to mislead people maliciously. The generation of text detection systems aimed at distinguishing whether text is generated by a machine or a human has become a research hotspot in the field of natural language processing in recent years. Although statistical learning models do not require a large amount of labeling data to train and are easily migrated to new areas, their detection accuracy tends to be low. The deep learning model can automatically extract the features, avoids inconvenience and effect dependence caused by manual design rules and features, can extract more implicit features, and can obtain better detection effects. Training these models requires a large amount of labeling data in the domain, and the effect of detection is greatly reduced when migrating to a new domain. However, obtaining high-quality labeling data in multiple fields is generally time-consuming and labor-consuming in many real-world scenarios, and thus how to build a good-performance generated text detection system under limited resources and data becomes a significant challenge.
Considering the situations that the statistical learning model has strong mobility but poor effect and the deep learning model has poor mobility, the invention hopes to combine the statistical learning model and the deep learning model to solve the problem of poor detection effect of the generated text in multiple fields.
Disclosure of Invention
In order to solve the technical problems, the invention provides a generated text detection method based on statistical information and a pre-training language model, which is characterized in that the statistical characteristics such as confusion degree, word frequency and the like are obtained through the language model, the depth characteristics of the text are extracted through a deep learning model, probability calibration is carried out on the prediction results of the statistical characteristics and the depth characteristics respectively, and finally dynamic fusion prediction of the generated text is realized.
In order to solve the technical problems, the invention adopts the following technical scheme:
a method for detecting generated text based on statistical information and a pre-training language model detects class labels of the generated text through a detection model consisting of a statistical learning model, a deep learning model and a dynamic fusion frame; the training data set adopted in the training of the detection model is recorded as,/>Corresponding tag set->And->,For tag collection->For the length of the training dataset, +.>Is->A corresponding category label; text->Is a word sequence,/>Represents the first->Text->The%>Individual words->For text->Is a length of (2);
the construction method of the detection model comprises the following steps:
step one, constructing a statistical learning model:
the statistical learning model adopts an autoregressive language model; obtaining the generation probability of each word in the text to be detected through an autoregressive language modelCounting the number of words in the text that appear in the vocabulary in the top ten words respectivelyThe number of the previous hundred->The number of previous thousand->The method comprises the steps of carrying out a first treatment on the surface of the Generating probability based on each word->Calculate text +.>Probability of->According to->Calculate text +.>Is->The method comprises the steps of carrying out a first treatment on the surface of the Will be、/>、/>The confusion degree with the text is used as a statistical feature, and the category label probability distribution ++f of the text to be detected based on statistical feature prediction is obtained through a logistic regression classifier>;
Step two, constructing a deep learning model:
the deep learning model adopts a self-coding language model, and the text to be detected is coded by the self-coding language model and then starts the initiator [ CLS ] of the text]Vector representation of (a)As whole textSemantic representation, then obtaining the class label probability distribution of the text to be detected based on depth coding feature prediction through a fully connected network and a classifier network>;
Step three, constructing a dynamic fusion framework:
using tag smoothing to separate original single thermal tags fromThe range of values of (2) is extended to +.>,/>Is a constant representing the degree of smoothness, increasing the true probability distribution of the predicted class after label smoothing +.>The method comprises the following steps:
;
wherein the method comprises the steps ofClass labels representing predictions of statistical and deep learning models for which cross entropy loss functions are used, so +.>The class labels used for the prediction of the two models are collectively referred to, or +.>、/>And (3) representing. />K represents the total number of category labels for the real category labels; the cross entropy loss function of the detection model is:
;
original cross entropy loss of a logistic regression classifier and a classifier network is achieved, and finally, category label probability distribution which is predicted based on two characteristics and is dynamically fused is obtained through dynamic fusion>:/>The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>、/>Are all weight parameters;
step four, based on the training data set, calculating the informationAnd->Cross entropy loss function->To train the detection model.
Further, in step one, the probability of generation is based on each wordCalculate text +.>Probability of (2)When (1):
;
wherein,representing conditional probabilities.
Further, in step one, according toCalculate text +.>Is->When (1):
。
further, in the first step、/>、/>The confusion degree with the text is used as a statistical feature, and the category label probability distribution of the text based on statistical feature prediction is obtained through a logistic regression classifierWhen (1):
;
wherein,for logistic regression classifier,>representing a stitching operation.
Further, in the second step, the initiator [ CLS ] of the text is added]Vector representation of (a)As semantic representation of the whole text and obtaining the class label probability distribution ++of the text to be detected based on depth coding feature prediction through the fully connected network and classifier network>When (1):
;
wherein,for the activation function of the classifier network, +.>Is a fully connected network, < >>Is a bias parameter.
Compared with the prior art, the invention has the beneficial technical effects that:
the detection model comprises a statistical learning model, a deep learning model and a dynamic fusion framework; the statistical learning model provides statistical characteristics, so that the problem of poor model mobility under the condition of limited labeling data in multiple fields is effectively solved. The deep learning model gets rid of the problem of manual design characteristics, more implicit characteristics can be extracted, the pre-training language model provides potential inter-word correlation characteristics by means of strong coding capacity, and the detection effect of the model is improved. On one hand, the dynamic fusion framework uses label smoothing to calibrate the probability of the model, converts the probability of model prediction into real probability, on the other hand, the dynamic fusion framework combines the advantages of the statistical learning model and the deep learning model, greatly improves the migration capability of the model on the premise of losing less detection effect, obtains good detection effect in the new field, and has very wide application prospect.
Drawings
FIG. 1 is a schematic diagram of a detection model in an embodiment of the invention.
Detailed Description
A preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.
In the present invention, training data setsCorresponding tag set->And->Tag set->,/>Representing a human being->Indicating machine->Is the length of the training dataset. Text->Is a word sequence +.>,/>Represents->Personal text->The%>Individual words->Is->Is a length of (c). The goal of the task is to learn a pass +.>To predict the correct category label->Function of->。
The detection model provided by the invention is shown in fig. 1, and comprises the following three parts: (1) statistical learning models; (2) a deep learning model; (3) dynamic fusion framework.
(1) Statistical learning model
The body of the statistical learning model adopts an autoregressive language model such as GPT-2, because the generation process of the autoregressive language model can better simulate the process of generating text by human beings. These models employ autoregressive means to predict the next word or token from the previously generated word or token, thereby progressively generating semantically coherent text. Since language models tend to sample words that have a higher probability of being generated, words that are selected by humans are more random. Thus selecting a language model such as GPT-2 to obtain the probability of generation of each wordProbability of generation->Representing the given +.>Condition of individual word->Predictive probability distribution of individual words, and statistics of the number of words in the text that appear in the vocabulary in the first ten, hundred and thousand, respectively, expressed as +.>、/>、/>。
Each text is first calculated based on the probability of generation of each wordProbability of->:
;
Thereby calculating the confusion degree of each text:
;
The obtained statistical result of word ranking and the confusion degree of the text are used as statistical characteristics, and the category label probability distribution of the input text based on statistical characteristic prediction is obtained through a logistic regression classifier:
;
Wherein the method comprises the steps ofFor logistic regression classifier,>representing a stitching operation.
(2) Deep learning model
The subject of the deep learning model employs an auto-coded language model such as BERT, rather than an autoregressive language model, because the auto-coded language model generally performs better on language understanding class tasks. After being encoded by a language model such as BERT, the initiator [ CLS ] of the text is encoded]Vector representation of (a)As semantic representation of the whole text, the class label probability distribution of the input text based on depth coding feature prediction is then obtained via a fully connected network and a classifier network>:
;
Wherein the method comprises the steps ofFor the activation function of the classifier network, +.>Is a fully connected network, < >>Is a bias parameter.
(3) Dynamic fusion framework
In performing the classification tasks, it is generally only of interest if the output of the model is greater than a certain threshold, and not of interest as to how confidence. However atThe confidence measure is also important in the field of generating text detection. The model calibration aims to keep the model prediction probability consistent with the true experience probability, namely, the probability of model prediction is as close as possible to the true probability. The present invention uses label smoothing to separate the original single thermal label from the labelThe value range of (2) is extended to a larger range, i.e. +.>Wherein->Is a small number indicating the degree of smoothness. Increasing the true probability distribution of the label post-smoothing prediction category +.>The process is as follows:
。
wherein the method comprises the steps ofClass labels representing statistical and deep learning model predictions,>for the true category label, K represents the total number of categories, in this embodiment k=2. Cross entropy loss function->The change is as follows:
。
the original cross entropy loss of the logistic regression classifier and the classifier network is finally calculated by dynamicThe final class label probability distribution which is predicted based on two characteristics and dynamically fused is obtained through state fusion>:
;
Wherein,,/>and->,/>And->The weight of each input probability distribution is controlled by adjusting +.>And->To obtain the best results.
Based on the training data set, by calculating the information aboutAnd->Is +.>To train the detection model.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a single embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to specific embodiments, and that the embodiments may be combined appropriately to form other embodiments that will be understood by those skilled in the art.
Claims (5)
1. A method for detecting generated text based on statistical information and a pre-training language model detects class labels of the generated text through a detection model consisting of a statistical learning model, a deep learning model and a dynamic fusion frame; the training data set adopted in the training of the detection model is recorded as,/>Corresponding tag set->And->,/>For tag collection->For the length of the training dataset, +.>Is->A corresponding category label; text->Is a word sequence,/>Represents the first->Text->The%>Individual words->For text->Is a length of (2);
the construction method of the detection model comprises the following steps:
step one, constructing a statistical learning model:
the statistical learning model adopts an autoregressive language model; obtaining the generation probability of each word in the text to be detected through an autoregressive language modelCounting the number of words in the text that appear in the vocabulary in the top ten words, respectively +.>The number of the previous hundred->The number of previous thousand->The method comprises the steps of carrying out a first treatment on the surface of the Generating probabilities on a per word basisCalculate text +.>Probability of->According to->Calculate text +.>Is->The method comprises the steps of carrying out a first treatment on the surface of the Will be、/>、/>The confusion degree with the text is used as a statistical feature, and the category label probability distribution ++f of the text to be detected based on statistical feature prediction is obtained through a logistic regression classifier>;
Step two, constructing a deep learning model:
the deep learning model adopts a self-coding language model, and the text to be detected is coded by the self-coding language model and then starts the initiator [ CLS ] of the text]Vector representation of (a)As semantic representation of the whole text, then obtaining the class label probability distribution ++of the text to be detected based on the depth coding feature prediction through the fully connected network and the classifier network>;
Step three, constructing a dynamic fusion framework:
using tag smoothing to separate original single thermal tags fromThe range of values of (2) is extended to +.>,/>Is a constant representing the degree of smoothness, increasing the true probability distribution of the predicted class after label smoothing +.>The method comprises the following steps:
;
wherein the method comprises the steps ofClass labels representing statistical and deep learning model predictions,>for true class labels, K generationA total number of table category labels; the cross entropy loss function of the detection model is:
;
original cross entropy loss of a logistic regression classifier and a classifier network is achieved, and finally, category label probability distribution which is predicted based on two characteristics and is dynamically fused is obtained through dynamic fusion>:/>The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>、/>Are all weight parameters;
step four, based on the training data set, calculating the informationAnd->Cross entropy loss function->To train the detection model.
2. The method for detecting generated text based on statistical information and pre-trained language model according to claim 1, wherein in step one, the probability of generation is based on each wordCalculate text +.>Probability of->When (1):
;
wherein,representing conditional probabilities.
3. The method for text detection based on statistics and pre-training language model according to claim 1, wherein in step one, according toCalculate text +.>Is->When (1):
。
4. the method for detecting generated text based on statistical information and pre-trained language model according to claim 1, wherein in step one, the method comprises the steps of、/>、/>The confusion degree with the text is used as a statistical feature, and the category label probability distribution of the text based on statistical feature prediction is obtained through a logistic regression classifier>When (1):
;
wherein,for logistic regression classifier,>representing a stitching operation.
5. The method for detecting generated text based on statistical information and pre-trained language model according to claim 1, characterized in that in step two, the initiator [ CLS ] of the text is entered]Vector representation of (a)As semantic representation of the whole text, and obtaining the category label probability distribution of the text to be detected based on depth coding feature prediction through a fully connected network and a classifier networkWhen (1):
;
wherein,for the activation function of the classifier network, +.>Is a fully connected network, < >>Is a bias parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311614320.1A CN117313709B (en) | 2023-11-29 | 2023-11-29 | Method for detecting generated text based on statistical information and pre-training language model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311614320.1A CN117313709B (en) | 2023-11-29 | 2023-11-29 | Method for detecting generated text based on statistical information and pre-training language model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117313709A true CN117313709A (en) | 2023-12-29 |
CN117313709B CN117313709B (en) | 2024-03-29 |
Family
ID=89250323
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311614320.1A Active CN117313709B (en) | 2023-11-29 | 2023-11-29 | Method for detecting generated text based on statistical information and pre-training language model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117313709B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117556817A (en) * | 2024-01-10 | 2024-02-13 | 国开启科量子技术(安徽)有限公司 | Text detection method, device, equipment and medium based on quantum circuit |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180113856A1 (en) * | 2016-10-26 | 2018-04-26 | Abbyy Infopoisk Llc | Producing training sets for machine learning methods by performing deep semantic analysis of natural language texts |
US20200257943A1 (en) * | 2019-02-11 | 2020-08-13 | Hrl Laboratories, Llc | System and method for human-machine hybrid prediction of events |
WO2020248471A1 (en) * | 2019-06-14 | 2020-12-17 | 华南理工大学 | Aggregation cross-entropy loss function-based sequence recognition method |
CN112214599A (en) * | 2020-10-20 | 2021-01-12 | 电子科技大学 | Multi-label text classification method based on statistics and pre-training language model |
US20210365773A1 (en) * | 2020-05-22 | 2021-11-25 | Element Ai Inc. | Method of and system for training machine learning algorithm to generate text summary |
WO2022036616A1 (en) * | 2020-08-20 | 2022-02-24 | 中山大学 | Method and apparatus for generating inferential question on basis of low labeled resource |
US20220067558A1 (en) * | 2020-09-03 | 2022-03-03 | International Business Machines Corporation | Artificial intelligence explaining for natural language processing |
US20220083739A1 (en) * | 2020-09-14 | 2022-03-17 | Smart Information Flow Technologies, Llc, D/B/A Sift L.L.C. | Machine learning for joint recognition and assertion regression of elements in text |
CN115081437A (en) * | 2022-07-20 | 2022-09-20 | 中国电子科技集团公司第三十研究所 | Machine-generated text detection method and system based on linguistic feature contrast learning |
US20230153546A1 (en) * | 2020-07-13 | 2023-05-18 | Ai21 Labs | Controllable reading guides and natural language generation |
CN116757164A (en) * | 2023-06-21 | 2023-09-15 | 张丽莉 | GPT generation language recognition and detection system |
CN116775880A (en) * | 2023-06-29 | 2023-09-19 | 重庆邮电大学 | Multi-label text classification method and system based on label semantics and transfer learning |
CN117131877A (en) * | 2023-09-12 | 2023-11-28 | 广州木木信息科技有限公司 | Text detection method and system based on contrast learning |
-
2023
- 2023-11-29 CN CN202311614320.1A patent/CN117313709B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180113856A1 (en) * | 2016-10-26 | 2018-04-26 | Abbyy Infopoisk Llc | Producing training sets for machine learning methods by performing deep semantic analysis of natural language texts |
US20200257943A1 (en) * | 2019-02-11 | 2020-08-13 | Hrl Laboratories, Llc | System and method for human-machine hybrid prediction of events |
WO2020248471A1 (en) * | 2019-06-14 | 2020-12-17 | 华南理工大学 | Aggregation cross-entropy loss function-based sequence recognition method |
US20210365773A1 (en) * | 2020-05-22 | 2021-11-25 | Element Ai Inc. | Method of and system for training machine learning algorithm to generate text summary |
US20230153546A1 (en) * | 2020-07-13 | 2023-05-18 | Ai21 Labs | Controllable reading guides and natural language generation |
WO2022036616A1 (en) * | 2020-08-20 | 2022-02-24 | 中山大学 | Method and apparatus for generating inferential question on basis of low labeled resource |
US20220067558A1 (en) * | 2020-09-03 | 2022-03-03 | International Business Machines Corporation | Artificial intelligence explaining for natural language processing |
US20220083739A1 (en) * | 2020-09-14 | 2022-03-17 | Smart Information Flow Technologies, Llc, D/B/A Sift L.L.C. | Machine learning for joint recognition and assertion regression of elements in text |
CN112214599A (en) * | 2020-10-20 | 2021-01-12 | 电子科技大学 | Multi-label text classification method based on statistics and pre-training language model |
CN115081437A (en) * | 2022-07-20 | 2022-09-20 | 中国电子科技集团公司第三十研究所 | Machine-generated text detection method and system based on linguistic feature contrast learning |
CN116757164A (en) * | 2023-06-21 | 2023-09-15 | 张丽莉 | GPT generation language recognition and detection system |
CN116775880A (en) * | 2023-06-29 | 2023-09-19 | 重庆邮电大学 | Multi-label text classification method and system based on label semantics and transfer learning |
CN117131877A (en) * | 2023-09-12 | 2023-11-28 | 广州木木信息科技有限公司 | Text detection method and system based on contrast learning |
Non-Patent Citations (2)
Title |
---|
李舟军;范宇;吴贤杰;: "面向自然语言处理的预训练技术研究综述", 计算机科学, no. 03, 24 March 2020 (2020-03-24) * |
黄露;周恩国;李岱峰;: "融合特定任务信息注意力机制的文本表示学习模型", 数据分析与知识发现, no. 09, 9 June 2020 (2020-06-09) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117556817A (en) * | 2024-01-10 | 2024-02-13 | 国开启科量子技术(安徽)有限公司 | Text detection method, device, equipment and medium based on quantum circuit |
CN117556817B (en) * | 2024-01-10 | 2024-05-24 | 国开启科量子技术(安徽)有限公司 | Quantum circuit-based large model generation text detection method, device and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN117313709B (en) | 2024-03-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110502749B (en) | Text relation extraction method based on double-layer attention mechanism and bidirectional GRU | |
CN110609891B (en) | Visual dialog generation method based on context awareness graph neural network | |
CN113254599B (en) | Multi-label microblog text classification method based on semi-supervised learning | |
CN110245229B (en) | Deep learning theme emotion classification method based on data enhancement | |
CN109086658B (en) | Sensor data generation method and system based on generation countermeasure network | |
CN108416065B (en) | Hierarchical neural network-based image-sentence description generation system and method | |
Liu et al. | Chinese image caption generation via visual attention and topic modeling | |
Li et al. | Improving convolutional neural network for text classification by recursive data pruning | |
CN109948158A (en) | Emotional orientation analytical method based on environment member insertion and deep learning | |
CN108984530A (en) | A kind of detection method and detection system of network sensitive content | |
CN108595601A (en) | A kind of long text sentiment analysis method incorporating Attention mechanism | |
CN110909736B (en) | Image description method based on long-term and short-term memory model and target detection algorithm | |
Li et al. | Image sentiment prediction based on textual descriptions with adjective noun pairs | |
CN113536922A (en) | Video behavior identification method for weighting fusion of multiple image tasks | |
CN111522908A (en) | Multi-label text classification method based on BiGRU and attention mechanism | |
CN117313709B (en) | Method for detecting generated text based on statistical information and pre-training language model | |
CN110727844B (en) | Online commented commodity feature viewpoint extraction method based on generation countermeasure network | |
JP2015511733A (en) | How to classify text | |
Yin et al. | Sentiment lexical-augmented convolutional neural networks for sentiment analysis | |
CN109670169B (en) | Deep learning emotion classification method based on feature extraction | |
Yue et al. | Multi-task adversarial autoencoder network for face alignment in the wild | |
Chakraborty et al. | Sign Language Recognition Using Landmark Detection, GRU and LSTM | |
CN113722536A (en) | Video description method based on bilinear adaptive feature interaction and target perception | |
Yan et al. | Image captioning based on a hierarchical attention mechanism and policy gradient optimization | |
CN116662924A (en) | Aspect-level multi-mode emotion analysis method based on dual-channel and attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |