CN114239539A - English composition off-topic detection method and device - Google Patents

English composition off-topic detection method and device Download PDF

Info

Publication number
CN114239539A
CN114239539A CN202111571897.XA CN202111571897A CN114239539A CN 114239539 A CN114239539 A CN 114239539A CN 202111571897 A CN202111571897 A CN 202111571897A CN 114239539 A CN114239539 A CN 114239539A
Authority
CN
China
Prior art keywords
composition
detected
model
encoder model
question
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111571897.XA
Other languages
Chinese (zh)
Inventor
杨航
邓嘉
张新访
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Tianyu Information Industry Co Ltd
Original Assignee
Wuhan Tianyu Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Tianyu Information Industry Co Ltd filed Critical Wuhan Tianyu Information Industry Co Ltd
Priority to CN202111571897.XA priority Critical patent/CN114239539A/en
Publication of CN114239539A publication Critical patent/CN114239539A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Abstract

The invention discloses a method and a device for detecting English composition separation problems, which relate to the technical field of computers, and the method comprises the steps of constructing an encoder model based on a self-supervision contrast learning mode, and carrying out fine tuning training on the encoder model to obtain a target encoder model; respectively inputting the composition of the question and the composition to be detected to a target encoder model to obtain the embedding of the composition of the question and the composition to be detected; and judging whether the composition to be detected leaves the question or not based on the similarity between the composition to be checked and the embedding of the composition to be detected. The invention can effectively reduce the cost of composition deviation detection.

Description

English composition off-topic detection method and device
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for detecting English composition departure.
Background
Currently, when an automatic grading system for english compositions is used to grade english compositions, whether the compositions are in question (off-question) needs to be detected, a common off-question detection method is mainly performed based on a combination mode of a topic model and a word vector representation, such as a combination of an LDA (late Dirichlet Allocation, implicit Dirichlet Allocation) model and a word2vec (correlation model for generating a word vector), or a combination of variants of the two, then a correlation degree of the composition to be detected under the current given composition topic is calculated according to the combination representation, and whether the composition to be detected is an off-question composition is determined according to the correlation.
The existing problem separation detection method mainly has the following problems: 1. the word2vec vocabulary representation is easy to ignore the association between words in the current text, and the influence of the sequence relation and the position relation of sentences on the sentence semantics; 2. the method comprises the following steps of (1) not carrying out improvement learning on the problem separation detection capability of a specific problem during the problem separation detection of different topics; 3. in the use of the LDA model, relatively many samples are required for accurate subject information acquisition in detection, which results in high detection cost.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method and a device for detecting the separation of English composition from the question, which can effectively reduce the cost of the detection of the separation of the composition from the question.
In order to achieve the above purpose, the invention provides a method for detecting the departure of English composition, which comprises the following steps:
constructing an encoder model based on a self-supervision contrast learning mode, and carrying out fine tuning training on the encoder model to obtain a target encoder model;
respectively inputting the composition of the question and the composition to be detected to a target encoder model to obtain the embedding of the composition of the question and the composition to be detected;
and judging whether the composition to be detected leaves the question or not based on the similarity between the composition to be checked and the embedding of the composition to be detected.
On the basis of the technical scheme, the construction of the encoder model based on the self-supervision contrast learning method comprises the following specific steps:
obtaining a bert model or a RoBERTA model as a basic model;
constructing a positive example pair and a negative example pair by a dropout mask method based on a label-free text data set;
and inputting the constructed positive case pair and the constructed negative case pair into the basic model to train the basic model, so as to obtain the encoder model.
On the basis of the technical scheme, the constructed positive example pair and the constructed negative example pair are input into a basic model to train the basic model to obtain an encoder model, wherein a loss function adopted for training the basic model is as follows:
Figure BDA0003424038670000021
therein, lossiRepresenting a loss function, N representing the number of batch samples, log representing a logarithm calculation function, e representing a natural constant, tau representing a temperature parameter, sim representing cosine similarity calculation,
Figure BDA0003424038670000022
denotes the ith sample ZiThe hidden layer (dropout mask is not used) represents,
Figure BDA0003424038670000023
indicating that the ith sample is represented by the hidden layer after the dropout mask,
Figure BDA0003424038670000024
denotes that the jth sample is represented by a hidden layer after the dropout mask, j denotes the sample number, where j ∈ [1, N]。
On the basis of the technical scheme, the method for performing fine tuning training on the encoder model to obtain the target encoder model comprises the following specific steps:
obtaining the composition of the marks of different composition questions as a training set for fine tuning training;
enhancing the training set based on a data enhancement method;
and inputting the enhanced training set into an encoder model, and adjusting parameters of a full connection layer of the encoder model to obtain a target encoder model.
On the basis of the technical scheme, in the training set, the problem deducting composition of the same composition subject is a positive example pair, and the problem deducting composition of different composition subjects is a negative example pair.
On the basis of the technical scheme, the title composition and the composition to be detected are respectively input to the target encoder model to obtain the embedding of the title composition and the composition to be detected, wherein the specific steps of obtaining the embedding of the title composition are as follows:
acquiring a plurality of discount compositions the same as the composition subjects of the composition to be detected;
and sequentially inputting the obtained multiple buckling question texts into a target encoder model to obtain multiple imbedding, wherein the multiple imbedding forms an imbedding set of the buckling question texts.
On the basis of the technical scheme, whether the composition to be detected leaves the question or not is judged based on the similarity between the composition to be checked and the embedding of the composition to be detected, and the specific steps comprise:
cosine similarity calculation between embeddings in the embedding and embedding sets of the composition to be detected is sequentially carried out to obtain a plurality of similarity values;
selecting the maximum value or the average value from the obtained multiple similarity values as the mark deduction value of the composition to be detected;
and comparing the mark deduction value with a preset standard mark deduction value so as to judge whether the composition to be detected leaves the question.
On the basis of the technical scheme, the preset standard deduction value comprises a plagiarism suspicion value and a leaving suspicion value.
On the basis of the technical proposal, the device comprises a shell,
when the discount score is larger than the plagiarism suspicion score, judging that the composition to be detected is plagiarism;
judging the composition to be detected to be withheld when the withholding value is not greater than the plagiarism suspicion value and is not less than the leaving suspicion value;
and when the discount score is smaller than the leave suspicion score, judging that the composition to be detected leaves the question.
The invention provides a device for detecting the departure of English composition, which comprises:
the training module is used for constructing an encoder model based on a self-supervision contrast learning mode and carrying out fine tuning training on the encoder model to obtain a target encoder model;
the input module is used for respectively inputting the composition of the question and the composition to be detected to the target encoder model to obtain the embedding of the composition of the question and the composition to be detected;
and the judging module is used for judging whether the composition to be detected leaves the question or not based on the similarity between the composition to be checked and the embedding of the composition to be detected.
Compared with the prior art, the invention has the advantages that: the method includes the steps that an encoder model is built on the basis of a self-supervision contrast learning mode, fine-tuning training is conducted on the encoder model to obtain a target encoder model, then a question-deducting composition and a composition to be detected are respectively input to the target encoder model to obtain the question-deducting composition and embedding of the composition to be detected, whether the composition to be detected leaves a question or not is judged on the basis of the similarity between the question-deducting composition and the embedding of the composition to be detected, performance of composition departure detection is improved, sample dependence on the composition of a specific question is reduced, the construction of a departure detection task model can be completed on the basis of a small number of question-deducting composition samples, and cost of composition departure detection is effectively reduced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart of a method for detecting English composition separation in an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an apparatus for detecting departure of english compositions according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments.
Referring to fig. 1, an english composition separation detection method provided in an embodiment of the present invention is used for automatically detecting whether an english composition separates from a question, and specifically includes the following steps:
s1: an encoder model (a common model framework) is constructed based on a self-supervision contrast learning mode, and fine tuning training is carried out on the encoder model to obtain a target encoder model;
namely, a self-supervision comparison learning method is adopted to carry out comparison learning on the basic model, so that the encoder model is obtained. Meanwhile, fine tuning training is carried out on the encoder model by adding a full connection layer suitable for the off-topic detection training, and a target encoder model suitable for the off-topic detection task is obtained.
S2: respectively inputting the composition of the question and the composition to be detected to a target encoder model to obtain the embedding of the composition of the question and the composition to be detected; for the same composition subject, inputting the composition to be detected into the target encoder model to obtain the embedding of the composition to be detected.
S3: and judging whether the composition to be detected leaves the question or not based on the similarity between the composition to be checked and the embedding of the composition to be detected. And judging whether the composition to be detected leaves the question or not by comparing the similarity between embedding of the composition to be checked and embedding of the composition to be detected.
In the embodiment of the invention, an encoder model is constructed based on a self-supervision contrast learning method, and the method specifically comprises the following steps:
s101: obtaining a bert model (self-coding language model) or a RoBERTA model as a basic model, and retraining the model on the basis of the basic model
S102: constructing a positive example pair and a negative example pair by a dropout mask method based on a label-free text data set; the training samples for the base model are common public unlabeled text data sets, such as NLI (Natural Language Inference) data sets.
A regular example pair is constructed by a dropout mask method, namely the same sentence input model is input twice, due to the randomness of dropout (a method for preventing overfitting), the output of the two times is different, and when the dropout is small, the two output embedding semantics can be considered to be similar. Likewise, different text is chosen as the negative example pair.
S103: and inputting the constructed positive case pair and the constructed negative case pair into the basic model to train the basic model, so as to obtain the encoder model. The goal of the training is to narrow the distance between the positive case pairs and pull the distance between the negative case pairs.
In the embodiment of the invention, the constructed positive case pair and the constructed negative case pair are input into a basic model to train the basic model to obtain an encoder model, wherein a loss function adopted for training the basic model is as follows:
Figure BDA0003424038670000061
therein, lossiRepresenting a loss function, N representing the number of batch samples, log representing a logarithm calculation function, e representing a natural constant, tau representing a temperature parameter, sim representing cosine similarity calculation,
Figure BDA0003424038670000062
denotes the ith sample ziThe hidden layer (dropout mask is not used) represents,
Figure BDA0003424038670000063
indicating that the ith sample is represented by the hidden layer after the dropout mask,
Figure BDA0003424038670000064
denotes that the jth sample is represented by a hidden layer after the dropout mask, j denotes the sample number, where j ∈ [1, N]。
In the embodiment of the invention, the fine tuning training is carried out on the encoder model to obtain a target encoder model, and the specific steps comprise:
s111: obtaining the composition of the marks of different composition questions as a training set for fine tuning training; in the training set, the deduction composition of the same composition question is positive example pair, and the deduction composition of different composition questions is negative example pair. The data of the composition of the questions of different English composition examinations are obtained from the public data and are used as the data of fine tuning training.
S112: enhancing the training set based on a data enhancement method; in the present invention, the data enhancement method includes but is not limited to anti-attack, word order disorder, cutting, Dropout, etc.
S113: and inputting the enhanced training set into an encoder model, and adjusting parameters of a full connection layer of the encoder model to obtain a target encoder model.
The training goal is to make the positive example pair closer and the negative example pair farther, and the corresponding objective loss function can be designed. And during fine tuning training, fixing partial parameters of the encoder model, fine tuning parameters of the full connection layer, and finally training to obtain the target encoder model.
In the embodiment of the invention, a question-deducting composition and a composition to be detected are respectively input to a target encoder model to obtain the question-deducting composition and embedding of the composition to be detected, wherein the concrete steps of obtaining the embedding of the question-deducting composition are as follows:
s201: acquiring a plurality of discount compositions the same as the composition subjects of the composition to be detected;
s202: and sequentially inputting the obtained multiple buckling question texts into a target encoder model to obtain multiple imbedding, wherein the multiple imbedding forms an imbedding set of the buckling question texts.
Namely, for the composition texts with the same topic as the composition texts to be detected, the topic texts are respectively input into a target encoder model to obtain the embedding of each topic text, and the embedding of all the topic texts are combined together to form an embedding set of the topic texts.
In the embodiment of the invention, whether the composition to be detected leaves the question or not is judged based on the similarity between the composition to be checked and the embedding of the composition to be detected, and the specific steps comprise:
s301: cosine similarity calculation between embeddings in the embedding and embedding sets of the composition to be detected is sequentially carried out to obtain a plurality of similarity values; namely, cosine similarity calculation is carried out on the embedding of the composition to be detected and each embedding in the embedding set in sequence, each time, a similarity value is obtained, and a plurality of similarity values are obtained in total.
S302: selecting a maximum value or an average value from the obtained similarity values as a discount value of the composition to be detected, wherein the value range of the discount value is [0,1 ];
s303: and comparing the mark deduction value with a preset standard mark deduction value so as to judge whether the composition to be detected leaves the question.
Specifically, the preset standard discount value comprises a plagiarism suspicion value and a departure suspicion value, the plagiarism suspicion value can be 0.95, the departure suspicion value can be 0.6, and values of the plagiarism suspicion value and the departure suspicion value can be flexibly set according to specific conditions. When the discount score is larger than the plagiarism suspicion score, judging that the composition to be detected is plagiarism; judging the composition to be detected to be withheld when the withholding value is not greater than the plagiarism suspicion value and is not less than the leaving suspicion value; and when the discount score is smaller than the leave suspicion score, judging that the composition to be detected leaves the question.
The method improves the representation quality of text coding through technologies such as self-supervision contrast learning, fine-tuning training, data enhancement and the like, so that the performance of off-topic detection is better, and the sample dependency on a special topic composition is reduced. Firstly, constructing a basic encoder model; secondly, constructing a reinforced encoder model, namely a target encoder model, so that the enhanced encoder model is suitable for a task of off-topic detection; and finally, for the task of detecting the composition departure of the specific topic, taking the existing data set of the composition of the special topic as a support set (support set), coding the data of the support set through a reinforced encoder model to obtain an embedding set of the composition of the topic, inputting the reinforced encoder model into the composition to be detected to obtain the embedding of the composition to be detected, comparing the similarity relation between the embedding of the composition to be detected and the embedding set of the composition of the topic, and explaining the degree of the summary of the composition to be detected so as to judge whether the composition to be detected leaves the topic or not.
The retrained encoder model can be well suitable for the off-topic detection task, the basic encoder model adopts self-supervision contrast learning, the similarity of texts can be better measured through encoding, the retrained model can also inherit the semantic similarity calculation capability, and the target encoder model is subjected to off-topic detection task fine adjustment through the existing English composition data set and is better suitable for the off-topic detection task; meanwhile, the off-topic detection of the composition to be detected can be completed only by a small amount of composition of the discount; meanwhile, the method is more suitable for the detection task of the actual examination scene, and only a small amount of composition making and deducting composition embedding sets meeting the requirements need to be selected in the actual detection due to the fact that a small amount of composition making and deducting compositions are needed in the detection, so that less manual work can be used for completing the detection of the separated problems.
The English composition problem-separating detection method comprises the steps of constructing an encoder model based on a self-supervision contrast learning mode, carrying out fine-tuning training on the encoder model to obtain a target encoder model, then respectively inputting a question-deducting composition and a composition to be detected to the target encoder model to obtain the question-deducting composition and the embeadding of the composition to be detected, finally judging whether the composition to be detected separates from the question or not based on the similarity between the question-deducting composition and the embeadding of the composition to be detected, improving the performance of composition problem-separating detection, reducing the sample dependence on the specific question composition, completing the construction of a problem-separating detection task model based on a small number of question-deducting composition samples, and effectively reducing the cost of composition problem-separating detection.
Referring to fig. 2, an apparatus for detecting a departure of an english composition according to an embodiment of the present invention includes a training module, an input module, and a determination module. The training module is used for constructing an encoder model based on a self-supervision contrast learning mode and carrying out fine tuning training on the encoder model to obtain a target encoder model; the input module is used for respectively inputting the composition of the question and the composition to be detected to the target encoder model to obtain the embedding of the composition of the question and the composition to be detected; the judging module is used for judging whether the composition to be detected leaves the question or not based on the similarity between the composition to be detected and the embedding of the composition to be detected.
The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims (10)

1. A method for detecting English composition separation problems is characterized by comprising the following steps:
constructing an encoder model based on a self-supervision contrast learning mode, and carrying out fine tuning training on the encoder model to obtain a target encoder model;
respectively inputting the composition of the question and the composition to be detected to a target encoder model to obtain the embedding of the composition of the question and the composition to be detected;
and judging whether the composition to be detected leaves the question or not based on the similarity between the composition to be checked and the embedding of the composition to be detected.
2. The method for detecting the separation of the English compositions from the questions of claim 1, wherein the method for constructing the encoder model based on the self-supervision comparison learning method comprises the following specific steps:
obtaining a bert model or a RoBERTA model as a basic model;
constructing a positive example pair and a negative example pair by a dropout mask method based on a label-free text data set;
and inputting the constructed positive case pair and the constructed negative case pair into the basic model to train the basic model, so as to obtain the encoder model.
3. The method according to claim 2, wherein the constructed positive case pair and negative case pair are input into the base model to train the base model, so as to obtain the encoder model, wherein the loss function used for training the base model is as follows:
Figure FDA0003424038660000011
therein, lossiRepresenting a loss function, N representing the number of batch samples, log representing a logarithm calculation function, e representing a natural constant, tau representing a temperature parameter, sim representing cosine similarity calculation,
Figure FDA0003424038660000021
denotes the ith sample ZiThe hidden layer (dropout mask is not used) represents,
Figure FDA0003424038660000022
indicating that the ith sample is represented by the hidden layer after the dropout mask,
Figure FDA0003424038660000023
denotes that the jth sample is represented by a hidden layer after the dropout mask, j denotes the sample number, where j ∈ [1, N]。
4. The method for detecting the separation of the English compositions from the questions of claim 2, wherein the fine tuning training of the encoder model is performed to obtain a target encoder model, and the specific steps include:
obtaining the composition of the marks of different composition questions as a training set for fine tuning training;
enhancing the training set based on a data enhancement method;
and inputting the enhanced training set into an encoder model, and adjusting parameters of a full connection layer of the encoder model to obtain a target encoder model.
5. The method for detecting the departure of english compositions according to claim 4, wherein: in the training set, the composition of the same composition subject is positive case pair, and the composition of different composition subjects is negative case pair.
6. The method for detecting the departure of english compositions according to claim 1, wherein: respectively inputting the composition of the questions and the composition to be detected to a target encoder model to obtain the composition of the questions and the embedding of the composition to be detected, wherein the specific steps of obtaining the embedding of the composition of the questions are as follows:
acquiring a plurality of discount compositions the same as the composition subjects of the composition to be detected;
and sequentially inputting the obtained multiple buckling question texts into a target encoder model to obtain multiple imbedding, wherein the multiple imbedding forms an imbedding set of the buckling question texts.
7. The method for detecting the leaving of the English composition as claimed in claim 6, wherein the step of determining whether the composition to be detected leaves the question based on the similarity between the composition to be checked and the embedding of the composition to be detected comprises the following steps:
cosine similarity calculation between embeddings in the embedding and embedding sets of the composition to be detected is sequentially carried out to obtain a plurality of similarity values;
selecting the maximum value or the average value from the obtained multiple similarity values as the mark deduction value of the composition to be detected;
and comparing the mark deduction value with a preset standard mark deduction value so as to judge whether the composition to be detected leaves the question.
8. The method for detecting the departure of english compositions according to claim 7, wherein: the preset standard deduction value comprises a plagiarism suspicion value and a leaving suspicion value.
9. The method for detecting the departure of english compositions according to claim 8, wherein:
when the discount score is larger than the plagiarism suspicion score, judging that the composition to be detected is plagiarism;
judging the composition to be detected to be withheld when the withholding value is not greater than the plagiarism suspicion value and is not less than the leaving suspicion value;
and when the discount score is smaller than the leave suspicion score, judging that the composition to be detected leaves the question.
10. The utility model provides an english composition leaves problem detection device which characterized in that includes:
the training module is used for constructing an encoder model based on a self-supervision contrast learning mode and carrying out fine tuning training on the encoder model to obtain a target encoder model;
the input module is used for respectively inputting the composition of the question and the composition to be detected to the target encoder model to obtain the embedding of the composition of the question and the composition to be detected;
and the judging module is used for judging whether the composition to be detected leaves the question or not based on the similarity between the composition to be checked and the embedding of the composition to be detected.
CN202111571897.XA 2021-12-21 2021-12-21 English composition off-topic detection method and device Pending CN114239539A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111571897.XA CN114239539A (en) 2021-12-21 2021-12-21 English composition off-topic detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111571897.XA CN114239539A (en) 2021-12-21 2021-12-21 English composition off-topic detection method and device

Publications (1)

Publication Number Publication Date
CN114239539A true CN114239539A (en) 2022-03-25

Family

ID=80760480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111571897.XA Pending CN114239539A (en) 2021-12-21 2021-12-21 English composition off-topic detection method and device

Country Status (1)

Country Link
CN (1) CN114239539A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881043A (en) * 2022-07-11 2022-08-09 四川大学 Deep learning model-based legal document semantic similarity evaluation method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881043A (en) * 2022-07-11 2022-08-09 四川大学 Deep learning model-based legal document semantic similarity evaluation method and system
CN114881043B (en) * 2022-07-11 2022-11-18 四川大学 Deep learning model-based legal document semantic similarity evaluation method and system

Similar Documents

Publication Publication Date Title
CN110489555B (en) Language model pre-training method combined with similar word information
CN107092596B (en) Text emotion analysis method based on attention CNNs and CCR
US11531874B2 (en) Regularizing machine learning models
CN110263854B (en) Live broadcast label determining method, device and storage medium
CN115357719B (en) Power audit text classification method and device based on improved BERT model
CN116992005B (en) Intelligent dialogue method, system and equipment based on large model and local knowledge base
CN110321434A (en) A kind of file classification method based on word sense disambiguation convolutional neural networks
CN115146629A (en) News text and comment correlation analysis method based on comparative learning
CN112818110A (en) Text filtering method, text filtering equipment and computer storage medium
CN107451116B (en) Statistical analysis method for mobile application endogenous big data
CN112417132A (en) New intention recognition method for screening negative samples by utilizing predicate guest information
CN113836894B (en) Multi-dimensional English composition scoring method and device and readable storage medium
CN114239539A (en) English composition off-topic detection method and device
CN114398900A (en) Long text semantic similarity calculation method based on RoBERTA model
CN111737475B (en) Unsupervised network public opinion spam long text recognition method
CN112528628A (en) Text processing method and device and electronic equipment
CN116681056B (en) Text value calculation method and device based on value scale
CN112116181B (en) Classroom quality model training method, classroom quality evaluation method and classroom quality evaluation device
CN117216214A (en) Question and answer extraction generation method, device, equipment and medium
CN112989816B (en) Text content quality evaluation method and system
CN115796141A (en) Text data enhancement method and device, electronic equipment and storage medium
CN115587163A (en) Text classification method and device, electronic equipment and storage medium
CN114595684A (en) Abstract generation method and device, electronic equipment and storage medium
CN114722832A (en) Abstract extraction method, device, equipment and storage medium
CN110347824B (en) Method for determining optimal number of topics of LDA topic model based on vocabulary similarity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination