CN116610770B - Judicial field case pushing method based on big data - Google Patents

Judicial field case pushing method based on big data Download PDF

Info

Publication number
CN116610770B
CN116610770B CN202310464853.XA CN202310464853A CN116610770B CN 116610770 B CN116610770 B CN 116610770B CN 202310464853 A CN202310464853 A CN 202310464853A CN 116610770 B CN116610770 B CN 116610770B
Authority
CN
China
Prior art keywords
sample
original
judicial
judicial field
original sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310464853.XA
Other languages
Chinese (zh)
Other versions
CN116610770A (en
Inventor
王进
王一雄
周羽
李俊莲
曾思盈
周青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huoyan Jinjing Data Services Xiongan Co ltd
Yami Technology Guangzhou Co ltd
Original Assignee
Huoyan Jinjing Data Services Xiongan Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huoyan Jinjing Data Services Xiongan Co ltd filed Critical Huoyan Jinjing Data Services Xiongan Co ltd
Priority to CN202310464853.XA priority Critical patent/CN116610770B/en
Publication of CN116610770A publication Critical patent/CN116610770A/en
Application granted granted Critical
Publication of CN116610770B publication Critical patent/CN116610770B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of natural language processing, in particular to a judicial field case pushing method based on big data; uploading a judicial field document to a database for matching; inputting the judicial field document and the matching data thereof into a trained class similarity calculation model, and outputting the similarity between the judicial field document and each matching data; arranging all the similarities in descending order according to the size, and selecting matching data corresponding to the first k similarities for pushing; the method solves the problem that the text characteristics of the document text are similar in the pre-training model representation, and the data enhancement is carried out by a data disturbance method, so that the method overcomes the difficulties of high time and labor cost of constructing a supervision sample under the condition of pushing the document class in the judicial field, can efficiently, low-cost and automatically complete the pushing of the document class in the judicial field, and helps the practitioners in the judicial field to quickly acquire the information related to the case they are processing and the previous judging result.

Description

Judicial field case pushing method based on big data
Technical Field
The invention relates to the technical field of natural language processing, in particular to a judicial field case pushing method based on big data.
Background
In the judicial field, the requirement of class case pushing is derived from the fact that judicial personnel need to quickly and accurately find cases similar to the current case from a large number of cases so as to better know relevant conditions such as legal regulations, judgment standards and the like. Most of traditional case pushing methods are based on text similarity algorithms, and cases similar to the current case are found by matching text information of the cases; due to the specificity of case data in the judicial field, such as complex and various types of related cases, different judgment standards and the like, the case pushing method based on the text similarity algorithm is difficult to accurately reflect the similarity between cases. Therefore, the judicial field is more and more urgent for the demand of automatic and intelligent case pushing technology. With the rapid development of technology, the advent of big data technology has promoted the digital transformation of judicial trade, so that the judicial field can realize the automated analysis and excavation of a large amount of case data, thereby better serving judicial practices, improving the scientificity and accuracy of judicial decisions; the class case pushing technology based on big data has wide application prospect in the market.
In recent years, with the rapid development of pre-trained language models, text similarity algorithms have performed better. The BERT (Bidirectional Encoder Representations from Transformers) pre-training language model is excellent in tasks such as text similarity and the like; the method can be used for pre-training on large-scale non-supervision data so as to obtain rich semantic information, and can be used for fine adjustment on small-scale supervision data according to specific tasks. However, in practical applications, pre-trained language models such as BERT are found to be prone to semantic collapse (semantic collapse) when processing long text, i.e., mapping two long text with similar meaning but different expressions into the same vector representation. This results in poor similarity evaluation effect when performing similarity evaluation, and it is difficult to accurately reflect the similarity between texts.
Disclosure of Invention
The invention aims to provide a judicial field case pushing method based on big data, which considers the specificity of judicial field case data and solves the problems that the conventional text similarity algorithm can not really meet the demands of judicial personnel due to poor similarity evaluation effect when processing texts with different meanings and identical expression modes caused by neglecting the sequence and semantic information of words in the texts, so that the reliability and reliability verification of case pushing results are lacking.
The specific scheme provided by the invention comprises the following steps: uploading the judicial field document to a database for matching; inputting the judicial field document and the matching data thereof into a trained class similarity calculation model, and outputting the similarity between the judicial field document and each matching data; arranging all the similarities in descending order according to the size, and selecting matching data corresponding to the first k similarities for pushing;
the training process of the class similarity calculation model comprises the following steps:
s1, sampling in an acquired judicial field document data set D to obtain an original sample set with a batch_size of N;
s2, inputting the original sample set into a text embedding layer and a data disturbance layer to obtain an enhanced sample set; and the enhancement samples in the enhancement sample set are in one-to-one correspondence with the original samples in the original sample set;
s3, inputting an original sample set passing through the text embedding layer into a Bert pre-training model after carrying out ebadd to obtain text vector representations of N original samples, and inputting an enhanced sample set into the Bert pre-training model to obtain text vector representations of N enhanced samples;
s4, based on the data obtained in the step S3, calculating comparison learning loss and rewarding loss through a Simloss function and a Rewards loss function respectively, and back-propagating training parameters;
s5, repeating the steps S1-S4, and performing iterative training until the model converges.
Further, in step S1, the calculation formula of the size N of the batch_size is:
wherein floor () represents a downward rounding, K represents a video memory size, M represents an average video memory size of each piece of data in the judicial field document dataset D, and S represents a total number of data in the judicial field document dataset D.
Further, in step S2, any one of the original samples in the original sample set is input into the text embedding layer and the data perturbation layer to obtain a corresponding enhanced sample, which includes:
s21, converting the original sample into a token sequence according to a Bert model vocabulary;
s22, performing scrambling operation on the token sequence to obtain a new token sequence, wherein the scrambling operation comprises disordered sequence, dropout and random substitution;
s23, carrying out EMbedding on the new token sequence, and then carrying out inverse gradient attack to obtain an enhanced sample.
Further, in step S23, an inverse gradient attack is performed on the new token sequence after ebadd, where the inverse gradient attack is expressed as:
wherein x represents a new token sequence after ebedding, x r Represents the enhanced sample, g represents the gradient, and e represents the degree of attack.
Further, step S3 inputs any enhanced sample or any original sample that has passed through the text embedding layer and is subjected to ebedding into the Bert pre-training model, and the process of obtaining the text vector representation corresponding to the enhanced sample or any original sample includes:
s31, inputting a sample into a Bert pre-training model, and obtaining the emmbedding expression output by each of the last 7 encoder layers in the Bert pre-training model;
s32, extracting CLS vectors in each ebadd expression, converting all the CLS vectors into one-dimensional vectors by using a linear layer, and normalizing to obtain 7 weights;
s33, multiplying each weight by the corresponding CLS vector to obtain a CLS weight vector, and adding all the CLS weight vectors to obtain a text vector representation of the sample.
Further, the Simloss function is expressed as:
wherein,text vector representation representing the i=1, 2, …, N original samples, +.>Text vector representation representing the i-th original sample corresponding to the enhanced sample, sim () represents a similarity calculation function, and dct () represents a distance calculation function.
The invention has the beneficial effects that:
the invention solves the problem of the trend of the text characteristics of the document text in the pre-training model representation by adopting contrast learning and rewarding learning, and enhances the data by a data disturbance method, thereby overcoming the difficulties of long time and high labor cost of constructing a supervision sample under the condition of pushing the document class in the judicial field, efficiently and automatically completing the pushing of the document class in the judicial field with low cost, and helping the staff in the judicial field to quickly acquire the information related to the case which the staff is processing and the prior judging result.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a training flow chart of the similarity calculation model of the present invention;
fig. 3 is a schematic structural diagram of a similarity calculation model according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides a judicial field case pushing method based on big data, as shown in fig. 1, comprising the following steps: uploading the judicial field document to a database to be matched with other documents; inputting the judicial field document and the matching data thereof into a trained class similarity calculation model, and outputting the similarity between the judicial field document and each matching data; and arranging all the similarities in descending order according to the sizes, selecting matching data corresponding to the first k similarities, and pushing the matching data in sequence from the big to the small according to the sizes of the similarities.
The training process of the class similarity calculation model, as shown in fig. 2, includes:
s1, sampling in the acquired judicial field document data set D to obtain an original sample set with the size of batch_size being N.
Specifically, in step S1, the calculation formula of the size N of the batch_size is:
wherein floor () represents a downward rounding, K represents a video memory size, M represents an average video memory size of each piece of data in the judicial field document dataset D, and S represents a total number of data in the judicial field document dataset D.
S2, inputting the original sample set into a text embedding layer and a data disturbance layer to obtain an enhanced sample set; and the enhanced samples in the enhanced sample set are in one-to-one correspondence with the original samples in the original sample set.
Specifically, in step S2, any one of the original samples in the original sample set is input into the text embedding layer and the data perturbation layer, and a corresponding enhanced sample is obtained, which includes:
s21, converting the original sample according to a Bert model vocabulary to obtain a token sequence;
s22, under the premise of keeping an original token sequence, performing scrambling treatment on the token sequence to obtain a new token sequence, wherein the scrambling treatment comprises three operations of disorder, dropout and random replacement, and any one to three of the three operations are selected;
s23, carrying out EMbedding on the new token sequence, and then carrying out inverse gradient attack to obtain an enhanced sample.
Specifically, in step S23, an inverse gradient attack is performed on the new token sequence after ebadd, where the inverse gradient attack is expressed as:
wherein x represents a new token sequence after ebedding, x r Represents the enhanced sample, g represents the gradient, and e represents the degree of attack. The inverse gradient attack means that the original sample is attacked according to the direction of the inverse gradient but similar similarity, so that an enhanced sample is obtained; the inverse gradient attack provided by the invention is different from the traditional countermeasure attack, and the inverse gradient attack attacks according to the direction with similar similarity, can generate enhanced data with similar similarity but performing inverse gradient interference, and the data can be taken as a positive example in the step of calculating contrast learning loss later.
S3, inputting the original sample set passing through the text embedding layer into a Bert pre-training model after carrying out ebadd to obtain text vector representations of N original samples, and inputting the enhanced sample set into the Bert pre-training model to obtain text vector representations of N enhanced samples.
Specifically, step S3 is a process of inputting any enhanced sample or any original sample that has passed through the text embedding layer and undergone ebedding into a Bert pre-training model to obtain a text vector representation corresponding to the enhanced sample or any original sample, as shown in fig. 3, and includes:
s31, inputting a sample into a Bert pre-training model, and obtaining the emmbedding expression output by each of the last 7 encoder layers in the Bert pre-training model;
s32, extracting the CLS vector in each embellishing expression, converting the CLS vector in each embellishing expression into a one-dimensional vector by using a linear layer, and respectively normalizing to obtain 7 weights;
s33, multiplying each weight by the corresponding CLS vector to obtain a CLS weight vector, and adding all the CLS weight vectors to obtain a text vector representation of the sample.
S4, based on the data obtained in the step S3, calculating comparison learning loss and rewarding loss through a Simloss function and a Rewards loss function respectively, and back-propagating training parameters.
Specifically, the process of calculating the bonus loss in step S4 includes:
s41, original sample x i Text vector representation of (c)Corresponding to which the enhancement sample x is ri Text vector representation +.>Splicing, wherein a spliced result is linearly mapped to the input dimension of the Bertencoder through a reward learning layer to obtain an original sample x i A corresponding first bonus vector;
s42, removing the original sample x i Itself and its corresponding enhanced sample x ri Original sample x i Text vector representation of (c)With any of the remaining original samples x j Splicing text vector representations of (2), or enhancing sample x with any other rj Is a concatenation of text vector representations; the spliced result is linearly mapped to the input dimension of the Bertencoder through a reward learning layer to obtain an original sample x i A corresponding second prize vector;
s43, original sample x i The corresponding first rewarding vector and the second rewarding vector are mapped into 1 dimension through a linear layer respectively to obtain an original sample x i Corresponding first and second bonus points.
S44, passing through original sample x i Calculating the reward loss at the moment according to the corresponding first reward points and the second reward points, and back-propagating training parameters;
s45, repeating the steps S41-S44 until all original samples in the original sample set acquired in the current round are calculated or model parameters are converged.
S5, repeating the steps S1-S4, and performing iterative training until the model converges.
Specifically, contrast learning is mainly to make a comparison between each original sample and its enhanced sample, and the original samples, so as to minimize the loss value of the Simloss function. Specifically, from cosine and distance directions, the similarity between the original sample and the enhanced sample is maximized, and the similarity between the original samples is minimized, so that the problem of semantic collapse of the Bert representation vector is solved. Wherein the Simloss function is expressed as:
wherein,text vector representation representing the i=1, 2, …, N original samples, +.>Text vector representation representing the i-th original sample corresponding to the enhanced sample, sim () represents a similarity calculation function, and dct () represents a distance calculation function.
Specifically, the calculation formula of the bonus loss is expressed as:
s in 1 Representing a first bonus point, S 2 Representing a second prize score.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (8)

1. The judicial field case pushing method based on big data is characterized by comprising the following steps: uploading the judicial field document to a database for matching; inputting the judicial field document and the matching data thereof into a trained class similarity calculation model, and outputting the similarity between the judicial field document and each matching data; arranging all the similarities in descending order according to the size, and selecting matching data corresponding to the first k similarities for pushing;
the training process of the class similarity calculation model comprises the following steps:
s1, sampling in an acquired judicial field document data set D to obtain an original sample set with a batch_size of N;
s2, inputting the original sample set into a text embedding layer and a data disturbance layer to obtain an enhanced sample set; and the enhancement samples in the enhancement sample set are in one-to-one correspondence with the original samples in the original sample set;
s3, inputting an original sample set passing through the text embedding layer into a Bert pre-training model after carrying out ebadd to obtain text vector representations of N original samples, and inputting an enhanced sample set into the Bert pre-training model to obtain text vector representations of N enhanced samples;
s4, based on the data obtained in the step S3, calculating comparison learning loss and rewarding loss through a Simloss function and a Rewards loss function respectively, and back-propagating training parameters;
s5, repeating the steps S1-S4, and performing iterative training until the model converges.
2. The judicial field case pushing method based on big data according to claim 1, wherein the calculation formula of the size N of batch_size in step S1 is:
wherein floor () represents a downward rounding, K represents a video memory size, M represents an average video memory size of each piece of data in the judicial field document dataset D, and S represents a total number of data in the judicial field document dataset D.
3. The judicial field case pushing method based on big data according to claim 1, wherein in step S2, any one of the original samples in the original sample set is input into the text embedding layer and the data perturbation layer to obtain an enhanced sample corresponding to the original sample, and the method comprises the steps of:
s21, converting the original sample into a token sequence according to a Bert model vocabulary;
s22, performing scrambling operation on the token sequence to obtain a new token sequence, wherein the scrambling operation comprises disordered sequence, dropout and random substitution;
s23, carrying out EMbedding on the new token sequence, and then carrying out inverse gradient attack to obtain an enhanced sample.
4. The judicial field case pushing method based on big data according to claim 3, wherein in step S23, an inverse gradient attack is performed on the new token sequence after ebedding, where the inverse gradient attack is expressed as:
wherein x represents a new token sequence after ebedding, x r Represents the enhanced sample, g represents the gradient, and e represents the degree of attack.
5. The judicial field case pushing method based on big data according to claim 1, wherein the step S3 is to input any enhanced sample or any original sample that has passed through a text embedding layer and is subjected to ebedding into a Bert pre-training model, and the process of obtaining the corresponding text vector representation includes:
s31, inputting a sample into a Bert pre-training model, and obtaining the emmbedding expression output by each of the last 7 encoder layers in the Bert pre-training model;
s32, extracting CLS vectors in each ebadd expression, converting all the CLS vectors into one-dimensional vectors by using a linear layer, and normalizing to obtain 7 weights;
s33, multiplying each weight by the corresponding CLS vector to obtain a CLS weight vector, and adding all the CLS weight vectors to obtain a text vector representation of the sample.
6. The judicial field case pushing method based on big data according to claim 1, wherein Simloss function is expressed as:
wherein,text vector representation representing the i=1, 2, …, N original samples, +.>Text vector representation representing the i-th original sample corresponding to the enhanced sample, sim () represents a similarity calculation function, and dct () represents a distance calculation function.
7. The judicial field case pushing method based on big data according to claim 1, wherein the step S4 of calculating the prize loss includes:
s41, original sample x i Text vector representation of (c)Corresponding to which the enhancement sample x is ri Text vector representation +.>Splicing, wherein a spliced result is linearly mapped to the input dimension of the Bertencoder through a reward learning layer to obtain an original sample x i A corresponding first bonus vector;
s42, removing the original sample x i Itself and its corresponding enhanced sample x ri Original sample x i Text vector representation of (c)With any of the remaining original samples x j Splicing text vector representations of (2), or enhancing sample x with any other rj Is a concatenation of text vector representations; the spliced result is linearly mapped to the input dimension of the Bertencoder through a reward learning layer to obtain an original sample x i A corresponding second prize vector;
s43, original sample x i The corresponding first rewarding vector and the second rewarding vector are mapped into 1 dimension through a linear layer respectively to obtain an original sample x i Corresponding first and second bonus points;
s44, passing through original sample x i Calculating the reward loss at the moment according to the corresponding first reward points and the second reward points, and back-propagating training parameters;
s45, repeating the steps S41-S44 until all original samples in the original sample set acquired in the current round are calculated or model parameters are converged.
8. The big data-based judicial art case pushing method according to claim 7, wherein the calculation formula of the reward loss is expressed as:
s in 1 Representing a first bonus point, S 2 Representing a second prize score.
CN202310464853.XA 2023-04-26 2023-04-26 Judicial field case pushing method based on big data Active CN116610770B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310464853.XA CN116610770B (en) 2023-04-26 2023-04-26 Judicial field case pushing method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310464853.XA CN116610770B (en) 2023-04-26 2023-04-26 Judicial field case pushing method based on big data

Publications (2)

Publication Number Publication Date
CN116610770A CN116610770A (en) 2023-08-18
CN116610770B true CN116610770B (en) 2024-02-27

Family

ID=87677239

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310464853.XA Active CN116610770B (en) 2023-04-26 2023-04-26 Judicial field case pushing method based on big data

Country Status (1)

Country Link
CN (1) CN116610770B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291570A (en) * 2018-12-07 2020-06-16 北京国双科技有限公司 Method and device for realizing element identification in judicial documents
CN113191387A (en) * 2021-03-27 2021-07-30 西北大学 Cultural relic fragment point cloud classification method combining unsupervised learning and data self-enhancement
CN113705678A (en) * 2021-08-28 2021-11-26 重庆理工大学 Specific target emotion analysis method for enhancing and resisting learning by utilizing word mask data
CN113807171A (en) * 2021-08-10 2021-12-17 三峡大学 Text classification method based on semi-supervised transfer learning
CN113901207A (en) * 2021-09-15 2022-01-07 昆明理工大学 Adverse drug reaction detection method based on data enhancement and semi-supervised learning
CN114564587A (en) * 2022-03-08 2022-05-31 天津大学 Data enhancement method based on countermeasure training under text classification scene
CN115796141A (en) * 2022-11-24 2023-03-14 华润数字科技有限公司 Text data enhancement method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291570A (en) * 2018-12-07 2020-06-16 北京国双科技有限公司 Method and device for realizing element identification in judicial documents
CN113191387A (en) * 2021-03-27 2021-07-30 西北大学 Cultural relic fragment point cloud classification method combining unsupervised learning and data self-enhancement
CN113807171A (en) * 2021-08-10 2021-12-17 三峡大学 Text classification method based on semi-supervised transfer learning
CN113705678A (en) * 2021-08-28 2021-11-26 重庆理工大学 Specific target emotion analysis method for enhancing and resisting learning by utilizing word mask data
CN113901207A (en) * 2021-09-15 2022-01-07 昆明理工大学 Adverse drug reaction detection method based on data enhancement and semi-supervised learning
CN114564587A (en) * 2022-03-08 2022-05-31 天津大学 Data enhancement method based on countermeasure training under text classification scene
CN115796141A (en) * 2022-11-24 2023-03-14 华润数字科技有限公司 Text data enhancement method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于长短时记忆网络的中文文本分类方法研究;陈海鸥;《中国优秀硕士学位论文全文数据库》;全文 *

Also Published As

Publication number Publication date
CN116610770A (en) 2023-08-18

Similar Documents

Publication Publication Date Title
CN111444340A (en) Text classification and recommendation method, device, equipment and storage medium
CN111985239A (en) Entity identification method and device, electronic equipment and storage medium
CN110929515A (en) Reading understanding method and system based on cooperative attention and adaptive adjustment
CN111738169B (en) Handwriting formula recognition method based on end-to-end network model
CN111259940A (en) Target detection method based on space attention map
CN110516240B (en) Semantic similarity calculation model DSSM (direct sequence spread spectrum) technology based on Transformer
CN113392191A (en) Text matching method and device based on multi-dimensional semantic joint learning
CN116049367A (en) Visual-language pre-training method and device based on non-supervision knowledge enhancement
CN115170874A (en) Self-distillation implementation method based on decoupling distillation loss
CN114330514A (en) Data reconstruction method and system based on depth features and gradient information
CN112035629B (en) Method for implementing question-answer model based on symbolized knowledge and neural network
CN118196472A (en) Recognition method for improving complex and diverse data distribution based on condition domain prompt learning
CN118247393A (en) AIGC-based 3D digital man driving method
CN113486174A (en) Model training, reading understanding method and device, electronic equipment and storage medium
CN117828024A (en) Plug-in retrieval method, device, storage medium and equipment
CN118114105A (en) Multimode emotion recognition method and system based on contrast learning and transducer structure
CN116610770B (en) Judicial field case pushing method based on big data
CN117033961A (en) Multi-mode image-text classification method for context awareness
CN113268657B (en) Deep learning recommendation method and system based on comments and item descriptions
CN113792120B (en) Graph network construction method and device, reading and understanding method and device
CN116257618A (en) Multi-source intelligent travel recommendation method based on fine granularity emotion analysis
CN114357166A (en) Text classification method based on deep learning
CN117932487B (en) Risk classification model training and risk classification method and device
CN114996424B (en) Weak supervision cross-domain question-answer pair generation method based on deep learning
CN117909441A (en) Multi-jump answer question framework based on label smoothing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240130

Address after: 071000 Room 217, Zone B, No.1 Cinema Road, Rongcheng Town, Rongcheng County, Baoding City, Hebei Province (self declared)

Applicant after: Huoyan Jinjing Data Services (Xiongan) Co.,Ltd.

Country or region after: China

Address before: Room 801, No. 85, Kefeng Road, Huangpu District, Guangzhou, Guangdong 510000 (office only)

Applicant before: Yami Technology (Guangzhou) Co.,Ltd.

Country or region before: China

Effective date of registration: 20240130

Address after: Room 801, No. 85, Kefeng Road, Huangpu District, Guangzhou, Guangdong 510000 (office only)

Applicant after: Yami Technology (Guangzhou) Co.,Ltd.

Country or region after: China

Address before: 400065 Chongwen Road, Nanshan Street, Nanan District, Chongqing

Applicant before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS

Country or region before: China

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant