CN114491024A - Small sample-based specific field multi-label text classification method - Google Patents
Small sample-based specific field multi-label text classification method Download PDFInfo
- Publication number
- CN114491024A CN114491024A CN202111680038.4A CN202111680038A CN114491024A CN 114491024 A CN114491024 A CN 114491024A CN 202111680038 A CN202111680038 A CN 202111680038A CN 114491024 A CN114491024 A CN 114491024A
- Authority
- CN
- China
- Prior art keywords
- label
- sentence
- model
- vector
- mask
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 20
- 239000013598 vector Substances 0.000 claims description 61
- 230000006870 function Effects 0.000 claims description 18
- 238000002372 labelling Methods 0.000 claims description 6
- 230000007423 decrease Effects 0.000 claims description 3
- 230000000873 masking effect Effects 0.000 claims description 3
- RWSOTUBLDIXVET-UHFFFAOYSA-N Dihydrogen sulfide Chemical compound S RWSOTUBLDIXVET-UHFFFAOYSA-N 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a specific field multi-label text classification method based on small samples, which classifies the linguistic data with original labels in the original linguistic data by labels, then changes the original labels to expand the linguistic data, and carries out multi-task training on a pre-training language model by a mask language model to update model parameters based on the expanded linguistic data, so that the model can fully learn the semantic knowledge of the field, and in the prediction stage, a knowledge base retrieval mode is used, knn is used to reduce the randomness, and the accuracy of the classification result is improved. After the prediction result is obtained, the prediction result is continuously taken as an artificial label to repeat the steps, so that the model can continuously learn knowledge in the field, the search knowledge base is larger and larger, and the classification result is correspondingly improved.
Description
Technical Field
The invention relates to a specific field multi-label text classification method based on small samples.
Background
In the initial stage of on-line of a system needing a text classification task, the data accumulation is less, and only a small amount of data is labeled.
At present, a paper proposes a template-based small sample learning method, but the premise of good effect is that a large number of data sets in a specific field exist at the beginning, and only labeled data are few. According to the scheme, on the basis of the method, the problem that the number of labels and the number of texts in the specific field are few is solved.
Disclosure of Invention
In order to achieve the technical purpose, the technical scheme of the invention is that,
a specific field multi-label text classification method based on small samples comprises the following steps:
step one, acquiring an original corpus in a specific field, extracting a small part of the corpus, labeling each sentence in the corpus with a label, recording the total number of the labels by taking the same label as one type;
adding the labeled tags to the front of the sentence, masking the tags, adding fixed words at the head and the tail of the tags to identify the tags and form a new sentence, and adding specific symbols at the head and the tail of the new sentence; then adding an identification tag for identifying whether the current tag is correct or not, copying the sentence, sequentially replacing the content of the original tag with tags which are labeled by other sentences and are different from the original tag, and simultaneously changing the identification tag from correct to wrong, thereby expanding the small part of the corpus extracted in the step one;
inputting the expanded linguistic data into the pre-training language model, and then executing a mask language model task, thereby updating the parameters of the pre-training model;
step four, the updated model is used as a semantic feature extractor, so that all the expanded linguistic data are converted into semantic vectors and serve as a query retrieval library;
step five, extracting partial linguistic data from the original linguistic data, adding masks before and after each sentence in the linguistic data, adding fixed words in the step two before and after the masks, copying according to the label category number recorded in the step one to obtain sentences with the same number, and inputting the sentences into a model to obtain a semantic vector of each sentence;
step six, similarity calculation is carried out on the obtained semantic vector result and the query search library, and the label with the highest occurrence frequency in the first N labels with the highest similarity is taken as the label of the corpus without the original label;
step seven, returning to the step three, taking the corpus of the label obtained in the step six as the input of the model, and continuously updating the parameters of the model until the loss function is converged to finish the model training;
and step eight, labeling the corpora in the same field as the corpora in the step one by adopting the model trained in the step seven, thereby realizing classification.
In the method for classifying the multi-label texts in the specific field based on the small samples, in the first step, the small part of the corpus is less than 200 text sentences.
In the third step, the execution of the mask language model task includes:
inputting each sentence into a pre-training language model to obtain a mapped low-dimensional vector representation, calculating a loss function of the low-dimensional vector and a mask position label mlm _ label for a mask position, calculating a loss function of the low-dimensional vector and an identification label eq _ label for a sentence start position [ cls ], and adding the two loss functions to be used as a loss function of the whole pre-training language model; the corresponding loss function L is formulated as follows:
L=mlm_loss+eq_loss
eq_loss=-[yjlog(pj)+(1-yj)log(1-pj)]
for mlm _ loss, where V is the word of maskNumber, yiOne-hot format, p, representing a tag replaced by a maskiRepresenting the probability of the word predicted by the model; for eq _ loss, yjDenotes the value of eq _ label, pjRepresenting the probability of whether the model prediction is a positive case; wherein mlm _ label is calculated by softmax, eq _ label is calculated by sigmoid;
based on the steps, iteration is repeated until the model loss value continuously decreases until convergence.
In the fourth step, all the expanded linguistic data are converted into semantic vectors, the original sentences without labels and the labels are respectively mapped into low-dimensional vectors through multi-layer transform outputs of the model, and the mean value of all the characters is taken as the semantic vector of the sentence.
In the fifth step, the semantic vector of each sentence comprises a low-dimensional vector mean value and a predicted mask vector mean value, wherein the low-dimensional vector mean value of the sentence is the vector mean value of each word in the sentence, and the mask vector mean value refers to the mean value of the word vector at the position replaced by the mask in the sentence.
In the sixth step, the similarity calculation is realized through cosine similarity, wherein the calculation formula is as follows:
w1+w2=1
wherein w1、w2Weight for two similarities, vm1、vm2Representing the predicted mask vector and the actual tag vector, v, of the model, respectivelys1、vs2It is represented as a sentence vector to be predicted and a sentence vector in the search base.
In the sixth step, the top N labels with the highest similarity are taken as the labels of the corpus without the original labels, the results of all similarity calculation are sorted from large to small, the top N results are taken, a voting method in knn is used, and the label with the highest occurrence frequency in the top N results is taken as the closest sentence label.
The method has the technical effects that based on a pre-training language model, under the condition that the number of artificial labels is small (only 200 labels) and the data in the field is small, a trained data set is expanded by using few label data through data pre-processing, and the data set is subjected to multi-task training through a mask language model, so that the model can fully learn the semantic knowledge of the field, and in the prediction stage, a knowledge base retrieval mode is used, knn is used for reducing the randomness, and the accuracy of the classification result is improved. After the prediction result is obtained, the prediction result is continuously taken as an artificial label to repeat the steps, so that the model can continuously learn knowledge in the field, the search knowledge base is larger and larger, and the classification result is correspondingly improved.
The invention will be further explained with reference to the drawings.
Drawings
Fig. 1 is a schematic flow chart of the present embodiment.
Detailed Description
The embodiment is realized by the following steps:
step one, acquiring an original corpus in a specific field, extracting a small part of the corpus, labeling each sentence in the corpus with a label, and recording the total number of the labels by taking the same label as one type. In this embodiment, the small part of the corpus is less than 200 text sentences. The number can be adjusted accordingly depending on the situation.
And step two, adding the labeled tags into the front of the sentence, masking the tags, simultaneously adding fixed words at the head and the tail of the tags to identify the tags and form a new sentence, and adding specific symbols at the head and the tail of the new sentence. And then adding an identification tag for identifying whether the current tag is correct or not, copying the sentence, sequentially replacing the content of the original tag with tags which are labeled by other sentences and are different from the original tag, and simultaneously changing the identification tag from correct to wrong, thereby expanding the small part of the corpus extracted in the step one. For example, assuming that the specific field is the financial industry, one sentence in the corpus is "the elderly often transact regular renewal. ", then the label is" personal deposit ". The processing in step two is to replace the words of the tag with [ MASK ], add specific symbols [ CLS ] and [ SEP ] to the beginning and end of the sentence, and finally obtain the following input sentence: "[ CLS ] is [ MASK ] [ MASK ] [ MASK ] [ MASK ] [ MASK ] [ MASK ] business, and the old people usually transact regular renewal business. [ SEP ] ". There are two tags that are set to be input at the same time, one is a tag mlm _ label of [ MASK ] 'personal fixed deposit', and the other determines whether it is a positive example, and eq _ label is 1. The expansion process is then: the tag of the piece of data is set to other categories, for example, another category is "teller management", the input sentence is changed to "CLS", which is [ MASK ] [ MASK ] [ MASK ] [ MASK ] business, and the old people often transact regular renewal business. [ SEP ] ", two set tags mlm _ label ═ teller management', which is not the correct category, so eq _ label ═ 0. Then, assuming that the total number of classes of tags is 11 classes, the sentence is copied 10 times, each time the sentence is replaced by another tag, and the total amount of data is expanded by 11 times.
And step three, inputting the expanded linguistic data into the pre-training language model, and then executing a mask language model task, thereby updating the parameters of the pre-training model.
And step four, taking the updated model as a semantic feature extractor, thereby converting all the expanded linguistic data into semantic vectors and taking the semantic vectors as a query retrieval library.
And step five, extracting partial linguistic data from the original linguistic data, adding masks before and after each sentence in the linguistic data, adding fixed words in the step two, copying according to the tag type number recorded in the step one to obtain sentences with the same number, and inputting the sentences into a model to obtain the semantic vector of each sentence.
And step six, carrying out similarity calculation on the obtained semantic vector result and the query search library, and taking the label with the highest occurrence frequency in the first N labels with the highest similarity as the label of the corpus without the original label.
And step seven, returning to the step three, taking the corpus of the label obtained in the step six as the input of the model, and continuously updating the parameters of the model until the loss function is converged to finish the model training.
And step eight, labeling the corpora in the same field as the corpora in the step one by adopting the model trained in the step seven, thereby realizing classification.
Specifically, in step three, the task of executing the mask language model includes:
after each sentence is input into the pre-training language model, a mapped low-dimensional vector representation is obtained, a loss function of the low-dimensional vector and a mask position label mlm _ label is calculated for a mask position, a loss function of the low-dimensional vector and a label eq _ label is calculated for a sentence start position [ cls ], and the two loss functions are added to be used as a loss function of the whole pre-training language model. The corresponding loss function L is formulated as follows:
L=mlm_loss+eq_loss
eq_loss=-[yjlog(pj)+(1-yj)log(1-pj)]
for mlm _ loss, where V is the number of words of the mask, yiOne-hot format, p, representing a tag replaced by a maskiRepresenting the probability of the word predicted by the model. For eq _ loss, yjDenotes the value of eq _ label, pjRepresenting the probability of whether the model prediction is a positive example. Where mlm _ label is calculated by softmax and eq _ label is calculated by sigmoid.
Based on the steps, iteration is repeated until the model loss value continuously decreases until convergence.
Further, in the fourth step, all the expanded corpora are converted into semantic vectors, that is, the original sentences without tags and the tags are mapped into low-dimensional vectors through the multi-layer Transformer output of the model respectively, and the mean value of all the words is taken as the semantic vector of the sentence.
In the fifth step, the semantic vector of each sentence comprises a low-dimensional vector mean value and a predicted mask vector mean value, wherein the low-dimensional vector mean value of the sentence is the mean value of the vector of each word in the sentence, and the mean value of the mask vector refers to the mean value of the word vector of the position replaced by the mask in the sentence.
In the sixth step, the similarity calculation is realized through cosine similarity, wherein the calculation formula is as follows:
w1+w2=1
wherein w1、w2Weight for two similarities, vm1、vm2Representing the predicted mask vector and the actual tag vector, v, of the model, respectivelys1、vs2It is represented as a sentence vector to be predicted and a sentence vector in the search base.
And step six, taking the top N labels with the highest similarity as the labels of the corpus without the original labels, sequencing the results of all similarity calculation from large to small, taking the top N results, and using a voting method in knn, wherein the label with the highest occurrence frequency in the top N results is taken as the closest sentence label.
Claims (7)
1. A specific field multi-label text classification method based on small samples is characterized by comprising the following steps:
step one, acquiring an original corpus in a specific field, extracting a small part of the corpus, labeling each sentence in the corpus with a label, recording the total number of the labels by taking the same label as one type;
adding the labeled tags to the front of the sentence, masking the tags, adding fixed words at the head and the tail of the tags to identify the tags and form a new sentence, and adding specific symbols at the head and the tail of the new sentence; then adding an identification tag for identifying whether the current tag is correct or not, copying the sentence, sequentially replacing the content of the original tag with tags which are labeled by other sentences and are different from the original tag, and simultaneously changing the identification tag from correct to wrong, thereby expanding the small part of the corpus extracted in the step one;
inputting the expanded linguistic data into the pre-training language model, and then executing a mask language model task, thereby updating the parameters of the pre-training model;
step four, the updated model is used as a semantic feature extractor, so that all the expanded linguistic data are converted into semantic vectors and serve as a query retrieval library;
step five, extracting partial linguistic data from the original linguistic data, adding masks before and after each sentence in the linguistic data, adding fixed words in the step two, copying according to the tag type number recorded in the step one to obtain sentences with the same number, and inputting the sentences into a model to obtain semantic vectors of each sentence;
step six, similarity calculation is carried out on the obtained semantic vector result and the query search library, and the label with the highest occurrence frequency in the first N labels with the highest similarity is taken as the label of the corpus without the original label;
step seven, returning to the step three, taking the corpus of the label obtained in the step six as the input of the model, and continuously updating the parameters of the model until the loss function is converged to finish the model training;
and step eight, labeling the corpora in the same field as the corpora in the step one by adopting the model trained in the step seven, thereby realizing classification.
2. The method as claimed in claim 1, wherein in the step one, the corpus is less than 200 text sentences.
3. The method for classifying domain-specific multi-label texts based on small samples according to claim 1, wherein in the third step, the performing of the mask language model task comprises:
inputting each sentence into a pre-training language model to obtain a mapped low-dimensional vector representation, calculating a loss function of the low-dimensional vector and a mask position label mlm _ label for a mask position, calculating a loss function of the low-dimensional vector and an identification label eq _ label for a sentence start position [ cls ], and adding the two loss functions to be used as a loss function of the whole pre-training language model; the corresponding loss function L is formulated as follows:
L=mlm_loss+eq_loss
eq_loss=-[yjlog(pj)+(1-yj)log(1-pj)]
for mlm _ loss, where V is the number of words of the mask, yiOne-hot format, p, representing a tag replaced by a maskiRepresenting the probability of the word predicted by the model; for eq _ loss, yjDenotes the value of eq _ label, pjRepresenting the probability of whether the model prediction is a positive case; wherein mlm _ label is calculated by softmax, eq _ label is calculated by sigmoid;
based on the steps, iteration is repeated until the model loss value continuously decreases until convergence.
4. The method as claimed in claim 1, wherein in the fourth step, all the expanded linguistic data are converted into semantic vectors, the original sentences without labels and the labels are mapped into low-dimensional vectors through multi-layer transform outputs of the model, and the average of all the words is taken as the semantic vector of the sentence.
5. The method of claim 1, wherein in the fifth step, the semantic vector of each sentence comprises a low-dimensional vector mean value and a predicted mask vector mean value, wherein the low-dimensional vector mean value of a sentence is the vector mean value of each word in the sentence, and the mask vector mean value refers to the mean value of the word vector of the position in the sentence replaced by the mask.
6. The method for classifying the multi-label text in the specific field based on the small sample as claimed in claim 1, wherein in the sixth step, the similarity calculation is implemented by cosine similarity, wherein the calculation formula is as follows:
w1+w2=1
wherein w1、w2Weight for two similarities, vm1、vm2Representing the predicted mask vector and the actual tag vector, v, of the model, respectivelys1、vs2It is represented as a sentence vector to be predicted and a sentence vector in the search base.
7. The method according to claim 1, wherein in the sixth step, the top N labels with the highest similarity are taken as the labels of the corpus without the original labels, the results of all similarity calculations are sorted from large to small, the top N results are taken, and the voting method in knn is used, and the label with the highest occurrence frequency in the top N results is taken as the closest sentence label.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111680038.4A CN114491024B (en) | 2021-12-31 | 2021-12-31 | Specific field multi-label text classification method based on small sample |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111680038.4A CN114491024B (en) | 2021-12-31 | 2021-12-31 | Specific field multi-label text classification method based on small sample |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114491024A true CN114491024A (en) | 2022-05-13 |
CN114491024B CN114491024B (en) | 2024-04-26 |
Family
ID=81510645
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111680038.4A Active CN114491024B (en) | 2021-12-31 | 2021-12-31 | Specific field multi-label text classification method based on small sample |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114491024B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116662582A (en) * | 2023-08-01 | 2023-08-29 | 成都信通信息技术有限公司 | Specific domain business knowledge retrieval method and retrieval device based on natural language |
CN117150305A (en) * | 2023-11-01 | 2023-12-01 | 杭州光云科技股份有限公司 | Text data enhancement method and device integrating retrieval and filling and electronic equipment |
CN117171653A (en) * | 2023-11-02 | 2023-12-05 | 成方金融科技有限公司 | Method, device, equipment and storage medium for identifying information relationship |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112214599A (en) * | 2020-10-20 | 2021-01-12 | 电子科技大学 | Multi-label text classification method based on statistics and pre-training language model |
US20210319215A1 (en) * | 2020-04-08 | 2021-10-14 | Peking University | Method and system for person re-identification |
CN113807098A (en) * | 2021-08-26 | 2021-12-17 | 北京百度网讯科技有限公司 | Model training method and device, electronic equipment and storage medium |
CN113821622A (en) * | 2021-09-29 | 2021-12-21 | 平安银行股份有限公司 | Answer retrieval method and device based on artificial intelligence, electronic equipment and medium |
-
2021
- 2021-12-31 CN CN202111680038.4A patent/CN114491024B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210319215A1 (en) * | 2020-04-08 | 2021-10-14 | Peking University | Method and system for person re-identification |
CN112214599A (en) * | 2020-10-20 | 2021-01-12 | 电子科技大学 | Multi-label text classification method based on statistics and pre-training language model |
CN113807098A (en) * | 2021-08-26 | 2021-12-17 | 北京百度网讯科技有限公司 | Model training method and device, electronic equipment and storage medium |
CN113821622A (en) * | 2021-09-29 | 2021-12-21 | 平安银行股份有限公司 | Answer retrieval method and device based on artificial intelligence, electronic equipment and medium |
Non-Patent Citations (2)
Title |
---|
孙松涛;何炎祥;: "基于CNN特征空间的微博多标签情感分类", 工程科学与技术, no. 03, 20 May 2017 (2017-05-20) * |
孙钦东;管晓宏;周亚东;: "网络信息内容审计研究的现状及趋势", 计算机研究与发展, no. 08, 15 August 2009 (2009-08-15) * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116662582A (en) * | 2023-08-01 | 2023-08-29 | 成都信通信息技术有限公司 | Specific domain business knowledge retrieval method and retrieval device based on natural language |
CN116662582B (en) * | 2023-08-01 | 2023-10-10 | 成都信通信息技术有限公司 | Specific domain business knowledge retrieval method and retrieval device based on natural language |
CN117150305A (en) * | 2023-11-01 | 2023-12-01 | 杭州光云科技股份有限公司 | Text data enhancement method and device integrating retrieval and filling and electronic equipment |
CN117150305B (en) * | 2023-11-01 | 2024-02-27 | 杭州光云科技股份有限公司 | Text data enhancement method and device integrating retrieval and filling and electronic equipment |
CN117171653A (en) * | 2023-11-02 | 2023-12-05 | 成方金融科技有限公司 | Method, device, equipment and storage medium for identifying information relationship |
CN117171653B (en) * | 2023-11-02 | 2024-01-23 | 成方金融科技有限公司 | Method, device, equipment and storage medium for identifying information relationship |
Also Published As
Publication number | Publication date |
---|---|
CN114491024B (en) | 2024-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112115238B (en) | Question-answering method and system based on BERT and knowledge base | |
CN109783818B (en) | Enterprise industry classification method | |
CN107526834B (en) | Word2vec improvement method for training correlation factors of united parts of speech and word order | |
CN109325231B (en) | Method for generating word vector by multitasking model | |
CN108763353B (en) | Baidu encyclopedia relation triple extraction method based on rules and remote supervision | |
CN114491024B (en) | Specific field multi-label text classification method based on small sample | |
CN108733647B (en) | Word vector generation method based on Gaussian distribution | |
CN112395417A (en) | Network public opinion evolution simulation method and system based on deep learning | |
CN114239574A (en) | Miner violation knowledge extraction method based on entity and relationship joint learning | |
CN114970523B (en) | Topic prompting type keyword extraction method based on text semantic enhancement | |
CN112905736A (en) | Unsupervised text emotion analysis method based on quantum theory | |
CN115510863A (en) | Question matching task oriented data enhancement method | |
CN114564563A (en) | End-to-end entity relationship joint extraction method and system based on relationship decomposition | |
CN113934909A (en) | Financial event extraction method based on pre-training language and deep learning model | |
CN115203507A (en) | Event extraction method based on pre-training model and oriented to document field | |
CN114970536A (en) | Combined lexical analysis method for word segmentation, part of speech tagging and named entity recognition | |
CN115392254A (en) | Interpretable cognitive prediction and discrimination method and system based on target task | |
CN114048314A (en) | Natural language steganalysis method | |
CN116522165B (en) | Public opinion text matching system and method based on twin structure | |
CN114388108A (en) | User feedback analysis method based on multi-task learning | |
CN115827871A (en) | Internet enterprise classification method, device and system | |
CN115934936A (en) | Intelligent traffic text analysis method based on natural language processing | |
CN113626553B (en) | Cascade binary Chinese entity relation extraction method based on pre-training model | |
CN115510230A (en) | Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism | |
CN115130475A (en) | Extensible universal end-to-end named entity identification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |