CN114491024A - Small sample-based specific field multi-label text classification method - Google Patents

Small sample-based specific field multi-label text classification method Download PDF

Info

Publication number
CN114491024A
CN114491024A CN202111680038.4A CN202111680038A CN114491024A CN 114491024 A CN114491024 A CN 114491024A CN 202111680038 A CN202111680038 A CN 202111680038A CN 114491024 A CN114491024 A CN 114491024A
Authority
CN
China
Prior art keywords
label
sentence
model
vector
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111680038.4A
Other languages
Chinese (zh)
Other versions
CN114491024B (en
Inventor
罗东
张沅
吴笛
王晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Great Wall Information Co Ltd
Original Assignee
Great Wall Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Great Wall Information Co Ltd filed Critical Great Wall Information Co Ltd
Priority to CN202111680038.4A priority Critical patent/CN114491024B/en
Publication of CN114491024A publication Critical patent/CN114491024A/en
Application granted granted Critical
Publication of CN114491024B publication Critical patent/CN114491024B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a specific field multi-label text classification method based on small samples, which classifies the linguistic data with original labels in the original linguistic data by labels, then changes the original labels to expand the linguistic data, and carries out multi-task training on a pre-training language model by a mask language model to update model parameters based on the expanded linguistic data, so that the model can fully learn the semantic knowledge of the field, and in the prediction stage, a knowledge base retrieval mode is used, knn is used to reduce the randomness, and the accuracy of the classification result is improved. After the prediction result is obtained, the prediction result is continuously taken as an artificial label to repeat the steps, so that the model can continuously learn knowledge in the field, the search knowledge base is larger and larger, and the classification result is correspondingly improved.

Description

Small sample-based specific field multi-label text classification method
Technical Field
The invention relates to a specific field multi-label text classification method based on small samples.
Background
In the initial stage of on-line of a system needing a text classification task, the data accumulation is less, and only a small amount of data is labeled.
At present, a paper proposes a template-based small sample learning method, but the premise of good effect is that a large number of data sets in a specific field exist at the beginning, and only labeled data are few. According to the scheme, on the basis of the method, the problem that the number of labels and the number of texts in the specific field are few is solved.
Disclosure of Invention
In order to achieve the technical purpose, the technical scheme of the invention is that,
a specific field multi-label text classification method based on small samples comprises the following steps:
step one, acquiring an original corpus in a specific field, extracting a small part of the corpus, labeling each sentence in the corpus with a label, recording the total number of the labels by taking the same label as one type;
adding the labeled tags to the front of the sentence, masking the tags, adding fixed words at the head and the tail of the tags to identify the tags and form a new sentence, and adding specific symbols at the head and the tail of the new sentence; then adding an identification tag for identifying whether the current tag is correct or not, copying the sentence, sequentially replacing the content of the original tag with tags which are labeled by other sentences and are different from the original tag, and simultaneously changing the identification tag from correct to wrong, thereby expanding the small part of the corpus extracted in the step one;
inputting the expanded linguistic data into the pre-training language model, and then executing a mask language model task, thereby updating the parameters of the pre-training model;
step four, the updated model is used as a semantic feature extractor, so that all the expanded linguistic data are converted into semantic vectors and serve as a query retrieval library;
step five, extracting partial linguistic data from the original linguistic data, adding masks before and after each sentence in the linguistic data, adding fixed words in the step two before and after the masks, copying according to the label category number recorded in the step one to obtain sentences with the same number, and inputting the sentences into a model to obtain a semantic vector of each sentence;
step six, similarity calculation is carried out on the obtained semantic vector result and the query search library, and the label with the highest occurrence frequency in the first N labels with the highest similarity is taken as the label of the corpus without the original label;
step seven, returning to the step three, taking the corpus of the label obtained in the step six as the input of the model, and continuously updating the parameters of the model until the loss function is converged to finish the model training;
and step eight, labeling the corpora in the same field as the corpora in the step one by adopting the model trained in the step seven, thereby realizing classification.
In the method for classifying the multi-label texts in the specific field based on the small samples, in the first step, the small part of the corpus is less than 200 text sentences.
In the third step, the execution of the mask language model task includes:
inputting each sentence into a pre-training language model to obtain a mapped low-dimensional vector representation, calculating a loss function of the low-dimensional vector and a mask position label mlm _ label for a mask position, calculating a loss function of the low-dimensional vector and an identification label eq _ label for a sentence start position [ cls ], and adding the two loss functions to be used as a loss function of the whole pre-training language model; the corresponding loss function L is formulated as follows:
L=mlm_loss+eq_loss
Figure BDA0003453797220000021
eq_loss=-[yjlog(pj)+(1-yj)log(1-pj)]
for mlm _ loss, where V is the word of maskNumber, yiOne-hot format, p, representing a tag replaced by a maskiRepresenting the probability of the word predicted by the model; for eq _ loss, yjDenotes the value of eq _ label, pjRepresenting the probability of whether the model prediction is a positive case; wherein mlm _ label is calculated by softmax, eq _ label is calculated by sigmoid;
based on the steps, iteration is repeated until the model loss value continuously decreases until convergence.
In the fourth step, all the expanded linguistic data are converted into semantic vectors, the original sentences without labels and the labels are respectively mapped into low-dimensional vectors through multi-layer transform outputs of the model, and the mean value of all the characters is taken as the semantic vector of the sentence.
In the fifth step, the semantic vector of each sentence comprises a low-dimensional vector mean value and a predicted mask vector mean value, wherein the low-dimensional vector mean value of the sentence is the vector mean value of each word in the sentence, and the mask vector mean value refers to the mean value of the word vector at the position replaced by the mask in the sentence.
In the sixth step, the similarity calculation is realized through cosine similarity, wherein the calculation formula is as follows:
Figure BDA0003453797220000031
w1+w2=1
wherein w1、w2Weight for two similarities, vm1、vm2Representing the predicted mask vector and the actual tag vector, v, of the model, respectivelys1、vs2It is represented as a sentence vector to be predicted and a sentence vector in the search base.
In the sixth step, the top N labels with the highest similarity are taken as the labels of the corpus without the original labels, the results of all similarity calculation are sorted from large to small, the top N results are taken, a voting method in knn is used, and the label with the highest occurrence frequency in the top N results is taken as the closest sentence label.
The method has the technical effects that based on a pre-training language model, under the condition that the number of artificial labels is small (only 200 labels) and the data in the field is small, a trained data set is expanded by using few label data through data pre-processing, and the data set is subjected to multi-task training through a mask language model, so that the model can fully learn the semantic knowledge of the field, and in the prediction stage, a knowledge base retrieval mode is used, knn is used for reducing the randomness, and the accuracy of the classification result is improved. After the prediction result is obtained, the prediction result is continuously taken as an artificial label to repeat the steps, so that the model can continuously learn knowledge in the field, the search knowledge base is larger and larger, and the classification result is correspondingly improved.
The invention will be further explained with reference to the drawings.
Drawings
Fig. 1 is a schematic flow chart of the present embodiment.
Detailed Description
The embodiment is realized by the following steps:
step one, acquiring an original corpus in a specific field, extracting a small part of the corpus, labeling each sentence in the corpus with a label, and recording the total number of the labels by taking the same label as one type. In this embodiment, the small part of the corpus is less than 200 text sentences. The number can be adjusted accordingly depending on the situation.
And step two, adding the labeled tags into the front of the sentence, masking the tags, simultaneously adding fixed words at the head and the tail of the tags to identify the tags and form a new sentence, and adding specific symbols at the head and the tail of the new sentence. And then adding an identification tag for identifying whether the current tag is correct or not, copying the sentence, sequentially replacing the content of the original tag with tags which are labeled by other sentences and are different from the original tag, and simultaneously changing the identification tag from correct to wrong, thereby expanding the small part of the corpus extracted in the step one. For example, assuming that the specific field is the financial industry, one sentence in the corpus is "the elderly often transact regular renewal. ", then the label is" personal deposit ". The processing in step two is to replace the words of the tag with [ MASK ], add specific symbols [ CLS ] and [ SEP ] to the beginning and end of the sentence, and finally obtain the following input sentence: "[ CLS ] is [ MASK ] [ MASK ] [ MASK ] [ MASK ] [ MASK ] [ MASK ] business, and the old people usually transact regular renewal business. [ SEP ] ". There are two tags that are set to be input at the same time, one is a tag mlm _ label of [ MASK ] 'personal fixed deposit', and the other determines whether it is a positive example, and eq _ label is 1. The expansion process is then: the tag of the piece of data is set to other categories, for example, another category is "teller management", the input sentence is changed to "CLS", which is [ MASK ] [ MASK ] [ MASK ] [ MASK ] business, and the old people often transact regular renewal business. [ SEP ] ", two set tags mlm _ label ═ teller management', which is not the correct category, so eq _ label ═ 0. Then, assuming that the total number of classes of tags is 11 classes, the sentence is copied 10 times, each time the sentence is replaced by another tag, and the total amount of data is expanded by 11 times.
And step three, inputting the expanded linguistic data into the pre-training language model, and then executing a mask language model task, thereby updating the parameters of the pre-training model.
And step four, taking the updated model as a semantic feature extractor, thereby converting all the expanded linguistic data into semantic vectors and taking the semantic vectors as a query retrieval library.
And step five, extracting partial linguistic data from the original linguistic data, adding masks before and after each sentence in the linguistic data, adding fixed words in the step two, copying according to the tag type number recorded in the step one to obtain sentences with the same number, and inputting the sentences into a model to obtain the semantic vector of each sentence.
And step six, carrying out similarity calculation on the obtained semantic vector result and the query search library, and taking the label with the highest occurrence frequency in the first N labels with the highest similarity as the label of the corpus without the original label.
And step seven, returning to the step three, taking the corpus of the label obtained in the step six as the input of the model, and continuously updating the parameters of the model until the loss function is converged to finish the model training.
And step eight, labeling the corpora in the same field as the corpora in the step one by adopting the model trained in the step seven, thereby realizing classification.
Specifically, in step three, the task of executing the mask language model includes:
after each sentence is input into the pre-training language model, a mapped low-dimensional vector representation is obtained, a loss function of the low-dimensional vector and a mask position label mlm _ label is calculated for a mask position, a loss function of the low-dimensional vector and a label eq _ label is calculated for a sentence start position [ cls ], and the two loss functions are added to be used as a loss function of the whole pre-training language model. The corresponding loss function L is formulated as follows:
L=mlm_loss+eq_loss
Figure BDA0003453797220000051
eq_loss=-[yjlog(pj)+(1-yj)log(1-pj)]
for mlm _ loss, where V is the number of words of the mask, yiOne-hot format, p, representing a tag replaced by a maskiRepresenting the probability of the word predicted by the model. For eq _ loss, yjDenotes the value of eq _ label, pjRepresenting the probability of whether the model prediction is a positive example. Where mlm _ label is calculated by softmax and eq _ label is calculated by sigmoid.
Based on the steps, iteration is repeated until the model loss value continuously decreases until convergence.
Further, in the fourth step, all the expanded corpora are converted into semantic vectors, that is, the original sentences without tags and the tags are mapped into low-dimensional vectors through the multi-layer Transformer output of the model respectively, and the mean value of all the words is taken as the semantic vector of the sentence.
In the fifth step, the semantic vector of each sentence comprises a low-dimensional vector mean value and a predicted mask vector mean value, wherein the low-dimensional vector mean value of the sentence is the mean value of the vector of each word in the sentence, and the mean value of the mask vector refers to the mean value of the word vector of the position replaced by the mask in the sentence.
In the sixth step, the similarity calculation is realized through cosine similarity, wherein the calculation formula is as follows:
Figure BDA0003453797220000052
w1+w2=1
wherein w1、w2Weight for two similarities, vm1、vm2Representing the predicted mask vector and the actual tag vector, v, of the model, respectivelys1、vs2It is represented as a sentence vector to be predicted and a sentence vector in the search base.
And step six, taking the top N labels with the highest similarity as the labels of the corpus without the original labels, sequencing the results of all similarity calculation from large to small, taking the top N results, and using a voting method in knn, wherein the label with the highest occurrence frequency in the top N results is taken as the closest sentence label.

Claims (7)

1. A specific field multi-label text classification method based on small samples is characterized by comprising the following steps:
step one, acquiring an original corpus in a specific field, extracting a small part of the corpus, labeling each sentence in the corpus with a label, recording the total number of the labels by taking the same label as one type;
adding the labeled tags to the front of the sentence, masking the tags, adding fixed words at the head and the tail of the tags to identify the tags and form a new sentence, and adding specific symbols at the head and the tail of the new sentence; then adding an identification tag for identifying whether the current tag is correct or not, copying the sentence, sequentially replacing the content of the original tag with tags which are labeled by other sentences and are different from the original tag, and simultaneously changing the identification tag from correct to wrong, thereby expanding the small part of the corpus extracted in the step one;
inputting the expanded linguistic data into the pre-training language model, and then executing a mask language model task, thereby updating the parameters of the pre-training model;
step four, the updated model is used as a semantic feature extractor, so that all the expanded linguistic data are converted into semantic vectors and serve as a query retrieval library;
step five, extracting partial linguistic data from the original linguistic data, adding masks before and after each sentence in the linguistic data, adding fixed words in the step two, copying according to the tag type number recorded in the step one to obtain sentences with the same number, and inputting the sentences into a model to obtain semantic vectors of each sentence;
step six, similarity calculation is carried out on the obtained semantic vector result and the query search library, and the label with the highest occurrence frequency in the first N labels with the highest similarity is taken as the label of the corpus without the original label;
step seven, returning to the step three, taking the corpus of the label obtained in the step six as the input of the model, and continuously updating the parameters of the model until the loss function is converged to finish the model training;
and step eight, labeling the corpora in the same field as the corpora in the step one by adopting the model trained in the step seven, thereby realizing classification.
2. The method as claimed in claim 1, wherein in the step one, the corpus is less than 200 text sentences.
3. The method for classifying domain-specific multi-label texts based on small samples according to claim 1, wherein in the third step, the performing of the mask language model task comprises:
inputting each sentence into a pre-training language model to obtain a mapped low-dimensional vector representation, calculating a loss function of the low-dimensional vector and a mask position label mlm _ label for a mask position, calculating a loss function of the low-dimensional vector and an identification label eq _ label for a sentence start position [ cls ], and adding the two loss functions to be used as a loss function of the whole pre-training language model; the corresponding loss function L is formulated as follows:
L=mlm_loss+eq_loss
Figure FDA0003453797210000021
eq_loss=-[yjlog(pj)+(1-yj)log(1-pj)]
for mlm _ loss, where V is the number of words of the mask, yiOne-hot format, p, representing a tag replaced by a maskiRepresenting the probability of the word predicted by the model; for eq _ loss, yjDenotes the value of eq _ label, pjRepresenting the probability of whether the model prediction is a positive case; wherein mlm _ label is calculated by softmax, eq _ label is calculated by sigmoid;
based on the steps, iteration is repeated until the model loss value continuously decreases until convergence.
4. The method as claimed in claim 1, wherein in the fourth step, all the expanded linguistic data are converted into semantic vectors, the original sentences without labels and the labels are mapped into low-dimensional vectors through multi-layer transform outputs of the model, and the average of all the words is taken as the semantic vector of the sentence.
5. The method of claim 1, wherein in the fifth step, the semantic vector of each sentence comprises a low-dimensional vector mean value and a predicted mask vector mean value, wherein the low-dimensional vector mean value of a sentence is the vector mean value of each word in the sentence, and the mask vector mean value refers to the mean value of the word vector of the position in the sentence replaced by the mask.
6. The method for classifying the multi-label text in the specific field based on the small sample as claimed in claim 1, wherein in the sixth step, the similarity calculation is implemented by cosine similarity, wherein the calculation formula is as follows:
Figure FDA0003453797210000022
w1+w2=1
wherein w1、w2Weight for two similarities, vm1、vm2Representing the predicted mask vector and the actual tag vector, v, of the model, respectivelys1、vs2It is represented as a sentence vector to be predicted and a sentence vector in the search base.
7. The method according to claim 1, wherein in the sixth step, the top N labels with the highest similarity are taken as the labels of the corpus without the original labels, the results of all similarity calculations are sorted from large to small, the top N results are taken, and the voting method in knn is used, and the label with the highest occurrence frequency in the top N results is taken as the closest sentence label.
CN202111680038.4A 2021-12-31 2021-12-31 Specific field multi-label text classification method based on small sample Active CN114491024B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111680038.4A CN114491024B (en) 2021-12-31 2021-12-31 Specific field multi-label text classification method based on small sample

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111680038.4A CN114491024B (en) 2021-12-31 2021-12-31 Specific field multi-label text classification method based on small sample

Publications (2)

Publication Number Publication Date
CN114491024A true CN114491024A (en) 2022-05-13
CN114491024B CN114491024B (en) 2024-04-26

Family

ID=81510645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111680038.4A Active CN114491024B (en) 2021-12-31 2021-12-31 Specific field multi-label text classification method based on small sample

Country Status (1)

Country Link
CN (1) CN114491024B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116662582A (en) * 2023-08-01 2023-08-29 成都信通信息技术有限公司 Specific domain business knowledge retrieval method and retrieval device based on natural language
CN117150305A (en) * 2023-11-01 2023-12-01 杭州光云科技股份有限公司 Text data enhancement method and device integrating retrieval and filling and electronic equipment
CN117171653A (en) * 2023-11-02 2023-12-05 成方金融科技有限公司 Method, device, equipment and storage medium for identifying information relationship

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214599A (en) * 2020-10-20 2021-01-12 电子科技大学 Multi-label text classification method based on statistics and pre-training language model
US20210319215A1 (en) * 2020-04-08 2021-10-14 Peking University Method and system for person re-identification
CN113807098A (en) * 2021-08-26 2021-12-17 北京百度网讯科技有限公司 Model training method and device, electronic equipment and storage medium
CN113821622A (en) * 2021-09-29 2021-12-21 平安银行股份有限公司 Answer retrieval method and device based on artificial intelligence, electronic equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210319215A1 (en) * 2020-04-08 2021-10-14 Peking University Method and system for person re-identification
CN112214599A (en) * 2020-10-20 2021-01-12 电子科技大学 Multi-label text classification method based on statistics and pre-training language model
CN113807098A (en) * 2021-08-26 2021-12-17 北京百度网讯科技有限公司 Model training method and device, electronic equipment and storage medium
CN113821622A (en) * 2021-09-29 2021-12-21 平安银行股份有限公司 Answer retrieval method and device based on artificial intelligence, electronic equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孙松涛;何炎祥;: "基于CNN特征空间的微博多标签情感分类", 工程科学与技术, no. 03, 20 May 2017 (2017-05-20) *
孙钦东;管晓宏;周亚东;: "网络信息内容审计研究的现状及趋势", 计算机研究与发展, no. 08, 15 August 2009 (2009-08-15) *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116662582A (en) * 2023-08-01 2023-08-29 成都信通信息技术有限公司 Specific domain business knowledge retrieval method and retrieval device based on natural language
CN116662582B (en) * 2023-08-01 2023-10-10 成都信通信息技术有限公司 Specific domain business knowledge retrieval method and retrieval device based on natural language
CN117150305A (en) * 2023-11-01 2023-12-01 杭州光云科技股份有限公司 Text data enhancement method and device integrating retrieval and filling and electronic equipment
CN117150305B (en) * 2023-11-01 2024-02-27 杭州光云科技股份有限公司 Text data enhancement method and device integrating retrieval and filling and electronic equipment
CN117171653A (en) * 2023-11-02 2023-12-05 成方金融科技有限公司 Method, device, equipment and storage medium for identifying information relationship
CN117171653B (en) * 2023-11-02 2024-01-23 成方金融科技有限公司 Method, device, equipment and storage medium for identifying information relationship

Also Published As

Publication number Publication date
CN114491024B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
CN112115238B (en) Question-answering method and system based on BERT and knowledge base
CN109783818B (en) Enterprise industry classification method
CN107526834B (en) Word2vec improvement method for training correlation factors of united parts of speech and word order
CN109325231B (en) Method for generating word vector by multitasking model
CN108763353B (en) Baidu encyclopedia relation triple extraction method based on rules and remote supervision
CN114491024B (en) Specific field multi-label text classification method based on small sample
CN108733647B (en) Word vector generation method based on Gaussian distribution
CN112395417A (en) Network public opinion evolution simulation method and system based on deep learning
CN114239574A (en) Miner violation knowledge extraction method based on entity and relationship joint learning
CN114970523B (en) Topic prompting type keyword extraction method based on text semantic enhancement
CN112905736A (en) Unsupervised text emotion analysis method based on quantum theory
CN115510863A (en) Question matching task oriented data enhancement method
CN114564563A (en) End-to-end entity relationship joint extraction method and system based on relationship decomposition
CN113934909A (en) Financial event extraction method based on pre-training language and deep learning model
CN115203507A (en) Event extraction method based on pre-training model and oriented to document field
CN114970536A (en) Combined lexical analysis method for word segmentation, part of speech tagging and named entity recognition
CN115392254A (en) Interpretable cognitive prediction and discrimination method and system based on target task
CN114048314A (en) Natural language steganalysis method
CN116522165B (en) Public opinion text matching system and method based on twin structure
CN114388108A (en) User feedback analysis method based on multi-task learning
CN115827871A (en) Internet enterprise classification method, device and system
CN115934936A (en) Intelligent traffic text analysis method based on natural language processing
CN113626553B (en) Cascade binary Chinese entity relation extraction method based on pre-training model
CN115510230A (en) Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism
CN115130475A (en) Extensible universal end-to-end named entity identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant