CN117494760A - Semantic tag-rich data augmentation method based on ultra-large-scale language model - Google Patents

Semantic tag-rich data augmentation method based on ultra-large-scale language model Download PDF

Info

Publication number
CN117494760A
CN117494760A CN202311320484.3A CN202311320484A CN117494760A CN 117494760 A CN117494760 A CN 117494760A CN 202311320484 A CN202311320484 A CN 202311320484A CN 117494760 A CN117494760 A CN 117494760A
Authority
CN
China
Prior art keywords
node
subject
samples
classification
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311320484.3A
Other languages
Chinese (zh)
Inventor
肖濛
周园春
蔡勋鑫
宁致远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN202311320484.3A priority Critical patent/CN117494760A/en
Publication of CN117494760A publication Critical patent/CN117494760A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a semantic tag-rich data augmentation method based on a very large scale language model. The method comprises the following steps: 1) Obtaining subject text data from a plurality of data sets to form a subject database; traversing each data sample in the subject database, dividing the data samples with the same classification number into the same node of the hierarchical subject structure sampling tree, determining the position of each node according to the classification number, constructing the hierarchical subject structure sampling tree and calculating the statistical information of each node; 2) Determining a number of enhanced samples for each discipline classification; 3) Updating the hierarchical subject structure sampling tree according to the number of the enhanced samples of each subject classification, and calculating the statistical information of each node; 4) Judging whether the sample number corresponding to each subject classification is balanced or not according to the statistical information before and after updating of each node, and repeating the steps 2-3 if the sample number is unbalanced; 5) A corresponding number of data samples of the corresponding subject class is generated for each subject class using the very large scale language model.

Description

Semantic tag-rich data augmentation method based on ultra-large-scale language model
Technical Field
The invention relates to the fields of big data, super-large-scale language models, data augmentation, multi-category text classification, hierarchical multi-label classification and the like, in particular to a data augmentation method for a data set rich in semantic labels based on super-large-scale language models (Large Language Model, LLM) so as to solve the problems of unbalance among sample categories and the like.
Background
Inference of subject matter for a given study proposal is a preliminary step in automating peer review systems in which accurate subject code can help sponsor administrators assign domain-related experts for fair evaluation. Because of the inherent hierarchy of disciplines, such topic inference tasks can be defined as hierarchical multi-labeled classification tasks. However, the number of applications associated with these hierarchical subject labels is not balanced due to the development, planning and division of the primary subject (e.g., information science and mathematics science, etc.). This imbalance in data level may further impact the accuracy of the automated topic inference model for some secondary category disciplines. Furthermore, this may lead to some new disciplines being reviewed by experts unrelated to the field, further restricting the development of emerging disciplines. The hierarchical discipline labels, namely labels rich in semantic information under the condition that the hierarchical disciplines are determined, not only contain the current discipline field information, but also can know rich information such as the disciplines to which the hierarchical discipline labels belong according to the hierarchical discipline system.
Along with the rapid development of machine learning technology, the idea of amplifying data of unbalanced data to relieve model overfitting becomes an important research field, but the specificity of text data and the semantic complexity thereof make the direct application of the existing method to the task of natural language processing have a great challenge, the appearance of a very large-scale language model provides a solution idea for the challenge, and the layering and semantic characteristics of each label on a discipline system also provide possibility for data amplification by adopting a large language model. The ultra-large scale language model is a deep learning model trained by a large amount of text data, is used for understanding and generating natural language, is an artificial intelligence technology for processing and analyzing the text, can encode, generate and understand the text through learning statistical rules and semantic structures of the language, and is usually constructed by using a deep neural network, wherein the deep neural network comprises a plurality of layers of neural network structures. These models learn from them the underlying patterns, semantic relationships, and grammar rules of the language by training on large-scale text data. They can automatically extract features from the input text and convert them into a high-dimensional vector representation for subsequent text analysis and processing. The method performs deep analysis and understanding on large-scale text data by utilizing a pre-trained language model and combining deep learning and natural language processing technology, and can correspondingly generate richer and more accurate enhancement data according to rich semantic information in the tag.
Traditional semantic-rich tag data augmentation methods often rely on manual labeling and manual rules, and have higher cost and limited effect due to enrichment of semantic information in the tags. The method based on the ultra-large-scale language model can automatically learn rich semantic information from the provided labels and apply the rich semantic information to the enhanced text data.
Disclosure of Invention
The invention aims to provide a semantic tag-rich data augmentation method based on a very large scale language model. The core idea of the semantic tag-rich data augmentation method based on the ultra-large-scale language model is to encode and represent learning the text by using the pre-trained language model. These language models are usually pre-trained with extensive, unsupervised training data, enabling learning of rich semantic information and language structures. When generating the semantic-rich label data, a pre-trained language model can be used to encode the input text to obtain its semantic representation. These semantic representations may then be used to perform tasks such as label prediction, entity identification, relationship extraction, etc., to generate semantic-rich label data.
The invention utilizes the existing semantic-rich label text database to carry out sampling analysis on quantity and frequency, selects few sample categories with unbalanced frequency or quantity distribution, carries out construction of prompt words according to the categories and keywords marked by randomly provided experts, and transmits the prompt words to a large language model, and the generated text data is marked as enhancement data when being generated and is simultaneously provided with the original data to a classification model so as to reduce unbalance among category samples, thereby achieving the aim of improving classification accuracy.
The invention specifically comprises the following steps:
a semantic tag-rich data augmentation method based on a very large scale language model comprises the following steps:
1) Obtaining subject text data from a plurality of data sets to form a subject database; traversing each data sample in the discipline database, dividing the data samples with the same classification number into the same node of the hierarchical discipline structure sampling tree, determining the position of each node in the hierarchical discipline structure sampling tree according to the classification number, and constructing the hierarchical discipline structure sampling tree; each node corresponds to a sample set and a class number; counting global frequencies globalFreq and hierarchical frequencies levfreq of each node in the hierarchical subject structure sampling tree, counting father freq of the father node for each father node, and counting leaf node freq of each leaf node; the global frequency globalFreq of a node is the ratio of the number of samples in a sample set corresponding to the node to the total number of samples, the hierarchical frequency levefeq of the node is the ratio of the number of samples in the sample set corresponding to the node to the number of samples corresponding to each node on the same layer as the node, the father frequency father of a father node is the ratio of the sum of the number of samples in the sample set corresponding to all nodes under the father node to the total number of samples, and the leaf node frequency leaf freq of a leaf node is the ratio of the number of samples in the sample set corresponding to the leaf node to the total number of samples;
2) According toDetermining a number of enhanced samples for each discipline classification; wherein N is an enhancementThe total sample number, lambda is the super parameter, C is the subject classification set corresponding to the subject database, and C is one subject classification in the subject classification set C, namely C epsilon C; the number of samples of the node corresponding to discipline class c is n c The number of samples of the node corresponding to the discipline classification i is n i The number of enhanced samples corresponding to discipline class c is s c
3) Updating the hierarchical subject structure sampling tree according to the number of the enhanced samples of each subject classification, and calculating the statistical information of each node in the hierarchical subject structure sampling tree after updating;
4) Judging whether the sample number corresponding to each subject classification is balanced or not according to the statistical information before and after updating of each node, and if not, adjusting the value of the super parameter lambda to repeat the steps 2-3); step 5) if balanced;
5) Generating an enhanced sample number list time according to the enhanced sample number of each subject class, and then generating a corresponding number of data samples of the corresponding subject class for each subject class by utilizing the pre-trained ultra-large scale language model.
Further, the method for generating the data samples of the corresponding subject classification in the corresponding number for each subject classification by utilizing the pre-trained ultra-large scale language model comprises the following steps: for discipline class c, the pre-trained very large scale language model is performed s c And (3) enhancing the data, inputting the prompt word corresponding to the subject class c each time as the input of the pre-trained ultra-large scale language model, and obtaining the output of the pre-trained ultra-large scale language model as the data enhancement result.
Further, for the subject class c, randomly selecting a keyword from keywords belonging to the subject class c in the expert annotation keyword database as a part of the prompt word, and constructing the prompt word corresponding to the subject class c.
Further, the prompt words are constructed through a prompt word template, wherein the prompt word template comprises background knowledge, a generation principle, a generation format, a language style, disciplines and keywords, and only disciplines and keywords are changed according to different discipline classifications.
Further, the ultra-large scale language model is a BERT model, a GPT model or a LLaMA and an instruction fine tuning version model Vicuna thereof.
A server comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the steps of the above method.
A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor realizes the steps of the above method.
Compared with the related method in the past, the semantic tag-rich data augmentation method based on the ultra-large-scale language model has the following advantages and contributions:
(1) The method can automatically learn rich semantic information from large-scale text data, and avoids the complexity and high cost of manual labeling and rule design in the traditional method.
(2) The method based on the ultra-large scale language model can utilize the advantages of deep learning and natural language processing technology to deeply analyze and understand the text and generate more accurate and rich semantic tag data.
(3) The pre-training of the language model used by the invention is performed based on large-scale data, so that the generated semantic-rich label data also has better coverage range and representativeness.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The invention will be further elucidated with reference to the drawings and examples.
The invention aims to enhance data of unbalanced subjects through a pre-trained large language model by rich semantic tags, reduce unbalance among hierarchical subjects to improve subject reasoning effect, and mainly use thesis subject classification, thesis abstract, thesis keywords and expert-labeled keywords in an original hierarchical subject text database. The flow of the method is shown in figure 1.
Step one: regarding an original hierarchical subject database formed by organizing one or more data sets, focusing on important information such as subject classification, abstract text, keywords and the like, sampling the number and the frequency of each subject according to a predetermined hierarchical subject classification structure, and constructing a hierarchical subject structure sampling tree. The method comprises the steps of dividing samples with the same classification number into the same node, wherein each node corresponds to one sample set and one classification number; and determining the position of each node in the classification tree according to the classification number.
Each discipline classification is taken as a node of a hierarchical discipline structure sampling tree, each data sample is traversed firstly, the number of samples is sampled, each data sample is sequentially read from an original hierarchical discipline database, for example, one sample is classified as A= > A01= > A0102= > A010203 in the hierarchy, A is taken as a root node in the discipline, the rest of nodes are taken as sub-nodes of the previous node, sampling is carried out downwards from the root node according to the hierarchical discipline classification, the num attribute value of each node on a path is increased by 1, the number of the num attribute values of the A node, the A01 node, the A0102 node and the A010203 node are sequentially increased by 1, and the operation is continuously circulated until all samples in the original database finish the number of samples.
The frequency calculation is evaluated from multiple angles, including 1. The global frequency globalFreq is the ratio of the number of samples in the sample set corresponding to each node in the total number of samples, and reflects the number ratio of a certain node on the whole hierarchical subject structure sampling tree; 2. the hierarchical frequency level freq is the ratio of the number of samples in the sample set corresponding to each node in the number of samples corresponding to the nodes at the same layer, and reflects the ratio of the number of the nodes at the layer where the node is located; 3. father freq is the ratio of the sum of the sample numbers in the sample set corresponding to all nodes under the same father node in the total sample number, and reflects the number ratio among all nodes under the same father node, namely whether all sub-discipline classifications under a certain discipline are balanced; 4. leaf node frequency leaf freq is the ratio of the number of samples in the sample set corresponding to each leaf node to the total number of samples, and is used for performing enhancement effect analysis on the subject classification of some particularly few samples, only the subject leaf node leaf freq has a valid value, and all other non-leaf nodes are-1 and are not used for analysis. And (3) performing frequency calculation on the hierarchical subject structure sampling tree with the number of samples being completed, traversing each layer to a lower layer sequence according to the tree-shaped layer sequence structure, and performing calculation on each frequency on nodes of each layer. In the frequency calculation process, the calculation process is differentiated according to different layer times, and the unbalance degree of each discipline tree node is evaluated on the basis of the number and the frequency sampling.
Step two: and (3) determining and calculating the quantity of data enhancement required by each node through LLM according to the hierarchical subject structure sampling tree generated in the step one and the related sampling results thereof. The total number of samples to be enhanced is first determined, and according to the size of the original database, for example, the original database contains 10000 pieces of data, the total number of samples to be enhanced can be selectively generated according to the proportion of 0.5. In selecting a decision to generate a sample number decision, consider utilizationThe function better distributes the number of samples between discipline categories in terms of number and frequency. But compared to the frequency freq as z in the above formula i The resulting list of times, 1/λ with the sample number results will make the discipline classifications smoother, while the purpose of adding-1 is to make the discipline classifications with smaller sample numbers get correspondingly larger enhancement numbers, therefore->
For the enhanced total number of samples N, a certain discipline C E C in all discipline classification sets C has a sampling number of N c Then its data-augmented sample number isS generated for each node in turn c An enhanced sample number list time is formed according to the hierarchical subject node order.
The super parameter lambda in the second step can be adjusted according to the actual effect. Judging whether the number of samples corresponding to each subject classification is balanced or not according to the updated frequency statistical information of the node after each adjustment, and if not, adjusting the value of the super parameter lambda and reassigning the number of enhanced samples corresponding to each subject classification; if balanced, go to step three.
Step three: according to the list time of the enhanced sample number generated in the second step, invoking a pre-trained ultra-large scale language model to perform time on a subject with a subject classification order of c in all subject classifications (c) The data enhancement is carried out, the prompting words are input each time and used as the input of the pre-trained ultra-large-scale language model, and the output of the prompting words is obtained and used as the data enhancement result. In order to avoid similar or even identical results obtained when the ultra-large scale language model (Large Language Model, LLM) data are continuously used for classifying the same subject, when the input prompt word template of the ultra-large scale language model is constructed in an experiment, an expert annotation keyword database is called to randomly select one keyword from keywords belonging to the subject as a part of the template to be constructed. Therefore, when the ultra-large scale language model is called for data enhancement, the template consists of background knowledge, a generation principle, a generation format, a language style, discipline and keywords, wherein only discipline and keywords are changed according to different discipline classifications, and other components of the prompting word template are basically determined for the enhancement task.
And (3) carrying out data enhancement by calling the ultra-large-scale language model to obtain a result, and completing the data labeling of the enhanced data according to the subject classification of the enhanced data so as to facilitate the subsequent training of the pre-trained BERT downstream classifier.
The invoked Pre-training models include BERT (Bidirectional Encoder Representations from Transformers) model, GPT (generated Pre-trained Transformer) model, LLaMA and its instruction fine-tuning version model Vicura, etc.
The selection of the pre-training model in the third step can be adjusted according to the actual effect.
And (3) experimental verification: and (3) using a pre-trained BERT model, connecting a plurality of full-connection layers to form a neural network for processing a downstream hierarchical subject reasoning task, combining hierarchical subject enhancement sample data generated according to the step (II) and an original subject database of the step (I), using a result of a abstract text after passing through a pre-trained word segmentation device as input, and outputting a poiler_output which is more suitable for processing a sentence-level task as the input of the downstream task neural network, training the neural network, and achieving the purpose of improving subject reasoning accuracy of subject text data by improving unbalance among subjects. In practical experiments, data augmentation is performed on a data set with a size of 14028, and when N is set to be 5% of the total number, the classification model can achieve 4% of F1Score effect improvement, and when N is set to be 10%, 10% of F1Score effect improvement can be obtained. The effectiveness of the method in an unbalanced data augmentation scene of a semantic tag-rich model aiming at a subject system is fully described.
The construction of the neural network in the experiment can be adjusted according to the actual effect, so that the effect of data enhancement on weakening of unbalance degree and thus the theme reasoning capability is better displayed.
The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and those skilled in the art may modify or substitute the technical solution of the present invention without departing from the spirit and scope of the present invention, and the protection scope of the present invention shall be subject to the claims.

Claims (7)

1. A semantic tag-rich data augmentation method based on a very large scale language model comprises the following steps:
1) Obtaining subject text data from a plurality of data sets to form a subject database; traversing each data sample in the discipline database, dividing the data samples with the same classification number into the same node of the hierarchical discipline structure sampling tree, determining the position of each node in the hierarchical discipline structure sampling tree according to the classification number, and constructing the hierarchical discipline structure sampling tree; each node corresponds to a sample set and a class number; counting global frequencies globalFreq and hierarchical frequencies levfreq of each node in the hierarchical subject structure sampling tree, counting father freq of the father node for each father node, and counting leaf node freq of each leaf node; the global frequency globalFreq of a node is the ratio of the number of samples in a sample set corresponding to the node to the total number of samples, the hierarchical frequency levefeq of the node is the ratio of the number of samples in the sample set corresponding to the node to the number of samples corresponding to each node on the same layer as the node, the father frequency father of a father node is the ratio of the sum of the number of samples in the sample set corresponding to all nodes under the father node to the total number of samples, and the leaf node frequency leaf freq of a leaf node is the ratio of the number of samples in the sample set corresponding to the leaf node to the total number of samples;
2) According toDetermining a number of enhanced samples for each discipline classification; wherein N is the number of enhanced total samples, lambda is a super parameter, C is a subject classification set corresponding to the subject database, and C is one subject classification in the subject classification set C, namely C epsilon C; the number of samples of the node corresponding to discipline class c is n c The number of samples of the node corresponding to the discipline classification i is n i The number of enhanced samples corresponding to discipline class c is s c
3) Updating the hierarchical subject structure sampling tree according to the number of the enhanced samples of each subject classification, and calculating the statistical information of each node in the hierarchical subject structure sampling tree after updating;
4) Judging whether the sample number corresponding to each subject classification is balanced or not according to the statistical information before and after updating of each node, and if not, adjusting the value of the super parameter lambda to repeat the steps 2-3); step 5) if balanced;
5) Generating an enhanced sample number list time according to the enhanced sample number of each subject class, and then generating a corresponding number of data samples of the corresponding subject class for each subject class by utilizing the pre-trained ultra-large scale language model.
2. The method of claim 1, wherein generating a respective number of data samples for each subject class using the pre-trained very large scale language model for the corresponding subject class comprises: for discipline class c, the pre-trained very large scale language model is performed s c And (3) enhancing the data, inputting the prompt word corresponding to the subject class c each time as the input of the pre-trained ultra-large scale language model, and obtaining the output of the pre-trained ultra-large scale language model as the data enhancement result.
3. The method according to claim 2, wherein for the subject class c, a keyword is randomly selected from keywords belonging to the subject class c in the expert annotation keyword database as a part of the prompt word, and the prompt word corresponding to the subject class c is constructed.
4. A method according to claim 3, wherein the cue words are constructed by a cue word template comprising background knowledge, generation principles, generation formats, language styles, disciplines, and keywords, wherein only disciplines and keywords vary depending on discipline classification.
5. A method according to claim 1, 2 or 3, wherein the very large scale language model is a BERT model, a GPT model or a LLaMA and its instruction fine tuning version model Vicuna.
6. A server comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the steps of the method of any of claims 1 to 5.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.
CN202311320484.3A 2023-10-12 2023-10-12 Semantic tag-rich data augmentation method based on ultra-large-scale language model Pending CN117494760A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311320484.3A CN117494760A (en) 2023-10-12 2023-10-12 Semantic tag-rich data augmentation method based on ultra-large-scale language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311320484.3A CN117494760A (en) 2023-10-12 2023-10-12 Semantic tag-rich data augmentation method based on ultra-large-scale language model

Publications (1)

Publication Number Publication Date
CN117494760A true CN117494760A (en) 2024-02-02

Family

ID=89683804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311320484.3A Pending CN117494760A (en) 2023-10-12 2023-10-12 Semantic tag-rich data augmentation method based on ultra-large-scale language model

Country Status (1)

Country Link
CN (1) CN117494760A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117786414A (en) * 2024-02-23 2024-03-29 云南联合视觉科技有限公司 Method for constructing medical instruction data set

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117786414A (en) * 2024-02-23 2024-03-29 云南联合视觉科技有限公司 Method for constructing medical instruction data set
CN117786414B (en) * 2024-02-23 2024-05-10 云南联合视觉科技有限公司 Method for constructing medical instruction data set

Similar Documents

Publication Publication Date Title
CN110597735B (en) Software defect prediction method for open-source software defect feature deep learning
US7685082B1 (en) System and method for identifying, prioritizing and encapsulating errors in accounting data
CN111026842A (en) Natural language processing method, natural language processing device and intelligent question-answering system
CN109857846B (en) Method and device for matching user question and knowledge point
CN113535963B (en) Long text event extraction method and device, computer equipment and storage medium
CN113268561B (en) Problem generation method based on multi-task joint training
KR102109369B1 (en) Artificial Intelligence System to Predict Changes and Explain Reasons in Time Series
CN109766911A (en) A kind of behavior prediction method
CN113312480A (en) Scientific and technological thesis level multi-label classification method and device based on graph convolution network
CN113761893A (en) Relation extraction method based on mode pre-training
CN116796045B (en) Multi-dimensional book grading method, system and readable medium
CN115204143B (en) Method and system for calculating text similarity based on prompt
CN117494760A (en) Semantic tag-rich data augmentation method based on ultra-large-scale language model
CN113779264A (en) Trade recommendation method based on patent supply and demand knowledge graph
CN115270797A (en) Text entity extraction method and system based on self-training semi-supervised learning
Song et al. Rgvisnet: A hybrid retrieval-generation neural framework towards automatic data visualization generation
CN113886562A (en) AI resume screening method, system, equipment and storage medium
CN110019796A (en) A kind of user version information analysis method and device
CN117807232A (en) Commodity classification method, commodity classification model construction method and device
CN116258504B (en) Bank customer relationship management system and method thereof
CN117010373A (en) Recommendation method for category and group to which asset management data of power equipment belong
Sekiyama et al. Automated proof synthesis for the minimal propositional logic with deep neural networks
CN114282875A (en) Flow approval certainty rule and semantic self-learning combined judgment method and device
Hou et al. FewJoint: few-shot learning for joint dialogue understanding
CN113570455A (en) Stock recommendation method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination