CN112836772A - Random contrast test identification method integrating multiple BERT models based on LightGBM - Google Patents
Random contrast test identification method integrating multiple BERT models based on LightGBM Download PDFInfo
- Publication number
- CN112836772A CN112836772A CN202110363597.6A CN202110363597A CN112836772A CN 112836772 A CN112836772 A CN 112836772A CN 202110363597 A CN202110363597 A CN 202110363597A CN 112836772 A CN112836772 A CN 112836772A
- Authority
- CN
- China
- Prior art keywords
- training
- text
- lightgbm
- data
- rct
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012360 testing method Methods 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000012549 training Methods 0.000 claims abstract description 63
- 239000013598 vector Substances 0.000 claims abstract description 40
- 238000011161 development Methods 0.000 claims abstract description 34
- 230000018109 developmental process Effects 0.000 claims description 31
- 238000002790 cross-validation Methods 0.000 claims description 11
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000002194 synthesizing effect Effects 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 238000003058 natural language processing Methods 0.000 description 6
- 230000035945 sensitivity Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 238000012216 screening Methods 0.000 description 5
- 238000012706 support-vector machine Methods 0.000 description 4
- 239000003814 drug Substances 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 230000005182 global health Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000011282 treatment Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000009509 drug development Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000013332 literature search Methods 0.000 description 1
- 238000010197 meta-analysis Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
The invention discloses a random contrast test identification method integrating a plurality of BERT models based on LightGBM, which comprises the following steps: step s 1: segmenting initial RCT data prepared in advance into a training set, a development set and a test set, wherein the initial RCT data comprises texts and initial classification labels; step s 2: respectively converting texts in the training set, the development set and the test set into a position vector, a text vector and a word vector; step s 3: training a model; step s 4: adjusting the hyper-parameters of the model; step s 5: classifying the texts of the training set and the development set by using the trained model; step s 6: training a LightGBM model; step s 7: and obtaining a final classification result. The invention integrates 4 different models by developing an integrated learning algorithm LightGBM, trains on RCT data provided by Cochrane, and automatically screens the subjects and abstracts of RCT classes.
Description
Technical Field
The invention relates to the technical field of computer data processing, in particular to a random contrast test identification method based on a LightGBM integrated plurality of BERT models.
Background
The Random Control Test (RCT) is generally considered to be a gold standard for evaluating the safety and efficacy of drugs. In recent years, how to evaluate the effectiveness and safety of drugs by using real world evidence becomes a hot issue of increasing concern in drug development and supervision decisions at home and abroad.
For a single RCT, the experimental samples are limited, a Meta analysis is often used for comprehensively collecting small samples of various treatments of a certain disease and the results of single clinical test RCT, systematic evaluation and statistical analysis are carried out on the results, scientific conclusions as real as possible are timely provided for the society and clinicians, so that the popularization of real and effective treatment means is promoted, and an invalid and even harmful method which is not yet provided for reference is abandoned.
The literature, as an important scientific research display sharing mode, contains a lot of scientific research information. RCT-related documents are typically collected by researchers through literature search.
However, in the document retrieval process for system evaluation, due to the explosive growth of documents every year and the lack of specificity of retrieval strategies, and the number of retrieved citations is very large, the screening of RCT-related documents on retrieval results is time-consuming and labor-consuming manually.
Currently, some system evaluation software tools include RCT classification functions, including gapscoreerer, austackr, and Rayyan, which are semi-automatic reference filtering and selection software that classify documents using a Support Vector Machine (SVM). SVM is a successful machine learning model widely used in these text mining tools to classify text in the first decade of the 21 st century. SVMs rely heavily on artificially set sample characteristics, which can be unstable and labor intensive.
With the development of machine learning techniques and computer hardware, network-based machine learning approaches have gained popularity due to their good performance on many problems, particularly in image recognition and Natural Language Processing (NLP). The bi-directional encoder representation was derived from the transformer (BERT), a pre-trained model, proposed by google, which achieved the best model results in 2018 in 10 months for 11 NLP tasks. Due to the deep network and the pre-training process thereof, the BERT model can obtain better effect in different NLP tasks. In the pre-training process, the model learns the background features of the language over a large number of pre-training data sets. The machine learning process has a large amount of basic learning, and the learning effect of a specific task is better. Therefore, we wish to use different pre-trained BERT models that are medically relevant as the basic classifier for the RCT classification task.
In the last two years, LightGBM has been widely used in machine learning tasks as an integration method for integrating different model effects. Besides saving the training prediction time, the performance of the method is superior to that of all the existing Boosting algorithms.
Currently, a model that performs well in the field of text classification is supervised learning. Supervised learning models for text classification require a training process. During the training process, the model is adapted to learn the relationship between the quotation and the classification tags, where the known filter tags are used to predict quotations without known classification tags. Therefore, the accuracy of the screened quotation directly influences the classification effect of the model. Cochrane is a recognized project in the field of system evaluation, and global health science researchers from 158 countries have participated in the classification of text. Panelists trained on the study method paired, screened the headlines/abstracts independently. The reviewer resolves the divergence by discussing or, if necessary, negotiating with a third reviewer.
Disclosure of Invention
The invention aims to provide a random contrast test identification method integrating a plurality of BERT models based on LightGBM, which is used for automatically screening the subjects and abstracts of RCT types.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
a random control test identification method integrating a plurality of BERT models based on LightGBM comprises the following steps:
step s 1: segmenting initial RCT data prepared in advance into a training set, a development set and a test set, wherein the initial RCT data comprises texts and initial classification labels;
step s 2: respectively converting texts in the training set, the development set and the test set into a position vector, a text vector and a word vector;
step s 3: respectively training 4 BERT models by using the position vector, the text vector, the word vector and the initial classification label after the text conversion in the training set;
step s 4: adjusting hyper-parameters of the 4 BERT models using the converted text position vectors, text vectors, word vectors and initial classification tags in the development set;
step s 5: classifying the texts of the training set and the development set into RCT classes and non-RCT classes by using the trained 4 BERT models;
step s 6: training a LightGBM model;
step s 7: and classifying the data of the test set by using 4 BERT models to obtain a classification result, and synthesizing the classification results of the 4 BERT models by using the LightGBM model to obtain a final classification result of the test set.
Preferably, the text includes a title and a summary, and the initial classification tags include an RCT class and a non-RCT class.
Preferably, in step s1, the segmentation includes the steps of:
step s 101: dividing the initial RCT data into 5 disjoint data sets;
step s 102: sequentially selecting 1 part of 5 parts in s101 as a test set, and taking the other 4 parts as training data, thereby obtaining 5 groups of data, wherein each group of data comprises 1 training data and 1 test set, and the sample number ratio of the test set to the training data is 1: 4;
step s 103: for 5 groups of data, the training data in each group are randomly divided into a training set and a development set in a ratio of 3:1, so that each group of data is composed of a training set, a development set and a test set, wherein the training set, the development set and the test set comprise samples in a ratio of 3:1: 1.
Preferably, the 4 BERT models are BIO-BBUPC, BIO-BBUP, SCI-BBU and BBU respectively, and the 4 BERT models are used as base classifiers.
Preferably, in step s5, each text in the training set and the development set is classified by a BERT model to obtain a 2-dimensional vector as a classification result, and each text in the training set and the development set is classified by 4 BERT models to obtain an 8-dimensional vector.
Further, in step s6, the LightGBM model is trained using the 8-dimensional vector data after text conversion of the training set and the development set and the training set initial classification labels, and the hyper-parameters of the LightGBM model are adjusted step by adopting five-fold cross validation.
The invention has the following beneficial effects:
according to the method, the lightGBM models of 4 different BERT models are integrated, the questions and the abstracts of the RCT are automatically screened, the accuracy, the sensitivity and the specificity of the screening result are higher, the method is faster and more accurate, and the manual workload is reduced.
Drawings
Fig. 1 is a general framework work flow diagram of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.
A random control test identification method integrating a plurality of BERT models based on LightGBM comprises the following steps:
step s 1: segmenting pre-prepared initial RCT data into a training set, a development set and a test set, wherein the initial RCT data comprises text and initial classification labels.
The initial RCT data is derived from Cochrane. Cochrane is a recognized project in the field of system evaluation, and global health science researchers from 158 countries have participated in the classification of text. Panelists trained on the study method paired, screened the headlines/abstracts independently. And the reviewer resolves the divergence by discussing or negotiating with a third reviewer if necessary.
The text includes a title and a summary, and the initial classification tags include an RCT class and a non-RCT class.
In step s1, the segmentation comprises the steps of:
step s 101: dividing the initial RCT data into 5 disjoint data sets;
step s 102: sequentially selecting 1 part of 5 parts in s101 as a test set, and taking the other 4 parts as training data, thereby obtaining 5 groups of data, wherein each of the 5 groups of data comprises 1 part of training data and 1 part of test set, and the ratio of the training data to the test set is 4: 1;
step s 103: each of the 5 groups of data was divided into 3: the proportion of 1 is divided into a training set and a development set, so that new 5 groups of data are obtained, each new group of data comprises the training set and the development set, and the proportion of the training set, the development set and the test set is 3:1: 1.
Step s 2: and respectively converting the texts in the training set, the development set and the test set into a position vector, a text vector and a word vector.
Step s 3: the 4 BERT models are trained using the text-converted position vectors, text vectors, word vectors, and initial classification labels in the training set, respectively.
The 4 BERT models are SCI-BBU, BIO-BBUP, BBU and BIO-BBUPC respectively, and the 4 BERT models are used as base classifiers.
The 4 BERT models have the same basic BERT model structure without size, but have different initial parameters, namely BIO-BBUPC, BIO-BBUPs, SCI-BBUs and BBUs. BIO-BBUPC was pre-trained in 2018 on abstracts and clinical notes in the PubMed database; BIO-BBUPs were pre-trained in 2018 on abstracts and clinical notes in the PubMed database; SCI-BBU is pre-trained in a semantic corpus, which has 1.14 million papers and 31 hundred million marks; SCI-BBU used the full text of the paper in training, not just the abstract; BBU received pre-training on wikipedia data in 2018. Different pre-training sets imply different initial model parameters.
Step s 4: the hyper-parameters of the 4 BERT models are adjusted using the text-converted position vectors, text vectors, word vectors, and the initial classification tags in the development set. The adjustment of the hyper-parameters mainly adjusts the maximum length and the learning rate of the input text.
Step s 5: and classifying the texts of the training set and the development set into RCT classes and non-RCT classes by using the trained 4 BERT models.
In step s5, each text in the training set and the development set is classified by a BERT model to obtain a 2-dimensional vector as a classification result, and each text in the training set and the development set is classified by 4 BERT models to obtain an 8-dimensional vector.
Step s 6: the LightGBM model is trained.
In step s6, the LightGBM model is trained using the 8-dimensional vector data after text conversion of the training set and the development set and the training set initial classification labels, and the hyper-parameters of the LightGBM model are adjusted step by adopting five-fold cross validation.
As shown in fig. 1, a working process of identifying whether a text is an RCT type by a trained model is shown, where a text is subjected to 4 classification results respectively obtained by 4 base classifiers BIO-BBUP, BIO-BBUP pc, SCI-BBU, and spliced by a Concat layer shown in fig. 1, the 4 classification results are merged and used as input of a LightGBM, and a final classification result, i.e., an RCT type or a non-RCT type, is obtained by the LightGBM model. The classification result obtained by each text through the calculation of the base classifier or the LightGBM model is a 2-dimensional vector ([ 0,1] or [1,0 ]), wherein [0,1] represents a non-RCT class and [1,0] represents an RCT class.
Step s 7: and classifying the data of the test set by using 4 BERT models to obtain a classification result, and synthesizing the classification results of the 4 BERT models by using the lightGBM model to obtain a final classification result of the test set, namely a screening result.
The technical effect of the invention is illustrated by the five-fold cross validation as follows:
indicators of evaluation method performance have accuracy, sensitivity, specificity, missed studies, and workload savings. The citation of the RCT class is a qualified citation, and the citation of the non-RCT class is a disqualified citation. Accuracy is the ratio of the number of correctly predicted quotations to the total number of quotations. Sensitivity is the ratio of the number of qualified citations correctly predicted as qualified citations to the total number of qualified citations. Specificity is the ratio of the number of quotations correctly predicted as ineligible to the total number of ineligible quotations.
The five-fold cross validation mainly aims to illustrate the robustness of the model, and the model has stability. In the five-fold cross validation, the invention shows stable high sensitivity and specificity in each test set. The test set contained 1472 citations for RCT classes and 15,323 citations for non-RCT classes, totaling 16,794 documents.
The accuracy in the case study evaluation set was 95%, the sensitivity was 93%, and the specificity was 95%. The sensitivity of 93% was better in case studies than each individual BERT model. Without further measures and with the full acceptance of the accuracy of the invention, the present invention would avoid manual screening of 14,650 of 16,794 citations, resulting in a 87% reduction in workload. The final model parameters are obtained by taking all data as a training set, and the evaluation parameters of the model take the average model evaluation parameters of the five-fold cross validation as the final evaluation parameters of the model.
The mean values of the five-fold cross validation results of RCT types recognized by different NLP methods are shown in Table 1:
table 1: five-fold cross validation result mean value for identifying RCT classes by different NLP methods
The five-fold cross validation results for identifying RCT classes are shown in Table 2:
table 2: identifying five-fold cross-validation results of RCT classes
The present invention is capable of other embodiments, and various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention.
Claims (6)
1. A random control test identification method integrating a plurality of BERT models based on LightGBM is characterized by comprising the following steps:
step s 1: segmenting initial RCT data prepared in advance into a training set, a development set and a test set, wherein the initial RCT data comprises texts and initial classification labels;
step s 2: respectively converting texts in the training set, the development set and the test set into a position vector, a text vector and a word vector;
step s 3: respectively training 4 BERT models by using the position vector, the text vector, the word vector and the initial classification label after the text conversion in the training set;
step s 4: adjusting hyper-parameters of the 4 BERT models using the converted text position vectors, text vectors, word vectors and initial classification tags in the development set;
step s 5: classifying the training set text and the development set text into RCT classes and non-RCT classes by using the trained 4 BERT models;
step s 6: training a LightGBM model;
step s 7: and classifying the data of the test set by using 4 BERT models to obtain a classification result, and synthesizing the classification results of the 4 BERT models by using the LightGBM model to obtain a final classification result of the test set.
2. The LightGBM-based random control trial identification method of integrating multiple BERT models according to claim 1, wherein: the text includes a title and a summary, and the initial classification tags include an RCT class and a non-RCT class.
3. The LightGBM-based random control trial identification method of integrating multiple BERT models according to claim 1, wherein:
in step s1, the segmentation comprises the steps of:
step s 101: dividing the initial RCT data into 5 disjoint data sets;
step s 102: sequentially selecting 1 part of 5 parts in s101 as a test set, and taking the other 4 parts as training data, thereby obtaining 5 groups of data, wherein each group of data comprises 1 training data and 1 test set, and the sample number ratio of the test set to the training data is 1: 4;
step s 103: for 5 groups of data, the training data in each group are randomly divided into a training set and a development set in a ratio of 3:1, so that each group of data is composed of a training set, a development set and a test set, wherein the training set, the development set and the test set comprise samples in a ratio of 3:1: 1.
4. The LightGBM-based random control trial identification method of integrating multiple BERT models according to claim 1, wherein: the 4 BERT models are BIO-BBUPC, BIO-BBUPP, SCI-BBU and BBU respectively, and the 4 BERT models are used as base classifiers.
5. The LightGBM-based random control trial identification method of integrating multiple BERT models according to claim 1, wherein: in step s5, each text in the training set and each text in the development set are classified by a BERT model to obtain a 2-dimensional vector as a classification result, and each text in the training set and each text in the development set are classified by 4 BERT models to obtain an 8-dimensional vector.
6. The LightGBM integrated multiple BERT model based random control trial identification method of claim 5, wherein: in step s6, the LightGBM model is trained using the 8-dimensional vector data and the training set initial classification labels after the training set text and the development set text are converted, and the LightGBM model hyper-parameter is adjusted step by adopting five-fold cross validation.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110363597.6A CN112836772A (en) | 2021-04-02 | 2021-04-02 | Random contrast test identification method integrating multiple BERT models based on LightGBM |
PCT/CN2021/116267 WO2022205768A1 (en) | 2021-04-02 | 2021-09-02 | Random contrast test identification method for integrating multiple bert models on the basis of lightgbm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110363597.6A CN112836772A (en) | 2021-04-02 | 2021-04-02 | Random contrast test identification method integrating multiple BERT models based on LightGBM |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112836772A true CN112836772A (en) | 2021-05-25 |
Family
ID=75930701
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110363597.6A Pending CN112836772A (en) | 2021-04-02 | 2021-04-02 | Random contrast test identification method integrating multiple BERT models based on LightGBM |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112836772A (en) |
WO (1) | WO2022205768A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022205768A1 (en) * | 2021-04-02 | 2022-10-06 | 四川大学华西医院 | Random contrast test identification method for integrating multiple bert models on the basis of lightgbm |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108829810A (en) * | 2018-06-08 | 2018-11-16 | 东莞迪赛软件技术有限公司 | File classification method towards healthy public sentiment |
CN109753564A (en) * | 2018-12-13 | 2019-05-14 | 四川大学 | The construction method of Chinese RCT Intelligence Classifier based on machine learning |
CN110210037A (en) * | 2019-06-12 | 2019-09-06 | 四川大学 | Category detection method towards evidence-based medicine EBM field |
CN110347825A (en) * | 2019-06-14 | 2019-10-18 | 北京物资学院 | The short English film review classification method of one kind and device |
CN112131389A (en) * | 2020-10-26 | 2020-12-25 | 四川大学华西医院 | Method for integrating multiple BERT models by LightGBM to accelerate system evaluation updating |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9087303B2 (en) * | 2012-02-19 | 2015-07-21 | International Business Machines Corporation | Classification reliability prediction |
CN112836772A (en) * | 2021-04-02 | 2021-05-25 | 四川大学华西医院 | Random contrast test identification method integrating multiple BERT models based on LightGBM |
-
2021
- 2021-04-02 CN CN202110363597.6A patent/CN112836772A/en active Pending
- 2021-09-02 WO PCT/CN2021/116267 patent/WO2022205768A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108829810A (en) * | 2018-06-08 | 2018-11-16 | 东莞迪赛软件技术有限公司 | File classification method towards healthy public sentiment |
CN109753564A (en) * | 2018-12-13 | 2019-05-14 | 四川大学 | The construction method of Chinese RCT Intelligence Classifier based on machine learning |
CN110210037A (en) * | 2019-06-12 | 2019-09-06 | 四川大学 | Category detection method towards evidence-based medicine EBM field |
CN110347825A (en) * | 2019-06-14 | 2019-10-18 | 北京物资学院 | The short English film review classification method of one kind and device |
CN112131389A (en) * | 2020-10-26 | 2020-12-25 | 四川大学华西医院 | Method for integrating multiple BERT models by LightGBM to accelerate system evaluation updating |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022205768A1 (en) * | 2021-04-02 | 2022-10-06 | 四川大学华西医院 | Random contrast test identification method for integrating multiple bert models on the basis of lightgbm |
Also Published As
Publication number | Publication date |
---|---|
WO2022205768A1 (en) | 2022-10-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021047186A1 (en) | Method, apparatus, device, and storage medium for processing consultation dialogue | |
WO2018218708A1 (en) | Deep-learning-based public opinion hotspot category classification method | |
WO2018000269A1 (en) | Data annotation method and system based on data mining and crowdsourcing | |
CN106528528A (en) | A text emotion analysis method and device | |
CN109933664A (en) | A kind of fine granularity mood analysis improved method based on emotion word insertion | |
Harfoushi et al. | Sentiment analysis algorithms through azure machine learning: Analysis and comparison | |
CN104834940A (en) | Medical image inspection disease classification method based on support vector machine (SVM) | |
Sathiyanarayanan et al. | Identification of breast cancer using the decision tree algorithm | |
CN109036577A (en) | Diabetic complication analysis method and device | |
CN102156885A (en) | Image classification method based on cascaded codebook generation | |
Jatav | An algorithm for predictive data mining approach in medical diagnosis | |
Borovsky et al. | Moving towards accurate and early prediction of language delay with network science and machine learning approaches | |
CN107194617A (en) | A kind of app software engineers soft skill categorizing system and method | |
CN109492105A (en) | A kind of text sentiment classification method based on multiple features integrated study | |
Liu et al. | Patent analysis and classification prediction of biomedicine industry: SOM-KPCA-SVM model | |
Livieris et al. | Identification of blood cell subtypes from images using an improved SSL algorithm | |
Tran et al. | Automated curation of CNMF-E-extracted ROI spatial footprints and calcium traces using open-source AutoML tools | |
Orosoo et al. | Performance analysis of a novel hybrid deep learning approach in classification of quality-related English text | |
CN112836772A (en) | Random contrast test identification method integrating multiple BERT models based on LightGBM | |
CN112131389B (en) | Method for integrating multiple BERT models through LightGBM to accelerate system evaluation updating | |
CN113886562A (en) | AI resume screening method, system, equipment and storage medium | |
Hardaya et al. | Application of text mining for classification of community complaints and proposals | |
CN104331507B (en) | Machine data classification is found automatically and the method and device of classification | |
CN116451114A (en) | Internet of things enterprise classification system and method based on enterprise multisource entity characteristic information | |
CN116775897A (en) | Knowledge graph construction and query method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210525 |
|
RJ01 | Rejection of invention patent application after publication |