WO2022114447A1 - 유사 임상 시험 데이터 제공 방법 및 이를 실행하는 서버 - Google Patents
유사 임상 시험 데이터 제공 방법 및 이를 실행하는 서버 Download PDFInfo
- Publication number
- WO2022114447A1 WO2022114447A1 PCT/KR2021/009978 KR2021009978W WO2022114447A1 WO 2022114447 A1 WO2022114447 A1 WO 2022114447A1 KR 2021009978 W KR2021009978 W KR 2021009978W WO 2022114447 A1 WO2022114447 A1 WO 2022114447A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- clinical trial
- trial data
- data
- vector
- word
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 239000013598 vector Substances 0.000 claims abstract description 64
- 239000011159 matrix material Substances 0.000 claims description 36
- 238000000605 extraction Methods 0.000 claims description 16
- 238000007781 pre-processing Methods 0.000 claims description 13
- 238000004458 analytical method Methods 0.000 claims description 4
- 238000013075 data extraction Methods 0.000 claims description 4
- 238000010801 machine learning Methods 0.000 claims description 4
- 239000000284 extract Substances 0.000 description 9
- 238000010586 diagram Methods 0.000 description 4
- 238000007726 management method Methods 0.000 description 4
- IQFYYKKMVGJFEH-CSMHCCOUSA-N telbivudine Chemical compound O=C1NC(=O)C(C)=CN1[C@H]1O[C@@H](CO)[C@H](O)C1 IQFYYKKMVGJFEH-CSMHCCOUSA-N 0.000 description 4
- 201000010099 disease Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 208000000419 Chronic Hepatitis B Diseases 0.000 description 2
- IQFYYKKMVGJFEH-UHFFFAOYSA-N beta-L-thymidine Natural products O=C1NC(=O)C(C)=CN1C1OC(CO)C(O)C1 IQFYYKKMVGJFEH-UHFFFAOYSA-N 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 208000002672 hepatitis B Diseases 0.000 description 2
- 229960001627 lamivudine Drugs 0.000 description 2
- JTEGQNOMFQHVDC-NKWVEPMBSA-N lamivudine Chemical compound O=C1N=C(N)C=CN1[C@H]1O[C@@H](CO)SC1 JTEGQNOMFQHVDC-NKWVEPMBSA-N 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 229960005311 telbivudine Drugs 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 208000030453 Drug-Related Side Effects and Adverse reaction Diseases 0.000 description 1
- 101100113998 Mus musculus Cnbd2 gene Proteins 0.000 description 1
- 230000007012 clinical effect Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000001647 drug administration Methods 0.000 description 1
- 238000009509 drug development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000002547 new drug Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003285 pharmacodynamic effect Effects 0.000 description 1
- 238000012958 reprocessing Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/20—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/40—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
Definitions
- the present invention relates to providing similar clinical trial data, and more particularly, to a method for providing similar clinical trial data for extracting and providing clinical trial data similar to clinical trial data input by a user, and a server executing the same.
- clinical trials for new drug development are also increasing.
- clinical trials are conducted to evaluate drug efficacy or prepare safety standards for newly developed drugs, etc., to check the range of applicable diseases, appropriate dosing, side effects, pharmacokinetics, pharmacodynamics, pharmacology, clinical effects, etc. It can be defined as a test or study conducted on humans to investigate adverse drug reactions.
- This clinical trial management system includes a clinical data database that stores clinical trial data.
- the clinical trial management system provides clinical data stored in the clinical data database to the clinical researcher. Therefore, researchers conducting clinical research search for necessary items in consideration of the research topic.
- An object of the present invention is to provide a method for providing similar clinical trial data that extracts and provides clinical trial data similar to clinical trial data input by a user, and a server for executing the same.
- a similar clinical trial data providing method executed in a similar clinical trial data providing server for achieving this purpose, when clinical trial data is received from a user terminal, determining the type of the clinical trial data; generating a vector by using each of the metadata of the clinical trial data or tokenizing a word extracted from the clinical trial data to generate a vector, inputting the vector into a pre-trained learning model, and in the learning model Calculating a previously stored vector and a distance between the vectors, and measuring a similarity grade according to the distance between the vectors, and extracting and providing clinical trial data having a similarity grade less than or equal to a specific grade.
- the similar clinical trial data providing server upon receiving the clinical trial data from the user terminal, determines the type of the clinical trial data, and a preprocessor that executes preprocessing according to the type of the clinical trial data;
- a data feature extractor that generates a vector by using each meta data of clinical trial data or tokenizes words extracted from the clinical trial data to generate a vector and the vector is input to a pre-trained learning model, the learning model and a similar clinical trial data extraction unit that calculates a pre-stored vector and a distance between the vectors, measures a similarity grade according to the distance between the vectors, and extracts and provides clinical trial data whose similarity grade is less than or equal to a specific grade.
- FIG. 1 is a network configuration diagram illustrating a system for providing similar clinical trial data according to an embodiment of the present invention.
- FIG. 2 is a block diagram illustrating the internal structure of a server for providing similar clinical trial data according to an embodiment of the present invention.
- FIG. 3 is a flowchart illustrating an embodiment of a method for providing similar clinical trial data according to the present invention.
- FIG. 4 is a flowchart for explaining another embodiment of a method for providing similar clinical trial data according to the present invention.
- clinical trial data refers to data collected through the web or database, and includes unstructured data and structured data.
- Structured data means data including metadata such as CRIS registration number, summary title in Korean, summary title in English, approval status, and approval date, and unstructured data means data listed in natural language, such as clinical trial results.
- FIG. 1 is a network configuration diagram illustrating a system for providing similar clinical trial data according to an embodiment of the present invention.
- FIG. 1 illustrates a system for providing similar clinical trial data according to an embodiment of the present invention including user terminals 100_1 to 100_N and a server 200 for providing similar clinical trial data.
- the user terminals 100_1 to 100_N provide clinical trial data to the similar clinical trial data providing server 200 to receive similar clinical trial data similar to the clinical trial data from the similar clinical trial data providing server 200. It is a terminal
- These user terminals 100_1 to 100_N may be implemented as a smart phone, a tablet PC, a notebook computer, a desktop, or the like.
- the similar clinical trial data providing server 200 is a server that, when receiving clinical trial data from the user terminals 100_1 to 100_N, extracts and provides clinical trial data similar to the clinical trial data.
- the similar clinical trial data providing server 200 collects clinical trial data through a web or a clinical trial database and executes pre-processing. At this time, the similar clinical trial data providing server 200 executes different pre-processing according to whether the clinical trial data is standardized data or unstructured data.
- the similar clinical trial data providing server 200 when the clinical trial data is standardized data, the similar clinical trial data providing server 200 generates a sub vector for each meta data of the clinical trial data, and uses the sub vector for each meta data. create a vector
- the similar clinical trial data providing server 200 pre-processes the weights calculated through the above-described process in another form, such as normalization or tf-idf, and then generates a learning model by learning the vector.
- This learning model is a model for extracting clinical trial data similar to clinical trial data when the standardized clinical trial data is received from the user terminals 100_1 to 100_N later.
- the similar clinical trial data providing server 200 deletes a predetermined clinically unused word from the clinical trial data or deletes a predetermined clinically unused word part-of-speech.
- the predetermined part-of-speech of the clinically unused word may include an article, a preposition, a conjunction, an interjection, and the like.
- the similar clinical trial data providing server 200 receives the clinical trial data “A Randomized, Double Blind Trial of LdT(Telbivudine) Versus Lamivudine in Adults With Compensated Chronic Hepatitis B”, Delete “A”, “of”, “in”, “with” and “B”.
- the similar clinical trial data providing server 200 extracts a word from the clinical trial data from which a predetermined clinically unused word is deleted based on a blank, and measures the frequency of words in the clinical trial data.
- the similar clinical trial data providing server 200 performs morpheme analysis on each word to generate a token in which the word and the morpheme value are paired and a label indicating the frequency is assigned.
- the similar clinical trial data providing server 200 transmits clinical trial data from which predetermined clinically unused words are deleted (frequency: 1000 times, (word, morpheme value)), (frequency: 234 times, ( word, morpheme)), (frequency: 2541 times, (word, morpheme)), (frequency: 2516 (word, morpheme)) ...
- You can create tokens such as
- each of the tokens according to the word of the corresponding token and the label of the token give different weights to
- the similar clinical trial data providing server 200 for each token the language type (ie, English, Chinese, Korean, etc.) that implements the word of the corresponding token, the position where the word exists in the clinical trial data, and the token A document word matrix is created by giving different weights according to the number of frequencies of labels assigned to .
- the language type ie, English, Chinese, Korean, etc.
- the similar clinical trial data providing server 200 uses a non-negative matrix factorization (Non-negative Matrix Factorization) machine learning algorithm to generate a document word matrix (the number of clinical trial data * k) into a matrix of size and (k * words). number) is decomposed into a matrix of size.
- the integer k is a hyperparameter (ie, topic number) and may be determined as the number of topics to be clustered. For example, k may be determined as the number of diseases or the like.
- the first matrix and the second matrix may be updated by clustering each of the clinical trial data and the word into any one of the k topics.
- the similar clinical trial data providing server 200 generates a learning model using the first matrix and the second matrix.
- This learning model is a model for extracting clinical trial data similar to clinical trial data when receiving atypical clinical trial data from the user terminals 100_1 to 100_N later.
- the similar clinical trial data providing server 200 receives clinical trial data from the user terminals 100_1 to 100_N, it vectorizes it through the same process as described above according to the type of clinical trial data.
- the similar clinical trial data providing server 200 calculates the distance between the matrix generated based on the clinical trial data received from the user terminals 100_1 to 100_N and the matrix of the learning model to determine the degree of similarity between the clinical trial data. can be calculated.
- the clinical trial main keyword prediction server 200 performs similar clinical trials according to the distance between the vector of the learning model and the vector generated based on the clinical trial data received from the user terminals 100_1 to 100_N. Data can be extracted and provided.
- FIG. 2 is a block diagram illustrating the internal structure of a server for providing similar clinical trial data according to an embodiment of the present invention.
- the similar clinical trial data providing server 200 includes a preprocessor 210 , a clinical stopword database 220 , a data feature extractor 230 , a user input receiver 240 , and a similar clinical trial data extractor (250).
- the preprocessor 210 collects clinical trial data through the web or a clinical trial database and executes the preprocessing. At this time, the preprocessor 210 executes different preprocessing according to whether the clinical trial data is standardized data or unstructured data.
- the preprocessor 210 extracts metadata of the clinical trial data.
- This learning model is a model for extracting clinical trial data similar to clinical trial data when the standardized clinical trial data is received from the user terminals 100_1 to 100_N later.
- the preprocessor 210 deletes a predetermined clinically unused word from the clinical trial data or deletes a predetermined clinically unused word part-of-speech and tokenizes it.
- the predetermined part-of-speech of the clinically unused word may include an article, a preposition, a conjunction, an interjection, and the like.
- the preprocessor 210 receives the clinical trial data “A Randomized, Double Blind Trial of LdT (Telbivudine) Versus Lamivudine in Adults With Compensated Chronic Hepatitis B”, Delete “of”, “in”, “with” and “B”.
- the preprocessor 210 extracts a word from the clinical trial data from which a predetermined clinically unused word is deleted based on a blank, and measures the frequency number of the word in the clinical trial data.
- the preprocessor 210 generates a token in which the word and the morpheme value are paired and a label indicating the frequency is assigned by performing morpheme analysis on each word.
- the pre-processing unit 210 may store clinical trial data from which a predetermined clinically unused word has been deleted (frequency: 1000 times, (word, morpheme value)), (frequency: 234 times, (word, morpheme) ), (frequency: 2541 times, (word, morpheme)), (frequency: 2516 (word, morpheme)) ... You can create tokens such as
- the data feature extraction unit 230 generates a learning model by using the information generated by the preprocessor 210 .
- the data feature extraction unit 230 generates a sub vector by using each meta data generated by the preprocessor 210 , and generates a vector by using the sub vector for each meta data.
- the data feature extraction unit 230 gives different weights to each of the tokens generated by the preprocessor 210 according to a word of the corresponding token and a label of the token.
- the data feature extraction unit 230 for each token the type of language that implements the word of the corresponding token (ie, English, Chinese, Korean, etc.), the position where the word exists in the clinical trial data, and the label assigned to the token.
- a document word matrix is created by giving different weights according to the number of frequencies.
- the data feature extraction unit 230 calculates a first weight based on the following [Equation 1] using the total number of tokens generated in the clinical trial title and the order of each token.
- token() A function that returns the total number of tokens after tokenizing the clinical trial title
- token_i the number of the i-th token among the total number of tokens
- the data feature extraction unit 230 calculates the first weight to a predetermined important value based on [Equation 1] based on the number of tokens and the order of the tokens based on the total number of tokens. .
- the data feature extraction unit 230 calculates “0.25” and reflects a predetermined important value according to the type of language to be the first weights can be calculated.
- the predetermined important value according to the type of language may be changed depending on where the important word for each type of language is indicated. That is, the important value predetermined according to the type of language may be changed according to the number of the current token.
- the data feature extraction unit 230 is based on the following [Equation 2] and [Equation 3] for each of the tokens, the frequency indicated by the label pre-allocated to the token and the previous token and each of the next token.
- the second weight may be calculated using the frequency number indicated by the pre-allocated label.
- Difference_value the average value of the number of frequencies
- token_i the i-th token among the total number of tokens
- token_i-1 the previous token of the i-th token among the total number of tokens
- token_i+1 the next token of the i-th token among the total number of tokens
- f() a function that extracts the number of frequencies indicated by the label assigned to the token
- Threshold Threshold
- the data feature extraction unit 230 calculates the first weight and the second weight based on [Equation 1] to [Equation 3], and then finally uses the first weight and the second weight A document word matrix is created by calculating and assigning weights.
- the data feature extraction unit 230 converts the document word matrix to a (number of clinical trial data * k) size matrix and (k * number of words) through a non-negative matrix factorization machine learning algorithm. decompose into matrices of size.
- the integer k is a hyperparameter (ie, topic number) and may be determined as the number of topics to be clustered. For example, k may be determined as the number of diseases or the like.
- the first matrix and the second matrix may be updated by clustering each of the clinical trial data and the word into any one of the k topics.
- This learning model is a model for extracting clinical trial data similar to clinical trial data when receiving atypical clinical trial data from the user terminals 100_1 to 100_N later.
- the preprocessing unit 210 and the data feature extracting unit 230 perform preprocessing and data feature extraction according to the type of clinical trial data. to do it
- the similar clinical trial data extraction unit 250 learns the vector in advance. input into the model.
- the similar clinical trial data extraction unit 250 calculates a vector stored in advance in the learning model and the distance between the vectors, measures the similarity grade according to the distance between the vectors, and extracts clinical trial data whose similarity grade is less than or equal to a specific grade. to provide.
- FIG. 3 is a flowchart illustrating an embodiment of a method for providing similar clinical trial data according to the present invention.
- the similar clinical trial data providing server 200 collects clinical trial data through the web or a clinical trial database (step S310), determines the type of the clinical trial data (step S320), and the clinical trial data Pre-processing is performed according to the type of test data (step S330).
- the similar clinical trial data providing server 200 generates a vector by using each of the metadata of the clinical trial data according to the type of the clinical trial data or tokenizes words extracted from the clinical trial data to generate a vector (step S340).
- the similar clinical trial data providing server 200 generates a learning model by learning the vector (step S350).
- FIG. 4 is a flowchart for explaining another embodiment of a method for providing similar clinical trial data according to the present invention.
- the similar clinical trial data providing server 200 receives clinical trial data from the user terminal (step S410), it determines the type of the clinical trial data (step S420), and the type of the clinical trial data Preprocessing is performed according to (step S430).
- the similar clinical trial data providing server 200 generates a vector by using each of the metadata of the clinical trial data according to the type of the clinical trial data or tokenizes words extracted from the clinical trial data to generate a vector (step S440).
- the similar clinical trial data providing server 200 inputs the vector to the pre-trained learning model, and calculates the distance between the vector and the vector stored in advance in the learning model (step S450))
- the similar clinical trial data providing server 200 measures the similarity grade according to the distance between the vectors, and extracts and provides clinical trial data in which the similarity grade is less than or equal to a specific grade (step S460).
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (8)
- 유사 임상 시험 데이터 제공 서버에서 실행되는 유사 임상 시험 데이터 제공 방법에 있어서,사용자 단말로부터 임상 시험 데이터를 수신하면, 상기 임상 시험 데이터의 종류를 결정하는 단계;상기 임상 시험 데이터의 종류에 따라 상기 임상 시험 데이터의 메타 데이터 각각을 이용하여 벡터를 생성하거나 상기 임상 시험 데이터에서 추출된 단어를 토큰화하여 벡터를 생성하는 단계;상기 벡터를 미리 학습된 학습 모델에 입력하고, 상기 학습 모델에서 미리 저장된 벡터 및 상기 벡터 사이의 거리를 산출하는 단계; 및상기 벡터 사이의 거리에 따라 유사 등급을 측정하여 유사 등급이 특정 등급 이하인 임상 시험 데이터를 추출하여 제공하는 단계를 포함하는 것을 특징으로 하는유사 임상 시험 데이터 제공 방법.
- 제1항에 있어서,상기 임상 시험 데이터의 종류에 따라 상기 임상 시험 데이터의 메타 데이터 각각을 이용하여 벡터를 생성하거나 상기 임상 시험 데이터에서 추출된 단어를 토큰화하여 벡터를 생성하는 단계는상기 임상 시험 데이터 종류가 정형 데이터인 경우 임상 시험 데이터의 메타 데이터 각각에 대한 서브 벡터를 생성하고, 메타 데이터 각각에 대한 서브 벡터를 이용하여 벡터를 생성하는 단계를 포함하는 것을 특징으로 하는유사 임상 시험 데이터 제공 방법.
- 제1항에 있어서,상기 임상 시험 데이터의 종류에 따라 상기 임상 시험 데이터의 메타 데이터 각각을 이용하여 벡터를 생성하거나 상기 임상 시험 데이터에서 추출된 단어를 토큰화하여 벡터를 생성하는 단계는상기 임상 시험 데이터 종류가 비정형 데이터인 경우 임상 시험 타이틀 데이터에서 미리 결정된 임상 불이용 단어를 삭제하고, 미리 결정된 임상 불이용 단어가 삭제된 임상 시험 타이틀 데이터를 공백을 기준으로 단어를 추출하는 단계;상기 단어 각각에 대한 형태소 분석을 실행하여 단어 및 형태소 값이 쌍으로 이루어지고, 빈도 수를 지시하는 레이블이 할당된 토큰을 생성하는 단계; 및상기 토큰 각각에 대해서 해당 토큰의 단어 및 토큰의 레이블에 따라 토큰 각각에 서로 다른 가중치를 부여하여 문서 단어 행렬을 생성하는 단계를 포함하는 것을 특징으로 하는유사 임상 시험 데이터 제공 방법.
- 제3항에 있어서,상기 토큰 각각에 대해서 해당 토큰의 단어 및 토큰의 레이블에 따라 토큰 각각에 서로 다른 가중치를 부여하여 문서 단어 행렬을 생성하는 단계는음수 미포함 행렬 분해(Non-negative Matrix Factorization) 머신러닝 알고리즘을 통해 문서 단어 행렬을 (임상 시험 데이터의 수 * k개의 토픽 수)크기의 제1 행렬 및 (k개의 토픽 수 * 단어 수) 크기의 제2 행렬로 분해하는 단계; 및상기 임상 시험 데이터 및 상기 단어 각각을 상기 상기 k개의 토픽 중 어느 하나의 토픽으로 클러스터링하여 제1 행렬 및 제2 행렬을 업데이트하는 단계를 포함하는 것을 특징으로 하는유사 임상 시험 데이터 제공 방법.
- 유사 임상 시험 데이터 제공 서버에 있어서,사용자 단말로부터 임상 시험 데이터를 수신하면, 상기 임상 시험 데이터의 종류를 결정하고, 상기 임상 시험 데이터의 종류에 따라 전처리를 실행하는 전처리부;상기 임상 시험 데이터의 메타 데이터 각각을 이용하여 벡터를 생성하거나 상기 임상 시험 데이터에서 추출된 단어를 토큰화하여 벡터를 생성하는 데이터 특징 추출부;상기 벡터를 미리 학습된 학습 모델에 입력하고, 상기 학습 모델에서 미리 저장된 벡터 및 상기 벡터 사이의 거리를 산출하고, 상기 벡터 사이의 거리에 따라 유사 등급을 측정하여 유사 등급이 특정 등급 이하인 임상 시험 데이터를 추출하여 제공하는 유사 임상 시험 데이터 추출부를 포함하는 것을 특징으로 하는유사 임상 시험 데이터 제공 서버.
- 제5항에 있어서,상기 데이터 특징 추출부는상기 임상 시험 데이터 종류가 정형 데이터인 경우 임상 시험 데이터의 메타 데이터 각각에 대한 서브 벡터를 생성하고, 메타 데이터 각각에 대한 서브 벡터를 이용하여 벡터를 생성하는 것을 특징으로 하는유사 임상 시험 데이터 제공 서버.
- 제5항에 있어서,상기 데이터 특징 추출부는상기 임상 시험 데이터 종류가 비정형 데이터인 경우 임상 시험 타이틀 데이터에서 미리 결정된 임상 불이용 단어를 삭제하고, 미리 결정된 임상 불이용 단어가 삭제된 임상 시험 타이틀 데이터를 공백을 기준으로 단어를 추출하고, 상기 단어 각각에 대한 형태소 분석을 실행하여 단어 및 형태소 값이 쌍으로 이루어지고, 빈도 수를 지시하는 레이블이 할당된 토큰을 생성하고, 상기 토큰 각각에 대해서 해당 토큰의 단어 및 토큰의 레이블에 따라 토큰 각각에 서로 다른 가중치를 부여하여 문서 단어 행렬을 생성하는 것을 특징으로 하는유사 임상 시험 데이터 제공 서버.
- 제5항에 있어서,상기 데이터 특징 추출부는음수 미포함 행렬 분해(Non-negative Matrix Factorization) 머신러닝 알고리즘을 통해 문서 단어 행렬을 (임상 시험 데이터의 수 * k개의 토픽 수)크기의 제1 행렬 및 (k개의 토픽 수 * 단어 수) 크기의 제2 행렬로 분해하고, 상기 임상 시험 데이터 및 상기 단어 각각을 상기 상기 k개의 토픽 중 어느 하나의 토픽으로 클러스터링하여 제1 행렬 및 제2 행렬을 업데이트하는 것을 특징으로 하는유사 임상 시험 데이터 제공 서버.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/039,404 US20240005097A1 (en) | 2020-11-30 | 2021-07-30 | Method for providing similar clinical trial data and server executing same |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020200164313A KR20220075815A (ko) | 2020-11-30 | 2020-11-30 | 유사 임상 시험 데이터 제공 방법 및 이를 실행하는 서버 |
KR10-2020-0164313 | 2020-11-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022114447A1 true WO2022114447A1 (ko) | 2022-06-02 |
Family
ID=81755173
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2021/009978 WO2022114447A1 (ko) | 2020-11-30 | 2021-07-30 | 유사 임상 시험 데이터 제공 방법 및 이를 실행하는 서버 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20240005097A1 (ko) |
KR (1) | KR20220075815A (ko) |
WO (1) | WO2022114447A1 (ko) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102672284B1 (ko) | 2022-06-21 | 2024-06-03 | 주식회사 엘지에너지솔루션 | 배터리 관리 장치 및 방법 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013229035A (ja) * | 2007-01-31 | 2013-11-07 | Quintiles Transnational Corp | サイトスタートアップのための方法とシステム |
JP2014178800A (ja) * | 2013-03-14 | 2014-09-25 | Gifu Univ | 医療情報処理装置、及び、プログラム |
KR20170085813A (ko) * | 2016-01-15 | 2017-07-25 | 사회복지법인 삼성생명공익재단 | 임상 연구 데이터 제공 방법 및 시스템 |
KR20180062321A (ko) * | 2016-11-29 | 2018-06-08 | (주)아크릴 | 딥러닝-기반 키워드에 연관된 단어를 도출하는 방법과 컴퓨터프로그램 |
KR20200080732A (ko) * | 2018-12-27 | 2020-07-07 | (주)인실리코젠 | 의료분야 비정형 데이터 검색 장치 |
-
2020
- 2020-11-30 KR KR1020200164313A patent/KR20220075815A/ko not_active Application Discontinuation
-
2021
- 2021-07-30 US US18/039,404 patent/US20240005097A1/en active Pending
- 2021-07-30 WO PCT/KR2021/009978 patent/WO2022114447A1/ko active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013229035A (ja) * | 2007-01-31 | 2013-11-07 | Quintiles Transnational Corp | サイトスタートアップのための方法とシステム |
JP2014178800A (ja) * | 2013-03-14 | 2014-09-25 | Gifu Univ | 医療情報処理装置、及び、プログラム |
KR20170085813A (ko) * | 2016-01-15 | 2017-07-25 | 사회복지법인 삼성생명공익재단 | 임상 연구 데이터 제공 방법 및 시스템 |
KR20180062321A (ko) * | 2016-11-29 | 2018-06-08 | (주)아크릴 | 딥러닝-기반 키워드에 연관된 단어를 도출하는 방법과 컴퓨터프로그램 |
KR20200080732A (ko) * | 2018-12-27 | 2020-07-07 | (주)인실리코젠 | 의료분야 비정형 데이터 검색 장치 |
Also Published As
Publication number | Publication date |
---|---|
KR20220075815A (ko) | 2022-06-08 |
US20240005097A1 (en) | 2024-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Pan et al. | Course concept extraction in moocs via embedding-based graph propagation | |
WO2019103183A1 (ko) | Esg 기반의 기업 평가 수행 장치 및 이의 작동 방법 | |
Xu et al. | A Domain Adaptive Approach to Automatic Acquisition of Domain Relevant Terms and their Relations with Bootstrapping. | |
Nelson et al. | The University of South Florida free association, rhyme, and word fragment norms | |
WO2015167074A1 (ko) | 토픽을 추출하고, 추출된 토픽의 적합성을 평가하는 방법 및 서버 | |
WO2011065617A1 (ko) | 과학기술핵심개체 간 의미적 연관관계 자동 추출을 위한 시맨틱 구문 트리 커널 기반 처리 시스템 및 방법 | |
WO2018131955A1 (ko) | 디지털 컨텐츠를 분석하는 방법 | |
WO2020111314A1 (ko) | 개념 그래프 기반 질의응답 장치 및 방법 | |
WO2015023035A1 (ko) | 전치사 교정 방법 및 이를 수행하는 장치 | |
WO2021251558A1 (ko) | 임상시험 검색을 위한 데이터 분류 장치, 시스템 및 방법 | |
WO2013002436A1 (ko) | 온톨로지 기반의 문서 분류 방법 및 장치 | |
WO2019093675A1 (ko) | 빅데이터 분석을 위한 데이터 병합 장치 및 방법 | |
WO2022092409A1 (ko) | 임상 시험 주요 키워드 예측 방법 및 이를 실행하는 서버 | |
WO2018088664A1 (ko) | 러프 셋을 이용한 형태소 품사 태깅 코퍼스 오류 자동 검출 장치 및 그 방법 | |
WO2021112463A1 (ko) | 기업을 위한 정보 제공 장치 및 방법 | |
Golshan et al. | A study of recent contributions on information extraction | |
WO2022114447A1 (ko) | 유사 임상 시험 데이터 제공 방법 및 이를 실행하는 서버 | |
WO2024185948A1 (ko) | 인공신경망 기반의 검색어 사전 생성 및 검색 방법 및 장치 | |
Mazoyer et al. | Real-time collection of reliable and representative tweets datasets related to news events | |
Liebeskind et al. | Semiautomatic construction of cross-period thesaurus | |
WO2017057858A1 (ko) | 가중치에 의한 다수 분야별 검색 기능을 구비한 지식관리 시스템 | |
WO2020242086A1 (ko) | 다중 지식의 비교 우위를 추론하는 서버, 방법 및 컴퓨터 프로그램 | |
WO2024071568A1 (ko) | 고객 선호도 예측에 기반한 상품 마케팅 방법 | |
WO2018143490A1 (ko) | 웹 콘텐츠를 이용한 사용자 감성 예측 시스템 및 그 방법 | |
WO2014148664A1 (ko) | 단어의 의미를 기반으로 하는 다국어 검색 시스템, 다국어 검색 방법 및 이를 이용한 이미지 검색 시스템 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21898276 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18039404 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21898276 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21898276 Country of ref document: EP Kind code of ref document: A1 |