CN114530205A - Organ chip database vectorization scheme for artificial intelligence algorithm - Google Patents
Organ chip database vectorization scheme for artificial intelligence algorithm Download PDFInfo
- Publication number
- CN114530205A CN114530205A CN202110986435.8A CN202110986435A CN114530205A CN 114530205 A CN114530205 A CN 114530205A CN 202110986435 A CN202110986435 A CN 202110986435A CN 114530205 A CN114530205 A CN 114530205A
- Authority
- CN
- China
- Prior art keywords
- information
- data
- organ chip
- coded
- vectorization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 210000000056 organ Anatomy 0.000 title claims abstract description 56
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 18
- 238000013473 artificial intelligence Methods 0.000 title claims description 16
- 239000003814 drug Substances 0.000 claims abstract description 38
- 229940079593 drug Drugs 0.000 claims abstract description 30
- 238000000034 method Methods 0.000 claims abstract description 22
- 239000000463 material Substances 0.000 claims abstract description 17
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 claims abstract description 15
- 229910052760 oxygen Inorganic materials 0.000 claims abstract description 15
- 239000001301 oxygen Substances 0.000 claims abstract description 15
- 239000003153 chemical reaction reagent Substances 0.000 claims abstract description 13
- 239000013598 vector Substances 0.000 claims abstract description 11
- 238000006243 chemical reaction Methods 0.000 claims abstract description 8
- 238000013136 deep learning model Methods 0.000 claims abstract description 4
- 108090000623 proteins and genes Proteins 0.000 claims description 30
- 102000004169 proteins and genes Human genes 0.000 claims description 23
- 238000010606 normalization Methods 0.000 claims description 21
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 2
- 239000008177 pharmaceutical agent Substances 0.000 claims description 2
- 239000000203 mixture Substances 0.000 claims 2
- 238000009472 formulation Methods 0.000 claims 1
- CURLTUGMZLYLDI-UHFFFAOYSA-N Carbon dioxide Chemical compound O=C=O CURLTUGMZLYLDI-UHFFFAOYSA-N 0.000 abstract description 12
- 239000011159 matrix material Substances 0.000 abstract description 12
- 229910002092 carbon dioxide Inorganic materials 0.000 abstract description 6
- 239000001569 carbon dioxide Substances 0.000 abstract description 6
- 230000015556 catabolic process Effects 0.000 abstract description 6
- 238000006731 degradation reaction Methods 0.000 abstract description 6
- 239000002207 metabolite Substances 0.000 abstract description 6
- 230000004083 survival effect Effects 0.000 abstract description 4
- 238000012549 training Methods 0.000 abstract description 4
- -1 cell lines Substances 0.000 abstract description 2
- 238000010801 machine learning Methods 0.000 abstract description 2
- 238000002474 experimental method Methods 0.000 description 15
- 150000001413 amino acids Chemical class 0.000 description 8
- 230000014509 gene expression Effects 0.000 description 8
- 239000000126 substance Substances 0.000 description 6
- 239000003124 biologic agent Substances 0.000 description 4
- 239000003596 drug target Substances 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 230000008520 organization Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000000338 in vitro Methods 0.000 description 2
- 230000035772 mutation Effects 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000035899 viability Effects 0.000 description 2
- 230000004791 biological behavior Effects 0.000 description 1
- 239000012620 biological material Substances 0.000 description 1
- 238000012258 culturing Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 239000002547 new drug Substances 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000001766 physiological effect Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/30—Data warehousing; Computing architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/40—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Epidemiology (AREA)
- Biotechnology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioethics (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Primary Health Care (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Algebra (AREA)
- General Engineering & Computer Science (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
This patent generally describes a method suitable for organ chip database vectorization. The organ chip database contains biological stent materials, reagents, cell lines, drugs, organ chip models, organ chip configuration parameters (reagents and drug concentrations, cell types and the like), time information and experimental results (cell metabolite concentrations, the number and survival rate of cells, the pH of microenvironment in the chip, temperature, oxygen concentration, carbon dioxide concentration, TEER, air pressure, whether drugs are added or not, the release rate of the drugs and the degradation rate) serving as label data, a deep learning model weight matrix can be obtained through training of the data, and the label data can be automatically predicted by the model after the information is input. The first thing to be done when the data in the database is input into the model is the format conversion of the data, because the types and formats of the data stored in the organ chip database are not uniform, text information, digital information and even image information exist, and the information needs to be converted into vector information which can be identified by a machine learning algorithm. This patent is designed to solve this problem-how to vectorize the data of the organ chip.
Description
Technical Field
The invention belongs to the field of biomedical engineering and computer science and technology fusion, and designs a vectorization method suitable for an organ chip database. The organ chip database contains biological stent materials, reagents, cell lines, drugs, organ chip models, organ chip configuration parameters (reagents and drug concentrations, cell types and the like), time information and experimental results (cell metabolite concentrations, the number and survival rate of cells, the pH of microenvironment in the chip, temperature, oxygen concentration, carbon dioxide concentration, TEER, air pressure, whether drugs are added or not, the release rate of the drugs and the degradation rate) serving as label data, a deep learning model weight matrix can be obtained through training of the data, and the label data can be automatically predicted by the model after the information is input. The first thing to be done when the data in the database is input into the model is the format conversion of the data, because the types and formats of the data stored in the organ chip database are not uniform, text information, digital information and even image information exist, and the information needs to be converted into vector information which can be identified by a machine learning algorithm. This patent is designed to solve this problem-how to vectorize the data of the organ chip.
Background
The organ chip is one physiological organ microsystem constructed on the chip, and it has micro flow control chip as core and combined with cell biology, biological material, engineering and other methods to constitute in vitro tissue organ microenvironment simulating and reflecting the main structure and function characteristics of human tissue organ. The tissue organ model can not only reproduce the physiological and pathological activities of human organs in vitro approximately and truly, but also can lead researchers to witness and research various biological behaviors of organisms in an unprecedented way, predict the response of human bodies to drugs or different external stimuli, and has wide application value in the fields of life science research, disease simulation, new drug research and development and the like.
In the process of culturing and performing experiments on organ chips, a large amount of experimental data can be generated, but in the previous research, researchers do not carefully analyze the association between data, particularly the data is not shared among different organ chip experiments, so that the data association characteristics among different experiments cannot be concerned, only the experimental results are paid attention to, the data in the experimental process, particularly dynamic data are lost, and the researchers only concern the experimental data of the researchers without time and energy, and do not have tools to concern similar experimental results which are once done by others, and the design parameters of the researchers are compared with the others, so that a proper data analysis method needs to be developed to analyze and model the data. Before data analysis, the data in the organ chip database needs to be subjected to vector conversion and then can be input into the artificial intelligence model. This patent aims to provide a good solution to such problems.
Disclosure of Invention
The purpose of the invention is as follows:
before the artificial intelligence model is built, the data information in the organ chip database needs to be subjected to vector conversion, and the actual data (text and non-text information, such as names, molecular formulas, components and the like of biological stent materials and pharmaceutical agents) needs to be converted into digital information which can be understood and calculated by the deep learning model, wherein the digital information is a coded representation and is in a digital format so as to facilitate the artificial intelligence model calculation.
For the organ chip database, it should contain the data table related to the Drug information (storing the Drug name, molecular formula, two-dimensional and three-dimensional structural formula, Target protein, SMILE format expression, MOL2VEC code, etc.), wherein the Target protein information requires to establish a protein data table and a Target data table to express Drug Target Interaction, DTI related information for short), the data table related to the cell information (storing the cell line name, source, GENE sequence, GENE2VEC code, etc.), the data table related to the biological scaffold material information (storing the molecular formula, structural formula, code expression, etc.), the data table related to the biological agent information (storing the components, ratios, concentrations, chemical formula, structural formula, code, etc.), the data table related to the organ chip model (storing the model ID of the chip, the type of the enumerated variables, the type of the organ chip, the developer, the organization, the article name and the link, The official website introduction link, the description of chip structure and components, the description of working principle, WORD2VEC code, etc.), an organ chip parameter configuration data table (storing parameter configuration ID, which is convenient to be associated with the experiment data table, and one line of parameter configuration ID information corresponding to a plurality of lines of experiment data table information with time data, and also storing organ chip model ID, one or more kinds of medicine preparation information, biological agent preparation information, adopted stent material preparation information, which cell lines are adopted, etc.), an experiment result data table with time information (storing cell metabolite concentration, the number and survival rate of cells, PH, temperature, oxygen concentration, carbon dioxide concentration, TEER, air pressure, whether to add medicine, release speed and degradation speed of medicine, etc.) which may have a certain direct or indirect relation with the experiment result data, the contact can be used for big data learning and pattern recognition through an artificial intelligence method, and further used for predicting an experimental result.
(1) For a data table related to Drug information (storing Drug names, molecular formulas, two-dimensional and three-dimensional structural formulas, Target proteins, SMILE format expressions, MOL2VEC codes and the like, wherein the Target protein information needs to establish a protein data table and a Target data table to express Drug Target Interaction, DTI related information for short), the Drug molecular formula can be converted into fingerprint information by using a Morgan algorithm, because the digit number of the fingerprint information is too long, secondary training conversion can be performed by a certain model, for example, vectors can be output by a BERT algorithm, or the Drug molecular formula can be directly converted into vectors by a MOL2Vec algorithm, and the converted digital character string result can be directly stored in the Drug information table. For vectorization of the amino acid sequence of the target protein, the amino acid sequence can be represented by the PSSM method, and the rest of the information, if a number, can be encoded by a normalization method between 0 and 1, and if a text, can be encoded by One-hot.
The PSSM matrix representation method is as follows:
firstly, finding a protein fasta sequence and a homologous protein fasta sequence, and arranging the protein fasta sequence and the homologous protein fasta sequence according to rows (columns); secondly, calculating the number of each amino acid of each sequence to obtain a PPM matrix which is a matrix of L-20, wherein 20 represents the number of the amino acids of the human body, and L represents the length of the protein sequence; thirdly, standardizing the matrix to obtain a PFM matrix; and fourthly, obtaining a PSSM matrix according to a formula, wherein the PSSM matrix is a matrix of L by 20, 20 represents the number of amino acids of the human body, and L represents the length of the protein sequence. The matrix may indicate both a protein and the potential for amino acid mutations at each position to other amino acids. The amino acid corresponding to the largest number in each row is the protein represented by this matrix. Each element represents the possibility of mutating the amino acid at that position to another amino acid. The larger the element value, the more likely mutation is to occur.
(2) For the data table relating to the cell information (storing the cell line name, source, GENE sequence, GENE2VEC code, etc.), the cell GENE sequence can be vectorized by the GENE2VEC method and stored in the cell information data table. The rest information can be coded by a normalization method between 0 and 1 if the information is a number, and can be coded by One-hot if the information is a text.
(3) For the data table (storing molecular formula, structural formula, coding expression, etc.) related to the information of the biological stent material, the molecular formula and the structural formula can be vectorized by using a Mol2Vec method and stored in the data table of the information of the biological stent material. The rest information can be coded by a normalization method between 0 and 1 if the information is a number, and can be coded by Word2Vec or One-hot if the information is a text.
(4) For the data table (storing the components, proportion, concentration, chemical formula, structural formula, code and the like) related to the biological reagent information, the chemical formula can be vectorized by using the Mol2Vec method and stored in the reagent information data table. The rest information can be coded by a normalization method between 0 and 1 if the information is a number, and can be coded by Word2Vec or One-hot if the information is a text.
(5) For the organ chip model related data table (the organ chip type storing the chip model ID, the enumerated variables, the developer, the organization, the article name and link, the official website introduction link, the chip structure and component description, the working principle description, the WORD2VEC code and the like), the field information, a part of the data such as the official website link, the developer, the article name and the like are irrelevant to the prediction of the experimental result, and the vectorization is not needed because the input into the artificial intelligence model is not needed. The rest information can be coded by a normalization method between 0 and 1 if the information is a number, and can be coded by Word2Vec if the information is a text.
(6) For an organ chip parameter configuration data table (storing a parameter configuration ID which is convenient to be associated with an experiment data table, One line of parameter configuration ID information corresponds to a plurality of lines of experiment data table information with time data, and also storing an organ chip model ID, One or more kinds of medicine preparation information, biological reagent preparation information, adopted stent material preparation information, adopted cell lines and the like), if the number is a number, the normalization method between 0 and 1 can be used for coding, and if the number is a text, Word2Vec or One-hot coding can be used.
(7) For the data sheet of experimental results with time information (storing cell metabolite concentration, number and viability of cells, PH of microenvironment in chip, temperature, oxygen concentration, carbon dioxide concentration, TEER, air pressure, whether drugs are added, release rate of drugs, degradation rate, etc.), the numbers can be encoded using a normalization method between 0 and 1 if they are numbers, and Word2Vec or One-hot encoding if they are texts. If the number of data sets used to train the model is small, it is recommended to convert the data of the numerical classes involved in the experimental results into class-classified data types, e.g., oxygen content < 19.5% is class 1; grade 2 with 19.5% < oxygen content < 24%; oxygen content > 24% is grade 3.
The invention has the advantages that:
(1) the problem of vectorization of the organ chip database is solved;
(2) a plurality of vectorization representation methods are used for providing effective data for the calculation of the artificial intelligent model.
Detailed Description
For the organ chip database, it should contain the data table related to the Drug information (storing the Drug name, molecular formula, two-dimensional and three-dimensional structural formula, Target protein, SMILE format expression, MOL2VEC code, etc.), wherein the Target protein information requires to establish a protein data table and a Target data table to express Drug Target Interaction, DTI related information for short), the data table related to the cell information (storing the cell line name, source, GENE sequence, GENE2VEC code, etc.), the data table related to the biological scaffold material information (storing the molecular formula, structural formula, code expression, etc.), the data table related to the biological agent information (storing the components, ratios, concentrations, chemical formula, structural formula, code, etc.), the data table related to the organ chip model (storing the model ID of the chip, the type of the enumerated variables, the type of the organ chip, the developer, the organization, the article name and the link, The official website introduction link, the description of chip structure and components, the description of working principle, WORD2VEC code, etc.), an organ chip parameter configuration data table (storing parameter configuration ID, which is convenient to be associated with the experiment data table, and one line of parameter configuration ID information corresponding to a plurality of lines of experiment data table information with time data, and also storing organ chip model ID, one or more kinds of medicine preparation information, biological agent preparation information, adopted stent material preparation information, which cell lines are adopted, etc.), an experiment result data table with time information (storing cell metabolite concentration, the number and survival rate of cells, PH, temperature, oxygen concentration, carbon dioxide concentration, TEER, air pressure, whether to add medicine, release speed and degradation speed of medicine, etc.) which may have a certain direct or indirect relation with the experiment result data, the contact can be used for big data learning and pattern recognition through an artificial intelligence method, and further used for predicting an experimental result.
(1) For a data table related to Drug information (storing Drug names, molecular formulas, two-dimensional and three-dimensional structural formulas, Target proteins, SMILE format expressions, MOL2VEC codes and the like, wherein the Target protein information needs to establish a protein data table and a Target data table to express Drug Target Interaction, DTI related information for short), the Drug molecular formula can be converted into fingerprint information by using a Morgan algorithm, because the digit number of the fingerprint information is too long, secondary training conversion can be performed by a certain model, for example, vectors can be output by a BERT algorithm, or the Drug molecular formula can be directly converted into vectors by a MOL2Vec algorithm, and the converted digital character string result can be directly stored in the Drug information table. For vectorization of the amino acid sequence of the target protein, the amino acid sequence can be represented by the PSSM method, and the rest of the information, if a number, can be encoded by a normalization method between 0 and 1, and if a text, can be encoded by One-hot.
(2) For the data table relating to the cell information (storing the cell line name, source, GENE sequence, GENE2VEC code, etc.), the cell GENE sequence can be vectorized by the GENE2VEC method and stored in the cell information data table. The rest information can be coded by a normalization method between 0 and 1 if the information is a number, and can be coded by One-hot if the information is a text.
(3) For the data table (storing molecular formula, structural formula, coding expression, etc.) related to the information of the biological stent material, the molecular formula and the structural formula can be vectorized by using a Mol2Vec method and stored in the data table of the information of the biological stent material. The rest information can be coded by a normalization method between 0 and 1 if the information is a number, and can be coded by Word2Vec or One-hot if the information is a text.
(4) For the data table (storing the components, proportion, concentration, chemical formula, structural formula, code and the like) related to the biological reagent information, the chemical formula can be vectorized by using the Mol2Vec method and stored in the reagent information data table. The rest information can be coded by a normalization method from 0 to 1 if the information is a number, and can be coded by Word2Vec or One-hot if the information is a text.
(5) For the organ chip model related data table (the organ chip type storing the chip model ID, the enumerated variables, the developer, the organization, the article name and link, the official website introduction link, the chip structure and component description, the working principle description, the WORD2VEC code and the like), the field information, a part of the data such as the official website link, the developer, the article name and the like are irrelevant to the prediction of the experimental result, and the vectorization is not needed because the input into the artificial intelligence model is not needed. The rest information can be coded by a normalization method between 0 and 1 if the information is a number, and can be coded by Word2Vec if the information is a text.
(6) For an organ chip parameter configuration data table (storing a parameter configuration ID which is convenient to be associated with an experiment data table, One line of parameter configuration ID information corresponds to a plurality of lines of experiment data table information with time data, and also storing an organ chip model ID, One or more kinds of medicine preparation information, biological reagent preparation information, adopted stent material preparation information, adopted cell lines and the like), if the number is a number, the normalization method between 0 and 1 can be used for coding, and if the number is a text, Word2Vec or One-hot coding can be used.
(7) For the data sheet of experimental results with time information (storing cell metabolite concentration, number and viability of cells, PH of microenvironment in chip, temperature, oxygen concentration, carbon dioxide concentration, TEER, air pressure, whether drugs are added, release rate of drugs, degradation rate, etc.), the numbers can be encoded using a normalization method between 0 and 1 if they are numbers, and Word2Vec or One-hot encoding if they are texts. If the number of data sets used to train the model is small, it is recommended to convert the data of the numerical classes involved in the experimental results into class-classified data types, e.g., oxygen content < 19.5% is class 1; grade 2 with 19.5% < oxygen content < 24%; oxygen content > 24% is grade 3.
The above are merely representative examples of the many specific applications of the present invention, and do not limit the scope of the invention in any way. All the technical solutions formed by the transformation or the equivalent substitution fall within the protection scope of the present invention.
Claims (9)
1. An organ chip database vectorization scheme for artificial intelligence algorithms, characterized by: the vector conversion of the data information in the organ chip database requires the conversion of the actual data (textual and non-textual information, such as names, molecular formulas, compositions, etc. of the biological stent material and the pharmaceutical agent) into digital information that can be understood and calculated by the deep learning model, and the digital information is a coded representation in digital format for the calculation of the artificial intelligence model.
2. The organ chip database vectorization scheme according to claim 1, wherein the data information in claim 1 mainly comprises: a data sheet related to drug information, a data sheet related to cell information, a data sheet related to biological stent material information, a data sheet related to biological reagent information, a data sheet related to organ chip model, an organ chip parameter configuration data sheet and an experimental result data sheet with time information.
3. The organ-chip database vectorization scheme for artificial intelligence algorithms according to claim 2, wherein the drug formula is converted into fingerprint information by morgan algorithm, and the fingerprint information is transformed into vectors by a model with too many digits, such as by BERT algorithm, or directly into vectors by Mol2Vec algorithm, and the result of the transformed digital string is directly stored in the drug information table. For vectorization of the amino acid sequence of the target protein, the amino acid sequence can be represented by the PSSM method, and the rest of the information, if a number, can be encoded by a normalization method between 0 and 1, and if a text, can be encoded by One-hot.
4. The organ chip database vectorization scheme for artificial intelligence algorithms according to claim 2, wherein the cell information related data tables are vectorized by using GENE2VEC method, and the cell GENE sequences are stored in the cell information data tables. The rest information can be coded by a normalization method between 0 and 1 if the information is a number, and can be coded by One-hot if the information is a text.
5. The organ chip database vectorization scheme for artificial intelligence algorithms according to claim 2, wherein the relevant tables of information about the biological stent material are vectorized by Mol2Vec method for molecular formula and structural formula and stored in the data table of the stent material information. The rest information can be coded by a normalization method between 0 and 1 if the information is a number, and can be coded by Word2Vec or One-hot if the information is a text.
6. The organ chip database vectorization scheme for artificial intelligence algorithms according to claim 2, wherein the formulation of the relevant tables for biological reagent information is vectorized by Mol2Vec method and stored in the reagent information tables. The rest information can be coded by a normalization method between 0 and 1 if the information is a number, and can be coded by Word2Vec or One-hot if the information is a text.
7. The organ chip database vectorization scheme according to claim 2, wherein said organ chip model-related data tables are vectorized, and a part of data such as official links, developers, article names, etc. is irrelevant to the prediction of experimental results, so that no vectorization is needed because no input into the artificial intelligence model is needed. The rest information can be coded by a normalization method between 0 and 1 if the information is a number, and can be coded by Word2Vec if the information is a text.
8. The organ chip database vectorization scheme according to claim 2, wherein said organ chip parameter configuration data table vectorization may be encoded using a normalization method between 0 and 1 if it is a number and Word2Vec or One-hot if it is a text.
9. The organ chip database vectorization scheme according to claim 2, wherein the vectorization of the experimental result data table with time information can be coded by a normalization method between 0 and 1 if the number is a number, and can be coded by Word2Vec or One-hot if the text is a text. If the number of data sets used to train the model is small, it is recommended to convert the data of the numerical classes involved in the experimental results into class-classified data types, e.g., oxygen content < 19.5% is class 1; grade 2 with 19.5% < oxygen content < 24%; oxygen content > 24% is grade 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110986435.8A CN114530205A (en) | 2021-08-31 | 2021-08-31 | Organ chip database vectorization scheme for artificial intelligence algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110986435.8A CN114530205A (en) | 2021-08-31 | 2021-08-31 | Organ chip database vectorization scheme for artificial intelligence algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114530205A true CN114530205A (en) | 2022-05-24 |
Family
ID=81618910
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110986435.8A Pending CN114530205A (en) | 2021-08-31 | 2021-08-31 | Organ chip database vectorization scheme for artificial intelligence algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114530205A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109671469A (en) * | 2018-12-11 | 2019-04-23 | 浙江大学 | The method for predicting marriage relation and binding affinity between polypeptide and HLA I type molecule based on Recognition with Recurrent Neural Network |
CN111407396A (en) * | 2019-01-08 | 2020-07-14 | 柯惠有限合伙公司 | Positioning system and method of use |
CN112071373A (en) * | 2020-09-02 | 2020-12-11 | 深圳晶泰科技有限公司 | Drug molecule screening method and system |
CN112435720A (en) * | 2020-12-04 | 2021-03-02 | 上海蠡图信息科技有限公司 | Prediction method based on self-attention mechanism and multi-drug characteristic combination |
US20210062148A1 (en) * | 2019-08-26 | 2021-03-04 | Massachusetts Institute Of Technology | Breast milk derived progenitor cells and cell systems |
CN113175952A (en) * | 2021-04-27 | 2021-07-27 | 东南大学 | Multi-channel signal acquisition control device for organ chip in-situ measurement |
CN113192559A (en) * | 2021-05-08 | 2021-07-30 | 中山大学 | Protein-protein interaction site prediction method based on deep map convolution network |
-
2021
- 2021-08-31 CN CN202110986435.8A patent/CN114530205A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109671469A (en) * | 2018-12-11 | 2019-04-23 | 浙江大学 | The method for predicting marriage relation and binding affinity between polypeptide and HLA I type molecule based on Recognition with Recurrent Neural Network |
CN111407396A (en) * | 2019-01-08 | 2020-07-14 | 柯惠有限合伙公司 | Positioning system and method of use |
US20210062148A1 (en) * | 2019-08-26 | 2021-03-04 | Massachusetts Institute Of Technology | Breast milk derived progenitor cells and cell systems |
CN112071373A (en) * | 2020-09-02 | 2020-12-11 | 深圳晶泰科技有限公司 | Drug molecule screening method and system |
CN112435720A (en) * | 2020-12-04 | 2021-03-02 | 上海蠡图信息科技有限公司 | Prediction method based on self-attention mechanism and multi-drug characteristic combination |
CN113175952A (en) * | 2021-04-27 | 2021-07-27 | 东南大学 | Multi-channel signal acquisition control device for organ chip in-situ measurement |
CN113192559A (en) * | 2021-05-08 | 2021-07-30 | 中山大学 | Protein-protein interaction site prediction method based on deep map convolution network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kanehisa | Post-genome informatics | |
Samra et al. | ANN Model for Predicting Protein Localization Sites in Cells | |
Robinson et al. | Introduction to bio-ontologies | |
He et al. | Mathematics of bioinformatics: theory, methods and applications | |
Matsuda et al. | Machine Beats Experts: Automatic Discovery of Skill Models for Data-Driven Online Course Refinement. | |
CN114530196A (en) | Organ chip drug evaluation method based on deep learning prediction | |
Zhang et al. | From Turing to Transformers: A Comprehensive Review and Tutorial on the Evolution and Applications of Generative Transformer Models | |
CN114530205A (en) | Organ chip database vectorization scheme for artificial intelligence algorithm | |
Ali et al. | An Efficient Heap Based Optimizer Algorithm for Feature Selection | |
Bassingthwaighte et al. | The computational integrated myocyte: a view into the virtual heart | |
Milicic | Talk is not cheap: Kinship terminologies and the origins of language | |
Deichmann | From Gregor Mendel to Eric Davidson: mathematical models and basic principles in biology | |
Lu et al. | Computational Identification and Analysis of Ubiquinone-Binding Proteins | |
Giannakis et al. | Particular biomolecular processes as computing paradigms | |
Shboul et al. | Male and Female Hormone Reading to Predict Pregnancy Percentage Using a Deep Learning Technique: A Real Case Study | |
Shklovskiy-Kordi et al. | The Genetic Language: Natural Algorithms, Developmental Patterns, and Instinctive Behavior | |
Mitaku et al. | System Biology and Protein Structure Prediction by Computer | |
Pandey et al. | StrainIQ: A Novel n-Gram-Based Method for Taxonomic Profiling of Human Microbiota at the Strain Level | |
Chen et al. | Bioinfo-Bench: A Simple Benchmark Framework for LLM Bioinformatics Skills Evaluation | |
Carreño | The Possibility of an Artificial Living Being in the Light of the Philosophy of St. Thomas Aquinas | |
Wooley et al. | Computational Biology Opportunity and Challenges for the Future | |
Joutchkov et al. | Grid-based onto-technologies provide an effective instrument for biomedical research | |
CN114528288A (en) | Design method of multi-type organ chip database | |
TAYAN et al. | NANOSCIENCE VE MATHEMATICS APPLICATIONSS | |
Yue et al. | CT-Agent: Clinical Trial Multi-Agent with Large Language Model-based Reasoning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |