CN113128685B - Natural selection classification and group scale change analysis system based on neural network - Google Patents
Natural selection classification and group scale change analysis system based on neural network Download PDFInfo
- Publication number
- CN113128685B CN113128685B CN202110446165.1A CN202110446165A CN113128685B CN 113128685 B CN113128685 B CN 113128685B CN 202110446165 A CN202110446165 A CN 202110446165A CN 113128685 B CN113128685 B CN 113128685B
- Authority
- CN
- China
- Prior art keywords
- data
- classification
- population
- model
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Abstract
The invention discloses a natural selection classification and group scale change analysis system based on a neural network, which comprises the following steps: an input module for obtaining genome sequence data; the data processing module is used for processing the genome sequence data acquired by the input module and outputting summary statistics of the genome sequence data; the classification fitting module is used for constructing a population genetic classification and parameter fitting model by combining a recurrent neural network with a convolutional neural network, and naturally selecting and classifying the population and fitting population change by using data output by the data processing module; the input module is connected with the data processing module, and the data processing module is connected with the classification fitting module. The invention can simultaneously analyze the scale change of the population and the natural selection classification, thereby eliminating the influence of the population scale change on the classification judgment of the natural selection, and automatically extracting and analyzing various summary statistics of the population through the neural network, thereby obtaining a result with high accuracy and reliability.
Description
Technical Field
The invention relates to the field of biological population genomes, in particular to a natural selection classification and population scale change analysis system based on a neural network.
Background
The group genetics is the life science for researching the genetic characteristics and the genetic rule of biological groups. In agricultural production, the method has great economic value for pest management, seed selection and breeding; in medical treatment, the medicine has great contribution to the infection law of diseases; has great scientific significance for biodiversity protection and research.
At present, some natural selection classification systems appear successively at home and abroad, but the influence of population scale change on natural selection judgment is not considered, and the population scale change may leave signals similar to natural selection on a population genome so as to influence the judgment of natural selection classification.
Disclosure of Invention
The invention aims to provide a natural selection classification and population scale change analysis system based on a neural network, so as to overcome the defects in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a neural network based natural choice classification and population scale variation analysis system, comprising:
an input module for obtaining genome sequence data;
the data processing module is used for processing the genome sequence data acquired by the input module and outputting summary statistics of the genome sequence data;
the classification fitting module is used for constructing a population genetic classification and parameter fitting model by combining a recurrent neural network with a convolutional neural network, and naturally selecting and classifying the population and fitting population change by using data output by the data processing module;
the input module is connected with the data processing module, and the data processing module is connected with the classification fitting module.
Further, the data processing module comprises,
the data preprocessing unit is used for dividing and cleaning the genome sequence data;
the data summarizing statistic generation unit is used for dividing each piece of output data of the data preprocessing unit into a set number of windows and calculating the population genetics summarizing statistic of each window;
the data preprocessing unit is connected with the data summarizing statistic generation unit.
Further, the number of the windows is 3.
Further, the statistics of genetics summary of each window population include the number of sites, site fold spectra, length distribution among sites, length distribution of state identification regions, linkage disequilibrium distribution, and Tajima's D statistics.
Further, the data preprocessing unit includes,
the data slicing divider is used for dividing the genome sequence into a plurality of segments with equal size;
a data position calculator for calculating the relative position of a site in a gene fragment at the fragment;
a data converter for converting the divided genome fragment data into binary data;
the data cleaner is used for deleting data with length less than a first set length and data with length greater than a second set length, combining the data of the repeated sites and carrying out OR operation on the data of the repeated sites to obtain a result, wherein the second set length is greater than the first set length;
the data slice divider, the data position calculator, the data converter and the data cleaner are sequentially connected.
Further, 0 in the binary data represents an ancestor gene and 1 represents a variant gene.
Further, the class fitting module includes,
the model building unit is used for building a natural selection classification and population change parameter fitting model by adopting a gate control cycle unit in a cyclic neural network (RNN) and combining a convolutional neural network;
the model prediction unit is used for inputting the group genetic summary statistic sequence obtained by the data processing module into the natural selection classification and group change parameter fitting model constructed by the model construction unit, training the model by using training set data, and reading the test set into the trained model to perform natural selection classification and group change parameter fitting;
the model building unit is connected with the model prediction unit.
Further, the model construction unit constructs a natural selection classification and group change parameter fitting model according to the following sequence, firstly calls an Input layer, a BilSTM layer, a CNN layer and a Dropout layer, and constructs a natural selection classification and group change parameter fitting model; the BiLSTM layer and the CNN layer are used for vector characterization learning, and the Dropout layer is used for preventing overfitting of the model; then adjusting the weight according to the correlation of each characteristic and the population genetic variable; and finally, multiplying the weight and the characteristic value vector, and summing and outputting.
Further, the calculation process of the gating cycle unit is,
f t =σ(W f ·[h t-1 ,x t ]+b f )
i t =σ(W i ·[h t-1 ,x t ]+b i )
o t =σ(W o ·[h t-1 ,x t ]+b o )
h t =o t *tanh(C t );
wherein, f t Is forgetting gate, i t Is a memory door, and the memory door is provided with a memory,is in a temporary state, C t Is the current time state, o t Is an input gate, h t Is a hidden state, σ is an activation function, W f 、W i 、W C 、W o Are different weight matrices, b f 、b i 、b C 、b o Are different offsets, h t-1 Is a hidden state of the previous layer, x t Is the current input, '-' stands for point product, and tanh is the tangent function. (ii) a
V=conv(W,X)+b
Further, the natural choice prediction classification process in the model prediction unit proceeds in the following order: firstly, reading a group genetic summary statistic sequence output by a data processing module, and dividing the read data into a training set and a test set according to a set proportion; then, coding the dispersed type data by adopting a single-hot coding mode to obtain vector representation of a group genetic summary statistical sequence; secondly, inputting the training set data converted into vector representation into a model for model training; and finally, reading in test set data by using the trained model, and performing natural selection classification and population scale change parameter fitting.
Compared with the prior art, the invention has the advantages that: the invention can simultaneously analyze the scale change of the population and the natural selection classification, thereby eliminating the influence of the population scale change on the natural selection classification judgment, and automatically extracting and analyzing various summary statistics of the population through the neural network, thereby obtaining a beneficial result with high accuracy and reliability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a block diagram of a neural network-based natural choice classification and population size variation analysis system of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, and the scope of the present invention will be more clearly and clearly defined.
Referring to fig. 1, the present invention discloses a natural selection classification and population scale change analysis system based on neural network, comprising: an input module for obtaining genome sequence data; the data processing module is used for processing the genome sequence data acquired by the input module and outputting summary statistics of the genome sequence data; the classification fitting module is used for constructing a population genetic classification and parameter fitting model by combining a cyclic neural network with a convolutional neural network, and performing natural selection classification and fitting population change on the population by using data output by the data processing module; the input module is connected with the data processing module, and the data processing module is connected with the classification fitting module.
In this embodiment, the data processing module includes: the data preprocessing unit is used for dividing and cleaning the genome sequence data; the data summarizing statistic generation unit is used for dividing each piece of output data of the data preprocessing unit into a set number of windows (3 in the embodiment) and calculating the population genetics summarizing statistic of each window; the data preprocessing unit is connected with the data summarizing statistic generation unit.
Preferably, the genetic summary statistics of each window population include site number, site folding spectrum, site-to-site length distribution, state identification region length distribution, linkage disequilibrium distribution, and Tajima's D statistics.
In this embodiment, the data preprocessing unit includes: the data slicing divider is used for dividing the genome sequence into a plurality of segments with equal size; a data position calculator for calculating the relative position of a site in a gene fragment in the fragment; a data converter for converting the divided genome fragment data into 0,1 binary data (0 of the binary data represents an ancestor gene, and 1 represents a variant gene); the data cleaner is used for deleting data which is smaller than a first set length and larger than a second set length, combining the data of the repeated sites and obtaining a result by OR operation of the data of the repeated sites, wherein the second set length is larger than the first set length; the data slicing divider, the data position calculator, the data converter and the data cleaner are connected in sequence.
In this embodiment, the classification fitting module includes: the model building unit is used for building a natural selection classification and population change parameter fitting model by adopting a gate control circulation unit in a Recurrent Neural Network (RNN) and combining a convolutional neural network; the model prediction unit is used for inputting the group genetic summary statistic sequence obtained by the data processing module into the natural selection classification and group change parameter fitting model constructed by the model construction unit, training the model by using training set data, and reading the test set into the trained model to perform natural selection classification and group change parameter fitting; the model building unit is connected with the model prediction unit.
The model construction unit constructs a natural selection classification and group change parameter fitting model according to the following sequence, firstly, an Input layer, a BilSTM layer, a CNN layer and a Dropout layer are called, and the natural selection classification and group change parameter fitting model is constructed; the BiLSTM layer and the CNN layer are used for vector characterization learning, and the Dropout layer is used for preventing overfitting of the model; then adjusting the weight according to the correlation of each characteristic and the genetic variable of the population; and finally, multiplying the weight and the characteristic value vector, and summing and outputting.
The calculation process of the gating cycle unit is that,
f t =σ(W f ·[h t-1 ,x t ]+b f )
i t =σ(W i ·[h t-1 ,x t ]+b i )
o t =σ(W o ·[h t-1 ,x t ]+b o )
h t =o t *tanh(C t );
wherein f is t Is forgetting gate, i t Is a memory door, and the memory door is provided with a memory,is in a temporary state, C t Is the current time state, o t Is an input gate, h t Is a hidden state, σ is an activation function, W f 、W i 、W C 、W o Are different weight matrices, b f 、b i 、b C 、b o Are different offsets, h t-1 Is a hidden state of the previous layer, x t Is the current input, '·' stands for point product, and tanh is a tangent function. (ii) a
V=conv(W,X)+b
In this embodiment, the natural choice prediction classification process in the model prediction unit is performed in the following order: firstly, reading in a group genetic summary statistic sequence output by a data processing module, and dividing the read data into a training set and a test set according to a set proportion; then, coding the dispersed type data by adopting a single-hot coding mode to obtain vector representation of a group genetic summary statistical sequence; secondly, inputting training set data converted into vector representation into a model for model training; and finally, reading in test set data by using the trained model, and performing natural selection classification and population scale change parameter fitting.
The invention provides a natural selection classification and group scale change analysis system based on a neural network, which utilizes the characteristic that the neural network can have multiple outputs, simultaneously outputs a group scale change parameter and a natural selection classification result, and can simultaneously analyze the group scale change and the natural selection classification, thereby eliminating the influence of the group scale change on the natural selection classification judgment, and automatically extracting and analyzing various summary statistics of the group through the neural network, thereby obtaining a beneficial result with high accuracy and reliability.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, various changes or modifications may be made by the patentees within the scope of the appended claims, and within the scope of the invention, as long as they do not exceed the scope of the invention described in the claims.
Claims (6)
1. A natural selection classification and population scale change analysis system based on a neural network is characterized in that: comprises the steps of (a) preparing a substrate,
an input module for obtaining genome sequence data;
the data processing module is used for processing the genome sequence data acquired by the input module and outputting summary statistics of the genome sequence data;
the classification fitting module is used for constructing a population genetic classification and parameter fitting model by combining a cyclic neural network with a convolutional neural network, and performing natural selection classification and fitting population change on the population by using data output by the data processing module;
the input module is connected with the data processing module, and the data processing module is connected with the classification fitting module;
the data processing module comprises a data processing module and a data processing module,
the data preprocessing unit is used for dividing and cleaning the genome sequence data;
the data summarizing statistic generation unit is used for dividing each piece of output data of the data preprocessing unit into a set number of windows and calculating the population genetics summarizing statistic of each window;
the data preprocessing unit is connected with the data summarizing statistic generation unit;
the data pre-processing unit comprises a data pre-processing unit,
the data slicing divider is used for dividing the genome sequence into a plurality of segments with equal size;
a data position calculator for calculating the relative position of a site in a gene fragment at the fragment;
a data converter for converting the divided genome fragment data into binary data;
the data cleaner is used for deleting data with length less than a first set length and data with length greater than a second set length, combining the data of the repeated sites and carrying out OR operation on the data of the repeated sites to obtain a result, wherein the second set length is greater than the first set length;
the data slice divider, the data position calculator, the data converter and the data cleaner are sequentially connected;
the class-fitting module includes a class-fitting module,
the model building unit is used for building a natural selection classification and population change parameter fitting model by adopting a gate control cycle unit in a cyclic neural network (RNN) and combining a convolutional neural network;
the model prediction unit is used for inputting the group genetic summary statistic sequence obtained by the data processing module into the natural selection classification and group change parameter fitting model constructed by the model construction unit, training the model by using training set data, and reading the test set into the trained model to perform natural selection classification and group change parameter fitting;
the model building unit is connected with the model prediction unit;
the model construction unit constructs a natural selection classification and group change parameter fitting model according to the following sequence, firstly calls an Input layer, a BilSTM layer, a CNN layer and a Dropout layer, and constructs the natural selection classification and group change parameter fitting model; the BiLSTM layer and the CNN layer are used for vector characterization learning, and the Dropout layer is used for preventing overfitting of the model; then adjusting the weight according to the correlation of each characteristic and the population genetic variable; and finally, multiplying the weight and the characteristic value vector, and summing and outputting.
2. The neural network-based natural choice classification and population-scale variation analysis system of claim 1, wherein: the number of the windows is 3.
3. The neural network-based natural choice classification and population-scale variation analysis system of claim 1, wherein: the genetics summary statistics of each window group comprise the number of sites, site folding frequency spectrums, the length distribution among the sites, the length distribution of a state identification region, linkage disequilibrium distribution and Tajima's D statistics.
4. The neural network-based natural choice classification and population size variation analysis system of claim 1, wherein: in the binary data, 0 represents an ancestor gene and 1 represents a variant gene.
5. The neural network-based natural choice classification and population-scale variation analysis system of claim 1, wherein: the calculation process of the gating cycle unit is that,
f t =σ(W f ·[h t-1 ,x t ]+b f )
i t =σ(W i ·[h t-1 ,x t ]+b i )
o t =σ(W o ·[h t-1 ,x t ]+b o )
h t =o t *tanh(C t );
wherein f is t Is a forgetting gate, i t Is a memory door, and the memory door is provided with a memory,is in a temporary state, C t Is the current time state, o t Is an input gate, h t Is a hidden state, σ is an activation function, W f 、W i 、W C 、W o Are different weight matrices, b f 、b i 、b C 、b o Are different offsets, h t-1 Is a hidden state of the previous layer, x t Is the current input, '·' stands for point multiplication, tanh is a tangent function;
V=conv(W,X)+b
6. The neural network-based natural choice classification and population-scale variation analysis system of claim 1, wherein: the natural selection prediction classification process in the model prediction unit is carried out according to the following sequence: firstly, reading a group genetic summary statistic sequence output by a data processing module, and dividing the read data into a training set and a test set according to a set proportion; then, coding the discrete type data by adopting a single-hot coding mode to obtain vector representation of a group genetic summary statistical sequence; secondly, inputting training set data converted into vector representation into a model for model training; and finally, reading in test set data by using the trained model, and performing natural selection classification and population scale change parameter fitting.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110446165.1A CN113128685B (en) | 2021-04-25 | 2021-04-25 | Natural selection classification and group scale change analysis system based on neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110446165.1A CN113128685B (en) | 2021-04-25 | 2021-04-25 | Natural selection classification and group scale change analysis system based on neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113128685A CN113128685A (en) | 2021-07-16 |
CN113128685B true CN113128685B (en) | 2023-04-07 |
Family
ID=76779703
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110446165.1A Active CN113128685B (en) | 2021-04-25 | 2021-04-25 | Natural selection classification and group scale change analysis system based on neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113128685B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114512185B (en) * | 2022-01-13 | 2024-04-05 | 湖南大学 | Donkey population natural selection classification system for variable data dimension reduction input |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110832510A (en) * | 2018-01-15 | 2020-02-21 | 因美纳有限公司 | Variant classifier based on deep learning |
US10657447B1 (en) * | 2018-11-29 | 2020-05-19 | SparkCognition, Inc. | Automated model building search space reduction |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622535A (en) * | 2012-02-27 | 2012-08-01 | 上海电机学院 | Processing method and processing device based on multiple sequence alignment genetic algorithm |
CN107025386B (en) * | 2017-03-22 | 2020-07-17 | 杭州电子科技大学 | Method for performing gene association analysis based on deep learning algorithm |
NZ759818A (en) * | 2017-10-16 | 2022-04-29 | Illumina Inc | Semi-supervised learning for training an ensemble of deep convolutional neural networks |
CA3085897C (en) * | 2017-12-13 | 2023-03-14 | Cognizant Technology Solutions U.S. Corporation | Evolutionary architectures for evolution of deep neural networks |
CN110111848B (en) * | 2019-05-08 | 2023-04-07 | 南京鼓楼医院 | Human body cycle expression gene identification method based on RNN-CNN neural network fusion algorithm |
EP4018450A1 (en) * | 2019-08-22 | 2022-06-29 | Inari Agriculture Technology, Inc. | Methods and systems for assessing genetic variants |
US20230030326A1 (en) * | 2019-10-10 | 2023-02-02 | Pioneer Hi-Bred International, Inc. | Synchronized breeding and agronomic methods to improve crop plants |
-
2021
- 2021-04-25 CN CN202110446165.1A patent/CN113128685B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110832510A (en) * | 2018-01-15 | 2020-02-21 | 因美纳有限公司 | Variant classifier based on deep learning |
US10657447B1 (en) * | 2018-11-29 | 2020-05-19 | SparkCognition, Inc. | Automated model building search space reduction |
Non-Patent Citations (2)
Title |
---|
Distinguishing positive selection from neutral evolution:boosting the performance of summary statistics;Lin K et al.;《Genetics》;229-244 * |
群体基因组学方法:从经典统计学到有监督学习;施怪等;《中国科学:生命科学》;445-455 * |
Also Published As
Publication number | Publication date |
---|---|
CN113128685A (en) | 2021-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112699960B (en) | Semi-supervised classification method, equipment and storage medium based on deep learning | |
CN109002917A (en) | Total output of grain multidimensional time-series prediction technique based on LSTM neural network | |
CN110175168B (en) | Time sequence data filling method and system based on generation of countermeasure network | |
CN108846261B (en) | Gene expression time sequence data classification method based on visual graph algorithm | |
CN110579186B (en) | Crop growth monitoring method based on inversion of leaf area index by inverse Gaussian process | |
CN106055922A (en) | Hybrid network gene screening method based on gene expression data | |
CN113128685B (en) | Natural selection classification and group scale change analysis system based on neural network | |
CN115689008A (en) | CNN-BilSTM short-term photovoltaic power prediction method and system based on ensemble empirical mode decomposition | |
Koval | Data preparation for neural network data analysis | |
Oriol Sabat et al. | SALAI-Net: species-agnostic local ancestry inference network | |
CN116579447A (en) | Time sequence prediction method based on decomposition mechanism and attention mechanism | |
Zhu et al. | Genomic prediction of growth traits in scallops using convolutional neural networks | |
Zhang et al. | SLRRSC: single-cell type recognition method based on similarity and graph regularization constraints | |
Mantes et al. | Neural admixture: rapid population clustering with autoencoders | |
CN109886721A (en) | A kind of pork price forecasting system algorithm | |
CN113505651A (en) | Mosquito identification method based on convolutional neural network | |
CN110070070B (en) | Action recognition method | |
CN109671468B (en) | Characteristic gene selection and cancer classification method | |
CN114764575B (en) | Multi-modal data classification method based on deep learning and time sequence attention mechanism | |
CN114512185B (en) | Donkey population natural selection classification system for variable data dimension reduction input | |
CN114611804A (en) | Maize yield prediction method based on TSO-GRNN combined model | |
CN115130509A (en) | Electrocardiosignal generation method based on conditional variational self-encoder | |
CN114062305A (en) | Single grain variety identification method and system based on near infrared spectrum and 1D-In-Resnet network | |
CN114255865A (en) | Diagnosis and treatment project prediction method based on recurrent neural network | |
Chen et al. | Functional response regression analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |