CN116612816B

CN116612816B - Whole genome nucleosome density prediction method, whole genome nucleosome density prediction system and electronic equipment

Info

Publication number: CN116612816B
Application number: CN202310415049.2A
Authority: CN
Inventors: 吴庭芳; 周昳婷
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2024-06-21
Anticipated expiration: 2043-04-18
Also published as: CN116612816A

Abstract

The invention relates to a whole genome nucleosome density prediction method, a whole genome nucleosome density prediction system and electronic equipment, wherein the whole genome nucleosome density prediction method comprises the following steps: acquiring DNA sequences of whole genome chromosomes, and respectively performing first coding and second coding to obtain a first coding sequence and a second coding sequence; constructing and training DeepNDP models to obtain trained DeepNDP models; inputting the first coding sequence and the second coding sequence into a trained DeepNDP model to obtain a whole genome nucleosome density result, wherein the DeepNDP model comprises a feature extraction network, a Concatenate layer, a transducer layer, a Flatten layer and two full connection layers which are sequentially connected. According to the invention, the DNA sequence is encoded into two forms, so that the model generalization capability is replaced, and the invention can more efficiently and accurately identify the distribution of nucleosomes without carrying out a biological experiment with high cost.

Description

Whole genome nucleosome density prediction method, whole genome nucleosome density prediction system and electronic equipment

Technical Field

The invention relates to the technical field of bioinformatics, in particular to a whole genome nucleosome density prediction method, a whole genome nucleosome density prediction system and electronic equipment.

Background

Nucleosome density prediction refers to the use of computational methods to predict the nucleosome signal intensity at each base site, resulting in a continuous nucleosome density across the genome. Nucleosomes are key participants in genetic processes as the basic units of chromatin, whose precise locations can regulate genomic accessibility to DNA binding proteins, thereby effecting regulation of gene expression, DNA replication and repair. Thus, identifying the location of nucleosomes on the genome may help one to study various biological processes in depth.

In past studies, many DNA sequence-based calculation methods have been proposed to determine nucleosome position in DNA sequences, for example:

(1) iNuc-PseKNC: a method for locating nucleosomes. A DNA sequence with the length of 147bp is input, a characteristic vector consisting of pseudo k-tuple nucleotides with 6 local DNA structural characteristics is extracted, and then the characteristics are input into an SVM classifier to predict whether the sequence is a nucleosome sequence.

(2) DLNN: a method for locating nucleosomes. Inputting a DNA sequence with the length of 147bp, encoding into ont-hot form, modeling and analyzing the sequence by using a convolution network and a circulation network, and predicting whether the sequence is a nucleosome sequence.

(3) Routhier et al: a method for predicting the density of nucleosomes. DNA sequences on the whole chromosome were obtained in the form of sliding windows, and the nucleosome density at the central site of the input sequence was predicted using three sequentially stacked convolution layers.

In the prior art, the nucleosome positioning method can only capture the context information within 147bp, cannot learn the long-range interaction relation between bases, and cannot rapidly predict and analyze the whole chromosome sequence.

And Routhier et al propose that the recognition accuracy of the deep learning-based nuclear corpuscle density prediction method is low, and the prediction performance still has room for improvement.

Disclosure of Invention

Therefore, the invention aims to solve the technical problem that the identification precision of the nuclear corpuscle density prediction method in the prior art is low.

In order to solve the technical problems, the invention provides a whole genome nucleosome density prediction method, which comprises the following steps:

Step S1: acquiring DNA sequences of whole genome chromosomes, and respectively performing first coding and second coding to obtain a first coding sequence and a second coding sequence;

Simultaneously constructing and training DeepNDP models to obtain a trained DeepNDP model;

Step S2: inputting the first coding sequence and the second coding sequence into a trained DeepNDP model for prediction to obtain a whole genome nucleosome density result;

the DeepNDP model comprises a feature extraction network, a Concatenate layer, a transducer layer, a flame layer and two fully connected layers which are sequentially connected;

the feature extraction network is used for extracting a first local feature of the first coding sequence and extracting a second local feature of the second coding sequence; the Concatenate layers are used for splicing the first local features and the second local features to obtain spliced features; the transducer layer is used for extracting global features of the splicing features; the flat layer is used for changing the dimension of the output of the transducer layer; the holo-junction layer is used to predict whole genome nucleosome density.

In one embodiment of the present invention, the feature extraction network in the step S2 includes a feature extraction module ResNet and a feature extraction module CNNNet, where the feature extraction module ResNet is configured to extract a first local feature of a first coding sequence and the feature extraction module CNNNet is configured to extract a second local feature of a second coding sequence.

In one embodiment of the present invention, the feature extraction module ResNet includes a first CNN layer, three ResBlock layers, a second CNN layer, a third CNN layer, and a first Reshape layer, which are sequentially connected, where the first Reshape layer is used to change a dimension of an output of the third CNN layer.

In one embodiment of the present invention, the ResBlock layers include a first column of CNN cells and a second column of CNN cells;

The first-column CNN unit comprises a fourth CNN layer, a fifth CNN layer and a sixth CNN layer which are sequentially connected, wherein convolution kernels adopted by the fourth CNN layer, the fifth CNN layer and the sixth CNN layer are sequentially 5, 16 and 16;

The second-column CNN unit comprises a seventh CNN layer, an eighth CNN layer and a ninth CNN layer which are sequentially connected, wherein the convolution kernel adopted by the seventh CNN layer, the eighth CNN layer and the ninth CNN layer is sequentially 3, 8 and 8;

And adding the output of the sixth CNN layer, the output of the ninth CNN layer and the input of the current ResBlock layer.

In one embodiment of the invention, all CNN layers in the feature extraction module ResNet are followed by a ReLU activation function.

In one embodiment of the present invention, the feature extraction module CNNNet includes a tenth CNN layer, an eleventh CNN layer, a twelfth CNN layer, and a second Reshape layer connected in sequence, where the second Reshape layer is used to change a dimension of an output of the twelfth CNN layer.

In one embodiment of the present invention, the method for obtaining the DNA sequence of the whole genome chromosome in step S1 and performing the first encoding and the second encoding respectively to obtain the first encoding sequence and the second encoding sequence includes:

obtaining a DNA sequence of a whole genome chromosome;

And carrying out One-hot coding on the DNA sequence of the whole genome chromosome to obtain an One-hot coding sequence, and simultaneously carrying out nucleotide coding on the DNA sequence of the whole genome chromosome to obtain a nucleotide coding sequence, wherein the One-hot coding sequence is a first coding sequence, and the nucleotide coding sequence is a second coding sequence.

In order to solve the technical problems, the invention provides a whole genome nucleosome density prediction system, which comprises:

encoding and construction module: the method comprises the steps of obtaining a DNA sequence of a whole genome chromosome, and respectively performing first coding and second coding to obtain a first coding sequence and a second coding sequence;

Meanwhile, the method is used for constructing and training DeepNDP models to obtain trained DeepNDP models;

and a prediction module: the method comprises the steps of inputting a first coding sequence and a second coding sequence into a trained DeepNDP model for prediction to obtain a whole genome nucleosome density result;

In order to solve the technical problems, the invention provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the whole genome nucleosome density prediction method when executing the computer program.

To solve the above technical problem, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the whole genome nucleosome density prediction method as described above.

Compared with the prior art, the technical scheme of the invention has the following advantages:

According to the invention, the DNA sequence is encoded into two forms, so that the constructed deep learning model can learn more information from the DNA sequence, and the method can more efficiently and accurately identify the distribution of nucleosomes of the whole genome without time-consuming and labor-consuming biological experiments with high cost;

The DeepNDP model provided by the invention can be used among different species, has strong generalization capability, and omits the complexity of a plurality of models of a plurality of species;

the DeepNDP model of the invention can be used for detecting the distribution of nucleosomes in biological research, thereby helping researchers to deeply study various biological processes such as gene expression, DNA replication, repair and the like.

Drawings

In order that the invention may be more readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings.

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of a molding structure of the invention DeepNDP;

FIG. 3 is a diagram showing the comparison of DeepNDP model of Saccharomyces cerevisiae with chemical process in the examples of the present invention;

FIG. 4 is a diagram showing a comparison of DeepNDP model of Saccharomyces cerevisiae with a conventional model in an embodiment of the present invention;

FIG. 5 is a diagram showing the effect of DeepNDP models with the NCP code removed as input in an embodiment of the present invention;

FIG. 6 is a graph showing the comparison of the performance of DeepNDP model and chemical method using mice as an example in the present invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the invention and practice it.

Example 1

Referring to FIG. 1, the whole genome nucleosome density prediction method of the present invention comprises:

Step S1: obtaining DNA sequences of whole genome chromosomes and respectively performing first coding (One-hot coding) and second coding (NCP coding) to obtain a first coding sequence and a second coding sequence;

The DeepNDP model comprises a feature extraction network, a Concatenate layer, a transducer layer, a flame layer and two fully connected layers (namely a Dense layer) which are connected in sequence;

The feature extraction network is used for extracting a first local feature of the first coding sequence and extracting a second local feature of the second coding sequence; the Concatenate layers are used for splicing the first local features and the second local features to obtain spliced features; the transducer layer is used for extracting global features of the splicing features; the flat layer is used for changing the dimension of the output of the transducer layer (namely, leveling the output of the transducer layer); the holo-junction layer (i.e., the Dense layer) is used to predict whole genome nucleosome density.

The feature extraction network includes a feature extraction module ResNet and a feature extraction module CNNNet, the feature extraction module ResNet being configured to extract a first local feature of a first coding sequence and the feature extraction module CNNNet being configured to extract a second local feature of a second coding sequence.

According to the invention, the DNA sequence is encoded into two forms, so that the constructed deep learning model can learn more information from the DNA sequence, and the method can more efficiently and accurately identify the distribution of nucleosomes of the whole genome without time-consuming and labor-consuming biological experiments with high cost; the DeepNDP model of the invention can be used for detecting the distribution of nucleosomes in biological research, thereby helping researchers to deeply study various biological processes such as gene expression, DNA replication, repair and the like.

The present invention is described in detail below:

In the step S1, dividing the DNA sequences in the data set into a training set, a verification set and a test set according to chromosome numbers; specifically, taking Saccharomyces cerevisiae as an example, the genome of Saccharomyces cerevisiae comprises 16 chromosomes, the 1 st to 13 th chromosomes are used as training sets, the 14 th and 15 th chromosomes are used as verification sets, and the 16 th chromosome is used as a test set;

The single thermal coding (One-hot coding) is to make A, T, C and G four bases in DNA sequence and unknown site N, respectively (1, 0), (0, 1, 0), (0, 1, 0) binary vector representations of (0, 1) and (0, 0);

Nucleotide chemical property coding (NCP coding) is to express a DNA sequence of A, C, G, T and unknown site N as (1, 1), (0, 1, 0), (1, 0, 1) and (0, 0) according to three chemical properties of cyclic structure, chemical function and hydrogen bond of a base, respectively.

In step S1, as shown in a in fig. 2 (left part of fig. 2), the DeepNDP model contains two input ports (One-hot encoded input port, NCP input port), two different feature extraction modules ResNet and CNNNet, transformer layers, a flat layer, two fully connected layers (Dense layer);

Further, the structure of the feature extraction module ResNet in this embodiment is shown as B (middle part of fig. 2) in fig. 2, and is used for extracting local features in data, and includes a first CNN layer, three ResBlock (i.e. residual module) layers, a second CNN layer, a third CNN layer, and a first Reshape layer, which are sequentially connected, where the first Reshape layer is used for changing the dimension of the output of the third CNN layer; the ResBlock layer comprises a first column CNN unit and a second column CNN unit, wherein the first column CNN unit is used for extracting abstract features, and the second column CNN unit is used for extracting detail features; the first-column CNN unit comprises a fourth CNN layer, a fifth CNN layer and a sixth CNN layer which are sequentially connected, wherein convolution kernels adopted by the fourth CNN layer, the fifth CNN layer and the sixth CNN layer are sequentially 5, 16 and 16; the second-column CNN unit comprises a seventh CNN layer, an eighth CNN layer and a ninth CNN layer which are sequentially connected, wherein the convolution kernel adopted by the seventh CNN layer, the eighth CNN layer and the ninth CNN layer is sequentially 3, 8 and 8; the output (X) of the sixth CNN layer, the output (X) of the ninth CNN layer, and the input (x_shortcut) of the current ResBlock layer perform an ADD (ADD) operation as an output result of the current ResBlock layer. The feature extraction module CNNNet of the present embodiment is shown by C in fig. 2 (right part of fig. 2), which is also used to extract local features in data, and is composed of three sequentially stacked convolutional layers (i.e., tenth CNN layer, eleventh CNN layer, twelfth CNN layer) and second Reshape layer. The transducer layer is a self-attention mechanism-based architecture, integrates residual design and multi-head attention mechanism, and is used for extracting global features of data, and specifically comprises two parts: self-attention sublayer and feed-forward neural network sublayer. The self-attention sub-layer is used for calculating the correlation between the expression vector of each position in the input sequence and other positions, so as to capture the long-distance dependency relationship in the sequence; the function of the feedforward neural network sub-layer is to perform nonlinear transformation on the output of the self-attention sub-layer, increase the expression capacity of the model, and have a residual connection and a layer normalization operation behind each sub-layer to improve the stability and convergence speed of the model. The full connection layer (i.e., the Dense layer) is used to predict the output result.

It should be noted that ResNet of this embodiment fuses multi-scale convolution and residual networks, and convolution layers with different convolution kernel sizes can extract features on different scales, as shown in ResBlock layer B in fig. 2, the number of convolution kernels of CNN is set to 16, so as to ensure that three features can be added subsequently. Channels with the convolution kernel sizes of CNNs set to 5, 16, and 16 in the first column of CNN cells can extract more abstract features in the sequence matrix, while channels with the convolution kernel sizes of CNNs set to 3, 8, and 8 in the second column of CNN cells can extract more detailed features in the sequence matrix. The Add function is then used to Add the X output from the two columns of CNN cells to the x_shortcut input from the current ResBlock layers. The design can extract the characteristics of the sequence matrix on different scales, and the neural network can learn the identity mapping more easily, so that the information loss and gradient attenuation in the deep network are avoided.

It should be noted that, in this embodiment ResNet, a ReLU activation function is connected after each convolution layer (CNN layer), which is used to remove the negative value in the convolution result, keep the positive value unchanged, improve the gradient vanishing problem, accelerate the convergence speed of gradient descent, and improve the calculation efficiency.

Further, the convolution layer (i.e., tenth CNN layer, eleventh CNN layer, twelfth CNN layer) parameters of the feature extraction module CNNNet of the present embodiment are set as follows: the number of the convolution kernels is 64, the size of the convolution kernels is 3, the number of the convolution kernels is 16, the size of the convolution kernels is 8, the number of the convolution kernels is 8, and the size of the convolution kernels is 80.

The two full connection layer parameters of this embodiment are set as: the output sizes were 256 and 1.

The two inputs input1 and input2 of DeepNDP model are DNA sequences represented by two coding modes respectively; in the step S2, two inputs are respectively input ResNet and CNNNet to extract local features, and then are horizontally spliced together through Concatenate layers; the transducer layer integrates the spliced local features and learns global features; the full ligation layer prediction was then used to output a number between 0 and 1 representing the nucleosome density at the central site of the input DNA sequence.

Further, in the embodiment, the training set and the verification set are used for training DeepNDP models, and the test set is used for verifying DeepNDP model performance; a sliding window with a window size of 2001bp and a step length of 1bp is used on the DNA sequence, and the 2001bp DNA sequence is used as a model input sequence, so that the DNA sequence of the whole chromosome is read; calculating the difference degree between the prediction and the actual data by using the loss function, and carrying out gradient update so as to update the parameters of the DeepNDP model;

setting the random discarding rate to be 0.2 in training; the loss function is set as:

Wherein, Is the model predictor, y is the true value, MAE is the mean absolute error between the two, and corr is the Pearson correlation coefficient between the two.

Further, the present embodiment encodes the test set to obtain a feature sequence of two forms, one-hot encoding and nucleotide chemistry encoding; inputting the two feature sequences into the local feature features extracted in the two feature extraction modules, and then horizontally splicing; the transducer layer integrates the spliced local features and learns the global features of the sequence; then, extracting a nonlinear relation of global features by using two full-connection layers, and outputting a prediction result;

wherein the output is mapped to a final predicted density, representing the nucleosome density of the central site of the input sequence, using softmax as an activation function in the fully connected layer.

Comparing the results of the density of nucleosomes obtained by the method prediction of this example with those obtained by the biological experimental method, the results are shown in fig. 3 as A, B, C: in fig. 3, a is a scatter diagram of a DeepNDP model prediction result and a biological experiment result, which shows quantitative comparison of the predicted nucleosome density and the biological experiment result, an X axis is a calculation experiment result, a Y axis is a biological experiment result, when a black area in the diagram is closer to a y=x direction, a stronger positive correlation is shown between two signals, otherwise, a negative correlation is shown, when the black area is deeper, data is shown to be more concentrated, and the predicted value distribution of the DeepNDP model can be found to be consistent with the true value of the biological experiment in the diagram; in fig. 3, B and C show the predicted variation of the nucleosome density and the biological sample nucleosome density along the DNA sequence, respectively, and it can be seen from the graph that the predicted high and low partitions of the nucleosome density are consistent with the biological test results, which means that DeepNDP model can accurately identify the dense nucleosome region and the depletion region on the DNA sequence.

Comparing the predicted results of the method of this example with those of the existing method on the same dataset, as shown in FIG. 4, the pearson correlation coefficient results obtained on the sixteenth chromosome of the Saccharomyces cerevisiae genome by DeepNDP, routhier et al, DLNN and LeNup are shown in the order from left to right. Through comparative studies, it was found that the correlation coefficient between the two signals of the nucleosome density and Mnase-seq obtained by DeepNDP prediction reached 0.723, and the Pearson correlation coefficient obtained by the method proposed by Routhier et al was 0.68, which was used to predict models of nucleosome localization, such as the behavior of DLNN, leNup when used to predict whole genome nucleosome density was 0.43 and 0.40, respectively. Therefore, the DeepNDP model of the embodiment is superior to the previous model in not only training results, but also correlation of prediction results.

It should be noted that, in this example, the DNA sequence of the whole genome chromosome is subjected to the One-hot encoding and NCP encoding, respectively, and the One-hot encoding is not used in the present example, because the effect of combining the One-hot encoding and NCP encoding is better. Specifically, also for the case of Saccharomyces cerevisiae, the DNA sequence was encoded as only One form of One-hot encoding, thereby verifying the effectiveness of nucleotide chemical property encoding (NCP) in nucleosome density prediction: the result shows that when DeepNDP model uses only One coding mode of One-hot coding, the predicted Pearson correlation is 0.703. FIG. 5 shows a distribution curve of nucleosome density on a Saccharomyces cerevisiae chr16 fragment, with solid line segments representing predicted results using nucleotide chemistry encoding, dotted line segments representing predicted results not using nucleotide chemistry encoding, and dashed line segments representing biological experimental results. Obviously, after nucleotide acid chemical property coding is added on the saccharomyces cerevisiae chr16 segment, the predicted result is better fitted with the biological experiment result, and the DeepNDP model effect is improved to a certain extent.

In addition to the above-mentioned s.cerevisiae, the present example also uses a DNA sequence of 2000bp range upstream and downstream of the transcription initiation site of the mouse as a test set to be input into DeepNDP model to verify the effectiveness of the model for cross-species recognition, and the result is shown in FIG. 6: the ordinate of the upper and lower graphs is the predicted nucleosome density of DeepNDP model and the chemically derived NCP_score (nucleosome centering score), which represents the signal intensity of the nucleosome centering center site. As can be easily seen from fig. 6, the mice nuclear corpuscle density predicted by DeepNDP model has similar periodicity as ncp_score obtained by chemical method, and shows better cross-species generalization ability of DeepNDP model.

Example two

The invention provides a whole genome nucleosome density prediction system, comprising:

And a coding module: the method comprises the steps of obtaining a DNA sequence of a whole genome chromosome, and respectively performing first coding and second coding to obtain a first coding sequence and a second coding sequence;

Example III

The present embodiment provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the whole genome nucleosome density prediction method according to the first embodiment when executing the computer program.

Example IV

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the whole genome nucleosome density prediction method of the first embodiment.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the application can be realized by adopting various computer languages, such as object-oriented programming language Java, an transliteration script language JavaScript and the like.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.

Claims

1. A whole genome nucleosome density prediction method, which is characterized in that: comprising the following steps:

2. The whole genome nucleosome density prediction method according to claim 1, wherein: the feature extraction network in step S2 includes a feature extraction module ResNet and a feature extraction module CNNNet, where the feature extraction module ResNet is configured to extract a first local feature of a first coding sequence and the feature extraction module CNNNet is configured to extract a second local feature of a second coding sequence.

3. The whole genome nucleosome density prediction method according to claim 2, characterized in that: the feature extraction module ResNet includes a first CNN layer, three ResBlock layers, a second CNN layer, a third CNN layer, and a first Reshape layer, which are sequentially connected, where the first Reshape layer is used to change the dimension of the output of the third CNN layer.

4. The whole genome nucleosome density prediction method according to claim 3, wherein: the ResBlock layers comprise a first column of CNN units and a second column of CNN units;

5. The whole genome nucleosome density prediction method according to claim 4, wherein: all CNN layers in the feature extraction module ResNet are connected with a ReLU activation function.

6. The whole genome nucleosome density prediction method according to claim 2, characterized in that: the feature extraction module CNNNet includes a tenth CNN layer, an eleventh CNN layer, a twelfth CNN layer, and a second Reshape layer that are sequentially connected, where the second Reshape layer is configured to change a dimension output by the twelfth CNN layer.

7. The whole genome nucleosome density prediction method according to claim 1, wherein: the method for obtaining the DNA sequence of the whole genome chromosome in the step S1 and respectively carrying out the first coding and the second coding to obtain the first coding sequence and the second coding sequence comprises the following steps:

obtaining a DNA sequence of a whole genome chromosome;

8. A whole genome nucleosome density prediction system, characterized in that: comprising the following steps:

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized by: the processor, when executing the computer program, implements the steps of the whole genome nucleosome density prediction method according to any one of claims 1 to 7.

10. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program, when executed by a processor, performs the steps of the whole genome nucleosome density prediction method according to any one of claims 1 to 7.