CN117789893A - Breeding data prediction method based on correlation analysis - Google Patents
Breeding data prediction method based on correlation analysis Download PDFInfo
- Publication number
- CN117789893A CN117789893A CN202410212922.2A CN202410212922A CN117789893A CN 117789893 A CN117789893 A CN 117789893A CN 202410212922 A CN202410212922 A CN 202410212922A CN 117789893 A CN117789893 A CN 117789893A
- Authority
- CN
- China
- Prior art keywords
- bill
- materials
- breeding
- value
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000001488 breeding effect Effects 0.000 title claims abstract description 298
- 238000009395 breeding Methods 0.000 title claims abstract description 257
- 238000000034 method Methods 0.000 title claims abstract description 77
- 238000010219 correlation analysis Methods 0.000 title claims abstract description 18
- 239000000463 material Substances 0.000 claims abstract description 794
- 238000012216 screening Methods 0.000 claims abstract description 16
- 239000013598 vector Substances 0.000 claims description 52
- 238000013507 mapping Methods 0.000 claims description 9
- 238000012512 characterization method Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 6
- 238000012163 sequencing technique Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000002474 experimental method Methods 0.000 abstract description 29
- 238000003976 plant breeding Methods 0.000 abstract description 2
- 238000009400 out breeding Methods 0.000 abstract 1
- 241000196324 Embryophyta Species 0.000 description 7
- 239000002689 soil Substances 0.000 description 5
- 240000008042 Zea mays Species 0.000 description 4
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 description 4
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 4
- 235000005822 corn Nutrition 0.000 description 4
- 230000007613 environmental effect Effects 0.000 description 2
- 239000003337 fertilizer Substances 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 241000238631 Hexapoda Species 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 241000209140 Triticum Species 0.000 description 1
- 235000021307 Triticum Nutrition 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013077 target material Substances 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the technical field of plant breeding, in particular to a breeding data prediction method based on correlation analysis. The method comprises the following steps: acquiring a bill of materials set and a remaining bill of materials set; carrying out breeding screening on each bill of materials in the bill of materials collection to obtain a target bill of materials collection; if the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials set do not meet the preset requirement of the characteristic values of the target breeding characters, acquiring a first bill of materials set from the rest bill of materials set according to the correlation among the characteristic values of the breeding characters corresponding to the various bill of materials in the target bill of materials set; and continuing to breed and screen each first bill of materials in the first bill of materials set until the characteristic value of the breeding character corresponding to any one of the first bill of materials in the first target bill of materials set meets the requirement of the preset characteristic value of the target breeding character. The invention can improve the efficiency of breeding experiments.
Description
Technical Field
The invention relates to the technical field of plant breeding, in particular to a breeding data prediction method based on correlation analysis.
Background
In the process of digital breeding experiments, a plurality of groups of to-be-tested bill materials are generally configured based on the materials of the to-be-tested bill materials, but due to the fact that the obtained to-be-tested bill materials are more and the number of test fields is limited, breeding experiments are generally performed in batches, but in the process of breeding experiments, if the difference between the breeding property characteristics of the to-be-tested bill materials and the expected to-be-achieved breeding property characteristics is not considered, namely, in the process of breeding experiments, if the difference between the breeding property characteristics of the to-be-tested bill materials and the breeding experiment purpose is not considered, the to-be-tested bill materials are randomly selected for breeding all the time, which may result in slower progress of the breeding experiments or waste of breeding time, for example, the difference between the breeding property characteristics of the selected to-be-tested bill materials and the expected to-be-achieved property characteristics is large, or the breeding property characteristics of the selected to-be-tested bill materials are not related to the expected to the purpose of the breeding experiments, so that in the process of breeding experiments, the to-be-tested bill materials are predicted in a critical aspect of the breeding experiments, and the material bill materials include the male parent in the breeding experiments, the female parent in the breeding process, the environment data in the breeding process, and fertilizer index data required in the breeding process, and the like.
Disclosure of Invention
In order to solve the problems, the invention provides a breeding data prediction method based on correlation analysis, which adopts the following technical scheme:
one embodiment of the invention provides a breeding data prediction method based on correlation analysis, which comprises the following steps:
acquiring a bill of materials set and a residual bill of materials set;
breeding each bill of materials in the bill of materials collection to obtain a characteristic value of the breeding property corresponding to each bill of materials, and screening the bill of materials contained in the bill of materials collection according to the characteristic value of the breeding property corresponding to each bill of materials to obtain a target bill of materials collection;
if the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials set do not meet the preset requirement of the characteristic values of the target breeding characters, acquiring the characteristic predicted values of the breeding characters corresponding to each residual bill of materials in the residual bill of materials set according to the correlation among the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials set, and acquiring a first bill of materials set and a first residual bill of materials set in the residual bill of materials set according to the characteristic predicted values of the breeding characters corresponding to each residual bill of materials in the residual bill of materials set;
Breeding and screening each bill of materials in the first bill of materials set to obtain a first target bill of materials set, and if the characteristic values of the breeding characters corresponding to each first bill of materials in the first target bill of materials set do not meet the preset requirement of the characteristic values of the target breeding characters, obtaining the characteristic predicted values of the breeding characters corresponding to each first residual bill of materials in the first residual bill of materials set according to the correlation between the characteristic values of the breeding characters corresponding to each first bill of materials in the first target bill of materials set and the correlation between the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials set;
according to the breeding character feature predicted value corresponding to each first residual bill of materials in the first residual bill of materials set, acquiring a new bill of materials set and a new residual bill of materials set from the first residual bill of materials set, taking the new bill of materials set and the new residual bill of materials set as the first bill of materials set and the first residual bill of materials set, returning to execute the steps of breeding and screening each first bill of materials in the first bill of materials set until detecting that the breeding character feature value corresponding to any one first bill of materials in the first target bill of materials set meets the preset target breeding character feature value requirement, and taking the first bill of materials corresponding to the preset target breeding character feature value requirement as a bill of materials required for breeding.
Preferably, the method for acquiring the bill of materials collection and the rest bill of materials collection comprises the following steps:
acquiring a bill of materials collection to be tested and a breeding limit threshold;
randomly selecting a breeding limit threshold number of bill of materials to be tested from the bill of materials to be tested, marking all the selected bill of materials to be tested as a bill of materials, and marking a set constructed by all the selected bill of materials as a bill of materials set;
and marking the set constructed by all the bill of materials to be tested in the bill of materials set except the bill of materials in the bill of materials set as a remaining bill of materials set, and marking all the bill of materials to be tested in the remaining bill of materials set as a remaining bill of materials.
Preferably, the method for obtaining the target bill of materials set includes:
judging whether the characteristic value of the breeding property corresponding to each bill of materials in the bill of materials collection is larger than or equal to the preset minimum characteristic value of the breeding property, if so, marking the corresponding bill of materials as a target bill of materials, and marking the collection constructed by all the target bill of materials as a target bill of materials collection.
Preferably, the characteristic value of the breeding property corresponding to each bill of material in the target bill of material set does not meet the preset requirement of the characteristic value of the target breeding property, that is, the absolute value of the difference value between the characteristic value of the breeding property corresponding to each bill of material in the target bill of material set and the preset characteristic value of the target breeding property is greater than the first threshold value.
Preferably, the method for obtaining the breeding trait characteristic predicted value corresponding to each remaining bill of materials in the remaining bill set comprises the following steps:
constructing a multidimensional material data space;
mapping each bill of materials in the target bill of materials set into the multidimensional material data space, acquiring data points corresponding to each bill of materials in the target bill of materials set, and recording the data points corresponding to each bill of materials in the target bill of materials set as first data points;
marking each bill of materials in the target bill of materials set to obtain a marking value of each bill of materials in the target bill of materials set;
acquiring an initial neighborhood radius and an initial neighborhood density threshold;
the method comprises the steps of recording a set constructed by all first data points in an initial neighborhood radius range taking each first data point as a center as a first data point set corresponding to the corresponding first data point;
marking the normalized value of the variance of the characteristic values of the breeding characters corresponding to all the first data points in the first data point set corresponding to each first data point as the stability of the characters corresponding to the corresponding first data points;
Adding the property stability degree corresponding to each first data point with a natural constant 1, and multiplying the property stability degree by an initial neighborhood density threshold value to obtain a value which is recorded as a target neighborhood density threshold value corresponding to the corresponding first data point;
calculating the normalized value of the Euclidean distance between any two first data points, performing density clustering on all the first data points according to the normalized value of the Euclidean distance between any two first data points, the initial neighborhood radius and the target neighborhood density threshold corresponding to each first data point, and recording each cluster obtained by clustering as a first cluster;
and obtaining a breeding character characteristic predicted value corresponding to each residual bill of materials in the residual bill set according to each first cluster.
Preferably, the method for obtaining the characteristic predicted value of the breeding property corresponding to each remaining bill of materials in the remaining bill set according to each first cluster includes:
for any of the remaining bill of materials a:
mapping the residual bill of materials a into the multidimensional material data space to obtain data points corresponding to the residual bill of materials a, and marking the data points as second data points;
calculating Euclidean distance between a second data point corresponding to the residual bill of materials a and each first data point, and marking a first cluster where the first data point corresponding to the minimum Euclidean distance is located as a target first cluster;
For any dimension in the multi-dimensional material data space: acquiring a pearson correlation coefficient between a projection value of each first data point in the target first cluster onto a coordinate axis corresponding to the dimension and a breeding character characteristic value corresponding to each first data point in the target first cluster, and recording the pearson correlation coefficient as the pearson correlation coefficient between the dimension and the breeding character characteristic; marking the normalized value of the pearson correlation coefficient between the dimension and the breeding character characteristic as a weight value corresponding to the dimension; performing linear fitting on the projection value of each first data point in the target first cluster onto the coordinate axis corresponding to the dimension and the characteristic value of the breeding property corresponding to each first data point in the target first cluster, and marking the fitted curve as a fitted curve corresponding to the dimension;
obtaining an initial predicted value corresponding to the residual bill of materials a in the dimension according to the projection value of the second data point corresponding to the residual bill of materials a projected onto the coordinate axis corresponding to the dimension and the fitting curve corresponding to the dimension;
the product of the initial predicted value corresponding to the residual bill of materials a in the dimension and the weight value corresponding to the dimension is recorded as a target predicted value corresponding to the residual bill of materials a in the dimension;
And (3) marking the average value of the target predicted values corresponding to the residual bill of materials a in all dimensions in the multidimensional bill of materials data space as the characteristic predicted value of the breeding property corresponding to the residual bill of materials a.
Preferably, the method for acquiring the first bill of materials set and the first remaining bill of materials set includes:
sequencing all the residual bill of materials in the residual bill of materials set according to the sequence of the characteristic predictive value of the breeding character from big to small to obtain a residual bill of materials sequence;
starting from the first residual bill of materials in the residual bill sequence, marking a set constructed by N0 continuously acquired residual bill of materials as a first bill of materials set, marking each residual bill of materials in the first bill of materials set as a first bill of materials, wherein N0 is a breeding limit threshold;
and marking all the remaining bill of materials except the first bill of materials in the sequence of the remaining bill of materials as a first remaining bill of materials, and marking a set constructed by all the first remaining bill of materials as a first remaining bill set.
Preferably, the method for obtaining the breeding trait characteristic predicted value corresponding to each first remaining bill of materials in the first remaining bill of materials set includes:
Marking each bill of materials in the first target bill of materials set to obtain a marking value of each first bill of materials in the first target bill of materials set;
sequencing all the first bill of materials in the first target bill of materials set according to the sequence of the marking values from small to large, marking the sequenced first bill of materials set as a newly added bill of materials set, and marking each first bill of materials in the newly added bill of materials set as a newly added bill of materials; the number of the newly-added bill of materials in the newly-added bill of materials set is W;
obtaining new clusters according to the 1 st newly-added bill of materials and each first cluster in the newly-added bill of materials set, and marking the new clusters as 1 st updated clusters; according to the 2 nd new bill of materials and each 1 st updated cluster in the new bill of materials set, obtaining new clusters, and marking the new clusters as the 2 nd updated clusters; according to the 3 rd new bill of materials and every 2 nd updated cluster in the new bill of materials collection, get the new cluster, and mark as the 3 rd updated cluster; and so on, obtaining each W-th updated cluster;
The W-th updated cluster clusters are marked as second cluster clusters;
and obtaining a breeding character characteristic predicted value corresponding to each first residual bill of materials in the first residual bill set according to each second cluster.
Preferably, the method for obtaining each 1 st update cluster comprises the following steps:
mapping each newly added bill of materials in the newly added bill of materials set into the multidimensional material data space, and obtaining a data point corresponding to each newly added bill of materials in the newly added bill of materials set;
for the data points corresponding to the 1 st new bill of materials in the set of new bill of materials:
calculating Euclidean distance between the data point corresponding to the 1 st newly added bill of materials and each first data point, and recording a set constructed by all first data points in a first cluster where the first data point corresponding to the minimum Euclidean distance is located as a data point set corresponding to the 1 st newly added bill of materials;
marking the marking value of the bill of materials corresponding to each data point in the data point set corresponding to the 1 st newly-added bill of materials as the marking value of the corresponding data point in the data point set corresponding to the 1 st newly-added bill of materials, sorting all data points in the data point set corresponding to the 1 st newly-added bill of materials according to the sequence of the marking values from small to large, and marking the sorted data point set as the previous data point set corresponding to the 1 st newly-added bill of materials;
The vector set constructed by vectors formed by all adjacent two data points in the previous data point set is marked as a vector set corresponding to the data point corresponding to the 1 st newly added bill of materials;
according to cosine similarity between adjacent vectors in the vector set, obtaining a first direction representation value of a data point corresponding to the 1 st newly-added bill of materials;
the method comprises the steps of (1) marking a set constructed by all first data points in an initial neighborhood radius range by taking data points corresponding to a 1 st newly added bill of materials as centers as a neighborhood data point set of the data points corresponding to the 1 st newly added bill of materials;
the average value of the characteristic values of the breeding corresponding to all the data points in the neighborhood data point set obtained through calculation is recorded as the neighborhood average value of the data points corresponding to the 1 st newly added bill of materials;
marking the normalized value of the characteristic value of the breeding character corresponding to the 1 st newly added bill of materials and the absolute value of the difference value of the neighborhood mean value as a second direction representation value of the data point corresponding to the 1 st newly added bill of materials;
the product of the first direction representation value of the data point corresponding to the 1 st newly added bill of materials and the second direction representation value corresponding to the 1 st newly added bill of materials is recorded as a density optimization factor of the data point corresponding to the 1 st newly added bill of materials;
Recording the product of the density optimization factor of the data point corresponding to the 1 st newly-added bill of materials and the initial neighborhood density threshold as the target neighborhood density threshold of the data point corresponding to the 1 st newly-added bill of materials;
recording the data point corresponding to the 1 st newly added bill of materials and all the first data points as first updated data points;
calculating the normalized value of the Euclidean distance between any two first updated data points, performing density clustering on all the first updated data points according to the normalized value of the Euclidean distance between any two first updated data points, the initial neighborhood radius and the target neighborhood density threshold corresponding to each first updated data point to obtain new cluster clusters, and recording the obtained new cluster clusters as 1 st updated cluster clusters.
Preferably, the first direction characterization value of the data point corresponding to the 1 st new bill of materials is calculated according to the following formula:
wherein,for the first direction representation value of the data point corresponding to the 1 st newly added bill of materials, t is the number of vectors in the vector set corresponding to the data point corresponding to the 1 st newly added bill of materials,/>In the vector set corresponding to the data point corresponding to the 1 st newly added bill of materials The j-th vector, ">For the j+1st vector in the vector set corresponding to the data point corresponding to the 1 st newly added bill of materials,/for>And the cosine similarity between the j vector and the j+1 vector in the vector set corresponding to the data point corresponding to the 1 st newly added bill of materials.
The beneficial effects are that: firstly, acquiring a bill of materials set and a residual bill of materials set; breeding and screening each bill of materials in the bill of materials collection to obtain a target bill of materials collection; if the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials set do not meet the preset requirement of the characteristic values of the target breeding characters, acquiring the characteristic predicted value of the breeding characters corresponding to each residual bill of materials in the residual bill of materials set according to the correlation among the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials set, acquiring a first bill of materials set in the residual bill of materials set according to the characteristic predicted value of the breeding characters, and acquiring a first bill of materials set in the residual bill of materials set; and continuing to breed and screen each first bill of materials in the first bill of materials set until the characteristic value of the breeding character corresponding to any one of the first bill of materials in the first target bill of materials set meets the requirement of the preset characteristic value of the target breeding character. According to the invention, the efficiency of a breeding experiment can be improved by acquiring the characteristic predicted value of the breeding character corresponding to the bill of materials, namely, the bill of materials required by breeding can be obtained relatively quickly.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for predicting breeding data based on correlation analysis.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by those skilled in the art based on the embodiments of the present invention are within the scope of protection of the embodiments of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The embodiment provides a breeding data prediction method based on correlation analysis, which is described in detail as follows:
As shown in fig. 1, the breeding data prediction method based on correlation analysis includes the following steps:
step S001, acquiring a bill of materials set and a remaining bill of materials set.
Since the purpose of the experiment is usually set when the experiment is performed, so as to facilitate the cutoff of the experimental process, the purpose of the digital breeding experiment in this embodiment is to obtain a bill of materials that meets the preset target breeding trait characteristics, that is, a bill of materials corresponding to the breeding trait characteristics that the experiment wants to achieve, and the breeding trait characteristics generally include cold resistance, drought resistance, or average yield of each plant, etc., while this embodiment uses the average yield of each plant as an example for analysis, that is, the breeding trait characteristic value appearing in the subsequent process refers to the average yield of each plant.
Therefore, in the embodiment, firstly, breeding materials such as rice, wheat or corn and the like are obtained when breeding experiments are carried out on any plant, then the breeding materials are subjected to material grouping in a material management module of a digital breeding system to obtain a plurality of groups of to-be-tested bill of materials, and a to-be-tested bill of materials collection is constructed according to each group of the obtained to-be-tested bill of materials; and the breeding materials need to be set or configured by relevant neighborhood personnel or relevant breeding experimenters.
The bill of materials in this embodiment includes male parent and female parent required in breeding, environmental data required in breeding (including soil temperature and humidity, environmental temperature and humidity, etc. required in breeding), fertilizer amount data required in breeding, etc., and the plant varieties of male parent and female parent used in this embodiment are the same in the breeding experiment process.
Then, the number which can be bred once when the breeding experiment is carried out is obtained, namely, the number which can be bred by one batch of materials to be tested when the breeding experiment is carried out is obtained, the obtained number which can be bred by one batch of materials to be tested when the breeding experiment is carried out is recorded as a breeding limit threshold value, the breeding limit threshold value is related to the number of test fields and the number of experimenters, and specific requirements are set by related personnel, but the number of materials to be tested which can be bred by one batch of materials to be tested when the breeding experiment is carried out is far smaller than the group number of materials to be tested under the normal condition.
Selecting a breeding limit threshold number of bill of materials to be tested randomly from a set of bill of materials to be tested, marking all the selected bill of materials to be tested as a bill of materials, and marking a set constructed by all the selected bill of materials as a bill of materials set; and the collection constructed by all the bill of materials to be tested except the bill of materials in the bill of materials collection is marked as a remaining bill of materials collection, and all the bill of materials to be tested in the remaining bill of materials collection is marked as a remaining bill of materials.
Thus, a bill of materials set and a remaining bill of materials set are obtained.
Step S002, breeding each bill of materials in the bill of materials collection to obtain a characteristic value of breeding property corresponding to each bill of materials, and screening the bill of materials contained in the bill of materials collection according to the characteristic value of breeding property corresponding to each bill of materials to obtain a target bill of materials collection.
Then breeding each bill of materials in the bill of materials collection, and acquiring data acquired by a field data acquisition module in the breeding process to obtain a characteristic value of the breeding property corresponding to each bill of materials in the bill of materials collection, namely breeding each bill of materials in the bill of materials collection to obtain average yield of each plant corresponding to each bill of materials in the bill of materials collection; the process of breeding each bill of materials in the bill of materials collection to obtain the corresponding characteristic value of the breeding property is a well-known technology, so this embodiment will not be described in detail; other breeding traits may be obtained as other embodiments, and may be insect resistance, drought resistance, and the like.
Then judging whether the characteristic value of the breeding character corresponding to each bill of materials in the bill of materials set is larger than or equal to the preset minimum characteristic value of the breeding character, if so, reserving the corresponding bill of materials, and marking the corresponding bill of materials as a target bill of materials; otherwise, the corresponding bill of materials is eliminated; the set of all the kept target bill of materials constructions is then noted as the target bill of materials set.
In this embodiment, the bill of materials retained in the target bill of materials set is transferred to the bill of materials management module for recording, so that the corresponding bill of materials is marked according to the time when each bill of materials in the target bill of materials set is transferred to the bill of materials management module, and the marking value of each bill of materials in the target bill of materials set is obtained, that is, the marking value of each bill of materials in the target bill of materials set is the time when the corresponding bill of materials is transferred to the bill of materials management module.
In this embodiment, a preset minimum breeding trait characteristic value needs to be set according to actual conditions, for example, if a breeding experiment is performed on corn, the preset minimum breeding trait characteristic value may be set to 30 g, that is, the minimum average yield of a single plant of corn is 30 g; the purpose of screening each bill of materials in the bill of materials collection through the preset minimum characteristic value of breeding is to screen out the bill of materials which are irrelevant to the characteristic of the breeding in the bill of materials collection, namely, the bill of materials which are irrelevant to the breeding in the bill of materials collection; because the target bill of materials stored in the bill of materials management module needs to be mobilized later, the mobilized purpose is to predict the characteristic of the breeding character later, if the breeding character corresponding to a bill of materials in the bill collection is irrelevant to the characteristic of the breeding character of the experiment, the reference meaning of the bill of materials when predicting the characteristic of the breeding character later is smaller, so the bill of materials can be eliminated, and the storage space of the bill of materials management module can be further reduced.
Thus, a target bill of materials set is obtained.
Step S003, if the breeding trait characteristic values corresponding to each bill of materials in the target bill of materials set do not meet the preset target breeding trait characteristic value requirement, obtaining a breeding trait characteristic predicted value corresponding to each residual bill of materials in the residual bill of materials set according to the correlation between the breeding trait characteristic values corresponding to each bill of materials in the target bill of materials set, and obtaining a first bill of materials set and a first residual bill of materials set in the residual bill of materials set according to the breeding trait characteristic predicted value corresponding to each residual bill of materials in the residual bill of materials set.
And then continuing to analyze the obtained target bill of materials set, wherein the specific process is as follows:
firstly, judging whether the characteristic values of the breeding characters corresponding to each bill of materials in a target bill of materials set do not meet the preset requirement of the characteristic values of the target breeding characters, wherein the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials set do not meet the preset requirement of the characteristic values of the target breeding characters, namely, the absolute values of the differences between the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials set and the preset characteristic values of the target breeding characters are larger than a first threshold value.
In specific application, a related experimenter or a related field personnel is required to set a preset target breeding character characteristic value and a first threshold according to the experimental purpose or according to actual conditions, wherein the target breeding character characteristic value is a breeding character characteristic expected to be achieved by a breeding experiment; for example, if a breeding experiment is performed on corn, a preset target breeding trait characteristic value may be set to 300 g, and the first threshold may be set to 5.
If judging that the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials set do not meet the preset requirement of the characteristic values of the target breeding characters, acquiring the characteristic predicted values of the breeding characters corresponding to each residual bill of materials in the residual bill of materials set according to the correlation among the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials set; according to the correlation between the characteristic values of the breeding characters corresponding to the bill of materials in the target bill of materials set, the specific process of obtaining the characteristic predicted value of the breeding character corresponding to each residual bill of materials in the residual bill of materials set is as follows:
acquiring each material data type corresponding to a bill of materials in a target bill of materials set, and constructing a multidimensional material data space according to each material data type corresponding to the bill of materials in the target bill of materials set; then mapping each bill of materials in the target bill of materials set into a constructed multidimensional bill of materials data space, acquiring a data point corresponding to each bill of materials in the target bill of materials set, and recording the data point corresponding to each bill of materials in the target bill of materials set as a first data point;
The material data types according to which the multidimensional material data space is constructed in the embodiment are all numerical material data types; the numerical material data type refers to material data with specific numerical values, such as soil temperature, soil humidity, environment temperature, environment humidity and the like in a bill of materials; and the number of dimensions of the constructed multidimensional material data space is related to the number of material data types corresponding to the target bill of materials set; for example, the number of the material types corresponding to all the bill of materials in the target bill of materials set is four, and the number of the material types corresponding to all the bill of materials in the target bill of materials set is the same, if the number of the material types corresponding to all the bill of materials in the target bill of materials set is soil temperature, soil humidity, environment temperature and environment humidity, then the dimension number of the multi-dimensional material data space is 4.
Setting an initial neighborhood radius and an initial neighborhood density threshold; the acquired set constructed by all the first data points in the initial neighborhood radius range taking each first data point as the center is recorded as a first data point set corresponding to the corresponding first data point; and recording the calculated normalized value of the variance of the characteristic values of the breeding characters corresponding to all the first data points in the first data point set corresponding to each first data point as the stability of the characters corresponding to the corresponding first data point; and adding the property stability degree corresponding to each first data point with a natural constant 1, and multiplying the property stability degree by an initial neighborhood density threshold value to obtain a value which is recorded as a target neighborhood density threshold value corresponding to the corresponding first data point.
And then calculating the normalized value of the Euclidean distance between any two first data points, carrying out density clustering on all the first data points according to the normalized value of the Euclidean distance between any two first data points, the initial neighborhood radius and the target neighborhood density threshold corresponding to each first data point, and recording each cluster obtained by clustering as a first cluster. The purpose of clustering is to analyze the correlation between the characteristic of the breeding character and each dimension in the multidimensional space, the purpose of analyzing the correlation is to acquire the characteristic predicted value of the breeding character accurately, and the embodiment adjusts the initial neighborhood density threshold value through the characteristic of the breeding character during clustering is to avoid dividing the unstable characteristic value of the breeding character into the same cluster.
All clustering processes in this embodiment are DBSCAN clustering algorithms, and since the DBSCAN clustering algorithm is a well-known technique, this embodiment will not be described in detail. In specific applications, the values of the initial neighborhood radius and the initial neighborhood density threshold value need to be set according to actual situations, for example, the initial neighborhood radius is set to 0.5, and the initial neighborhood density threshold value is set to one half of the number of dimensions corresponding to the multidimensional material data space.
Obtaining a breeding character characteristic predicted value corresponding to each residual bill of materials in the residual bill set according to each first cluster; the method comprises the following steps:
for any of the remaining bill of materials a:
mapping the residual bill of materials into the constructed multidimensional material data space to obtain data points corresponding to the residual bill of materials, and marking the data points as second data points; the Euclidean distance between the second data point and each first data point is calculated, and the first cluster where the first data point corresponding to the minimum Euclidean distance is located is recorded as a target first cluster;
for any dimension in the multi-dimensional material data space: acquiring a pearson correlation coefficient between a projection value of each first data point in the target first cluster onto a coordinate axis corresponding to the dimension and a breeding character characteristic value corresponding to each first data point in the target first cluster, and recording the pearson correlation coefficient as the pearson correlation coefficient between the dimension and the breeding character characteristic; the normalized value of the pearson correlation coefficient between the dimension and the breeding character characteristic is recorded as a weight value corresponding to the dimension, namely the sum of the weight values corresponding to all the dimensions is 1; then, a least square method is utilized to carry out linear fitting on a projection value of each first data point in the target first cluster onto a coordinate axis corresponding to the dimension and a breeding character characteristic value corresponding to each first data point in the target first cluster, a curve obtained through fitting is recorded as a fitting curve corresponding to the dimension, the breeding character characteristic value on the fitting curve is a vertical coordinate value, and a projection value of each first data point in the target first cluster onto the coordinate axis corresponding to the dimension is a horizontal coordinate value; acquiring a data point with the same projection value of a second data point corresponding to the residual bill of materials a on the coordinate axis corresponding to the dimension on the fitting curve, marking the data point as a target data point, and marking the ordinate value of the target data point as an initial predicted value corresponding to the residual bill of materials a in the dimension; and (3) marking the product of the initial predicted value corresponding to the residual bill of materials a in the dimension and the weight value corresponding to the dimension as the target predicted value corresponding to the residual bill of materials a in the dimension.
Since the linear fitting by the least square method is a known technique, the present embodiment will not be described in detail; and the pearson correlation coefficient between each dimension and the breeding trait characteristic can reflect the correlation between the breeding trait characteristic and each dimension, and the larger the pearson correlation coefficient is, the stronger the correlation between the corresponding dimension and the breeding trait characteristic is.
Therefore, the target predicted value corresponding to the residual bill of materials a in all dimensions in the multidimensional bill of materials data space can be obtained through the process, and then the average value of the target predicted values corresponding to the residual bill of materials a in all dimensions is recorded as the breeding character characteristic predicted value corresponding to the residual bill of materials a.
In this embodiment, the method for obtaining the characteristic feature predictors of the breeding property corresponding to all the remaining bill of materials in the remaining bill of materials set is the same as the method for obtaining the characteristic feature predictors of the breeding property corresponding to the remaining bill of materials a, so that the characteristic feature predictors of the breeding property corresponding to each remaining bill of materials in the remaining bill of materials set can be obtained in the above manner.
Next, according to the breeding character feature predicted value corresponding to each remaining bill of materials in the remaining bill of materials set, a first bill of materials set and a first remaining bill of materials set are obtained in the remaining bill of materials set, specifically:
Sequencing all the residual bill of materials in the residual bill of materials collection according to the sequence of the characteristic predictive value of the breeding character from big to small to obtain a residual bill of materials sequence; starting from the first residual bill of materials in the residual bill sequence, marking a set constructed by N0 continuously acquired residual bill of materials as a first bill of materials set, marking each residual bill of materials in the first bill of materials set as a first bill of materials, wherein N0 is a breeding limit threshold; the first bill of materials in the first bill of materials collection are arranged according to the sequence from the big to the small of the characteristic predicted value of the breeding property; all the remaining bill of materials except the first bill of materials in the sequence of the remaining bill of materials are marked as a first remaining bill of materials, and the set constructed by all the first remaining bill of materials is marked as a first remaining bill set; that is, in this embodiment, the first bill of materials collection will be continuously bred, and the method of selecting the bill of materials to be bred later by the characteristic prediction value of the breeding property can improve the efficiency of the breeding experiment.
Thus, a first bill of materials set and a first remaining bill of materials set are obtained.
Step S004, breeding and screening each bill of materials in the first bill of materials collection to obtain a first target bill of materials collection, and if the characteristic values of the breeding characters corresponding to each first bill of materials in the first target bill of materials collection do not meet the preset requirement of the characteristic values of the target breeding characters, obtaining the characteristic predicted values of the breeding characters corresponding to each first residual bill of materials in the first residual bill of materials collection according to the correlation between the characteristic values of the breeding characters corresponding to each first bill of materials in the first target bill of materials collection and the correlation between the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials collection.
Next, in this embodiment, breeding and screening are performed on each bill of materials in the first bill of materials set to obtain a first target bill of materials set:
breeding each first bill of materials in the first bill of materials collection, and obtaining a breeding character characteristic value corresponding to each first bill of materials in the first bill of materials collection through data acquired by a field data acquisition module in the breeding process; judging whether the characteristic value of the breeding character corresponding to each first bill of materials in the first bill of materials set is larger than or equal to a preset minimum characteristic value of the breeding character, if so, reserving the corresponding first bill of materials, and then recording the set constructed by all reserved first bill of materials as a first target bill of materials set; otherwise, the corresponding first bill of materials is eliminated.
Marking each bill of materials in the first bill of materials set to obtain a marking value of each first bill of materials in the first target bill of materials set; in this embodiment, each first bill of materials in the first target bill of materials set is marked according to the time when the first bill of materials is transferred into the bill of materials management module, that is, the marking value of each first bill of materials in the first target bill of materials set is the time when the corresponding first bill of materials is transferred into the bill of materials management module.
Therefore, the method of breeding, screening and marking each first bill of materials in the first bill of materials set to obtain the first target bill of materials set and the marking value of each first bill of materials in the first target bill of materials set in this embodiment is the same as the method of obtaining the target bill of materials set and the marking value of each bill of materials in the target bill of materials set in step S002.
Judging whether the characteristic values of the breeding characters corresponding to each first bill of materials in the first target bill of materials set do not meet the preset target characteristic value requirement of the breeding characters, wherein the characteristic values of the breeding characters corresponding to each first bill of materials in the first target bill of materials set do not meet the preset target characteristic value requirement of the breeding characters, and the absolute value of the difference value between the characteristic values of the breeding characters corresponding to each first bill of materials in the first target bill of materials set and the preset target characteristic value of the breeding characters is larger than a first threshold value.
If the characteristic values of the breeding characters corresponding to each first bill of materials in the first target bill of materials set do not meet the preset requirement of the characteristic values of the target breeding characters, obtaining the characteristic predicted values of the breeding characters corresponding to each first residual bill of materials in the first residual bill of materials set according to the correlation between the characteristic values of the breeding characters corresponding to each first bill of materials in the first target bill of materials set and the correlation between the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials set; in this embodiment, the specific process of obtaining the characteristic prediction value of the breeding trait corresponding to each first remaining bill of materials in the first remaining bill set is:
sequencing all the first bill of materials in the first target bill of materials set according to the sequence of the marking values from small to large, marking the sequenced first bill of materials set as a newly added bill of materials set, and marking each first bill of materials in the newly added bill of materials set as a newly added bill of materials; mapping each new bill of materials in the new bill of materials set into the multidimensional material data space, and obtaining the corresponding data point of each new bill of materials in the new bill of materials set.
Because the bill of materials corresponding to the first data point is already stored in the bill of materials management module, and the first data point is clustered to obtain a first cluster, then every time the bill of materials management module stores a new bill of materials, namely every time a new data point appears, the density of the data point needs to be judged by the neighborhood radius and the neighborhood density threshold of the new data point, so that the data point is newly increased to the existing cluster or marked as a noise point without density connection, but as the new data point increases, continuous data trend is gradually formed in the data space, namely the change of the characteristic value of the breeding characteristic of the data point in the local range has linear characteristics, the density threshold of the new data point is adjusted based on the characteristic presented along with the increase of the data point, and the purpose of adjustment is to more quickly update the cluster and more reliably predict the characteristic of the breeding characteristic of the first residual bill of materials, so that the embodiment can sequentially acquire the new bill of materials in the new cluster, and then update the density of each new bill of materials in the new cluster is based on the specific density of the new bill of materials in the new cluster, and the new bill of materials in the local range is updated based on the density threshold of each new bill of materials in the new cluster, and the target bill is sequentially updated:
Firstly, according to the 1 st new bill of materials in the new bill of materials set and each first cluster, each first updated cluster is obtained, specifically:
for the data points corresponding to the 1 st new bill of materials in the set of new bill of materials:
calculating Euclidean distance between the data point corresponding to the 1 st newly added bill of materials and each first data point, and recording a set constructed by all first data points in a first cluster where the first data point corresponding to the minimum Euclidean distance is located as a data point set corresponding to the 1 st newly added bill of materials; marking the marking value of the bill of materials corresponding to each data point in the data point set corresponding to the 1 st newly-added bill of materials as the marking value of the corresponding data point in the data point set corresponding to the 1 st newly-added bill of materials, sorting all data points in the data point set corresponding to the 1 st newly-added bill of materials according to the sequence of the marking values from small to large, and marking the sorted data point set as the previous data point set corresponding to the 1 st newly-added bill of materials.
Next, according to the previous data point set corresponding to the 1 st newly added bill of materials, a first direction characterization value of the data point corresponding to the 1 st newly added bill of materials is obtained, specifically: the vector set constructed by vectors formed by all adjacent two data points in the previous data point set is marked as the vector set corresponding to the data point corresponding to the 1 st newly added bill of materials; i.e. the i-th vector in the vector set is the vector formed from the i-1 th data point to the i-th data point in the previous data point set corresponding to the 1 st newly added bill of materials, and the i+1-th vector in the vector set is the vector formed from the i-th data point to the i+1-th data point in the previous data point set corresponding to the 1 st newly added bill of materials; then, according to cosine similarity between adjacent vectors in a vector set corresponding to the data point corresponding to the 1 st newly-added bill of materials, obtaining a first direction representation value of the data point corresponding to the 1 st newly-added bill of materials; calculating a first direction characterization value of the data point corresponding to the 1 st newly added bill of materials according to the following formula:
Wherein,for the first direction representation value of the data point corresponding to the 1 st newly added bill of materials, t is the number of vectors in the vector set corresponding to the data point corresponding to the 1 st newly added bill of materials,/>For the j-th vector in the vector set corresponding to the data point corresponding to the 1 st newly added bill of materials,/->For the j+1st vector in the vector set corresponding to the data point corresponding to the 1 st newly added bill of materials,/for>And the cosine similarity between the j vector and the j+1 vector in the vector set corresponding to the data point corresponding to the 1 st newly added bill of materials.
In addition, whenThe closer to 1 the value of (c) indicates that the smaller the angle between the two vectors, and thus the more similar the extension direction of the two vectors, whereas +.>The closer to-1 the value of (c) indicates that the greater the angle between the two vectors, and thus the less similar the direction of extension of the two vectors; thus->The smaller the value of (c), the more linear the feature that indicates all data points in the corresponding preceding data point set are extended, i.e., the stronger the collinearity; conversely->The larger the value of (c) indicates that all data points in the corresponding preceding data point set are less characterized by linear extension.
Then, according to the neighborhood data point set of the data point corresponding to the 1 st newly added bill of materials, obtaining a second direction representation value of the data point corresponding to the 1 st newly added bill of materials; the method comprises the following steps: the method comprises the steps of (1) marking a set constructed by all first data points in an initial neighborhood radius range by taking data points corresponding to a 1 st newly added bill of materials as centers as a neighborhood data point set of the data points corresponding to the 1 st newly added bill of materials; the average value of the characteristic values of the breeding corresponding to all the data points in the calculated neighborhood data point set is recorded as the neighborhood average value of the data points corresponding to the 1 st newly-added bill of materials; marking the normalized value of the characteristic value of the breeding character corresponding to the 1 st newly-added bill of materials and the absolute value of the difference value of the corresponding neighborhood mean value as the second direction representation value of the data point corresponding to the 1 st newly-added bill of materials; and when the absolute value of the difference value between the characteristic value of the breeding property corresponding to the 1 st newly-added bill of materials and the neighborhood mean value corresponding to the characteristic value is smaller, the characteristic that all data points in the data point set corresponding to the 1 st newly-added bill of materials and the neighborhood data point set corresponding to the 1 st newly-added bill of materials have linear extension is shown, namely the collinearity is stronger.
Then, the product of the first direction representation value of the data point corresponding to the 1 st newly-added bill of materials and the second direction representation value corresponding to the first direction representation value is recorded as a density optimization factor of the data point corresponding to the 1 st newly-added bill of materials; recording the product of the density optimization factor of the data point corresponding to the 1 st newly-added bill of materials and the initial neighborhood density threshold as the target neighborhood density threshold of the data point corresponding to the 1 st newly-added bill of materials; since clustering should be performed through a smaller density threshold when the collinearity is stronger, the degree of adjustment of the initial neighborhood density threshold is greater when the first direction characterization value and the second direction characterization value of the data point corresponding to the 1 st newly added bill of materials are smaller, that is, the density threshold of the newly added data point in clustering is smaller.
Then, recording the data point corresponding to the 1 st newly added bill of materials and all the first data points as first updated data points; calculating the normalized value of the Euclidean distance between any two first updated data points, performing density clustering on all the first updated data points according to the normalized value of the Euclidean distance between any two first updated data points, the initial neighborhood radius and the target neighborhood density threshold corresponding to each first updated data point to obtain new cluster clusters, and recording the obtained new cluster clusters as 1 st updated cluster clusters.
And then according to the 2 nd new bill of materials and each 1 st updated cluster in the new bill of materials set, each 2 nd updated cluster is obtained, which is specifically:
for the data points corresponding to the 2 nd new bill of materials in the set of new bill of materials:
calculating Euclidean distance between the data point corresponding to the 2 nd newly added bill of materials and each first updated data point, and recording a set constructed by all the first updated data points in the 1 st updated cluster where the first updated data point corresponding to the minimum Euclidean distance is located as a data point set corresponding to the 2 nd newly added bill of materials; marking the marking value of the bill of materials corresponding to each data point in the data point set corresponding to the 2 nd newly-added bill of materials as the marking value of the corresponding data point in the data point set corresponding to the 2 nd newly-added bill of materials, sorting all data points in the data point set corresponding to the 2 nd newly-added bill of materials according to the sequence of the marking values from small to large, and marking the sorted data point set as the previous data point set corresponding to the 2 nd newly-added bill of materials.
Then, according to the previous data point set corresponding to the 2 nd newly added bill of materials, a first direction representation value of the data point corresponding to the 2 nd newly added bill of materials is obtained, and according to the neighborhood data point set of the data point corresponding to the 2 nd newly added bill of materials, a second direction representation value of the data point corresponding to the 2 nd newly added bill of materials is obtained; in this embodiment, the method for obtaining the first direction representation value and the second direction representation value of the data point corresponding to the 2 nd new bill of materials is the same as the method for obtaining the first direction representation value and the second direction representation value of the data point corresponding to the 1 st new bill of materials, so that the detailed description will not be given.
Then, recording the data point corresponding to the 2 nd new added bill of materials and all the first updated data points as second updated data points; calculating the normalized value of the Euclidean distance between any two second updated data points, performing density clustering on all the second updated data points according to the normalized value of the Euclidean distance between any two second updated data points, the initial neighborhood radius and the target neighborhood density threshold corresponding to each second updated data point to obtain new cluster clusters, and recording the obtained new cluster clusters as the 2 nd updated cluster clusters.
Next, according to the 3 rd new bill of materials in the new bill of materials collection and each 2 nd updated cluster, each 3 rd updated cluster is obtained, and so on, each W-th updated cluster is obtained, wherein W is the number of the new bill of materials in the new bill of materials collection; and marking each W-th updated cluster as a second cluster. The method of obtaining each 3 rd update cluster and each W-th update cluster in this embodiment is the same as the method of obtaining each 2 nd update cluster and each 1 st update cluster, and thus will not be described in detail.
Then, according to each second cluster, obtaining a breeding character characteristic predicted value corresponding to each first residual bill of materials in the first residual bill set; the method for obtaining the characteristic predicted value of the breeding property corresponding to each of the first remaining bill of materials in the first remaining bill of materials set according to each of the second clusters is the same as the method for obtaining the characteristic predicted value of the breeding property corresponding to each of the remaining bill of materials in the remaining bill of materials set according to each of the first clusters in step S003, so the process for obtaining the characteristic predicted value of the breeding property corresponding to each of the first remaining bill of materials in the first remaining bill of materials set will not be described in detail.
So far, the characteristic prediction value of the breeding property corresponding to each first residual bill of materials in the first residual bill set is obtained.
Step S005, according to the breeding trait characteristic predicted value corresponding to each first remaining bill of materials in the first remaining bill of materials set, acquiring a new bill of materials set and a new remaining bill of materials set in the first remaining bill of materials set, using the new bill of materials set and the new remaining bill of materials set as the first bill of materials set and the first remaining bill of materials set, returning to execute the steps of breeding and screening each first bill of materials in the first bill of materials set until it is detected that the breeding trait characteristic value corresponding to any one of the first bill of materials set meets the preset target breeding trait characteristic value requirement, and using the first bill of materials corresponding to the preset target breeding trait characteristic value requirement as a bill of materials required for breeding.
Acquiring a new material bill set and a new material bill set from the first material bill sets according to the breeding character characteristic predicted value corresponding to each of the first material bill sets; the method for obtaining the new bill of materials set and the new bill of materials set in the first remaining bill of materials set according to the characteristic predictor of breeding property corresponding to each of the first remaining bill of materials set in step S003 is the same as the method for obtaining the first bill of materials set and the first remaining bill of materials set in the remaining bill of materials set according to the characteristic predictor of breeding property corresponding to each of the remaining bill of materials set in step S003, and therefore this embodiment will not be described in detail.
Then, the obtained new material bill set and the new remaining material bill set are used as the first material bill set and the first remaining material bill set again, and the step of executing the breeding and screening of each first material bill in the first material bill set in the step S004 is returned until the breeding characteristic feature value corresponding to any one first material bill in the first target material bill set is detected to meet the preset target breeding characteristic feature value requirement, the breeding experiment is stopped, and the first material bill corresponding to the preset target breeding characteristic feature value requirement is used as the material bill required by breeding; the requirement of meeting the preset target breeding property characteristic value refers to that the absolute value of the difference value between the breeding property characteristic value corresponding to the bill of materials and the preset target breeding property characteristic value is smaller than or equal to a first threshold value.
Thus, the embodiment obtains a bill of materials meeting the requirement of the preset target breeding character characteristic value.
In the embodiment, a bill of materials set and a remaining bill of materials set are firstly obtained; breeding and screening each bill of materials in the bill of materials collection to obtain a target bill of materials collection; if the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials set do not meet the preset requirement of the characteristic values of the target breeding characters, acquiring the characteristic predicted value of the breeding characters corresponding to each residual bill of materials in the residual bill of materials set according to the correlation among the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials set, acquiring a first bill of materials set in the residual bill of materials set according to the characteristic predicted value of the breeding characters, and acquiring a first bill of materials set in the residual bill of materials set; and continuing to breed and screen each first bill of materials in the first bill of materials set until the characteristic value of the breeding character corresponding to any one of the first bill of materials in the first target bill of materials set meets the requirement of the preset characteristic value of the target breeding character. According to the embodiment, the efficiency of a breeding experiment can be improved by acquiring the characteristic predicted value of the breeding character corresponding to the bill of materials, namely, the bill of materials required by breeding can be obtained relatively quickly.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.
Claims (10)
1. A method for predicting breeding data based on correlation analysis, the method comprising the steps of:
acquiring a bill of materials set and a residual bill of materials set;
breeding each bill of materials in the bill of materials collection to obtain a characteristic value of the breeding property corresponding to each bill of materials, and screening the bill of materials contained in the bill of materials collection according to the characteristic value of the breeding property corresponding to each bill of materials to obtain a target bill of materials collection;
if the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials set do not meet the preset requirement of the characteristic values of the target breeding characters, acquiring the characteristic predicted values of the breeding characters corresponding to each residual bill of materials in the residual bill of materials set according to the correlation among the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials set, and acquiring a first bill of materials set and a first residual bill of materials set in the residual bill of materials set according to the characteristic predicted values of the breeding characters corresponding to each residual bill of materials in the residual bill of materials set;
Breeding and screening each bill of materials in the first bill of materials set to obtain a first target bill of materials set, and if the characteristic values of the breeding characters corresponding to each first bill of materials in the first target bill of materials set do not meet the preset requirement of the characteristic values of the target breeding characters, obtaining the characteristic predicted values of the breeding characters corresponding to each first residual bill of materials in the first residual bill of materials set according to the correlation between the characteristic values of the breeding characters corresponding to each first bill of materials in the first target bill of materials set and the correlation between the characteristic values of the breeding characters corresponding to each bill of materials in the target bill of materials set;
according to the breeding character feature predicted value corresponding to each first residual bill of materials in the first residual bill of materials set, acquiring a new bill of materials set and a new residual bill of materials set from the first residual bill of materials set, taking the new bill of materials set and the new residual bill of materials set as the first bill of materials set and the first residual bill of materials set, returning to execute the steps of breeding and screening each first bill of materials in the first bill of materials set until detecting that the breeding character feature value corresponding to any one first bill of materials in the first target bill of materials set meets the preset target breeding character feature value requirement, and taking the first bill of materials corresponding to the preset target breeding character feature value requirement as a bill of materials required for breeding.
2. The correlation analysis-based breeding data prediction method according to claim 1, wherein the method of acquiring the bill of materials set and the remaining bill of materials set comprises:
acquiring a bill of materials collection to be tested and a breeding limit threshold;
randomly selecting a breeding limit threshold number of bill of materials to be tested from the bill of materials to be tested, marking all the selected bill of materials to be tested as a bill of materials, and marking a set constructed by all the selected bill of materials as a bill of materials set;
and marking the set constructed by all the bill of materials to be tested in the bill of materials set except the bill of materials in the bill of materials set as a remaining bill of materials set, and marking all the bill of materials to be tested in the remaining bill of materials set as a remaining bill of materials.
3. The correlation analysis-based breeding data prediction method according to claim 1, wherein the method for obtaining the target bill of materials set comprises:
judging whether the characteristic value of the breeding property corresponding to each bill of materials in the bill of materials collection is larger than or equal to the preset minimum characteristic value of the breeding property, if so, marking the corresponding bill of materials as a target bill of materials, and marking the collection constructed by all the target bill of materials as a target bill of materials collection.
4. The correlation analysis-based breeding data prediction method according to claim 1, wherein the fact that the breeding trait characteristic value corresponding to each bill of material in the target bill of material set does not meet the preset target breeding trait characteristic value requirement means that the absolute value of the difference between the breeding trait characteristic value corresponding to each bill of material in the target bill of material set and the preset target breeding trait characteristic value is greater than the first threshold.
5. The method for predicting breeding data based on correlation analysis according to claim 1, wherein the method for obtaining a predicted value of a characteristic of breeding corresponding to each remaining bill of materials in the set of remaining bill of materials comprises:
constructing a multidimensional material data space;
mapping each bill of materials in the target bill of materials set into the multidimensional material data space, acquiring data points corresponding to each bill of materials in the target bill of materials set, and recording the data points corresponding to each bill of materials in the target bill of materials set as first data points;
marking each bill of materials in the target bill of materials set to obtain a marking value of each bill of materials in the target bill of materials set;
Acquiring an initial neighborhood radius and an initial neighborhood density threshold;
the method comprises the steps of recording a set constructed by all first data points in an initial neighborhood radius range taking each first data point as a center as a first data point set corresponding to the corresponding first data point;
marking the normalized value of the variance of the characteristic values of the breeding characters corresponding to all the first data points in the first data point set corresponding to each first data point as the stability of the characters corresponding to the corresponding first data points;
adding the property stability degree corresponding to each first data point with a natural constant 1, and multiplying the property stability degree by an initial neighborhood density threshold value to obtain a value which is recorded as a target neighborhood density threshold value corresponding to the corresponding first data point;
calculating the normalized value of the Euclidean distance between any two first data points, performing density clustering on all the first data points according to the normalized value of the Euclidean distance between any two first data points, the initial neighborhood radius and the target neighborhood density threshold corresponding to each first data point, and recording each cluster obtained by clustering as a first cluster;
and obtaining a breeding character characteristic predicted value corresponding to each residual bill of materials in the residual bill set according to each first cluster.
6. The method for predicting breeding data based on correlation analysis according to claim 5, wherein the method for obtaining a predicted value of a characteristic of breeding corresponding to each of the remaining bill of materials in the remaining bill set according to each first cluster comprises:
for any of the remaining bill of materials a:
mapping the residual bill of materials a into the multidimensional material data space to obtain data points corresponding to the residual bill of materials a, and marking the data points as second data points;
calculating Euclidean distance between a second data point corresponding to the residual bill of materials a and each first data point, and marking a first cluster where the first data point corresponding to the minimum Euclidean distance is located as a target first cluster;
for any dimension in the multi-dimensional material data space: acquiring a pearson correlation coefficient between a projection value of each first data point in the target first cluster onto a coordinate axis corresponding to the dimension and a breeding character characteristic value corresponding to each first data point in the target first cluster, and recording the pearson correlation coefficient as the pearson correlation coefficient between the dimension and the breeding character characteristic; marking the normalized value of the pearson correlation coefficient between the dimension and the breeding character characteristic as a weight value corresponding to the dimension; performing linear fitting on the projection value of each first data point in the target first cluster onto the coordinate axis corresponding to the dimension and the characteristic value of the breeding property corresponding to each first data point in the target first cluster, and marking the fitted curve as a fitted curve corresponding to the dimension;
Obtaining an initial predicted value corresponding to the residual bill of materials a in the dimension according to the projection value of the second data point corresponding to the residual bill of materials a projected onto the coordinate axis corresponding to the dimension and the fitting curve corresponding to the dimension;
the product of the initial predicted value corresponding to the residual bill of materials a in the dimension and the weight value corresponding to the dimension is recorded as a target predicted value corresponding to the residual bill of materials a in the dimension;
and (3) marking the average value of the target predicted values corresponding to the residual bill of materials a in all dimensions in the multidimensional bill of materials data space as the characteristic predicted value of the breeding property corresponding to the residual bill of materials a.
7. The correlation analysis-based breeding data prediction method according to claim 2, wherein the method of acquiring the first bill of materials set and the first remaining bill of materials set comprises:
sequencing all the residual bill of materials in the residual bill of materials set according to the sequence of the characteristic predictive value of the breeding character from big to small to obtain a residual bill of materials sequence;
starting from the first residual bill of materials in the residual bill sequence, marking a set constructed by N0 continuously acquired residual bill of materials as a first bill of materials set, marking each residual bill of materials in the first bill of materials set as a first bill of materials, wherein N0 is a breeding limit threshold;
And marking all the remaining bill of materials except the first bill of materials in the sequence of the remaining bill of materials as a first remaining bill of materials, and marking a set constructed by all the first remaining bill of materials as a first remaining bill set.
8. The method of claim 6, wherein the step of obtaining a predicted value of a characteristic feature of breeding corresponding to each of the first remaining bill of materials in the first set of remaining bill of materials comprises:
marking each bill of materials in the first target bill of materials set to obtain a marking value of each first bill of materials in the first target bill of materials set;
sequencing all the first bill of materials in the first target bill of materials set according to the sequence of the marking values from small to large, marking the sequenced first bill of materials set as a newly added bill of materials set, and marking each first bill of materials in the newly added bill of materials set as a newly added bill of materials; the number of the newly-added bill of materials in the newly-added bill of materials set is W;
obtaining new clusters according to the 1 st newly-added bill of materials and each first cluster in the newly-added bill of materials set, and marking the new clusters as 1 st updated clusters; according to the 2 nd new bill of materials and each 1 st updated cluster in the new bill of materials set, obtaining new clusters, and marking the new clusters as the 2 nd updated clusters; according to the 3 rd new bill of materials and every 2 nd updated cluster in the new bill of materials collection, get the new cluster, and mark as the 3 rd updated cluster; and so on, obtaining each W-th updated cluster;
The W-th updated cluster clusters are marked as second cluster clusters;
and obtaining a breeding character characteristic predicted value corresponding to each first residual bill of materials in the first residual bill set according to each second cluster.
9. The method for predicting breeding data based on correlation analysis as claimed in claim 8, wherein the method for obtaining each 1 st update cluster comprises:
mapping each newly added bill of materials in the newly added bill of materials set into the multidimensional material data space, and obtaining a data point corresponding to each newly added bill of materials in the newly added bill of materials set;
for the data points corresponding to the 1 st new bill of materials in the set of new bill of materials:
calculating Euclidean distance between the data point corresponding to the 1 st newly added bill of materials and each first data point, and recording a set constructed by all first data points in a first cluster where the first data point corresponding to the minimum Euclidean distance is located as a data point set corresponding to the 1 st newly added bill of materials;
marking the marking value of the bill of materials corresponding to each data point in the data point set corresponding to the 1 st newly-added bill of materials as the marking value of the corresponding data point in the data point set corresponding to the 1 st newly-added bill of materials, sorting all data points in the data point set corresponding to the 1 st newly-added bill of materials according to the sequence of the marking values from small to large, and marking the sorted data point set as the previous data point set corresponding to the 1 st newly-added bill of materials;
The vector set constructed by vectors formed by all adjacent two data points in the previous data point set is marked as a vector set corresponding to the data point corresponding to the 1 st newly added bill of materials;
according to cosine similarity between adjacent vectors in the vector set, obtaining a first direction representation value of a data point corresponding to the 1 st newly-added bill of materials;
the method comprises the steps of (1) marking a set constructed by all first data points in an initial neighborhood radius range by taking data points corresponding to a 1 st newly added bill of materials as centers as a neighborhood data point set of the data points corresponding to the 1 st newly added bill of materials;
the average value of the characteristic values of the breeding corresponding to all the data points in the neighborhood data point set obtained through calculation is recorded as the neighborhood average value of the data points corresponding to the 1 st newly added bill of materials;
marking the normalized value of the characteristic value of the breeding character corresponding to the 1 st newly added bill of materials and the absolute value of the difference value of the neighborhood mean value as a second direction representation value of the data point corresponding to the 1 st newly added bill of materials;
the product of the first direction representation value of the data point corresponding to the 1 st newly added bill of materials and the second direction representation value corresponding to the 1 st newly added bill of materials is recorded as a density optimization factor of the data point corresponding to the 1 st newly added bill of materials;
Recording the product of the density optimization factor of the data point corresponding to the 1 st newly-added bill of materials and the initial neighborhood density threshold as the target neighborhood density threshold of the data point corresponding to the 1 st newly-added bill of materials;
recording the data point corresponding to the 1 st newly added bill of materials and all the first data points as first updated data points;
calculating the normalized value of the Euclidean distance between any two first updated data points, performing density clustering on all the first updated data points according to the normalized value of the Euclidean distance between any two first updated data points, the initial neighborhood radius and the target neighborhood density threshold corresponding to each first updated data point to obtain new cluster clusters, and recording the obtained new cluster clusters as 1 st updated cluster clusters.
10. The correlation analysis-based breeding data prediction method according to claim 1, wherein the first direction characterization value of the data point corresponding to the 1 st newly added bill of materials is calculated according to the following formula:
wherein,for the first direction representation value of the data point corresponding to the 1 st newly added bill of materials, t is the number of vectors in the vector set corresponding to the data point corresponding to the 1 st newly added bill of materials,/ >For the j-th vector in the vector set corresponding to the data point corresponding to the 1 st newly added bill of materials,/->For the j+1st vector in the vector set corresponding to the data point corresponding to the 1 st newly added bill of materials,/for>And the cosine similarity between the j vector and the j+1 vector in the vector set corresponding to the data point corresponding to the 1 st newly added bill of materials.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410212922.2A CN117789893B (en) | 2024-02-27 | 2024-02-27 | Breeding data prediction method based on correlation analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410212922.2A CN117789893B (en) | 2024-02-27 | 2024-02-27 | Breeding data prediction method based on correlation analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117789893A true CN117789893A (en) | 2024-03-29 |
CN117789893B CN117789893B (en) | 2024-04-30 |
Family
ID=90396726
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410212922.2A Active CN117789893B (en) | 2024-02-27 | 2024-02-27 | Breeding data prediction method based on correlation analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117789893B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118656657A (en) * | 2024-08-21 | 2024-09-17 | 河北省农林科学院农业信息与经济研究所 | Multi-model fusion method based on wheat yield estimation |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572900A (en) * | 2014-12-25 | 2015-04-29 | 北京农业信息技术研究中心 | Trait characteristic selection method for crop breeding evaluation |
CN104951987A (en) * | 2015-06-19 | 2015-09-30 | 北京农业信息技术研究中心 | Decision tree based crop breeding evaluation method |
CN105248272A (en) * | 2015-10-20 | 2016-01-20 | 陕西省杂交油菜研究中心 | Breeding method of high-photosynthetic-efficiency core collections of rape |
CN105838810A (en) * | 2016-05-20 | 2016-08-10 | 广西壮族自治区农业科学院甘蔗研究所 | Screening method of saccharum arundinaceum breeding material |
CN106688592A (en) * | 2017-01-19 | 2017-05-24 | 天津市农业技术推广站 | Screening method for premature senility resistance characters in cotton breeding process |
CN111771716A (en) * | 2020-07-24 | 2020-10-16 | 江西省农业科学院作物研究所 | Crop genetic breeding method for efficiently utilizing heterosis |
-
2024
- 2024-02-27 CN CN202410212922.2A patent/CN117789893B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572900A (en) * | 2014-12-25 | 2015-04-29 | 北京农业信息技术研究中心 | Trait characteristic selection method for crop breeding evaluation |
CN104951987A (en) * | 2015-06-19 | 2015-09-30 | 北京农业信息技术研究中心 | Decision tree based crop breeding evaluation method |
CN105248272A (en) * | 2015-10-20 | 2016-01-20 | 陕西省杂交油菜研究中心 | Breeding method of high-photosynthetic-efficiency core collections of rape |
CN105838810A (en) * | 2016-05-20 | 2016-08-10 | 广西壮族自治区农业科学院甘蔗研究所 | Screening method of saccharum arundinaceum breeding material |
CN106688592A (en) * | 2017-01-19 | 2017-05-24 | 天津市农业技术推广站 | Screening method for premature senility resistance characters in cotton breeding process |
CN111771716A (en) * | 2020-07-24 | 2020-10-16 | 江西省农业科学院作物研究所 | Crop genetic breeding method for efficiently utilizing heterosis |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118656657A (en) * | 2024-08-21 | 2024-09-17 | 河北省农林科学院农业信息与经济研究所 | Multi-model fusion method based on wheat yield estimation |
Also Published As
Publication number | Publication date |
---|---|
CN117789893B (en) | 2024-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117789893B (en) | Breeding data prediction method based on correlation analysis | |
CN111199016A (en) | DTW-based improved K-means daily load curve clustering method | |
Firpi et al. | Swarmed feature selection | |
CN111028100A (en) | Refined short-term load prediction method, device and medium considering meteorological factors | |
Oliva et al. | Multilevel thresholding by fuzzy type II sets using evolutionary algorithms | |
Ressom et al. | Adaptive double self-organizing maps for clustering gene expression profiles | |
CN110085322A (en) | A kind of improved method of k-means cluster diabetes Early-warning Model | |
Mousavi et al. | Improving customer clustering by optimal selection of cluster centroids in k-means and k-medoids algorithms | |
Resti et al. | Identification of corn plant diseases and pests based on digital images using multinomial naïve bayes and k-nearest neighbor | |
WO2018142816A1 (en) | Assistance device and assistance method | |
CN109034238A (en) | A kind of clustering method based on comentropy | |
CN114677592A (en) | Small sample SAR image target detection method based on pruning element learning | |
CN116206208B (en) | Forestry plant diseases and insect pests rapid analysis system based on artificial intelligence | |
CN117409260A (en) | Small sample image classification method and device based on depth subspace embedding | |
CN114219051B (en) | Image classification method, classification model training method and device and electronic equipment | |
CN115830413A (en) | Image feature library updating method, image feature library checking method and related equipment | |
CN112102880A (en) | Method for identifying variety, and method and device for constructing prediction model thereof | |
CN111383717B (en) | Method and system for constructing biological information analysis reference data set | |
CN113344140A (en) | Uncertain data sequence scanning method and system based on pruning conditions | |
Eldesoky et al. | DETECTING MILDEW DISEASES IN CUCUMBER USING IMAGE PROCESSING TECHNIQUE | |
Naenna et al. | DNA classifications with self-organizing maps (SOMs) | |
CN117078621B (en) | Cell strain stability determination method, device, computer equipment and storage medium | |
CN116757338B (en) | Crop yield prediction method, device, electronic equipment and storage medium | |
CN114928477B (en) | Network intrusion detection method and device, readable storage medium and terminal equipment | |
Islam et al. | RESTRAC: reference sequence based space transformation for clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |