CN115204323B

CN115204323B - Seed multi-feature based clustering and synthesis method, system, device and medium

Info

Publication number: CN115204323B
Application number: CN202211125597.3A
Authority: CN
Inventors: 邵德意; 王俊华; 尹合兴; 刘祥杰; 王永卡; 祝明新; 田冰川
Original assignee: Huazhi Biotechnology Co ltd
Current assignee: Huazhi Biotechnology Co ltd
Priority date: 2022-09-16
Filing date: 2022-09-16
Publication date: 2022-12-02
Anticipated expiration: 2042-09-16
Also published as: CN115204323A

Abstract

The invention discloses a clustering and comprehensive method, a system, equipment and a medium based on multiple characteristics of seeds, which firstly form a vector set of the multiple characteristics according to characteristic parameters of the seeds and take the multiple characteristics as classification standards; then obtaining the maximum aggregation classification number by a threshold method and carrying out classification by a fuzzy clustering algorithm, wherein the fuzzy clustering algorithm classifies the seeds with multiple characteristics through clustering analysis without manual division; after central vectors and a set of sub-vectors of all sub-clusters are obtained, bayes information values of all the aggregation classification numbers are calculated, the aggregation classification number with the maximum Bayes information value is selected, and unreasonable aggregation classification numbers are avoided; and finally, calculating to obtain a center vector set of the optimal aggregation classification number through the optimal aggregation classification number, calculating a space average Euclidean distance and a multi-dimensional space included angle according to the center vector set of the optimal aggregation classification number, and finally calculating to obtain a dispersion as comprehensive output of seed classification, so that the accuracy is improved.

Description

Seed multi-feature based clustering and synthesis method, system, device and medium

Technical Field

The invention relates to the technical field of feature classification, in particular to a seed multi-feature based clustering and synthesis method, system, equipment and medium.

Background

In the production, processing and circulation process of crop seeds, according to the regulations of crop seed inspection regulations, representative test samples of different batches of seeds need to be sampled and inspected to determine the quality, health and whether the seeds are mixed.

The traditional data classification of sampling results is based on the principle of significance statistical analysis to carry out difference analysis, the method can distinguish seed batches with larger differences, but generally uses the analysis of single index characters; considering that the characteristic values of the seeds in the same batch of the same variety have certain continuity difference distribution, the characteristic value data of the seeds in the same batch of different varieties may partially have coincidence intersection, and the characteristic values of the seeds have certain fluctuation due to different years and different cultivation management modes, the traditional data classification of the sampling result is difficult to distinguish the characteristic data difference between the seeds and in the seeds, and more is an aggregation classification method based on a single characteristic value. In addition, the current commonly used K-means clustering method is a hard classification method, and points at various junctions are easy to be misjudged, so that the identifiability is reduced.

Disclosure of Invention

The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention provides a method, a system, equipment and a medium for clustering and integrating based on multiple characteristics of seeds, which can effectively improve the accuracy of seed classification.

In a first aspect, an embodiment of the present invention provides a method for clustering and synthesizing based on seed multiple features, including:

obtaining N seeds, extracting M characteristic parameters of each seed, and respectively forming a data set by the Kth characteristic parameter of each seed; obtaining a vector set with a dimension of M and a length of N according to the data set; wherein N and M are integers, and K is an integer less than or equal to M;

determining the maximum aggregation classification number from the preset aggregation classification number according to a threshold value method

；

From

Sequentially selecting integers as aggregation classification numbers, and carrying out aggregation classification operation on the vector set according to the aggregation classification numbers and a fuzzy clustering algorithm to obtain central vectors and a sub-vector set of all sub-clusters of the aggregation classification numbers;

calculating the Bayesian information value of each aggregation classification number through the central vector and the sub-vector set of the sub-clusters;

selecting the aggregation classification number corresponding to the maximum Bayesian information value as the optimal aggregation classification number;

calculating to obtain a central vector set of the optimal aggregation classification number through the optimal aggregation classification number;

calculating the space average Euclidean distance and the multi-dimensional space included angle through the central vector set of the optimal aggregation classification number; calculating to obtain dispersion according to the space average Euclidean distance and the multi-dimensional space included angle;

and outputting the central vector set of the optimal aggregation classification number and the dispersion degree as the result of the seed clustering and synthesis.

The method provided by the embodiment of the invention has at least the following beneficial effects:

the method comprises the steps of firstly forming a multi-feature vector set according to the feature parameters of seeds, and taking a plurality of features as classification standards, so that the classification accuracy is improved; then obtaining the maximum aggregation classification number by a threshold method and classifying the maximum aggregation classification number by a fuzzy clustering algorithm, wherein the fuzzy clustering algorithm classifies the seeds with a plurality of characteristics through clustering analysis without manual division; after the central vectors and the sub-vector sets of all the sub-clusters are obtained, the Bayesian information value of each aggregation classification number is calculated, the aggregation classification number with the maximum Bayesian information value is selected, and the problem of misclassification caused by unreasonable aggregation classification numbers is solved; and finally, calculating to obtain a center vector set of the optimal aggregation classification number through the optimal aggregation classification number, calculating the space average Euclidean distance and the multi-dimensional space included angle according to the center vector set of the optimal aggregation classification number, calculating to obtain the dispersion, taking the center vector set and the dispersion of the optimal aggregation classification number as output results, and distinguishing the characteristic data difference between seeds and in the seeds more easily through the center vector set and the dispersion of the optimal aggregation classification number, so that the accuracy of seed classification is improved, and the problem that the seed characteristic difference among batches, between varieties and different storage time is difficultly considered is solved.

According to some embodiments of the present invention, after the K-th characteristic parameters of the N seeds are configured into a data set, the method further includes the steps of:

and carrying out median filtering on the data set to remove the outlier data.

According to some embodiments of the invention, the maximum number of aggregated classifications is determined from a preset number of aggregated classifications according to a thresholding method

The method comprises the following steps:

wherein the content of the first and second substances,

represents a preset number of aggregation classifications,

representing a rounding function.

According to some embodiments of the invention, the fuzzy clustering algorithm stopping condition comprises: the iteration times exceed a threshold value or the variance of all data in an FIFO buffer area is less than 0.001, and the FIFO buffer area is used for storing the objective function value obtained by each iteration calculation of the fuzzy clustering algorithm.

According to some embodiments of the invention, the calculation formula of the bayesian information value comprises:

wherein the content of the first and second substances,

a Bayesian information value representing the aggregate classification number, the

Is shown as

The number of vector points for a sub-cluster,

is shown as

The covariance of the sub-clusters is,

the dimensions of the display are represented by,

a penalty factor is indicated.

According to some embodiments of the invention, the calculating a spatial average euclidean distance and a multidimensional spatial angle by the set of center vectors of the optimal aggregate classification number comprises:

calculating the spatially averaged euclidean distance by:

calculating Euclidean distance between the central vector of each sub-cluster and the central vectors of other sub-clusters;

averaging all Euclidean distances except the maximum Euclidean distance to obtain the average Euclidean distance of each sub-cluster;

averaging the average Euclidean distances of all the sub-clusters to obtain the spatial average Euclidean distance;

calculating the multi-dimensional spatial angle by:

calculating the average central point of the central vectors of all the sub-clusters, and calculating the included angle between the central point of the central vector of each sub-cluster and the average central point:

wherein the content of the first and second substances,

an angle between a center point of a center vector representing the sub-cluster and the mean center point,

the mean center point is represented by the mean center point,

a center point representing a center vector of the sub-cluster;

and averaging the included angles between the central points of the central vectors of all the sub-clusters and the average central point to obtain the multi-dimensional spatial included angle.

According to some embodiments of the invention, the dispersion is calculated by the following formula:

wherein the content of the first and second substances,

the degree of dispersion is represented by a value,

the spatial average euclidean distance is represented,

representing a multi-dimensional spatial angle.

In a second aspect, an embodiment of the present invention provides a system for clustering and synthesizing based on seed multi-features, including:

the data acquisition module is used for acquiring N seeds, extracting M characteristic parameters of each seed and respectively forming a data set by the Kth characteristic parameter of each seed; obtaining a vector set with a dimension of M and a length of N according to the data set; wherein N and M are integers, and K is an integer less than or equal to M;

a maximum aggregation classification number selection module for selecting the maximum aggregation classification number from the pre-aggregation classification numbers according to a threshold methodDetermining the maximum number of aggregation classes among the set number of aggregation classes

；

An aggregate classification module for classifying

the Bayesian information acquisition module is used for calculating a Bayesian information value of each aggregation classification number through the central vector and the sub-vector set of the sub-clusters;

the optimal aggregation classification number selection module is used for selecting the aggregation classification number corresponding to the maximum Bayesian information value as the optimal aggregation classification number;

the clustering center vector assembly module is used for calculating the optimal aggregation classification number to obtain a center vector assembly of the optimal aggregation classification number;

the dispersion calculation module is used for calculating the space average Euclidean distance and the multi-dimensional space included angle through the central vector set of the optimal aggregation classification number; calculating to obtain dispersion according to the space average Euclidean distance and the multi-dimensional space included angle;

and the output module is used for outputting the central vector set of the optimal aggregation classification number and the dispersion as the result of the seed clustering and synthesis.

In a third aspect, an embodiment of the present invention provides an electronic device, including at least one control processor and a memory communicatively coupled to the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform the method of seed multi-feature based clustering and synthesis according to the first aspect.

In a fourth aspect, embodiments of the present invention provide a computer storage medium having stored thereon computer-executable instructions for causing a computer to perform the method for seed multi-feature based clustering and synthesis as described in the first aspect.

It should be noted that the beneficial effects between the second to fourth aspects of the present invention and the prior art are the same as the beneficial effects of the method for seed multi-feature based clustering and synthesis of the first aspect, and will not be described in detail herein.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of a method for seed multi-feature based clustering and synthesis according to an embodiment of the present invention;

FIG. 2 is a block diagram of a system for seed multi-feature based clustering and synthesis according to an embodiment of the present invention;

FIG. 3 is a block diagram of an electronic device according to an embodiment of the invention;

FIG. 4 is a flow chart of the calculation of the spatial average Euclidean distance according to an embodiment of the present invention;

fig. 5 is a flowchart of a multi-dimensional spatial angle calculation according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

In the description of the present invention, if there are first, second, etc. described, it is only for the purpose of distinguishing technical features, and it is not understood that relative importance is indicated or implied or the number of indicated technical features is implicitly indicated or the precedence of the indicated technical features is implicitly indicated.

In the description of the present invention, it should be understood that the orientation or positional relationship referred to, for example, the upper, lower, etc., is indicated based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, but does not indicate or imply that the device or element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.

In the description of the present invention, it should be noted that unless otherwise explicitly defined, terms such as arrangement, installation, connection and the like should be broadly understood, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.

Referring to fig. 1, in some embodiments of the present invention, a method for seed multi-feature based clustering and synthesis is provided, including:

s100, acquiring N seeds, extracting M characteristic parameters of each seed, and respectively forming a data set by the Kth characteristic parameter of each seed; obtaining a vector set with a dimension of M and a length of N according to the data set; wherein N and M are integers, and K is an integer less than or equal to M;

step S200, determining the maximum aggregation classification number from the preset aggregation classification number according to a threshold value method

；

Step S300, from

Sequentially selecting integers as the aggregate classification number, andperforming aggregation classification operation on the vector set according to the aggregation classification number and a fuzzy clustering algorithm to obtain central vectors and a sub-vector set of all sub-clusters of the aggregation classification number;

step S400, calculating the Bayesian information value of each aggregation classification number through the central vector and the subvector set of the sub-clusters;

s500, selecting the aggregation classification number corresponding to the maximum Bayesian information value as the optimal aggregation classification number;

step S600, calculating the optimal aggregation classification number to obtain a central vector set of the optimal aggregation classification number;

step S700, calculating the space average Euclidean distance and the multi-dimensional space included angle through the central vector set of the optimal aggregation classification number; calculating according to the space average Euclidean distance and the multi-dimensional space included angle to obtain dispersion;

and step S800, outputting the central vector set and the dispersion of the optimal aggregation classification number as the result of seed clustering and synthesis.

In step S100 of the embodiment of the method, first, a multi-feature vector set is formed according to the feature parameters of the seeds, and the features are used as the classification standard, so that the classification accuracy is improved; then in step S200, the maximum aggregation classification number is obtained through a threshold method, so that the condition that the maximum aggregation classification number needs to be determined through manual sampling detection is avoided; then, in the step S300, fuzzy clustering algorithm classification is carried out from 1 to the maximum aggregation classification number in sequence, the fuzzy clustering algorithm classifies the seeds with a plurality of characteristics through clustering analysis, and manual division is not needed; after the central vectors and the sub-vector sets of all the sub-clusters are obtained, the Bayesian information value of each aggregation classification number is calculated in the step S400, and the aggregation classification number with the maximum Bayesian information value is selected in the step S500, so that the problem of misclassification caused by unreasonable aggregation classification numbers is solved; finally, in step S600, a central vector set of the optimal aggregation classification number is obtained through calculation of the optimal aggregation classification number, in step S700, a spatial average euclidean distance and a multi-dimensional spatial angle are calculated according to the central vector set of the optimal aggregation classification number, finally, dispersion is obtained through calculation, the central vector set and the dispersion of the optimal aggregation classification number are used as output results, and the characteristic data differences among seeds and in the seeds are more easily distinguished through the central vector set and the dispersion of the optimal aggregation classification number, so that the accuracy of seed classification is improved, and the problem that the seed characteristic differences among batches, among varieties and in different storage times are difficult to consider is solved.

In some embodiments of the present invention, after forming a data set by the kth characteristic parameter of the N seeds, the method further includes the steps of:

and carrying out median filtering on the data set to remove the outlier data.

Specifically, a certain number of seeds are obtained, the number of the seeds is N, and M characteristic parameters of each seed are obtained

Then, for the Kth feature parameter

All have a data set

. To exclude data sets due to incidental factors

The individual data in the data collection causes large deviation, and a median filter algorithm is adopted to filter the data collection of each characteristic parameter to remove the wild point data.

Data set

The process of implementing median filtering is as follows:

wherein, the first and the second end of the pipe are connected with each other,

the operator for taking the median value is shown,

the order of the filter is represented by,

is an odd number.

The median filter is adopted to remove data wild points caused by accidental factors in the data set, so that the main change trend of the whole data set is not damaged, and the data level of the seeds can be reflected by the data set.

In some embodiments of the invention, the maximum number of aggregated classes is determined from a preset number of aggregated classes according to a thresholding method

The method comprises the following steps:

wherein the content of the first and second substances,

represents a preset number of aggregation classifications to be made,

representing a rounding function.

The maximum aggregation classification number is determined by a threshold method without manual determination, and the algorithm can calculate the maximum aggregation classification number according to the input classification number only by inputting an approximate classification number, so that the labor cost and the time cost are saved.

In some embodiments of the invention, the fuzzy clustering algorithm stopping condition comprises: the iteration times exceed a threshold value or the variance of all data in the FIFO buffer area is less than 0.001, and the FIFO buffer area is used for storing the objective function value obtained by each iteration calculation of the fuzzy clustering algorithm.

For each clustering operation, the start-stop condition has a first condition and a second condition, and the first condition and the second condition are in a logical or relationship:

the first condition is as follows: the iteration times exceed 100 times and stop operation;

and a second condition: defining a FIFO buffer area with the length of 30 points, which is used for storing a target function value obtained by calculation in each iteration of an FCM algorithm (a fuzzy clustering algorithm used in the embodiment); when the variance of all data in the FIFO is less than 0.001, the operation is stopped.

And by the limitation of the stopping condition, each aggregation classification operation is ensured to perform sufficient clustering operation, and a relatively accurate clustering result is obtained.

In some embodiments of the invention, the calculation formula of the bayesian information value comprises:

wherein the content of the first and second substances,

Is shown as

The number of vector points for a sub-cluster,

is shown as

The covariance of the sub-clusters is,

the dimensions of the display are represented by,

a penalty factor is indicated.

The best aggregate classification number is obtained by Bayesian information criterion calculation between the aggregate classification number 1 and the maximum aggregate classification number, and more accurate classification results are selected through Bayesian information, so that the accuracy and robustness of seed classification are improved.

Referring to fig. 4 and 5, in some embodiments of the present invention, calculating the spatial average euclidean distance and the multi-dimensional spatial angle by the set of central vectors of the optimal aggregate classification number includes:

the spatial average euclidean distance is calculated by:

and step S701, calculating Euclidean distances between the central vector of each sub-cluster and the central vectors of other sub-clusters.

Step S702, averaging all Euclidean distances except the maximum Euclidean distance to obtain the average Euclidean distance of each sub-cluster.

And step S703, averaging the average Euclidean distances of all the sub-clusters to obtain a spatial average Euclidean distance.

Calculating the multi-dimensional space angle by the following method:

step S707, calculating the average center point of the center vectors of all sub-clusters, and calculating the included angle between the center point of the center vector of each sub-cluster and the average center point:

wherein the content of the first and second substances,

the mean center point is represented by the mean center point,

a center point representing a center vector of the sub-cluster;

step S708, averaging the included angles between the central point and the average central point of the central vectors of all the sub-clusters to obtain a multi-dimensional spatial included angle.

By calculating the dispersion of the spatial distance of the central point of each sub-cluster, the method not only provides a comprehensive judgment basis for multi-feature classification of the seeds, but also has lower calculation complexity and high realizability, integrates the dispersion of the central data points of various features, realizes the differentiation of the seed data difference between the seeds and the seed data difference in the seeds, and provides a method for transversely classifying the seeds between the seeds for sampling inspection and identification.

In some embodiments of the invention, the dispersion is calculated by the following formula:

wherein the content of the first and second substances,

the degree of dispersion is represented by a value,

the spatial average euclidean distance is represented,

representing a multi-dimensional spatial angle.

When the inter-species or intra-species needs to be subjected to transverse comparison classification, the divergence is taken as a parameter of a classification result to help analyze the transverse comparison of the seeds, and more accurate and more targeted classification can be performed.

To facilitate understanding by those skilled in the art, one embodiment of the present invention provides a method for clustering and synthesizing based on seed multi-features, comprising the steps of:

the first step, data filtering:

obtaining a certain number of seeds, N, and obtaining M characteristic parameters of each seed

Then, for the Kth feature parameter

All have a data set

. To exclude data sets due to incidental factors

The individual data in (2) causes large deviation, and a median filter algorithm is adopted to filter the data set of each characteristic parameter to remove 'outlier' data.

Data set

The process of implementing median filtering is as follows:

the operation of taking the median value is shown,

the order of the filter is represented by,

is an odd number.

Step two, aggregation classification:

first, data is collected

Synthesizing a vector set with dimension M and length N

Wherein each vector point

。

Then, the number of aggregation classes is manually set

Then the final determined maximum number of aggregation classes:

number of aggregation classification

From 1 to

Taking middle value, using fuzzy clustering algorithm (FCM algorithm is used in this embodiment) to set

Performing aggregate classification operation to the second

Center vector of sub-cluster

And a subset

Wherein

。

For each clustering operation, the start-stop condition has a first condition and a second condition, and the first condition and the second condition are a logical or relationship:

the first condition is as follows: stopping operation when the iteration times exceed 100 times;

and a second condition: defining a FIFO buffer area with the length of 30 points, and storing a target function value obtained by calculation in each iteration of the FCM; when the variance of all data in the FIFO is less than 0.001, the operation is stopped.

Finally, classifying the number according to the aggregation

Calculating Bayesian information of each aggregate classification number

，

. Bayes' computational expression is as follows:

Is shown as

The number of vector points of a sub-cluster,

denotes the first

The covariance of the sub-clusters is,

the dimensions of the display are represented by,

a penalty factor is indicated.

Based on the result of the Bayesian information, the maximum Bayesian information value is calculated, i.e.

. Obtaining the best aggregate classification number according to the maximum Bayesian information value

。

According to the optimal aggregation classification number

Selecting the central vector set of the sub-clusters obtained by calculation

Wherein

。

Thirdly, calculating the dispersion degree:

first, an average Euclidean distance is calculated

. Calculating each center point

With other central points

The euclidean distance between them,

(ii) a Removing the maximum Euclidean distance, averaging other Euclidean distances to obtain the central point

Mean euclidean distance of

(ii) a Averaging the average Euclidean distances of all the central points to obtain a space average Euclidean distance

。

Then, clustering the center vector set

Finding the mean center point

And calculating each center point

And mean center point

The included angle between:

；

averaging the included angle between each central point and the average central point to obtain a multi-dimensional space included angle

。

Finally, the dispersion

By spatial averaging Euclidean distances

And average multi-dimensional spatial angle

The combination is specifically expressed as follows:

。

referring to fig. 2, an embodiment of the present invention further provides a system for clustering and synthesizing based on seed multi-features, which includes a data obtaining module 1001, a maximum aggregation classification number selecting module 1002, an aggregation classification module 1003, a bayesian information obtaining module 1004, an optimal aggregation classification number selecting module 1005, a cluster center vector integrating module 1006, a dispersion calculating module 1007, and an output module 1008, wherein:

the data acquisition module 1001 is configured to acquire N seeds, extract M characteristic parameters of each seed, and respectively form a data set with the kth characteristic parameter of each seed; obtaining a vector set with a dimension of M and a length of N according to the data set; wherein N and M are integers, and K is an integer less than or equal to M.

A maximum aggregation classification number selecting module 1002, configured to determine the maximum aggregation classification number from preset aggregation classification numbers according to a threshold method

。

An aggregate classification module 1003 for classifying

And sequentially selecting integers as the aggregation classification number, and carrying out aggregation classification operation on the vector set according to the aggregation classification number and a fuzzy clustering algorithm to obtain central vectors and a sub-vector set of all sub-clusters of the aggregation classification number.

And the bayesian information obtaining module 1004 is configured to calculate a bayesian information value of each aggregation classification number by using the central vector of the sub-cluster and the set of sub-vectors.

The optimal aggregate classification number selecting module 1005 is configured to select the aggregate classification number corresponding to the maximum bayesian information value as the optimal aggregate classification number.

And a clustering center vector assembly module 1006, configured to calculate a center vector assembly of the optimal aggregation classification number through the optimal aggregation classification number.

The dispersion calculation module 1007 is used for calculating the space average Euclidean distance and the multi-dimensional space included angle through the central vector set of the optimal aggregation classification number; and calculating to obtain the dispersion according to the space average Euclidean distance and the multi-dimensional space included angle.

And the output module 1008 is configured to output the central vector set and the dispersion of the optimal aggregation classification number as a result of seed clustering and synthesis.

It should be noted that, since the system for clustering and synthesizing based on seed multi-features in the present embodiment is based on the same inventive concept as the above-mentioned method for clustering and synthesizing based on seed multi-features, the corresponding contents in the method embodiment are also applicable to the present apparatus embodiment, and are not described in detail herein.

Referring to fig. 3, another embodiment of the present invention further provides an electronic device 6000, which may be any type of intelligent terminal, such as a mobile phone, a tablet computer, a personal computer, and the like.

Specifically, the electronic device 6000 includes: one or more control processors 6001 and a memory 6002, for example, a control processor 6001 and a memory 6002 in fig. 3, and the control processor 6001 and the memory 6002 can be connected by a bus or by other means, for example, in fig. 3.

The memory 6002 serves as a non-transitory computer-readable storage medium that can be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to an electronic device in an embodiment of the present invention;

the control processor 6001 executes non-transitory software programs, instructions, and modules stored in the memory 6002 to perform various functional applications and data processing of a seed multi-feature based clustering and synthesis method, i.e., a seed multi-feature based clustering and synthesis method according to the above-described method embodiments.

The memory 6002 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the stored data area may store data created using a seed multi-feature based clustering and synthesis method, and the like. Further, memory 6002 can include high-speed random access memory and can also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 6002 optionally includes memory that is remotely located from the control processor 6001, and such remote memory can be coupled to the electronic device 6000 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Stored in the memory 6002 are one or more modules that, when executed by the one or more control processors 6001, perform a method for seed multi-feature based clustering and synthesis in the above-described method embodiments, such as performing the method steps of fig. 1, 4, and 5 described above.

The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

It should be noted that, since an electronic device in the present embodiment is based on the same inventive concept as the above-mentioned method for clustering and synthesizing based on seed multi-features, the corresponding contents in the method embodiment are also applicable to the present apparatus embodiment, and are not described in detail herein.

An embodiment of the present invention also provides a computer-readable storage medium storing computer-executable instructions for performing: the method for clustering and synthesizing based on seed multi-feature as above embodiment.

It should be noted that, since a computer-readable storage medium in the present embodiment and the above-mentioned method for clustering and synthesizing based on seed multi-features are based on the same inventive concept, the corresponding contents in the method embodiment are also applicable to the present apparatus embodiment, and are not described in detail herein.

One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of data such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired data and which can accessed by the computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any data delivery media as known to one of ordinary skill in the art.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples" or the like mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A method for clustering and synthesizing based on seed multi-features is characterized in that the method for clustering and synthesizing based on seed multi-features comprises the following steps:

；

From

and outputting the central vector set of the optimal aggregation classification number and the dispersion as the clustering and integrating result of the seeds.

2. The method for seed multi-feature based clustering and synthesis according to claim 1, wherein after the Kth feature parameters of N seeds are formed into a data set, the method further comprises the steps of:

and carrying out median filtering on the data set to remove the outlier data.

3. The seed multi-feature based clustering and synthesis method according to claim 1, wherein the maximum aggregate classification number is determined from preset aggregate classification numbers according to a threshold method

The method comprises the following steps:

represents a preset number of aggregation classifications to be made,

representing a rounding function.

4. The method for seed multi-feature based clustering and synthesis according to claim 1, wherein the fuzzy clustering algorithm stopping condition comprises: the iteration times exceed a threshold value or the variance of all data in an FIFO buffer area is less than 0.001, and the FIFO buffer area is used for storing the objective function value obtained by each iteration calculation of the fuzzy clustering algorithm.

5. The method of claim 3, wherein the Bayesian information value calculation formula comprises:

Is shown as

The number of vector points for a sub-cluster,

denotes the first

The covariance of the sub-clusters is,

the dimensions are represented by a number of dimensions,

a penalty factor is indicated.

6. The method of claim 5, wherein the calculating a spatial mean Euclidean distance and a multidimensional spatial angle through the set of center vectors of the optimal aggregate classification number comprises:

calculating the spatially averaged euclidean distance by:

calculating the multi-dimensional spatial angle by:

wherein the content of the first and second substances,

the mean center point is represented by the mean center point,

a center point representing a center vector of the sub-cluster;

and averaging the included angles between the central points of the central vectors of all the sub-clusters and the average central point to obtain the multi-dimensional space included angle.

7. The seed multi-feature based clustering and synthesis method of claim 6, wherein the dispersion is calculated by the following formula:

wherein the content of the first and second substances,

the degree of dispersion is represented by a value,

the spatial average euclidean distance is represented,

representing a multi-dimensional spatial angle.

8. A system for clustering and synthesis based on seed multi-features, comprising:

a maximum aggregation classification number selection module for determining the maximum aggregation classification number from the preset aggregation classification number according to a threshold value method

；

An aggregate classification module for classifying

Sequentially selecting integers as aggregation classification numbers, and performing aggregation classification on the vector set according to the aggregation classification numbers and a fuzzy clustering algorithmCalculating to obtain central vectors and a sub-vector set of all sub-clusters of the aggregation classification number;

the optimal aggregation classification number selecting module is used for selecting the aggregation classification number corresponding to the maximum Bayesian information value as the optimal aggregation classification number;

the dispersion degree calculation module is used for calculating a space average Euclidean distance and a multi-dimensional space included angle through the central vector set of the optimal aggregation classification number; calculating to obtain dispersion according to the space average Euclidean distance and the multi-dimensional space included angle;

and the output module is used for outputting the central vector set of the optimal aggregation classification number and the dispersion as the clustering and integrating result of the seeds.

9. An electronic device, characterized in that: comprising at least one control processor and a memory for communicative connection with the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform the method of seed multi-feature based clustering and synthesis of any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that: the computer-readable storage medium stores computer-executable instructions for causing a computer to perform the method for seed multi-feature based clustering and synthesis of any one of claims 1 to 7.