CN115358349A

CN115358349A - Data optimization clustering method

Info

Publication number: CN115358349A
Application number: CN202211277521.2A
Authority: CN
Inventors: 计爱幼
Original assignee: Jiangsu Yijiesi Information Technology Co ltd
Current assignee: Shenzhen Ruilian Credit Data Technology Co ltd
Priority date: 2022-10-19
Filing date: 2022-10-19
Publication date: 2022-11-18
Anticipated expiration: 2042-10-19
Also published as: CN115358349B

Abstract

The invention relates to the field of data processing, in particular to a data optimization clustering method. The method includes the steps of acquiring data to obtain two-dimensional matrix data, obtaining an optimal segmentation threshold value by utilizing a maximum inter-class variance, segmenting the two-dimensional matrix data by utilizing the optimal segmentation threshold value to obtain a binary matrix, obtaining a row and column accumulation sum curve according to the binary matrix, obtaining row and column characteristic peaks according to the row and column accumulation sum curve, analyzing according to each row and column characteristic peak to obtain a segmentation threshold value of each row and column characteristic peak, obtaining each row and column coordinate range by utilizing the segmentation threshold value of each row and column characteristic peak, obtaining each target area by combining each row and column coordinate range, setting each initial clustering center according to each target area, carrying out clustering analysis on the binary matrix data based on the initial clustering centers to obtain a class set, and being capable of being closer to each class central point through the setting mode of the initial clustering centers, so that the clustering calculation efficiency is improved.

Description

Data optimization clustering method

Technical Field

The invention relates to the technical field of data processing, in particular to a data optimization clustering method.

Background

The traditional data processing is usually processed by a clustering algorithm, and corresponding regulation and control are carried out by searching a dense area of data. In the traditional clustering segmentation, a plurality of starting points are generally selected at random, for example, in order to prevent the initial seed points from being selected too densely, a grid method is adopted to select the initial seed points, and data points are clustered and fused through the initial seed points, so that the purpose of data segmentation and clustering is finally achieved. However, the random selection of the initial seed points of the clusters inevitably leads to large calculation amount of the algorithm, and the cluster segmentation effect can be achieved by carrying out multiple iterations.

Aiming at the situation, the invention provides a data optimization clustering method, which is characterized in that data are analyzed, rows, columns and curves are constructed, and the positions of initial seed points are obtained according to the transformation conditions of the rows, the columns and the curves, so that the aims of reducing iterative times and reducing calculated amount by fast clustering segmentation are fulfilled.

Disclosure of Invention

In order to solve the above technical problems, an object of the present invention is to provide a data optimized clustering method, which adopts the following technical solutions:

a method of optimized clustering of data, the method comprising:

collecting data to obtain a two-dimensional matrix;

obtaining a foreground data point set and a background data point set by performing threshold segmentation on the two-dimensional matrix; setting each data point in the two-dimensional matrix, which belongs to the foreground data point set, as 1, and setting each data point in the two-dimensional matrix data, which belongs to the background data point set, as 0 to obtain a binary matrix;

obtaining the abscissa range of each target according to the binary matrix, including: accumulating each row of data of the binary matrix to obtain row accumulation sums, recording a sequence formed by the row accumulation sums of all rows of the binary matrix as a row accumulation sum sequence, and constructing a row accumulation sum curve according to the row accumulation sum sequence; dividing the line accumulation sum curve to obtain a plurality of line characteristic peaks, obtaining a first slope and a second slope of each line characteristic peak according to each line characteristic peak, obtaining a first division threshold of each line characteristic peak according to the first slope and the second slope of each line characteristic peak and each line characteristic peak, and obtaining a second division threshold of each line characteristic peak according to the first division threshold of each line characteristic peak and each line characteristic peak; making a straight line parallel to the horizontal axis through a second segmentation threshold of each line characteristic peak, wherein the straight line and the line characteristic peaks are compared with two points, and the horizontal coordinates of the two points form the horizontal coordinate range of each target;

similarly, obtaining the vertical coordinate range of each target according to the binary matrix; the abscissa range and the ordinate range of each target form a target range of each target;

and placing an initial clustering point in the target range of each target of the two-dimensional matrix, and performing mean shift clustering on foreground data points of the two-dimensional matrix based on the initial clustering points to obtain all categories.

Preferably, the method for obtaining the foreground data point set and the background data point set by performing threshold segmentation on the two-dimensional matrix includes:

respectively carrying out segmentation processing on the two-dimensional matrix data by utilizing different preset segmentation threshold values to obtain a first class and a second class, and calculating the class variance of the two classes according to the first class and the second class, wherein the calculation formula of the class variance of the two classes is as follows:

wherein

Representing the proportion of the number of data in the first category and the second category to the total number of data in the two-dimensional matrix;

，

representing the mean of the data in the first class, the mean of the data in the second class,

representing a class variance between the first class and the second class;

and each preset segmentation threshold corresponds to one category variance, a preset segmentation threshold corresponding to the maximum value of the category variances is selected to be recorded as an optimal segmentation threshold, all data points in a first category obtained by segmenting the optimal segmentation threshold are recorded as a foreground data point set, and all data points in a second category obtained by segmenting the optimal segmentation threshold are recorded as a background data point set.

Preferably, the method for obtaining the first slope and the second slope of each line characteristic peak according to each line characteristic peak includes:

obtaining a maximum value, a first minimum value and a second minimum value of each line characteristic peak, and calculating a first slope of each line characteristic peak according to the maximum value and the first minimum value of each line characteristic peak, wherein a first slope calculation formula is as follows:

wherein ,

the abscissa representing the maximum of each row feature peak,

the ordinate representing the maximum of each row's characteristic peak,

the abscissa representing the first minimum of each row feature peak,

the ordinate representing the first minimum of each line feature peak,

a first slope representing each line feature peak;

and similarly, calculating the second slope of each line characteristic peak according to the maximum value and the second minimum value of each line characteristic peak.

Preferably, the method for obtaining the first segmentation threshold of each line characteristic peak according to the first slope and the second slope of each line characteristic peak and each line characteristic peak includes:

for the maximum value, the first minimum value and the second minimum value of each row characteristic peak, acquiring the larger value of the first minimum value and the second minimum value and recording the larger value as the large minimum value;

the calculation formula for calculating the first segmentation threshold according to the first slope, the second slope, the maximum ordinate and the maximum and minimum of each line characteristic peak is as follows:

wherein

The ordinate representing the magnitude of the minimum value,

a first slope representing the characteristic peak of each row,

a second slope representing the characteristic peak of each row,

the ordinate of the maximum value is represented,

a first segmentation threshold representing each line characteristic peak, exp () represents an exponential model with a natural constant as the base.

Preferably, the method for obtaining the second segmentation threshold of each line characteristic peak according to the first segmentation threshold of each line characteristic peak and each line characteristic peak includes: acquiring the maximum value of each row characteristic peak; the calculation formula for calculating the second segmentation threshold of each line characteristic peak according to the maximum value of each line characteristic peak and the first segmentation threshold of each line characteristic peak is as follows:

wherein

A first segmentation threshold representing each line characteristic peak,

the ordinate representing the maximum of each row's characteristic peak,

the representation of the hyper-parameter is,

the scale factor is expressed in terms of an empirical scale factor,

representing the total number of columns of the binary matrix, exp () represents an exponential model with a natural constant as the base.

The invention has the following beneficial effects: the real-time embodiment of the invention obtains an optimal segmentation threshold value by utilizing the maximum between-class variance, segments the acquired two-dimensional matrix data by utilizing the optimal segmentation threshold value to obtain a binary matrix, obtains row and column accumulation sum curves according to the accumulation sum of each row and column data of the binary matrix, obtains the segmentation threshold value of each characteristic peak by analyzing each characteristic peak of the row and column accumulation sum curves, obtains row and column coordinate ranges by utilizing the segmentation threshold value of the characteristic peak, obtains all target ranges according to the row and column coordinate ranges, places an initial clustering point based on the target range, and the placement method of the initial clustering point is closer to the center of each category, thereby saving clustering iteration time and improving clustering efficiency.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flowchart illustrating steps of a data optimized clustering method according to an embodiment of the present invention;

fig. 2 is a statistical histogram of a data optimized clustering method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of foreground data points of a data optimized clustering method according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a row accumulation sum curve of a data optimized clustering method according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a column accumulation sum curve of a data optimized clustering method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a smoothed row accumulation sum curve of a data optimized clustering method according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a smoothed column accumulation sum curve of a data optimized clustering method according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating row feature peaks of a data optimized clustering method according to an embodiment of the present invention;

fig. 9 is a schematic diagram of a segmentation row feature peak of a data optimized clustering method according to an embodiment of the present invention;

fig. 10 is a schematic diagram of a target range of a data optimization clustering method according to an embodiment of the present invention.

Detailed Description

To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description will be given for a data optimized clustering method according to the present invention, and its specific implementation, structure, features and effects thereof, with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" refers to not necessarily the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following describes a specific scheme of the data optimization clustering method provided by the invention in detail with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of a data optimized clustering method according to an embodiment of the present invention is shown, where the method includes:

and S001, acquiring data and constructing a two-dimensional matrix.

The acquired data is data needing to find the cluster center, and the acquired data can be pure data or image data;

if the data is pure data, the data is constructed into a two-dimensional matrix, so that the relevance among the data is increased, for example, the vibration signal data is vibration signal data, the vibration signal data is one-dimensional time sequence data, and the accurate abnormality detection is difficult to perform when the one-dimensional vibration signal is analyzed, and the method for obtaining the two-dimensional matrix according to the one-dimensional time sequence data comprises the following steps: uniformly dividing one-dimensional time sequence data into

Has a length of

A subsequence of (2)

Has a length of

Into a subsequence of

The matrix is referred to as a two-dimensional matrix.

If the image data is image data, the image itself is a two-dimensional matrix, the two-dimensional matrix processing is not performed on the image data, for example, the citrus is subjected to mildew detection, a layer of white lime is artificially spread on the surface of the citrus for insect prevention, white colonies are generated due to mildew, white points are obtained through threshold segmentation, dense areas are obtained through clustering, and mildew judgment is performed through analyzing the density degree of the white pixels.

Step S002: and obtaining a binary matrix according to the two-dimensional matrix, and determining initial clustering points according to the binary matrix.

1. Segmenting the two-dimensional matrix to obtain a foreground data point set and a background data point set:

for the collected data, usually all the data will not be analyzed, and often the foreground data is analyzed, i.e. to achieve the final purpose, the foreground data points in the collected data need to be extracted first. For the data aimed by the invention, the foreground data point and the background data point are obviously different, for example, in the vibration signal, the abnormal vibration signal value is obviously greater than the normal vibration signal value, and for example, lime or mildew on the surface of the citrus can obviously distinguish the color of the citrus peel. Therefore, a bimodal method is adopted to segment the two-dimensional matrix data, data points in the two-dimensional matrix are counted to obtain a statistical histogram, as shown in a schematic diagram 2, an optimal segmentation threshold is obtained by maximizing the inter-class variance, which is specifically as follows:

obtaining

A preset division threshold value

To preset a threshold value

For illustration, the two-dimensional matrix is greater than a predetermined threshold

The data of (2) are divided into a first class, and the two-dimensional matrix is smaller than a preset threshold value

The data of (2) are divided into a second category, and the same is done

And the threshold value of each preset segmentation threshold value is segmented to obtain a first category and a second category corresponding to each threshold value.

Calculating the category variance of the two categories according to the first category and the second category, wherein the calculation formula of the category variance of the two categories is as follows:

wherein

，

means of data in the first category, means of data in the second category.

Representing a class variance between the first class and the second class.

And each preset threshold corresponds to a category variance, and a preset segmentation threshold corresponding to the maximum value of the category variances is selected and recorded as an optimal segmentation threshold. All data points in a first category obtained by dividing the optimal division threshold are marked as a foreground data point set, all data points in a second category obtained by dividing the optimal division threshold are marked as a background data point set, a foreground data point image is shown as a schematic diagram in fig. 3, the abscissa in the schematic diagram 3 represents a normalized value of a row number, and the ordinate represents a normalized value of a column number, and the method for obtaining the normalized values of the row number and the column number according to the values of the row number and the column number specifically comprises the following steps: and dividing the serial number value of each row by the total row number to obtain a normalized value of the serial number of each row, and dividing the serial number value of each column by the total column number to obtain the normalized value of the serial number of each column.

And setting the value of the data point in the foreground data point set in the two-dimensional matrix as 1 and the value of the data point in the background data point set in the two-dimensional matrix as 0, thereby obtaining the binary matrix.

1. Constructing a row accumulation sum curve and a column accumulation sum curve:

for foreground data points in a binary matrix, it is generally necessary to find a region with dense data point distribution, for example, the denser the data point distribution with a larger vibration value of a vibration signal is, the greater the probability of abnormality occurring in the corresponding device is, and for example, white pixel points in a citrus image, a mildew region usually appears in a cluster shape, and surface lime is an irregular sheet-like white region or a discretely distributed white region. When clustering segmentation is carried out, a dense area of foreground pixels is generally required to be searched, so a line accumulation sum curve and a column accumulation sum curve are constructed, and the method for constructing the line accumulation sum curve and the column accumulation sum curve comprises the following steps:

accumulating each column of data in the binary matrix:

in the formula

Representing the second in a binary matrix

The accumulated sum of all the data is listed,

the number of rows representing the binary matrix,

representing the second in a binary matrix

Column No. 2

The value of the row data point. All the columns of the binary matrix are obtained as a cumulative column sum sequence

, wherein

Represents the accumulated value of each column of data of the binary matrix, and represents all the column numbers of the binary matrix.

Similarly, accumulating each row of data points in the binary matrix:

in the formula

Represents a binary matrix of

The accumulated sum of all the data of a row,

the number of columns of the binary matrix is represented,

represents a binary matrix of

Go to the first

The value of the column data point. And obtaining a row accumulation sum sequence by all rows of the binary matrix:

, wherein

Representing the accumulated value of each row of the image,

representing the number of rows.

And drawing a row accumulated sum curve by using the row accumulated sum sequence with the normalized value of the row serial number as an abscissa and the accumulated sum of each row as an ordinate, wherein the row accumulated sum curve is shown in a schematic diagram 4 of an image of the row accumulated sum curve, the column accumulated sum curve is drawn by using the column accumulated sum sequence with the normalized value of the column serial number as an abscissa and the accumulated sum of each column as an ordinate, and the column accumulated sum curve is shown in a schematic diagram 5. Because the density degrees of data point distribution in the binary two-dimensional matrix are different, a part of the area is a dense area, a part of the area is a sparse area, and data points in the sparse area are few or even none, a value of a certain row and a certain column in a corresponding row and column sum curve is 0, so that the fluctuation degree of the row and column sum curve is very large, the analysis difficulty is high and a large error exists when the curve is analyzed, so that the curve is smoothed, small fluctuation in the smoothed curve can be smoothed, the whole variation trend of the curve is kept, the smoothed part does not cause a large influence on the selection of a subsequent initial point, the row accumulated sum curve and the column accumulated sum curve are subjected to Gaussian smoothing to obtain a smoothed row accumulated sum curve and column accumulated sum curve, the smoothed row accumulated sum curve image is shown as a schematic diagram 6, and the smoothed column accumulated sum curve image is shown as a schematic diagram 7.

1. Analyzing the row accumulation sum curve and the column accumulation sum curve to obtain a target area

Through analysis, when there are many target data points in a certain row or a certain column, the target data points in the corresponding binary matrix are likely to be distributed more densely, that is, corresponding to the higher curve part in the row cumulative sum curve and the column cumulative sum curve, and corresponding to the maximum value point and the minimum value point in the curve when there are the most or the least target data points in a certain row or a certain column. When the maximum value point of the curve is large, the corresponding row and column are most likely to have a plurality of dense areas; when a dense area exists in a binary matrix, a plurality of continuous row accumulation sums, column accumulation sums and values with larger sequence values appear, a corresponding curve shows a peak value, the higher the peak value is, a plurality of dense cluster types are more likely to exist, when dense cluster types are distributed sparsely, the wave of the corresponding curve is wider, if the initial clustering point is determined by only using the maximum value point of the row accumulation sum, the column accumulation sum curve, the less accurate the initial clustering point is, it is expected that the distribution condition of target data points in the binary matrix is considered to determine a plurality of initial clustering points, namely the sparsely distributed dense cluster types select a plurality of initial clustering points to facilitate faster iteration to obtain a dense central point, so that the curve is firstly segmented according to local extreme value points and divided into a plurality of characteristic peaks, and the method for segmenting the characteristic peaks is as follows:

(1) Obtaining characteristic peaks

Analyzing based on the smoothed row accumulation sum curve and column accumulation sum curve, taking the row accumulation sum curve as an example to illustrate, acquiring all minimum value points of the row accumulation sum curve, dividing the row accumulation sum curve into a plurality of row characteristic peaks by taking the minimum value points as a division boundary, and obtaining a row characteristic peak image by division as shown in a schematic diagram 8.

And in the same way, the column accumulation sum curve is divided to obtain a plurality of column characteristic peaks.

(2) Determining a second segmentation threshold for each characteristic peak:

the method for determining the second segmentation threshold of each row of characteristic peaks is described as an example, and specifically includes the following steps:

for each line characteristic peak, one maximum value point and two minimum value points exist, the first slope is calculated by using the maximum value and the first minimum value of the two minimum values, and the calculation formula is as follows:

wherein

Representing the first slope in each line characteristic peak,

the abscissa representing the maximum point of each line feature peak,

the ordinate of the maximum point is represented,

horizontal and vertical lines representing the first minimum pointAnd (4) coordinates.

Similarly, the second slope is calculated by using the maximum value and the second minimum value of the two minimum values

。

According to the slope self-adaptive selection threshold, the larger the slope of the characteristic peak is, the more concentrated the target data point distribution in the binary matrix is, that is, the smaller the corresponding initial point selection range can be, conversely, the smaller the slope of the characteristic peak is, the more discrete the target data point distribution in the binary two-dimensional matrix is, at this time, in order to increase the clustering speed, the more initial clustering points need to be selected, that is, the larger the corresponding initial point selection range can be. Calculating a first segmentation threshold for each line feature peak using the first slope, the second slope, and each line feature peak of each line feature peak:

wherein

A first segmentation threshold representing each line feature peak,

respectively representing a first slope and a second slope of each line characteristic peak,

the ordinate of the maximum point representing each line characteristic peak,

representing the ordinate of the larger of the two local minima points. The smaller the slope of the characteristic peak is, the wider the characteristic peak is, i.e. the more likely the corresponding region is to be a sparser dense cluster.

At this time, the initial segmentation is completed, for each characteristic peak, the different heights of the waves represent different numbers of corresponding columns and rows and target data points, the higher the height of the wave is, it is indicated that a plurality of dense clusters are more likely to exist in the same column, at this time, the initial segmentation threshold is adjusted according to the height of the wave, the more the clusters are, the larger the considered range is, and the region obtained by the initial segmentation cannot well contain most data points in the sparse clusters, so that the adjustment is needed at this time, specifically, the following steps are performed:

wherein

A second partition value representing each line characteristic peak,

a first segmentation threshold representing each line characteristic peak,

the ordinate of the maximum point representing each line characteristic peak,

representing the number of columns of the binary matrix,

indicating hyper-parameters, empirical values

，

Indicating empirical scaling factor, empirical value

。

And obtaining a second segmentation threshold value of each column characteristic peak according to each column characteristic peak in the same way.

Obtaining each target area according to the second segmentation threshold value of each characteristic peak

Obtaining the line coordinate range of each line characteristic peak according to each line characteristic peak and the line accumulation sum curve: the parallel abscissa of the second segmentation threshold of each line characteristic peak of the cross-line accumulation sum curve is taken as a straight line, and the line characteristic peak is compared with two points, wherein the coordinates of the intersection points are respectively

The image obtained by segmenting the characteristic peak by using the second segmentation threshold is shown in a schematic diagram 9. Will be provided with

The range is taken as the line coordinate range of each line feature peak. And similarly, obtaining the column coordinate range of each column characteristic peak according to each column characteristic peak and the column accumulation sum curve.

A target area is obtained according to each row coordinate range and column coordinate range, a plurality of target areas are obtained in all the row coordinate ranges and column coordinate ranges, and a plurality of target area images are shown in a schematic diagram 10.

(4) Determining initial clustering points according to target range

And placing an initial clustering point at the geometric center of each target range, and obtaining an initial clustering point set in all target ranges.

Step S003: and performing clustering processing based on the initial clustering points.

Based on the initial clustering point set, clustering foreground data points in the binary matrix by means of mean shift clustering to obtain a plurality of category sets.

In summary, in the embodiments of the present invention, the maximum inter-class variance is used to obtain the optimal segmentation threshold, the optimal segmentation threshold is used to segment the acquired two-dimensional matrix data to obtain the binary matrix, a row and column accumulation sum curve is obtained according to the accumulation sum of each row and column data of the binary matrix, each characteristic peak of the row and column accumulation sum curve is analyzed to obtain the segmentation threshold of each characteristic peak, the segmentation threshold of the characteristic peak is used to obtain the row and column coordinate range, all target ranges are obtained according to the row and column coordinate ranges, the initial clustering point is placed based on the target range, the placement method of the initial clustering point is closer to the center of each class, the clustering iteration time is saved, and the clustering efficiency is improved.

It should be noted that: the sequence of the above embodiments of the present invention is only for description, and does not represent the advantages or disadvantages of the embodiments. The processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A data optimization clustering method is characterized by comprising the following steps:

collecting data to obtain a two-dimensional matrix;

2. The method for optimized clustering of data according to claim 1, wherein the method for obtaining the foreground data point set and the background data point set by performing threshold segmentation on the two-dimensional matrix comprises:

respectively carrying out segmentation processing on the two-dimensional matrix data by utilizing different preset segmentation threshold values to obtain a first category and a second category, and calculating the category variance of the two categories according to the first category and the second category, wherein the calculation formula of the category variance of the two categories is as follows:

wherein

，

representing the mean of the data in the first category, the mean of the data in the second category,

representing a class variance between the first class and the second class;

and each preset segmentation threshold corresponds to one category variance, a preset segmentation threshold corresponding to the maximum value of the category variances is selected and recorded as an optimal segmentation threshold, all data points in a first category obtained by segmenting the optimal segmentation threshold are recorded as a foreground data point set, and all data points in a second category obtained by segmenting the optimal segmentation threshold are recorded as a background data point set.

3. The method for optimizing and clustering data according to claim 1, wherein the method for obtaining the first slope and the second slope of each line characteristic peak according to each line characteristic peak comprises:

wherein ,

the abscissa representing the maximum of each row feature peak,

the ordinate representing the maximum of each row characteristic peak,

the abscissa representing the first minimum of each line feature peak,

the ordinate representing the first minimum of each line feature peak,

a first slope representing each line feature peak;

4. The method for optimizing clustering of data according to claim 1, wherein the method for obtaining the first segmentation threshold of each line feature peak according to the first slope and the second slope of each line feature peak and each line feature peak comprises:

for the maximum value, the first minimum value and the second minimum value of each row characteristic peak, acquiring the larger value of the first minimum value and the second minimum value and recording the larger value as the maximum value;

wherein

The ordinate representing the magnitude of the minimum value,

a first slope representing the characteristic peak of each row,

a second slope representing the characteristic peak of each row,

the ordinate of the maximum value is represented,

a first segmentation threshold representing a characteristic peak of each line, exp () representing an exponential model with a natural constant as a base.

5. The method for optimizing and clustering data according to claim 1, wherein the method for obtaining the second segmentation threshold value of each line feature peak according to the first segmentation threshold value of each line feature peak and each line feature peak comprises: acquiring the maximum value of each row characteristic peak; the calculation formula for calculating the second segmentation threshold value of each line characteristic peak according to the maximum value of each line characteristic peak and the first segmentation threshold value of each line characteristic peak is as follows:

wherein

A first segmentation threshold representing each line feature peak,

the ordinate representing the maximum of each row's characteristic peak,

the representation of the hyper-parameter is,

the scale factor is expressed in terms of an empirical scale factor,