CN108764991B

CN108764991B - Supply chain information analysis method based on K-means algorithm

Info

Publication number: CN108764991B
Application number: CN201810495753.2A
Authority: CN
Inventors: 彭力; 苏洪舰
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2018-05-22
Filing date: 2018-05-22
Publication date: 2021-11-02
Anticipated expiration: 2038-05-22
Also published as: CN108764991A

Abstract

The invention relates to a supply chain information analysis method based on a K-means algorithm, which comprises the following steps: carrying out data cleaning on supply chain data, and extracting a first matrix array, wherein the first matrix array is a revenue-cost matrix array; clustering the first matrix sequence by using a K-means algorithm of a first clustering number K1, and generating a first image in a scattered point form; adding a first reference line and a second reference line to the first image; and cutting the first image into three parts according to the first reference line and the second reference line, wherein the middle part area shows the stable operation condition of the enterprise, and the upper part and the lower part show the abnormal operation condition of the enterprise. According to the supply chain information analysis method based on the K-means algorithm, the K-means algorithm clustering analysis is carried out on the revenue, the cost and the gross profit, and the K-means algorithm is improved by establishing the reference line, so that the algorithm is more suitable for the data in the field of the supply chain.

Description

Supply chain information analysis method based on K-means algorithm

Technical Field

The invention relates to supply chain information analysis, in particular to a supply chain information analysis method based on a K-means algorithm.

Background

Many relevant algorithms and experimental schemes for extracting and analyzing the numerous and complicated information of the supply chain are proposed in the current research. Based on supply chain data based on a single enterprise, most typically a conventional data statistics algorithm in the traditional sense, which is very easy to implement. In addition, on the basis, a cross data comparison analysis algorithm is generated, data are cut into a plurality of segments, and data analysis is realized through comparison of different data segments. In the redundant and huge supply chain data, most applications mostly stay in a data statistics stage, and logistics data in the supply chain are integrated, counted and visualized, so that efficient operation of the whole supply chain process is promoted. The conventional data statistical method mainly focuses on descriptive statistics, cross reports, hypothesis tests and other problems. Gradually, people adopt a data mining algorithm aiming at supply chain information, the supply chain data can be effectively classified by adopting a naive Bayesian classification algorithm, but the mined information has great limitation on the current operation situation of enterprises.

The traditional technology has the following technical problems:

at present, data mining and analysis on supply chain information are common, common data analysis methods are focused on statistical and classification algorithms, but the results of the algorithms are only limited to improving the efficiency of the whole supply chain process, and the analysis on the whole operation current situation of a company and future business decisions is difficult to be made.

Disclosure of Invention

Therefore, it is necessary to provide a supply chain information analysis method based on a K-means algorithm for solving the above technical problems, performing K-means algorithm cluster analysis on revenue, cost and gross profit, and improving the K-means algorithm by establishing a reference line, so that the algorithm is more suitable for data in the field of supply chains.

A supply chain information analysis method based on a K-means algorithm comprises the following steps:

carrying out data cleaning on supply chain data, and extracting a first matrix array, wherein the first matrix array is a revenue-cost matrix array;

clustering the first matrix sequence by using a K-means algorithm of a first clustering number K1, and generating a first image in a scattered point form;

adding a first reference line and a second reference line to the first image;

cutting the first image into three parts according to the first reference line and the second reference line, wherein the middle part area shows the stable operation condition of the enterprise, and the upper part and the lower part show the abnormal operation condition of the enterprise;

wherein the first reference line is y ═ ax, tan50 ° < a < tan60 °;

the second reference line is y ═ bx, tan30 ° < b < tan40 °.

In another embodiment, a — tan55 °.

In another embodiment, b — tan35 °.

In another embodiment, according to the distribution of the K1 category data points in the first image, a third reference line is continuously added to the first image;

the third reference line is x ═ Lk (1 ≦ K1); the third reference line divides the revenue into K1+1 segments, and divides the revenue into K1+1 parts for analyzing the profit and loss conditions of the enterprise supply chain.

carrying out data cleaning on supply chain data, and extracting a second matrix array, wherein the first matrix array is a profit-profit matrix array;

clustering the second matrix number sequence by using a K-means algorithm of a second clustering number K2, and generating a second image in a scattered point form;

adding a fourth reference line to the second image;

the area below the fourth reference line is a loss condition, and the current operating condition of the enterprise is known according to the density of data point distribution of the area below the fourth reference line;

the fourth reference line is y-0.

In another embodiment, according to the distribution of K2 category data points in the second image, adding a fifth reference line to the second image;

the fifth reference line is x ═ Lm (1 ≦ m ≦ K2), which divides revenue into K2 +1 category data point fragments, divides revenue into K2 +1 category data point parts, and analyzes risk and profitability of each part.

In another embodiment, the processor, when executing the program, performs any of the method steps.

In a further embodiment, the program realizes the steps of any of the methods when executed by a processor.

According to the supply chain information analysis method based on the K-means algorithm, the K-means algorithm clustering analysis is carried out on the revenue, the cost and the gross profit, and the K-means algorithm is improved by establishing the reference line, so that the algorithm is more suitable for the data in the field of the supply chain.

Drawings

Fig. 1 is a schematic diagram of a first image generated by a supply chain information analysis method based on a K-means algorithm according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a second image generated by a supply chain information analysis method based on a K-means algorithm according to an embodiment of the present application.

Fig. 3 is a schematic diagram illustrating a result of clustering a first matrix sequence in a supply chain information analysis method based on a K-means algorithm according to an embodiment of the present application.

Fig. 4 is a schematic diagram illustrating a result of clustering a second matrix sequence in the supply chain information analysis method based on the K-means algorithm according to the embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

adding a first reference line and a second reference line to the first image;

wherein the first reference line is y ═ ax, tan50 ° < a < tan60 °;

the second reference line is y ═ bx, tan30 ° < b < tan40 °.

In another embodiment, a — tan55 °.

In another embodiment, b — tan35 °.

adding a fourth reference line to the second image;

the fourth reference line is y-0.

The following describes a specific application scenario of the present invention:

(1) the data of the supply chain is firstly cleaned, and three rows of data of revenue, cost and profit are extracted to form two matrix rows of revenue-cost and revenue-profit.

(2) And (2) respectively carrying out K-means algorithm modeling on the two matrix data obtained in the step (1), and selecting the number of groups according to the size of the number set, namely a K-means cluster analysis algorithm. But the result obtained at this time is not obvious to the operation condition of the company. The invention carries out a scatter diagram on two matrix array results processed by the K-means algorithm, and each reference line is selected to react on different aspects of the company operation condition by establishing a plurality of reference lines.

(3) And finally, further analyzing the operation current situation of the company by combining a plurality of reference lines with a K-means algorithm, obtaining the whole operation current situation of the company by using the improved K-means algorithm, and helping the enterprise to make a business decision more accurately.

By the aforementioned three columns of revenue, cost and gross profit in the supply chain information as the algorithm data set, it is assumed that the three columns of data are revenue: (X)₁，X₂，X₃，…，X_m) And cost: (Y)₁，Y₂，Y₃，…，Y_m) And hair benefit: (Z)₁，Z₂，Z₃，…，Z_m). Three columns of data are organized into the following two matrix arrays (revenue-cost, revenue-gross profit):

are respectively denoted as (q)₁，q₂，q₃，…，q_m)、(t₁，t₂，t₃，…，t_m)。

Firstly, K-means algorithm modeling and calculation are carried out on the first matrix array (revenue-cost). Selecting initial central points K for two matrix arrays, and respectively recordingDo u₁，u₂，u₃…，u_kk < m, the specific coordinate points are expressed as:

with the aid of the euler formula:

the objective function is chosen as the squared error, as shown in the following equation:

the following formula can be obtained by deriving u from the above formula:

in the whole algorithm process, the clustering center u of each other point is firstly needed_jIdentification of (2):

label_i＝argmin||q_i-u_j||(1≤j≤k) (4)

u in formula (4)_jIs constantly changing, all existing center points K will be traversed.

Then, the cluster center needs to be adjusted and updated after each traversal, and then the K center is updated by using equation (3) as an objective function.

Through repeated formula (4) and formula (3), the method is terminated when the objective function reaches the optimal solution, namely the maximum iteration number, and the algorithm stops under the following conditions:

after the K-means algorithm is completed and an image is generated, two reference lines are added:

y＝tan 55°x (6)

y＝tan 35°x (7)

and cutting the image into three parts according to the reference line, wherein the middle part area shows the stable operation condition of the enterprise, and the upper part and the lower part show the abnormal operation condition of the enterprise. In addition, according to the initially selected K value and the distribution situation of the K category data points, adding K reference lines:

x＝L_k(1≤k≤K) (8)

the K reference lines represented by the formula (7) divide revenue into K +1 segments, and the revenue is divided into K +1 parts to analyze the profit and loss conditions of the enterprise supply chain through K +2 reference lines constructed by the formulas (6) and (7) and (8), so that the current business characteristics and the operation stability of the enterprise can be judged.

And secondly, carrying out K-means algorithm modeling and calculation on the second matrix array (revenues-gross profit).

With the aid of the euler formula:

the following formula can be obtained by deriving u from the above formula:

in the whole algorithm process, the identification of the clustering center needs to be carried out on each other point:

label_i＝argmin||t_i-u_j||(1≤j≤k) (12)

the cluster center also needs to be adjusted and updated after each traversal, and the K center is updated by using equation (10) as an objective function.

Through repeated equation (12) and equation (11), the method is terminated when the objective function reaches the optimal solution, i.e. the maximum iteration number, and the algorithm stops under the following conditions:

after forming a K-means algorithm and generating an image, adding a reference line:

y＝0 (14)

the area below the reference line is a loss condition, and the current operation condition of the enterprise can be known through the density of the data point distribution of the area below the reference line. In addition, according to the initially selected K value and the distribution situation of the K category data points, adding K reference lines:

x＝L_k(1≤k≤K) (15)

k reference lines represented by the formula (15) divide the revenue into K +1 segments, K +1 reference lines constructed by the formula (14) and the formula (15) divide the revenue into K +1 parts to analyze the risk and the profit condition of each part, and the return condition and the risk magnitude of each revenue segment are obtained by analyzing the classification of the revenue. Thereby helping enterprises to adjust strategic decisions.

Referring to fig. 1-4, a schematic diagram of the analysis of supply chain information for a business may be shown. (Note that this embodiment is not provided because the original supply chain information data is too large)

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A supply chain information analysis method based on a K-means algorithm is characterized by comprising the following steps:

adding a first reference line and a second reference line to the first image;

wherein the first reference line is y ═ ax, tan50 ° < a < tan60 °;

the second reference line is y ═ bx, tan30 ° < b < tan40 °.

2. The supply chain information analysis method based on the K-means algorithm as claimed in claim 1, wherein a ═ tan55 °.

3. The supply chain information analysis method based on the K-means algorithm as claimed in claim 1, wherein b ═ tan35 °.

4. The supply chain information analysis method based on the K-means algorithm as claimed in claim 1, characterized in that a third reference line is continuously added to the first image according to the distribution of K1 category data points in the first image;

the third reference line is x ═ Lk, where 1 ≦ K1; the third reference line divides the revenue into K1+1 segments, and divides the revenue into K1+1 parts for analyzing the profit and loss conditions of the enterprise supply chain.

5. A supply chain information analysis method based on a K-means algorithm is characterized by comprising the following steps:

carrying out data cleaning on supply chain data, and extracting a second matrix array, wherein the second matrix array is a revenue-profit matrix array;

adding a fourth reference line to the second image;

the fourth reference line is y-0.

6. The supply chain information analysis method based on the K-means algorithm as claimed in claim 5, characterized in that a fifth reference line is continuously added to the second image according to the distribution of K2 category data points in the second image;

the fifth reference line is x ═ Lm, where m is equal to or less than 1 and equal to or less than K2, divides revenue into K2 +1 category data point fragments, divides revenue into K2 +1 category data point parts, and analyzes risk and profitability of each part.

7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1-6 are implemented when the program is executed by the processor.

8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.