CN116596570A

CN116596570A - Information comparison system of same product in different E-commerce platforms based on big data analysis algorithm

Info

Publication number: CN116596570A
Application number: CN202310535737.2A
Authority: CN
Inventors: 聂放明; 王洪平
Original assignee: Guangdong Deao Smart Medical Technology Co ltd
Current assignee: Guangdong Deao Smart Medical Technology Co ltd
Priority date: 2023-05-11
Filing date: 2023-05-11
Publication date: 2023-08-15

Abstract

The information comparison system comprises a product information database module, a product big data analysis module and a multi-platform information comparison module, and aims at solving the problems that raw material differences of the same cross-border electronic commerce products are difficult to track, the using functions are not uniform and the like in cross-country transactions at present, and the information comparison of the same products is analyzed and compared in real time by utilizing an information tracing database in combination with the big data analysis algorithm, so that buyers acquire transparent product rights and interests, and dynamic information of the cross-border electronic commerce products during purchase is marked. The method has wide application range and low economic cost, can be popularized to international social application, and brings good social and economic benefits.

Description

Information comparison system of same product in different E-commerce platforms based on big data analysis algorithm

Technical Field

The invention relates to the field of cross-border e-commerce transactions, in particular to an information comparison system of the same product in different e-commerce platforms based on a big data analysis algorithm.

Background

The cross-border e-commerce refers to transaction subjects belonging to different environments, achieves transaction and electronic payment settlement through an e-commerce platform, and achieves goods through cross-border e-commerce logistics and off-site storage, thereby completing the international business activity of the transaction. The cross-border electronic commerce retail import policy is extended and perfected, the application range is enlarged, and the open larger motivation consumption potential is enlarged; and the layout construction of the propulsion logistics hub is deployed, so that the running quality and efficiency of national economy are improved. Cross-border electronic commerce is a technical foundation for promoting economic integration and trade globalization, and has very important strategic significance. The cross-border electronic commerce breaks through the barriers among countries, so that international trade moves to non-national trade, and simultaneously, the cross-border electronic commerce is also causing great transformation of world economic trade. For enterprises, the open, multidimensional and three-dimensional multi-side trade cooperation mode constructed by cross-border electronic commerce greatly widens the path for entering the international market and greatly promotes the optimal configuration of multi-side resources and the mutual win-win between enterprises; for consumers, cross-border e-commerce makes it very easy for them to obtain information from other countries and to purchase good and inexpensive goods. However, the problems that raw material differences of the same cross-border e-commerce products are difficult to track, the using functions are not uniform and the like still exist in the cross-country transaction at present, and the information of the same cross-border e-commerce products is analyzed and compared in real time by utilizing an information tracing database and combining a big data analysis algorithm, so that a buyer obtains more transparent product rights and interests, and dynamic information of the cross-border e-commerce products during purchase is marked. The method has wide application range and low economic cost, can be popularized to international social application, and brings good social and economic benefits.

Disclosure of Invention

The invention aims to provide an information comparison system of the same product in different E-commerce platforms based on a big data analysis algorithm so as to solve the problems in the background technology.

In order to achieve the above purpose, an information comparison system of the same product in different E-commerce platforms based on big data analysis algorithm is provided, which comprises a product information database module, a product big data analysis module and a multi-platform information comparison module; the specific process is described as follows:

s1, recording original attribute information of the same cross-border product in batches, and marking the information corresponding to different platforms in the cross-border E-commerce;

s2, constructing an index mode of the product attribute information in a database, compressing the storage capacity and the accurate information abstract of the database by utilizing a hash algorithm, and constructing a system database serving different platform-mounted systems in the same cross-border electronic commerce system on the basis;

s3, extracting key features and belonging categories of the sales information of the same product in different platforms from a system database by utilizing a big data analysis algorithm, updating the feature information into the system database in a control variable form, and classifying and analyzing the product information;

s4, taking the system database as a correlation basic database for driving a cross-border electronic commerce platform, and acquiring the analyzed characteristic information by the same type of products according to index marks of the database;

s5, based on the feature analysis result, utilizing a comparison analysis algorithm to obtain purchase recommendation and information comparison of the attribute of the product concerned by the buyer when the buyer purchases the product, and providing purchase guidance.

Further, in S1, the same cross-border product batch records the original attribute information, including: and the combination of the original place of the product, the material composition, the consumption and the price information.

Further, in the step S2, an indexing mode in the database is constructed based on the product attribute information, and the detailed process is as follows: before creating the index, the statistics component aggregates and records statistics corresponding to the information features, and then uses the statistics to determine data that the user wishes to obtain according to the ranking algorithm and the retrieval model:

(1) Analyzing the query and the document: the analysis and the processing of the query are corresponding to the processing steps of the document, namely, the words in the query are converted into the same form as words generated when the document text is processed, the analysis mainly comprises lexical analysis, namely, the morpheme information, the vocabulary information and the phrase information contained in the text content are identified, and the result of the file analysis is the corresponding structure of the file and the representation of related content; (2) remove stop words: the stop words are high-frequency words or prepositions used in text information and document files, the prepositions in the documents are helpful to sentence structures, the topics of product information are described, the size of an index can be reduced, the occupation of corresponding memory space can be reduced, and the speed and effect of the index can be improved by removing the prepositions; (3) extracting word stems: in the retrieval process, stem extraction enables information retrieval to be matched with related semantics. If the word is deformed or is derived from multiple forms, the word can be simplified into the same stem; (4) semantic matching: according to the homonym and the synonymous mispronounced word of the same stem, the recognition of the homonym and the synonymous mispronounced word is matched with the information data in a system database; (5) result feedback: the query result is fed back to the user in the form of a UI interface.

Further, in the step S2, the database storage capacity and the accurate information abstract are compressed by using a hash algorithm, and the detailed process is as follows:

the data is subjected to standardized filtering treatment, and the definition of data preprocessing is as follows:

wherein Q (a, b) is a data template of data in a row and b column during hash processing, Q ^(x) (a _i ,b _j ) (i, j) th data representing b-type precision information in a-type storage amount of x-type data; q (Q) ⁽¹⁾ (a, b) is data with actual constants, and is calculated by:

Q ⁽¹⁾ (a _i ,b _j ) The (1, 1) th data representing b-type accurate information in a-type storage amount of the first-type data changes the number of the data to a certain extent when resource load balancing adjustment is performed. Therefore, the format of the data needs to be converted, and the preprocessing of the image data is performed, and the preprocessing of the data is performedIn the method, a single-instruction multi-data stream is adopted to process data, a plurality of data processors are connected to the same controller, parallel processing of the data can be performed, low-rank expression of a hash algorithm is adopted to perform denoising of the data, and low-rank expression of the data is minimized:

wherein, R represents a processing result of low rank expression minimization processing, iiriiis a processing matrix, n is the number of data in the matrix, and a low rank expression mode of data J is set as I, then there is:

min‖J‖+α‖I‖＝tr[JI]

wherein, alpha represents Lagrange multiplier, tr [ JI ] represents input data J, I is hash code of the data in low rank expression mode of hash algorithm; the improved hash algorithm is used for mainly improving the hash processing data template and the low-rank expression minimization processing method in data denoising, and through improving the hash algorithm, the data magnitude in the invention can be subjected to denoising processing with stronger pertinence after improvement, the denoising effect is better, and meanwhile, the data information is more accurate.

Furthermore, the system database in S2 combines the features of the SQL structure type and the NoSQL unstructured type, and mainly uses the MongoDB database as a base.

Further, in the step S3, key features and categories of sales information of the same product in different platforms are extracted from a system database by using a big data analysis algorithm, and the detailed process is as follows:

the key features and the belonging categories of the sales information of the product are processed by introducing a K-Means cluster analysis method, the key features and the belonging categories of the sales information are clustered into a set, and a cluster center point is used as a representative of the data, wherein the K-Means cluster analysis method is an unsupervised classification algorithm, and a data set with n samples is assumed:

X＝{x ₁ ,x ₂ ,…,x _n }

the algorithm targets to aggregate dataClass to k clusters c= { C ₁ ,c ₂ ,…,c _k Firstly, randomly selecting k initial centroids in a sample, and comparing and calculating the distance d between the sample point and each centroid _i ＝‖x-μ _i ‖ ² Then, the sample points are marked into the nearest clusters; then, according to the sample points marked in each cluster, the cluster center mu is recalculated _i Repeating the process until the cluster center converges;

the sum of squares of total errors E is:

wherein C is _i As a constant set, the weight of the sum of squares of total errors of alpha Lagrangian multiplier E is introduced, mu _i Is cluster C _i Is a centroid of (2);

wherein, the liquid crystal display device comprises a liquid crystal display device,is cluster C _i The function can be used for evaluating the value of k, selecting different k values to respectively calculate E values, and comparing E changes among the different k values, if the reduction amplitude of the E values is greatly and rapidly reduced from the beginning, the current k value is the optimal k value;

the invention adopts improved contour coefficients to evaluate clustering effects, and determines the optimal cluster number, which belongs to an unsupervised algorithm and cannot be evaluated by adopting a cross-validation method, and comprises the following steps:

(1) Calculating sample x _i Average distance c (i) from the rest of the samples in the same cluster, c (i) being defined as sample x _i The smaller c (i) is, the less intra-cluster dissimilarity of (c), which is indicative of sample x _i The more should be clustered into such clusters;

(2) Calculating sample x _i To other cluster Y _j Maximum distance d of all samples of (a) _ij Called samplesx _i And cluster Y _j Definition d _ij For sample x _i Is provided for the inter-cluster dissimilarity of the (c),

(3) According to sample x _i The intra-cluster dissimilarity c (i) and the inter-cluster dissimilarity d (i) of the samples, the profile coefficient s (i) of the samples is expressed as:

the mean value of all samples s (i) is called the contour coefficient of the clustering result.

The mean value of all samples s (i) is called the contour coefficient of the clustering result. The invention adopts the improved contour coefficient to evaluate the clustering effect, uses the maximum distance to evaluate the dissimilarity of the samples, and has stronger matching degree when compared with the traditional algorithm, the improved algorithm is clustered, and accords with the application scene of the invention.

Further, the same type of product in the step S4 obtains the analyzed characteristic information according to the index mark of the database, and the detailed process is as follows:

applying an improved Apriori algorithm based on matrix and weight to the field of information comparison analysis, and collecting I by using characteristic information ₁ Constructing a 0-1 matrix M:

wherein, the liquid crystal display device comprises a liquid crystal display device,T _i representing the ith feature information; i=1, 2,3 …, m; j=1, 2,3, …, n; i= { I ₁ ,I ₂ ,I ₃ ,…,I _N -N sets of characteristic information; i _j The probability of occurrence in the feature information database is p (I _j ) Calculate the following formula I _j The weight of (c) is denoted as w (I) _j ) And p (I) _j ) Related, w (I) _j ) Is calculated as followsThe formula is shown as follows:

w(I _j )＝1/p(I _j )

wherein l represents I _j The number of occurrences in the feature information database, i.e. the number of 1's in the j-th column of the matrix, m is the total number of feature information.

Further, the comparison analysis algorithm in S5 has the following detailed procedures:

based on a K-Means cluster analysis method introducing alpha Lagrangian multiplier total error square sum E weight, clustering purchase recommendation and information comparison data into K clusters, and then determining optimal class characteristics by adopting any characteristic item in a characteristic information database

And obtaining purchase recommendation and information comparison of the product attribute concerned by the buyer when the buyer purchases the product through comparison analysis, and providing purchase guidance.

The invention has the beneficial effects that: the invention discloses an information comparison system of the same product in different E-commerce platforms based on a big data analysis algorithm, which comprises a product information database module, a product big data analysis module and a multi-platform information comparison module. The invention records original attribute information of the same cross-border product in batches, marks the information of the product origin, material composition, combination of consumption and price information and the like in different platforms in the cross-border electronic commerce, constructs an index mode of the information in a database based on the product attribute information, compresses the database storage capacity and accurate information abstract by utilizing a hash algorithm, can improve the processing efficiency of the system, can realize simultaneous processing of mass data on the database, simultaneously adopts low-rank expression of the hash algorithm to denoise the data, minimizes the low-rank expression of the data, and based on the information, combines the SQL structure and NoSQL unstructured characteristics to construct a system database which is mainly based on a MongoDB database and serves different platforms in the same cross-border electronic commerce system. The method comprises the steps of extracting key features and belonging categories of information sold by the same product in different platforms from a system database by utilizing a big data analysis algorithm, updating feature information into the system database in a control variable form, classifying and analyzing the features of the product information, taking the system database as a correlation base database for driving a cross-border electronic commerce platform, applying an improved Apriori algorithm based on a matrix and weight to the field of information comparison analysis, and acquiring the analyzed feature information by the same product according to index marks of the database. And obtaining purchase recommendation and information comparison of the product attribute concerned by the buyer when the buyer purchases the product by using a comparison analysis algorithm, and providing purchase guidance. Aiming at the problems that the raw material difference is difficult to track and the using functions are not uniform in the transnational transaction of the same transborder e-commerce product at present, the information tracing database is utilized to combine with a big data analysis algorithm to analyze and compare the information of the same product in real time, so that a buyer obtains more transparent product rights and interests, and the dynamic information of the transborder e-commerce product during purchase is marked. The method has wide application range and low economic cost, can be popularized to international social application, and brings good social and economic benefits.

Drawings

The invention will be further described with reference to the accompanying drawings, in which embodiments do not constitute any limitation on the invention, and other drawings can be obtained by one of ordinary skill in the art without undue effort from the following drawings.

Fig. 1 is a schematic diagram of the structure of the present invention.

Detailed Description

The invention is further described in connection with the following examples.

Referring to fig. 1, the present invention aims to provide an information comparison system of the same product in different e-commerce platforms based on big data analysis algorithm, so as to solve the problems set forth in the above background art.

In order to achieve the above purpose, an information comparison system of the same product in different electronic commerce platforms based on big data analysis algorithm is provided, which comprises a product information database module, a product big data analysis module and a multi-platform information comparison module, and the specific process is as follows:

s1, recording original attribute information of the same cross-border product in batches, and marking the information relative to the information in different platforms in the cross-border electronic commerce, wherein the method comprises the following steps: the composition of the material, the consumption and the price information of the original place of the product;

s2, constructing an index mode of the product attribute information in a database, and compressing the storage capacity and the accurate information abstract of the database by utilizing a hash algorithm, wherein the detailed process is as follows:

before creating the index, the statistics component aggregates and records statistics corresponding to the information features, and then uses the statistics to determine data that the user wishes to obtain according to the ranking algorithm and the retrieval model:

There is a large amount of data in the database storage, and if a traditional processing algorithm is adopted, the processing speed is slow, and the load of the whole system is reduced. The hash technology is a short digital sequence novel technology based on a visual technology, the processing efficiency of a system can be improved, and multi-scene simultaneous processing can be realized for mass data on a database, so that a hash algorithm is selected as a basic algorithm for constructing a database load balancing model. In order to prevent data information from being compressed due to resource adjustment in a database load balancing model to cause distortion of information when information resources are adjusted, double-cubic linear interpolation preprocessing is performed on the data in a hash algorithm, and parameters such as information size are adjusted. The data is subjected to standardized filtering treatment, and the definition of data preprocessing is as follows:

Q ⁽¹⁾ (a _i ,b _j ) The (1, 1) th data representing b-type accurate information in a-type storage amount of the first-type data changes the number of the data to a certain extent when resource load balancing adjustment is performed. Therefore, the format of the data needs to be converted and the image data needs to be preprocessed, in the preprocessing process of the data, the data is processed by adopting single-instruction multi-data flow, the processors of the data are connected to the same controller, the parallel processing of the data can be performed, the denoising of the data is performed by adopting the low-rank expression of the hash algorithm, and the low-rank expression of the data is minimized:

min‖J‖+α‖I‖＝tr[JI]

S3, extracting key features and belonging categories of the sales information of the same product in different platforms from a system database by utilizing a big data analysis algorithm, updating the feature information into the system database in a control variable form, and classifying and analyzing the feature information, wherein the detailed process is as follows:

X＝{x ₁ ,x ₂ ,…,x _n }

the algorithm targets clustering the dataset into k clusters c= { C ₁ ,c ₂ ,…,c _k Firstly, randomly selecting k initial centroids in a sample, and comparing and calculating the distance d between the sample point and each centroid _i ＝‖x-μ _i ‖ ² Then, the sample points are marked into the nearest clusters; then, according to the sample points marked in each cluster, the cluster center mu is recalculated _i The process is repeated until the cluster center converges.

The sum of squares of total errors E is:

wherein mu _i Is cluster C _i Is a centroid of (2);

wherein, the liquid crystal display device comprises a liquid crystal display device,is cluster C _i The number of samples in (a). The function can be used for evaluating the value of k, selecting different k values to respectively calculate E values, and comparing the E changes among the different k values, if the E value is reduced from the beginning to the beginning, the current k value is the optimal k value;

(1) Calculating sample x _i Average distance c (i) from the rest of samples in the same cluster, defining c (i) as intra-cluster dissimilarity of sample xi, the smaller c (i) is, the more illustrative sample x _i The more should be clustered into such clusters;

(2) Calculating sample x _i To other cluster Y _j Maximum distance d of all samples of (a) _ij Referred to as sample x _i And cluster Y _j Definition d _ij For sample x _i Is provided for the inter-cluster dissimilarity of the (c),

S4, taking the system database as a correlation basic database for driving a cross-border electronic commerce platform, and acquiring the analyzed characteristic information by the same product according to the index mark of the database, wherein the detailed process is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,T _i representing the ith feature information; i=1, 2,3 …, m; j=1, 2,3, …, n; i= { I ₁ ,I ₂ ,I ₃ ,…,I _N -N sets of characteristic information; i _j The probability of occurrence in the feature information database is p (I _j ) Calculate the following formula I _j The weight of (c) is denoted as w (I) _j ) And p (I) _j ) Related, w (I) _j ) Is calculated as follows:

w(I _j )＝1/p(I _j )

wherein l represents I _j The number of occurrences in the characteristic information database, i.e. the number of 1's in the j-th column of the matrix, m beingTotal number of pieces of characteristic information.

T _k Refers to the kth feature information in the dataset, the weight of which refers to the average weight of the feature items contained in the factor, noted as wt (T _k ) For a, i.e. for a _ij All w of=1 (I _j ) Averaging, where j=1, 2,3, …, n, is calculated as follows:

wherein, |T _k I represents factor T _k The number of feature items contained in the list;

the weight support degree of the item is marked as wtupport, the weight support degree represents the proportion of the factor weight containing the feature item to the weight of all the factors, and then a reasonable threshold is set according to the weight support degree of the feature item to form an optimal feature set, wherein wtupport (S) is calculated as shown in the following formula:

wherein S represents any characteristic item in the characteristic information database, T _k+1 Represents the k+1th characteristic information in the dataset.

S5, clustering purchase recommendation and information comparison data into K clusters on the basis of a K-Means cluster analysis method introducing sum of squares of total errors of alpha Lagrangian multipliers and E weights by using a comparison analysis algorithm on the basis of a feature analysis result, and then determining optimal class features by adopting any feature items in a feature information database

The present invention also provides a computer readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the above-described method. The computer readable storage medium may be, among other things, ROM, random access memory, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. The instructions stored therein may be loaded by a processor in the terminal and perform the methods described above.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An information comparison system of the same product in different E-commerce platforms based on a big data analysis algorithm comprises a product information database module, a product big data analysis module and a multi-platform information comparison module; the specific process is described as follows:

2. The system for comparing information of the same product in different e-commerce platforms based on the big data analysis algorithm according to claim 1, wherein the step of recording the original attribute information of the same cross-border product in batch in S1 comprises: and the combination of the original place of the product, the material composition, the consumption and the price information.

3. The system for comparing information of the same product in different electronic commerce platforms based on the big data analysis algorithm according to claim 1, wherein the indexing mode of the same product in the database is constructed based on the product attribute information in the step S2, and the detailed process is as follows:

before creating the index, the statistics component aggregates and records statistics corresponding to the information features, and then uses the statistics to determine data that the user wishes to obtain according to the ranking and retrieval model: (1) analyzing query keywords: the analysis and processing of the query keywords are corresponding to the analysis and processing steps of the document, namely, the numbers and words in the query are converted into the same form as words generated when the document text is processed, the analysis mainly comprises lexical analysis, namely, the morpheme information, the vocabulary information and the phrase information contained in the text content are identified, and the result of the file analysis is the corresponding structure of the file and the representation of related content; (2) remove stop words: the stop words are high-frequency words or prepositions used in text information and document files, the prepositions in the documents are helpful to sentence structures, the topics of product information are described, the size of an index can be reduced, the occupation of corresponding memory space can be reduced, and the speed and effect of the index can be improved by removing the prepositions; (3) extracting word stems: in the retrieval process, the stem extraction can enable information retrieval to be matched with related semantics, and if a word has deformation or is derived from multiple forms, the word can be simplified into the same stem; (4) semantic matching: according to the homonym and the synonymous mispronounced word of the same stem, the recognition of the homonym and the synonymous mispronounced word is matched with the information data in a system database; (5) result feedback: the query result is fed back to the user in the form of a UI interface.

4. The system for comparing information of the same product in different electronic commerce platforms based on big data analysis algorithm according to claim 1, wherein the step S2 is to compress the database storage capacity and the accurate information abstract by using the improved hash algorithm, and the detailed process is as follows:

wherein Q (a, b) is a data template of data in a row and b column during hash processing, Q ^(x) (a _i ，b _j ) (i, j) th data representing b-type precision information in a-type storage amount of x-type data; q (Q) ⁽⁰⁾ (a, b) is data with actual constants, and is calculated by:

Q ⁽¹⁾ (a, b) data representing b-type accurate information in a-type storage amount of first-type data, wherein when resource load balancing adjustment is performed, the number of the data is changed to a certain extent, in the preprocessing process of the data, single-instruction multi-data flow is adopted to process the data, a plurality of data processors are connected to the same controller to perform parallel processing of the data, low-rank expression of a hash algorithm is adopted to perform denoising of the data, and the low-rank expression of the data is minimized:

wherein, R represents a processing result of low-rank expression minimization processing, I R I is a processing matrix, n is the number of data in the matrix, and a low-rank expression mode of data J is set as I, then there is:

min‖J‖+α‖I‖＝tr[JI]

where α represents a lagrangian multiplier, tr [ JI ] represents input data J, I is a hash code of the data in a low-rank expression of the hash algorithm.

5. The information comparison system of the same product in different electronic commerce platforms based on the big data analysis algorithm according to claim 1, wherein the system database in S2 is built mainly based on a MongoDB database by combining the characteristics of SQL structure type and NoSQL non-structure type.

6. The system for comparing information of the same product in different electronic commerce platforms based on the big data analysis algorithm according to claim 1, wherein the step S3 is characterized in that the key characteristics and the category of the information sold by the same product in different platforms are extracted from a system database by using the big data analysis algorithm, and the detailed process is as follows:

the key features and the belonging categories of the sales information of the product are processed by introducing a K-Means clustering analysis method, the key features and the belonging categories of the sales information are clustered into a set, a clustering center point is used as a representative of the data, and a data set with n samples is assumed:

X＝{x ₁ ，x ₂ ，...，x _n }

the algorithm targets clustering the dataset into k clusters c= { C ₁ ，c ₂ ，...，c _k Firstly, randomly selecting k initial centroids in a sample, and comparing and calculating the distance d between the sample point and each centroid _i ＝||x-μ _i || ² Then, the sample points are marked into the nearest clusters; then, according to the sample points marked in each cluster, the cluster center mu is recalculated _i Repeating the process until the cluster center converges; setting the sum of squares E of total errors as:

wherein C is _i Introducing the weight of the sum of squares of total errors of alpha Lagrangian multipliers E for a constant set; the E values are calculated by selecting different k values respectively, and the variation of E between the different k values is compared, if the E value is reduced from the beginning to a very large and rapid extentThe current k value is the optimal k value;

the improved contour coefficient is adopted for carrying out clustering effect evaluation, and the optimal cluster number is determined, and the specific method is as follows:

7. The information comparison system of the same product in different electronic commerce platforms based on the big data analysis algorithm according to claim 1, wherein the same product in the step S4 obtains the analyzed characteristic information according to the index mark of the database, and the detailed process is as follows:

wherein a is _ij To judge the function, take the value asT _i Representing the ith feature information; i=1, 2,3., m; j=1, 2,3, n; i= { I ₁ ，I ₂ ，I ₃ ，...，I _N -N sets of characteristic information; i _j The probability of occurrence in the feature information database is p (I _j )，I _j The weight of (c) is denoted as w (I) _j ) And p (I) _j ) Related, w (I) _j ) Is calculated as follows:

w(I _j )＝1/p(I _j )

wherein l represents I _j The number of occurrences in the feature information database, i.e. the number of 1's in the j-th column in the M matrix, M being the total number of pieces of feature information.

8. The information comparison system of the same product in different electronic commerce platforms based on the big data analysis algorithm according to claim 6, wherein the comparison analysis algorithm in S5 comprises the following detailed procedures: