CN116596570A - Information comparison system of same product in different E-commerce platforms based on big data analysis algorithm - Google Patents

Information comparison system of same product in different E-commerce platforms based on big data analysis algorithm Download PDF

Info

Publication number
CN116596570A
CN116596570A CN202310535737.2A CN202310535737A CN116596570A CN 116596570 A CN116596570 A CN 116596570A CN 202310535737 A CN202310535737 A CN 202310535737A CN 116596570 A CN116596570 A CN 116596570A
Authority
CN
China
Prior art keywords
information
data
product
database
same
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310535737.2A
Other languages
Chinese (zh)
Inventor
聂放明
王洪平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Deao Smart Medical Technology Co ltd
Original Assignee
Guangdong Deao Smart Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Deao Smart Medical Technology Co ltd filed Critical Guangdong Deao Smart Medical Technology Co ltd
Priority to CN202310535737.2A priority Critical patent/CN116596570A/en
Publication of CN116596570A publication Critical patent/CN116596570A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The information comparison system comprises a product information database module, a product big data analysis module and a multi-platform information comparison module, and aims at solving the problems that raw material differences of the same cross-border electronic commerce products are difficult to track, the using functions are not uniform and the like in cross-country transactions at present, and the information comparison of the same products is analyzed and compared in real time by utilizing an information tracing database in combination with the big data analysis algorithm, so that buyers acquire transparent product rights and interests, and dynamic information of the cross-border electronic commerce products during purchase is marked. The method has wide application range and low economic cost, can be popularized to international social application, and brings good social and economic benefits.

Description

Information comparison system of same product in different E-commerce platforms based on big data analysis algorithm
Technical Field
The invention relates to the field of cross-border e-commerce transactions, in particular to an information comparison system of the same product in different e-commerce platforms based on a big data analysis algorithm.
Background
The cross-border e-commerce refers to transaction subjects belonging to different environments, achieves transaction and electronic payment settlement through an e-commerce platform, and achieves goods through cross-border e-commerce logistics and off-site storage, thereby completing the international business activity of the transaction. The cross-border electronic commerce retail import policy is extended and perfected, the application range is enlarged, and the open larger motivation consumption potential is enlarged; and the layout construction of the propulsion logistics hub is deployed, so that the running quality and efficiency of national economy are improved. Cross-border electronic commerce is a technical foundation for promoting economic integration and trade globalization, and has very important strategic significance. The cross-border electronic commerce breaks through the barriers among countries, so that international trade moves to non-national trade, and simultaneously, the cross-border electronic commerce is also causing great transformation of world economic trade. For enterprises, the open, multidimensional and three-dimensional multi-side trade cooperation mode constructed by cross-border electronic commerce greatly widens the path for entering the international market and greatly promotes the optimal configuration of multi-side resources and the mutual win-win between enterprises; for consumers, cross-border e-commerce makes it very easy for them to obtain information from other countries and to purchase good and inexpensive goods. However, the problems that raw material differences of the same cross-border e-commerce products are difficult to track, the using functions are not uniform and the like still exist in the cross-country transaction at present, and the information of the same cross-border e-commerce products is analyzed and compared in real time by utilizing an information tracing database and combining a big data analysis algorithm, so that a buyer obtains more transparent product rights and interests, and dynamic information of the cross-border e-commerce products during purchase is marked. The method has wide application range and low economic cost, can be popularized to international social application, and brings good social and economic benefits.
Disclosure of Invention
The invention aims to provide an information comparison system of the same product in different E-commerce platforms based on a big data analysis algorithm so as to solve the problems in the background technology.
In order to achieve the above purpose, an information comparison system of the same product in different E-commerce platforms based on big data analysis algorithm is provided, which comprises a product information database module, a product big data analysis module and a multi-platform information comparison module; the specific process is described as follows:
s1, recording original attribute information of the same cross-border product in batches, and marking the information corresponding to different platforms in the cross-border E-commerce;
s2, constructing an index mode of the product attribute information in a database, compressing the storage capacity and the accurate information abstract of the database by utilizing a hash algorithm, and constructing a system database serving different platform-mounted systems in the same cross-border electronic commerce system on the basis;
s3, extracting key features and belonging categories of the sales information of the same product in different platforms from a system database by utilizing a big data analysis algorithm, updating the feature information into the system database in a control variable form, and classifying and analyzing the product information;
s4, taking the system database as a correlation basic database for driving a cross-border electronic commerce platform, and acquiring the analyzed characteristic information by the same type of products according to index marks of the database;
s5, based on the feature analysis result, utilizing a comparison analysis algorithm to obtain purchase recommendation and information comparison of the attribute of the product concerned by the buyer when the buyer purchases the product, and providing purchase guidance.
Further, in S1, the same cross-border product batch records the original attribute information, including: and the combination of the original place of the product, the material composition, the consumption and the price information.
Further, in the step S2, an indexing mode in the database is constructed based on the product attribute information, and the detailed process is as follows: before creating the index, the statistics component aggregates and records statistics corresponding to the information features, and then uses the statistics to determine data that the user wishes to obtain according to the ranking algorithm and the retrieval model:
(1) Analyzing the query and the document: the analysis and the processing of the query are corresponding to the processing steps of the document, namely, the words in the query are converted into the same form as words generated when the document text is processed, the analysis mainly comprises lexical analysis, namely, the morpheme information, the vocabulary information and the phrase information contained in the text content are identified, and the result of the file analysis is the corresponding structure of the file and the representation of related content; (2) remove stop words: the stop words are high-frequency words or prepositions used in text information and document files, the prepositions in the documents are helpful to sentence structures, the topics of product information are described, the size of an index can be reduced, the occupation of corresponding memory space can be reduced, and the speed and effect of the index can be improved by removing the prepositions; (3) extracting word stems: in the retrieval process, stem extraction enables information retrieval to be matched with related semantics. If the word is deformed or is derived from multiple forms, the word can be simplified into the same stem; (4) semantic matching: according to the homonym and the synonymous mispronounced word of the same stem, the recognition of the homonym and the synonymous mispronounced word is matched with the information data in a system database; (5) result feedback: the query result is fed back to the user in the form of a UI interface.
Further, in the step S2, the database storage capacity and the accurate information abstract are compressed by using a hash algorithm, and the detailed process is as follows:
the data is subjected to standardized filtering treatment, and the definition of data preprocessing is as follows:
wherein Q (a, b) is a data template of data in a row and b column during hash processing, Q (x) (a i ,b j ) (i, j) th data representing b-type precision information in a-type storage amount of x-type data; q (Q) (1) (a, b) is data with actual constants, and is calculated by:
Q (1) (a i ,b j ) The (1, 1) th data representing b-type accurate information in a-type storage amount of the first-type data changes the number of the data to a certain extent when resource load balancing adjustment is performed. Therefore, the format of the data needs to be converted, and the preprocessing of the image data is performed, and the preprocessing of the data is performedIn the method, a single-instruction multi-data stream is adopted to process data, a plurality of data processors are connected to the same controller, parallel processing of the data can be performed, low-rank expression of a hash algorithm is adopted to perform denoising of the data, and low-rank expression of the data is minimized:
wherein, R represents a processing result of low rank expression minimization processing, iiriiis a processing matrix, n is the number of data in the matrix, and a low rank expression mode of data J is set as I, then there is:
min‖J‖+α‖I‖=tr[JI]
wherein, alpha represents Lagrange multiplier, tr [ JI ] represents input data J, I is hash code of the data in low rank expression mode of hash algorithm; the improved hash algorithm is used for mainly improving the hash processing data template and the low-rank expression minimization processing method in data denoising, and through improving the hash algorithm, the data magnitude in the invention can be subjected to denoising processing with stronger pertinence after improvement, the denoising effect is better, and meanwhile, the data information is more accurate.
Furthermore, the system database in S2 combines the features of the SQL structure type and the NoSQL unstructured type, and mainly uses the MongoDB database as a base.
Further, in the step S3, key features and categories of sales information of the same product in different platforms are extracted from a system database by using a big data analysis algorithm, and the detailed process is as follows:
the key features and the belonging categories of the sales information of the product are processed by introducing a K-Means cluster analysis method, the key features and the belonging categories of the sales information are clustered into a set, and a cluster center point is used as a representative of the data, wherein the K-Means cluster analysis method is an unsupervised classification algorithm, and a data set with n samples is assumed:
X={x 1 ,x 2 ,…,x n }
the algorithm targets to aggregate dataClass to k clusters c= { C 1 ,c 2 ,…,c k Firstly, randomly selecting k initial centroids in a sample, and comparing and calculating the distance d between the sample point and each centroid i =‖x-μ i2 Then, the sample points are marked into the nearest clusters; then, according to the sample points marked in each cluster, the cluster center mu is recalculated i Repeating the process until the cluster center converges;
the sum of squares of total errors E is:
wherein C is i As a constant set, the weight of the sum of squares of total errors of alpha Lagrangian multiplier E is introduced, mu i Is cluster C i Is a centroid of (2);
wherein, the liquid crystal display device comprises a liquid crystal display device,is cluster C i The function can be used for evaluating the value of k, selecting different k values to respectively calculate E values, and comparing E changes among the different k values, if the reduction amplitude of the E values is greatly and rapidly reduced from the beginning, the current k value is the optimal k value;
the invention adopts improved contour coefficients to evaluate clustering effects, and determines the optimal cluster number, which belongs to an unsupervised algorithm and cannot be evaluated by adopting a cross-validation method, and comprises the following steps:
(1) Calculating sample x i Average distance c (i) from the rest of the samples in the same cluster, c (i) being defined as sample x i The smaller c (i) is, the less intra-cluster dissimilarity of (c), which is indicative of sample x i The more should be clustered into such clusters;
(2) Calculating sample x i To other cluster Y j Maximum distance d of all samples of (a) ij Called samplesx i And cluster Y j Definition d ij For sample x i Is provided for the inter-cluster dissimilarity of the (c),
(3) According to sample x i The intra-cluster dissimilarity c (i) and the inter-cluster dissimilarity d (i) of the samples, the profile coefficient s (i) of the samples is expressed as:
the mean value of all samples s (i) is called the contour coefficient of the clustering result.
The mean value of all samples s (i) is called the contour coefficient of the clustering result. The invention adopts the improved contour coefficient to evaluate the clustering effect, uses the maximum distance to evaluate the dissimilarity of the samples, and has stronger matching degree when compared with the traditional algorithm, the improved algorithm is clustered, and accords with the application scene of the invention.
Further, the same type of product in the step S4 obtains the analyzed characteristic information according to the index mark of the database, and the detailed process is as follows:
applying an improved Apriori algorithm based on matrix and weight to the field of information comparison analysis, and collecting I by using characteristic information 1 Constructing a 0-1 matrix M:
wherein, the liquid crystal display device comprises a liquid crystal display device,T i representing the ith feature information; i=1, 2,3 …, m; j=1, 2,3, …, n; i= { I 1 ,I 2 ,I 3 ,…,I N -N sets of characteristic information; i j The probability of occurrence in the feature information database is p (I j ) Calculate the following formula I j The weight of (c) is denoted as w (I) j ) And p (I) j ) Related, w (I) j ) Is calculated as followsThe formula is shown as follows:
w(I j )=1/p(I j )
wherein l represents I j The number of occurrences in the feature information database, i.e. the number of 1's in the j-th column of the matrix, m is the total number of feature information.
Further, the comparison analysis algorithm in S5 has the following detailed procedures:
based on a K-Means cluster analysis method introducing alpha Lagrangian multiplier total error square sum E weight, clustering purchase recommendation and information comparison data into K clusters, and then determining optimal class characteristics by adopting any characteristic item in a characteristic information database
And obtaining purchase recommendation and information comparison of the product attribute concerned by the buyer when the buyer purchases the product through comparison analysis, and providing purchase guidance.
The invention has the beneficial effects that: the invention discloses an information comparison system of the same product in different E-commerce platforms based on a big data analysis algorithm, which comprises a product information database module, a product big data analysis module and a multi-platform information comparison module. The invention records original attribute information of the same cross-border product in batches, marks the information of the product origin, material composition, combination of consumption and price information and the like in different platforms in the cross-border electronic commerce, constructs an index mode of the information in a database based on the product attribute information, compresses the database storage capacity and accurate information abstract by utilizing a hash algorithm, can improve the processing efficiency of the system, can realize simultaneous processing of mass data on the database, simultaneously adopts low-rank expression of the hash algorithm to denoise the data, minimizes the low-rank expression of the data, and based on the information, combines the SQL structure and NoSQL unstructured characteristics to construct a system database which is mainly based on a MongoDB database and serves different platforms in the same cross-border electronic commerce system. The method comprises the steps of extracting key features and belonging categories of information sold by the same product in different platforms from a system database by utilizing a big data analysis algorithm, updating feature information into the system database in a control variable form, classifying and analyzing the features of the product information, taking the system database as a correlation base database for driving a cross-border electronic commerce platform, applying an improved Apriori algorithm based on a matrix and weight to the field of information comparison analysis, and acquiring the analyzed feature information by the same product according to index marks of the database. And obtaining purchase recommendation and information comparison of the product attribute concerned by the buyer when the buyer purchases the product by using a comparison analysis algorithm, and providing purchase guidance. Aiming at the problems that the raw material difference is difficult to track and the using functions are not uniform in the transnational transaction of the same transborder e-commerce product at present, the information tracing database is utilized to combine with a big data analysis algorithm to analyze and compare the information of the same product in real time, so that a buyer obtains more transparent product rights and interests, and the dynamic information of the transborder e-commerce product during purchase is marked. The method has wide application range and low economic cost, can be popularized to international social application, and brings good social and economic benefits.
Drawings
The invention will be further described with reference to the accompanying drawings, in which embodiments do not constitute any limitation on the invention, and other drawings can be obtained by one of ordinary skill in the art without undue effort from the following drawings.
Fig. 1 is a schematic diagram of the structure of the present invention.
Detailed Description
The invention is further described in connection with the following examples.
Referring to fig. 1, the present invention aims to provide an information comparison system of the same product in different e-commerce platforms based on big data analysis algorithm, so as to solve the problems set forth in the above background art.
In order to achieve the above purpose, an information comparison system of the same product in different electronic commerce platforms based on big data analysis algorithm is provided, which comprises a product information database module, a product big data analysis module and a multi-platform information comparison module, and the specific process is as follows:
s1, recording original attribute information of the same cross-border product in batches, and marking the information relative to the information in different platforms in the cross-border electronic commerce, wherein the method comprises the following steps: the composition of the material, the consumption and the price information of the original place of the product;
s2, constructing an index mode of the product attribute information in a database, and compressing the storage capacity and the accurate information abstract of the database by utilizing a hash algorithm, wherein the detailed process is as follows:
before creating the index, the statistics component aggregates and records statistics corresponding to the information features, and then uses the statistics to determine data that the user wishes to obtain according to the ranking algorithm and the retrieval model:
(1) Analyzing the query and the document: the analysis and the processing of the query are corresponding to the processing steps of the document, namely, the words in the query are converted into the same form as words generated when the document text is processed, the analysis mainly comprises lexical analysis, namely, the morpheme information, the vocabulary information and the phrase information contained in the text content are identified, and the result of the file analysis is the corresponding structure of the file and the representation of related content; (2) remove stop words: the stop words are high-frequency words or prepositions used in text information and document files, the prepositions in the documents are helpful to sentence structures, the topics of product information are described, the size of an index can be reduced, the occupation of corresponding memory space can be reduced, and the speed and effect of the index can be improved by removing the prepositions; (3) extracting word stems: in the retrieval process, stem extraction enables information retrieval to be matched with related semantics. If the word is deformed or is derived from multiple forms, the word can be simplified into the same stem; (4) semantic matching: according to the homonym and the synonymous mispronounced word of the same stem, the recognition of the homonym and the synonymous mispronounced word is matched with the information data in a system database; (5) result feedback: the query result is fed back to the user in the form of a UI interface.
There is a large amount of data in the database storage, and if a traditional processing algorithm is adopted, the processing speed is slow, and the load of the whole system is reduced. The hash technology is a short digital sequence novel technology based on a visual technology, the processing efficiency of a system can be improved, and multi-scene simultaneous processing can be realized for mass data on a database, so that a hash algorithm is selected as a basic algorithm for constructing a database load balancing model. In order to prevent data information from being compressed due to resource adjustment in a database load balancing model to cause distortion of information when information resources are adjusted, double-cubic linear interpolation preprocessing is performed on the data in a hash algorithm, and parameters such as information size are adjusted. The data is subjected to standardized filtering treatment, and the definition of data preprocessing is as follows:
wherein Q (a, b) is a data template of data in a row and b column during hash processing, Q (x) (a i ,b j ) (i, j) th data representing b-type precision information in a-type storage amount of x-type data; q (Q) (1) (a, b) is data with actual constants, and is calculated by:
Q (1) (a i ,b j ) The (1, 1) th data representing b-type accurate information in a-type storage amount of the first-type data changes the number of the data to a certain extent when resource load balancing adjustment is performed. Therefore, the format of the data needs to be converted and the image data needs to be preprocessed, in the preprocessing process of the data, the data is processed by adopting single-instruction multi-data flow, the processors of the data are connected to the same controller, the parallel processing of the data can be performed, the denoising of the data is performed by adopting the low-rank expression of the hash algorithm, and the low-rank expression of the data is minimized:
wherein, R represents a processing result of low rank expression minimization processing, iiriiis a processing matrix, n is the number of data in the matrix, and a low rank expression mode of data J is set as I, then there is:
min‖J‖+α‖I‖=tr[JI]
wherein, alpha represents Lagrange multiplier, tr [ JI ] represents input data J, I is hash code of the data in low rank expression mode of hash algorithm; the improved hash algorithm is used for mainly improving the hash processing data template and the low-rank expression minimization processing method in data denoising, and through improving the hash algorithm, the data magnitude in the invention can be subjected to denoising processing with stronger pertinence after improvement, the denoising effect is better, and meanwhile, the data information is more accurate.
S3, extracting key features and belonging categories of the sales information of the same product in different platforms from a system database by utilizing a big data analysis algorithm, updating the feature information into the system database in a control variable form, and classifying and analyzing the feature information, wherein the detailed process is as follows:
the key features and the belonging categories of the sales information of the product are processed by introducing a K-Means cluster analysis method, the key features and the belonging categories of the sales information are clustered into a set, and a cluster center point is used as a representative of the data, wherein the K-Means cluster analysis method is an unsupervised classification algorithm, and a data set with n samples is assumed:
X={x 1 ,x 2 ,…,x n }
the algorithm targets clustering the dataset into k clusters c= { C 1 ,c 2 ,…,c k Firstly, randomly selecting k initial centroids in a sample, and comparing and calculating the distance d between the sample point and each centroid i =‖x-μ i2 Then, the sample points are marked into the nearest clusters; then, according to the sample points marked in each cluster, the cluster center mu is recalculated i The process is repeated until the cluster center converges.
The sum of squares of total errors E is:
wherein mu i Is cluster C i Is a centroid of (2);
wherein, the liquid crystal display device comprises a liquid crystal display device,is cluster C i The number of samples in (a). The function can be used for evaluating the value of k, selecting different k values to respectively calculate E values, and comparing the E changes among the different k values, if the E value is reduced from the beginning to the beginning, the current k value is the optimal k value;
the invention adopts improved contour coefficients to evaluate clustering effects, and determines the optimal cluster number, which belongs to an unsupervised algorithm and cannot be evaluated by adopting a cross-validation method, and comprises the following steps:
(1) Calculating sample x i Average distance c (i) from the rest of samples in the same cluster, defining c (i) as intra-cluster dissimilarity of sample xi, the smaller c (i) is, the more illustrative sample x i The more should be clustered into such clusters;
(2) Calculating sample x i To other cluster Y j Maximum distance d of all samples of (a) ij Referred to as sample x i And cluster Y j Definition d ij For sample x i Is provided for the inter-cluster dissimilarity of the (c),
(3) According to sample x i The intra-cluster dissimilarity c (i) and the inter-cluster dissimilarity d (i) of the samples, the profile coefficient s (i) of the samples is expressed as:
the mean value of all samples s (i) is called the contour coefficient of the clustering result.
The mean value of all samples s (i) is called the contour coefficient of the clustering result. The invention adopts the improved contour coefficient to evaluate the clustering effect, uses the maximum distance to evaluate the dissimilarity of the samples, and has stronger matching degree when compared with the traditional algorithm, the improved algorithm is clustered, and accords with the application scene of the invention.
S4, taking the system database as a correlation basic database for driving a cross-border electronic commerce platform, and acquiring the analyzed characteristic information by the same product according to the index mark of the database, wherein the detailed process is as follows:
applying an improved Apriori algorithm based on matrix and weight to the field of information comparison analysis, and collecting I by using characteristic information 1 Constructing a 0-1 matrix M:
wherein, the liquid crystal display device comprises a liquid crystal display device,T i representing the ith feature information; i=1, 2,3 …, m; j=1, 2,3, …, n; i= { I 1 ,I 2 ,I 3 ,…,I N -N sets of characteristic information; i j The probability of occurrence in the feature information database is p (I j ) Calculate the following formula I j The weight of (c) is denoted as w (I) j ) And p (I) j ) Related, w (I) j ) Is calculated as follows:
w(I j )=1/p(I j )
wherein l represents I j The number of occurrences in the characteristic information database, i.e. the number of 1's in the j-th column of the matrix, m beingTotal number of pieces of characteristic information.
T k Refers to the kth feature information in the dataset, the weight of which refers to the average weight of the feature items contained in the factor, noted as wt (T k ) For a, i.e. for a ij All w of=1 (I j ) Averaging, where j=1, 2,3, …, n, is calculated as follows:
wherein, |T k I represents factor T k The number of feature items contained in the list;
the weight support degree of the item is marked as wtupport, the weight support degree represents the proportion of the factor weight containing the feature item to the weight of all the factors, and then a reasonable threshold is set according to the weight support degree of the feature item to form an optimal feature set, wherein wtupport (S) is calculated as shown in the following formula:
wherein S represents any characteristic item in the characteristic information database, T k+1 Represents the k+1th characteristic information in the dataset.
S5, clustering purchase recommendation and information comparison data into K clusters on the basis of a K-Means cluster analysis method introducing sum of squares of total errors of alpha Lagrangian multipliers and E weights by using a comparison analysis algorithm on the basis of a feature analysis result, and then determining optimal class features by adopting any feature items in a feature information database
And obtaining purchase recommendation and information comparison of the product attribute concerned by the buyer when the buyer purchases the product through comparison analysis, and providing purchase guidance.
The invention has the beneficial effects that: the invention discloses an information comparison system of the same product in different E-commerce platforms based on a big data analysis algorithm, which comprises a product information database module, a product big data analysis module and a multi-platform information comparison module. The invention records original attribute information of the same cross-border product in batches, marks the information of the product origin, material composition, combination of consumption and price information and the like in different platforms in the cross-border electronic commerce, constructs an index mode of the information in a database based on the product attribute information, compresses the database storage capacity and accurate information abstract by utilizing a hash algorithm, can improve the processing efficiency of the system, can realize simultaneous processing of mass data on the database, simultaneously adopts low-rank expression of the hash algorithm to denoise the data, minimizes the low-rank expression of the data, and based on the information, combines the SQL structure and NoSQL unstructured characteristics to construct a system database which is mainly based on a MongoDB database and serves different platforms in the same cross-border electronic commerce system. The method comprises the steps of extracting key features and belonging categories of information sold by the same product in different platforms from a system database by utilizing a big data analysis algorithm, updating feature information into the system database in a control variable form, classifying and analyzing the features of the product information, taking the system database as a correlation base database for driving a cross-border electronic commerce platform, applying an improved Apriori algorithm based on a matrix and weight to the field of information comparison analysis, and acquiring the analyzed feature information by the same product according to index marks of the database. And obtaining purchase recommendation and information comparison of the product attribute concerned by the buyer when the buyer purchases the product by using a comparison analysis algorithm, and providing purchase guidance. Aiming at the problems that the raw material difference is difficult to track and the using functions are not uniform in the transnational transaction of the same transborder e-commerce product at present, the information tracing database is utilized to combine with a big data analysis algorithm to analyze and compare the information of the same product in real time, so that a buyer obtains more transparent product rights and interests, and the dynamic information of the transborder e-commerce product during purchase is marked. The method has wide application range and low economic cost, can be popularized to international social application, and brings good social and economic benefits.
The present invention also provides a computer readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the above-described method. The computer readable storage medium may be, among other things, ROM, random access memory, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. The instructions stored therein may be loaded by a processor in the terminal and perform the methods described above.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (8)

1. An information comparison system of the same product in different E-commerce platforms based on a big data analysis algorithm comprises a product information database module, a product big data analysis module and a multi-platform information comparison module; the specific process is described as follows:
s1, recording original attribute information of the same cross-border product in batches, and marking the information corresponding to different platforms in the cross-border E-commerce;
s2, constructing an index mode of the product attribute information in a database, compressing the storage capacity and the accurate information abstract of the database by utilizing a hash algorithm, and constructing a system database serving different platform-mounted systems in the same cross-border electronic commerce system on the basis;
s3, extracting key features and belonging categories of the sales information of the same product in different platforms from a system database by utilizing a big data analysis algorithm, updating the feature information into the system database in a control variable form, and classifying and analyzing the product information;
s4, taking the system database as a correlation basic database for driving a cross-border electronic commerce platform, and acquiring the analyzed characteristic information by the same type of products according to index marks of the database;
s5, based on the feature analysis result, utilizing a comparison analysis algorithm to obtain purchase recommendation and information comparison of the attribute of the product concerned by the buyer when the buyer purchases the product, and providing purchase guidance.
2. The system for comparing information of the same product in different e-commerce platforms based on the big data analysis algorithm according to claim 1, wherein the step of recording the original attribute information of the same cross-border product in batch in S1 comprises: and the combination of the original place of the product, the material composition, the consumption and the price information.
3. The system for comparing information of the same product in different electronic commerce platforms based on the big data analysis algorithm according to claim 1, wherein the indexing mode of the same product in the database is constructed based on the product attribute information in the step S2, and the detailed process is as follows:
before creating the index, the statistics component aggregates and records statistics corresponding to the information features, and then uses the statistics to determine data that the user wishes to obtain according to the ranking and retrieval model: (1) analyzing query keywords: the analysis and processing of the query keywords are corresponding to the analysis and processing steps of the document, namely, the numbers and words in the query are converted into the same form as words generated when the document text is processed, the analysis mainly comprises lexical analysis, namely, the morpheme information, the vocabulary information and the phrase information contained in the text content are identified, and the result of the file analysis is the corresponding structure of the file and the representation of related content; (2) remove stop words: the stop words are high-frequency words or prepositions used in text information and document files, the prepositions in the documents are helpful to sentence structures, the topics of product information are described, the size of an index can be reduced, the occupation of corresponding memory space can be reduced, and the speed and effect of the index can be improved by removing the prepositions; (3) extracting word stems: in the retrieval process, the stem extraction can enable information retrieval to be matched with related semantics, and if a word has deformation or is derived from multiple forms, the word can be simplified into the same stem; (4) semantic matching: according to the homonym and the synonymous mispronounced word of the same stem, the recognition of the homonym and the synonymous mispronounced word is matched with the information data in a system database; (5) result feedback: the query result is fed back to the user in the form of a UI interface.
4. The system for comparing information of the same product in different electronic commerce platforms based on big data analysis algorithm according to claim 1, wherein the step S2 is to compress the database storage capacity and the accurate information abstract by using the improved hash algorithm, and the detailed process is as follows:
the data is subjected to standardized filtering treatment, and the definition of data preprocessing is as follows:
wherein Q (a, b) is a data template of data in a row and b column during hash processing, Q (x) (a i ,b j ) (i, j) th data representing b-type precision information in a-type storage amount of x-type data; q (Q) (0) (a, b) is data with actual constants, and is calculated by:
Q (1) (a, b) data representing b-type accurate information in a-type storage amount of first-type data, wherein when resource load balancing adjustment is performed, the number of the data is changed to a certain extent, in the preprocessing process of the data, single-instruction multi-data flow is adopted to process the data, a plurality of data processors are connected to the same controller to perform parallel processing of the data, low-rank expression of a hash algorithm is adopted to perform denoising of the data, and the low-rank expression of the data is minimized:
wherein, R represents a processing result of low-rank expression minimization processing, I R I is a processing matrix, n is the number of data in the matrix, and a low-rank expression mode of data J is set as I, then there is:
min‖J‖+α‖I‖=tr[JI]
where α represents a lagrangian multiplier, tr [ JI ] represents input data J, I is a hash code of the data in a low-rank expression of the hash algorithm.
5. The information comparison system of the same product in different electronic commerce platforms based on the big data analysis algorithm according to claim 1, wherein the system database in S2 is built mainly based on a MongoDB database by combining the characteristics of SQL structure type and NoSQL non-structure type.
6. The system for comparing information of the same product in different electronic commerce platforms based on the big data analysis algorithm according to claim 1, wherein the step S3 is characterized in that the key characteristics and the category of the information sold by the same product in different platforms are extracted from a system database by using the big data analysis algorithm, and the detailed process is as follows:
the key features and the belonging categories of the sales information of the product are processed by introducing a K-Means clustering analysis method, the key features and the belonging categories of the sales information are clustered into a set, a clustering center point is used as a representative of the data, and a data set with n samples is assumed:
X={x 1 ,x 2 ,...,x n }
the algorithm targets clustering the dataset into k clusters c= { C 1 ,c 2 ,...,c k Firstly, randomly selecting k initial centroids in a sample, and comparing and calculating the distance d between the sample point and each centroid i =||x-μ i || 2 Then, the sample points are marked into the nearest clusters; then, according to the sample points marked in each cluster, the cluster center mu is recalculated i Repeating the process until the cluster center converges; setting the sum of squares E of total errors as:
wherein C is i Introducing the weight of the sum of squares of total errors of alpha Lagrangian multipliers E for a constant set; the E values are calculated by selecting different k values respectively, and the variation of E between the different k values is compared, if the E value is reduced from the beginning to a very large and rapid extentThe current k value is the optimal k value;
the improved contour coefficient is adopted for carrying out clustering effect evaluation, and the optimal cluster number is determined, and the specific method is as follows:
(1) Calculating sample x i Average distance c (i) from the rest of the samples in the same cluster, c (i) being defined as sample x i The smaller c (i) is, the less intra-cluster dissimilarity of (c), which is indicative of sample x i The more should be clustered into such clusters;
(2) Calculating sample x i To other cluster Y j Maximum distance d of all samples of (a) ij Referred to as sample x i And cluster Y j Definition d ij For sample x i Is provided for the inter-cluster dissimilarity of the (c),
(3) According to sample x i The intra-cluster dissimilarity c (i) and the inter-cluster dissimilarity d (i) of the samples, the profile coefficient s (i) of the samples is expressed as:
the mean value of all samples s (i) is called the contour coefficient of the clustering result.
7. The information comparison system of the same product in different electronic commerce platforms based on the big data analysis algorithm according to claim 1, wherein the same product in the step S4 obtains the analyzed characteristic information according to the index mark of the database, and the detailed process is as follows:
applying an improved Apriori algorithm based on matrix and weight to the field of information comparison analysis, and collecting I by using characteristic information 1 Constructing a 0-1 matrix M:
wherein a is ij To judge the function, take the value asT i Representing the ith feature information; i=1, 2,3., m; j=1, 2,3, n; i= { I 1 ,I 2 ,I 3 ,...,I N -N sets of characteristic information; i j The probability of occurrence in the feature information database is p (I j ),I j The weight of (c) is denoted as w (I) j ) And p (I) j ) Related, w (I) j ) Is calculated as follows:
w(I j )=1/p(I j )
wherein l represents I j The number of occurrences in the feature information database, i.e. the number of 1's in the j-th column in the M matrix, M being the total number of pieces of feature information.
8. The information comparison system of the same product in different electronic commerce platforms based on the big data analysis algorithm according to claim 6, wherein the comparison analysis algorithm in S5 comprises the following detailed procedures:
based on a K-Means cluster analysis method introducing alpha Lagrangian multiplier total error square sum E weight, clustering purchase recommendation and information comparison data into K clusters, and then determining optimal class characteristics by adopting any characteristic item in a characteristic information database
And obtaining purchase recommendation and information comparison of the product attribute concerned by the buyer when the buyer purchases the product through comparison analysis, and providing purchase guidance.
CN202310535737.2A 2023-05-11 2023-05-11 Information comparison system of same product in different E-commerce platforms based on big data analysis algorithm Pending CN116596570A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310535737.2A CN116596570A (en) 2023-05-11 2023-05-11 Information comparison system of same product in different E-commerce platforms based on big data analysis algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310535737.2A CN116596570A (en) 2023-05-11 2023-05-11 Information comparison system of same product in different E-commerce platforms based on big data analysis algorithm

Publications (1)

Publication Number Publication Date
CN116596570A true CN116596570A (en) 2023-08-15

Family

ID=87605687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310535737.2A Pending CN116596570A (en) 2023-05-11 2023-05-11 Information comparison system of same product in different E-commerce platforms based on big data analysis algorithm

Country Status (1)

Country Link
CN (1) CN116596570A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196776A (en) * 2023-09-09 2023-12-08 广东德澳智慧医疗科技有限公司 Cross-border electronic commerce product credit marking and after-sale system based on random gradient lifting tree algorithm
CN117726389A (en) * 2023-09-09 2024-03-19 广东德澳智慧医疗科技有限公司 Cross-border e-commerce product purchase guiding system based on clustering recommendation algorithm

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409121A (en) * 2021-06-29 2021-09-17 南京财经大学 Cross-border e-commerce recommendation method based on heterogeneous graph expression learning
CN114266594A (en) * 2021-12-22 2022-04-01 亿橡信息科技(苏州)有限公司 Big data analysis method based on southeast Asia cross-border e-commerce platform
CN115775173A (en) * 2023-02-13 2023-03-10 广东德澳智慧医疗科技有限公司 E-commerce data group purchase deduction system based on algorithm and artificial intelligence

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409121A (en) * 2021-06-29 2021-09-17 南京财经大学 Cross-border e-commerce recommendation method based on heterogeneous graph expression learning
CN114266594A (en) * 2021-12-22 2022-04-01 亿橡信息科技(苏州)有限公司 Big data analysis method based on southeast Asia cross-border e-commerce platform
CN115775173A (en) * 2023-02-13 2023-03-10 广东德澳智慧医疗科技有限公司 E-commerce data group purchase deduction system based on algorithm and artificial intelligence

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
曾蔚;: "一种基于K-均值的用户行为聚类算法", 绵阳师范学院学报, no. 08, pages 2 - 3 *
李昌兵等: "基于改进特征提取及聚类的网络评论挖掘研究", 《现代情报》, pages 2 *
秦慧娟: "基于SQL的教育资源数据库索引自动推荐模型", 《计算机与通信技术》, pages 3 *
韦宁等: "基于改进哈希算法的云服务平台负载均衡模型构建", 《长江信息通信》, pages 1 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196776A (en) * 2023-09-09 2023-12-08 广东德澳智慧医疗科技有限公司 Cross-border electronic commerce product credit marking and after-sale system based on random gradient lifting tree algorithm
CN117726389A (en) * 2023-09-09 2024-03-19 广东德澳智慧医疗科技有限公司 Cross-border e-commerce product purchase guiding system based on clustering recommendation algorithm

Similar Documents

Publication Publication Date Title
CN103870973B (en) Information push, searching method and the device of keyword extraction based on electronic information
CN116596570A (en) Information comparison system of same product in different E-commerce platforms based on big data analysis algorithm
US20170329804A1 (en) Method And Apparatus Of Generating Image Characteristic Representation Of Query, And Image Search Method And Apparatus
CN109064285B (en) Commodity recommendation sequence and commodity recommendation method
US20150317390A1 (en) Computer-implemented systems and methods for taxonomy development
CN108846047A (en) A kind of picture retrieval method and system based on convolution feature
CN112100512A (en) Collaborative filtering recommendation method based on user clustering and project association analysis
CN114741603A (en) Mixed collaborative filtering recommendation algorithm based on user clustering and commodity clustering
CN112102029A (en) Knowledge graph-based long-tail recommendation calculation method
CN111209469A (en) Personalized recommendation method and device, computer equipment and storage medium
CN112906396A (en) Cross-platform commodity matching method and system based on natural language processing
CN113159892A (en) Commodity recommendation method based on multi-mode commodity feature fusion
CN112632373A (en) Personalized recommendation method and device based on probability matrix decomposition and computer readable storage medium
CN114588633A (en) Content recommendation method
US11526756B1 (en) Artificial intelligence system with composite models for multiple response-string queries
CN104965928A (en) Chinese character image retrieval method based on shape matching
CN111651477A (en) Multi-source heterogeneous commodity consistency judging method and device based on semantic similarity
CN116127194A (en) Enterprise recommendation method
CN112381627B (en) Commodity scoring processing recommendation method and device under child-care knowledge
Yu et al. Computer image content retrieval considering k-means clustering algorithm
CN114861079A (en) Collaborative filtering recommendation method and system fusing commodity features
CN113435713A (en) Risk map compiling method and system based on GIS technology and two-model fusion
CN111382265B (en) Searching method, device, equipment and medium
CN113111257A (en) Collaborative filtering-based recommendation method for fusing multi-source heterogeneous information
CN112650869A (en) Image retrieval reordering method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination