CN116596570A - Information comparison system of same product in different E-commerce platforms based on big data analysis algorithm - Google Patents
Information comparison system of same product in different E-commerce platforms based on big data analysis algorithm Download PDFInfo
- Publication number
- CN116596570A CN116596570A CN202310535737.2A CN202310535737A CN116596570A CN 116596570 A CN116596570 A CN 116596570A CN 202310535737 A CN202310535737 A CN 202310535737A CN 116596570 A CN116596570 A CN 116596570A
- Authority
- CN
- China
- Prior art keywords
- information
- data
- product
- database
- same
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 74
- 238000007405 data analysis Methods 0.000 title claims abstract description 34
- 238000000034 method Methods 0.000 claims abstract description 48
- 230000006870 function Effects 0.000 claims abstract description 11
- 238000004458 analytical method Methods 0.000 claims description 32
- 230000008569 process Effects 0.000 claims description 25
- 239000011159 matrix material Substances 0.000 claims description 17
- 230000000694 effects Effects 0.000 claims description 11
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000007621 cluster analysis Methods 0.000 claims description 7
- 239000000463 material Substances 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims 1
- 238000003064 k means clustering Methods 0.000 claims 1
- 239000002994 raw material Substances 0.000 abstract description 4
- 239000004973 liquid crystal related substance Substances 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000004590 computer program Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000002790 cross-validation Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- RVRCFVVLDHTFFA-UHFFFAOYSA-N heptasodium;tungsten;nonatriacontahydrate Chemical compound O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.[Na+].[Na+].[Na+].[Na+].[Na+].[Na+].[Na+].[W].[W].[W].[W].[W].[W].[W].[W].[W].[W].[W] RVRCFVVLDHTFFA-UHFFFAOYSA-N 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The information comparison system comprises a product information database module, a product big data analysis module and a multi-platform information comparison module, and aims at solving the problems that raw material differences of the same cross-border electronic commerce products are difficult to track, the using functions are not uniform and the like in cross-country transactions at present, and the information comparison of the same products is analyzed and compared in real time by utilizing an information tracing database in combination with the big data analysis algorithm, so that buyers acquire transparent product rights and interests, and dynamic information of the cross-border electronic commerce products during purchase is marked. The method has wide application range and low economic cost, can be popularized to international social application, and brings good social and economic benefits.
Description
Technical Field
The invention relates to the field of cross-border e-commerce transactions, in particular to an information comparison system of the same product in different e-commerce platforms based on a big data analysis algorithm.
Background
The cross-border e-commerce refers to transaction subjects belonging to different environments, achieves transaction and electronic payment settlement through an e-commerce platform, and achieves goods through cross-border e-commerce logistics and off-site storage, thereby completing the international business activity of the transaction. The cross-border electronic commerce retail import policy is extended and perfected, the application range is enlarged, and the open larger motivation consumption potential is enlarged; and the layout construction of the propulsion logistics hub is deployed, so that the running quality and efficiency of national economy are improved. Cross-border electronic commerce is a technical foundation for promoting economic integration and trade globalization, and has very important strategic significance. The cross-border electronic commerce breaks through the barriers among countries, so that international trade moves to non-national trade, and simultaneously, the cross-border electronic commerce is also causing great transformation of world economic trade. For enterprises, the open, multidimensional and three-dimensional multi-side trade cooperation mode constructed by cross-border electronic commerce greatly widens the path for entering the international market and greatly promotes the optimal configuration of multi-side resources and the mutual win-win between enterprises; for consumers, cross-border e-commerce makes it very easy for them to obtain information from other countries and to purchase good and inexpensive goods. However, the problems that raw material differences of the same cross-border e-commerce products are difficult to track, the using functions are not uniform and the like still exist in the cross-country transaction at present, and the information of the same cross-border e-commerce products is analyzed and compared in real time by utilizing an information tracing database and combining a big data analysis algorithm, so that a buyer obtains more transparent product rights and interests, and dynamic information of the cross-border e-commerce products during purchase is marked. The method has wide application range and low economic cost, can be popularized to international social application, and brings good social and economic benefits.
Disclosure of Invention
The invention aims to provide an information comparison system of the same product in different E-commerce platforms based on a big data analysis algorithm so as to solve the problems in the background technology.
In order to achieve the above purpose, an information comparison system of the same product in different E-commerce platforms based on big data analysis algorithm is provided, which comprises a product information database module, a product big data analysis module and a multi-platform information comparison module; the specific process is described as follows:
s1, recording original attribute information of the same cross-border product in batches, and marking the information corresponding to different platforms in the cross-border E-commerce;
s2, constructing an index mode of the product attribute information in a database, compressing the storage capacity and the accurate information abstract of the database by utilizing a hash algorithm, and constructing a system database serving different platform-mounted systems in the same cross-border electronic commerce system on the basis;
s3, extracting key features and belonging categories of the sales information of the same product in different platforms from a system database by utilizing a big data analysis algorithm, updating the feature information into the system database in a control variable form, and classifying and analyzing the product information;
s4, taking the system database as a correlation basic database for driving a cross-border electronic commerce platform, and acquiring the analyzed characteristic information by the same type of products according to index marks of the database;
s5, based on the feature analysis result, utilizing a comparison analysis algorithm to obtain purchase recommendation and information comparison of the attribute of the product concerned by the buyer when the buyer purchases the product, and providing purchase guidance.
Further, in S1, the same cross-border product batch records the original attribute information, including: and the combination of the original place of the product, the material composition, the consumption and the price information.
Further, in the step S2, an indexing mode in the database is constructed based on the product attribute information, and the detailed process is as follows: before creating the index, the statistics component aggregates and records statistics corresponding to the information features, and then uses the statistics to determine data that the user wishes to obtain according to the ranking algorithm and the retrieval model:
(1) Analyzing the query and the document: the analysis and the processing of the query are corresponding to the processing steps of the document, namely, the words in the query are converted into the same form as words generated when the document text is processed, the analysis mainly comprises lexical analysis, namely, the morpheme information, the vocabulary information and the phrase information contained in the text content are identified, and the result of the file analysis is the corresponding structure of the file and the representation of related content; (2) remove stop words: the stop words are high-frequency words or prepositions used in text information and document files, the prepositions in the documents are helpful to sentence structures, the topics of product information are described, the size of an index can be reduced, the occupation of corresponding memory space can be reduced, and the speed and effect of the index can be improved by removing the prepositions; (3) extracting word stems: in the retrieval process, stem extraction enables information retrieval to be matched with related semantics. If the word is deformed or is derived from multiple forms, the word can be simplified into the same stem; (4) semantic matching: according to the homonym and the synonymous mispronounced word of the same stem, the recognition of the homonym and the synonymous mispronounced word is matched with the information data in a system database; (5) result feedback: the query result is fed back to the user in the form of a UI interface.
Further, in the step S2, the database storage capacity and the accurate information abstract are compressed by using a hash algorithm, and the detailed process is as follows:
the data is subjected to standardized filtering treatment, and the definition of data preprocessing is as follows:
wherein Q (a, b) is a data template of data in a row and b column during hash processing, Q (x) (a i ,b j ) (i, j) th data representing b-type precision information in a-type storage amount of x-type data; q (Q) (1) (a, b) is data with actual constants, and is calculated by:
Q (1) (a i ,b j ) The (1, 1) th data representing b-type accurate information in a-type storage amount of the first-type data changes the number of the data to a certain extent when resource load balancing adjustment is performed. Therefore, the format of the data needs to be converted, and the preprocessing of the image data is performed, and the preprocessing of the data is performedIn the method, a single-instruction multi-data stream is adopted to process data, a plurality of data processors are connected to the same controller, parallel processing of the data can be performed, low-rank expression of a hash algorithm is adopted to perform denoising of the data, and low-rank expression of the data is minimized:
wherein, R represents a processing result of low rank expression minimization processing, iiriiis a processing matrix, n is the number of data in the matrix, and a low rank expression mode of data J is set as I, then there is:
min‖J‖+α‖I‖=tr[JI]
wherein, alpha represents Lagrange multiplier, tr [ JI ] represents input data J, I is hash code of the data in low rank expression mode of hash algorithm; the improved hash algorithm is used for mainly improving the hash processing data template and the low-rank expression minimization processing method in data denoising, and through improving the hash algorithm, the data magnitude in the invention can be subjected to denoising processing with stronger pertinence after improvement, the denoising effect is better, and meanwhile, the data information is more accurate.
Furthermore, the system database in S2 combines the features of the SQL structure type and the NoSQL unstructured type, and mainly uses the MongoDB database as a base.
Further, in the step S3, key features and categories of sales information of the same product in different platforms are extracted from a system database by using a big data analysis algorithm, and the detailed process is as follows:
the key features and the belonging categories of the sales information of the product are processed by introducing a K-Means cluster analysis method, the key features and the belonging categories of the sales information are clustered into a set, and a cluster center point is used as a representative of the data, wherein the K-Means cluster analysis method is an unsupervised classification algorithm, and a data set with n samples is assumed:
X={x 1 ,x 2 ,…,x n }
the algorithm targets to aggregate dataClass to k clusters c= { C 1 ,c 2 ,…,c k Firstly, randomly selecting k initial centroids in a sample, and comparing and calculating the distance d between the sample point and each centroid i =‖x-μ i ‖ 2 Then, the sample points are marked into the nearest clusters; then, according to the sample points marked in each cluster, the cluster center mu is recalculated i Repeating the process until the cluster center converges;
the sum of squares of total errors E is:
wherein C is i As a constant set, the weight of the sum of squares of total errors of alpha Lagrangian multiplier E is introduced, mu i Is cluster C i Is a centroid of (2);
wherein, the liquid crystal display device comprises a liquid crystal display device,is cluster C i The function can be used for evaluating the value of k, selecting different k values to respectively calculate E values, and comparing E changes among the different k values, if the reduction amplitude of the E values is greatly and rapidly reduced from the beginning, the current k value is the optimal k value;
the invention adopts improved contour coefficients to evaluate clustering effects, and determines the optimal cluster number, which belongs to an unsupervised algorithm and cannot be evaluated by adopting a cross-validation method, and comprises the following steps:
(1) Calculating sample x i Average distance c (i) from the rest of the samples in the same cluster, c (i) being defined as sample x i The smaller c (i) is, the less intra-cluster dissimilarity of (c), which is indicative of sample x i The more should be clustered into such clusters;
(2) Calculating sample x i To other cluster Y j Maximum distance d of all samples of (a) ij Called samplesx i And cluster Y j Definition d ij For sample x i Is provided for the inter-cluster dissimilarity of the (c),
(3) According to sample x i The intra-cluster dissimilarity c (i) and the inter-cluster dissimilarity d (i) of the samples, the profile coefficient s (i) of the samples is expressed as:
the mean value of all samples s (i) is called the contour coefficient of the clustering result.
The mean value of all samples s (i) is called the contour coefficient of the clustering result. The invention adopts the improved contour coefficient to evaluate the clustering effect, uses the maximum distance to evaluate the dissimilarity of the samples, and has stronger matching degree when compared with the traditional algorithm, the improved algorithm is clustered, and accords with the application scene of the invention.
Further, the same type of product in the step S4 obtains the analyzed characteristic information according to the index mark of the database, and the detailed process is as follows:
applying an improved Apriori algorithm based on matrix and weight to the field of information comparison analysis, and collecting I by using characteristic information 1 Constructing a 0-1 matrix M:
wherein, the liquid crystal display device comprises a liquid crystal display device,T i representing the ith feature information; i=1, 2,3 …, m; j=1, 2,3, …, n; i= { I 1 ,I 2 ,I 3 ,…,I N -N sets of characteristic information; i j The probability of occurrence in the feature information database is p (I j ) Calculate the following formula I j The weight of (c) is denoted as w (I) j ) And p (I) j ) Related, w (I) j ) Is calculated as followsThe formula is shown as follows:
w(I j )=1/p(I j )
wherein l represents I j The number of occurrences in the feature information database, i.e. the number of 1's in the j-th column of the matrix, m is the total number of feature information.
Further, the comparison analysis algorithm in S5 has the following detailed procedures:
based on a K-Means cluster analysis method introducing alpha Lagrangian multiplier total error square sum E weight, clustering purchase recommendation and information comparison data into K clusters, and then determining optimal class characteristics by adopting any characteristic item in a characteristic information database
And obtaining purchase recommendation and information comparison of the product attribute concerned by the buyer when the buyer purchases the product through comparison analysis, and providing purchase guidance.
The invention has the beneficial effects that: the invention discloses an information comparison system of the same product in different E-commerce platforms based on a big data analysis algorithm, which comprises a product information database module, a product big data analysis module and a multi-platform information comparison module. The invention records original attribute information of the same cross-border product in batches, marks the information of the product origin, material composition, combination of consumption and price information and the like in different platforms in the cross-border electronic commerce, constructs an index mode of the information in a database based on the product attribute information, compresses the database storage capacity and accurate information abstract by utilizing a hash algorithm, can improve the processing efficiency of the system, can realize simultaneous processing of mass data on the database, simultaneously adopts low-rank expression of the hash algorithm to denoise the data, minimizes the low-rank expression of the data, and based on the information, combines the SQL structure and NoSQL unstructured characteristics to construct a system database which is mainly based on a MongoDB database and serves different platforms in the same cross-border electronic commerce system. The method comprises the steps of extracting key features and belonging categories of information sold by the same product in different platforms from a system database by utilizing a big data analysis algorithm, updating feature information into the system database in a control variable form, classifying and analyzing the features of the product information, taking the system database as a correlation base database for driving a cross-border electronic commerce platform, applying an improved Apriori algorithm based on a matrix and weight to the field of information comparison analysis, and acquiring the analyzed feature information by the same product according to index marks of the database. And obtaining purchase recommendation and information comparison of the product attribute concerned by the buyer when the buyer purchases the product by using a comparison analysis algorithm, and providing purchase guidance. Aiming at the problems that the raw material difference is difficult to track and the using functions are not uniform in the transnational transaction of the same transborder e-commerce product at present, the information tracing database is utilized to combine with a big data analysis algorithm to analyze and compare the information of the same product in real time, so that a buyer obtains more transparent product rights and interests, and the dynamic information of the transborder e-commerce product during purchase is marked. The method has wide application range and low economic cost, can be popularized to international social application, and brings good social and economic benefits.
Drawings
The invention will be further described with reference to the accompanying drawings, in which embodiments do not constitute any limitation on the invention, and other drawings can be obtained by one of ordinary skill in the art without undue effort from the following drawings.
Fig. 1 is a schematic diagram of the structure of the present invention.
Detailed Description
The invention is further described in connection with the following examples.
Referring to fig. 1, the present invention aims to provide an information comparison system of the same product in different e-commerce platforms based on big data analysis algorithm, so as to solve the problems set forth in the above background art.
In order to achieve the above purpose, an information comparison system of the same product in different electronic commerce platforms based on big data analysis algorithm is provided, which comprises a product information database module, a product big data analysis module and a multi-platform information comparison module, and the specific process is as follows:
s1, recording original attribute information of the same cross-border product in batches, and marking the information relative to the information in different platforms in the cross-border electronic commerce, wherein the method comprises the following steps: the composition of the material, the consumption and the price information of the original place of the product;
s2, constructing an index mode of the product attribute information in a database, and compressing the storage capacity and the accurate information abstract of the database by utilizing a hash algorithm, wherein the detailed process is as follows:
before creating the index, the statistics component aggregates and records statistics corresponding to the information features, and then uses the statistics to determine data that the user wishes to obtain according to the ranking algorithm and the retrieval model:
(1) Analyzing the query and the document: the analysis and the processing of the query are corresponding to the processing steps of the document, namely, the words in the query are converted into the same form as words generated when the document text is processed, the analysis mainly comprises lexical analysis, namely, the morpheme information, the vocabulary information and the phrase information contained in the text content are identified, and the result of the file analysis is the corresponding structure of the file and the representation of related content; (2) remove stop words: the stop words are high-frequency words or prepositions used in text information and document files, the prepositions in the documents are helpful to sentence structures, the topics of product information are described, the size of an index can be reduced, the occupation of corresponding memory space can be reduced, and the speed and effect of the index can be improved by removing the prepositions; (3) extracting word stems: in the retrieval process, stem extraction enables information retrieval to be matched with related semantics. If the word is deformed or is derived from multiple forms, the word can be simplified into the same stem; (4) semantic matching: according to the homonym and the synonymous mispronounced word of the same stem, the recognition of the homonym and the synonymous mispronounced word is matched with the information data in a system database; (5) result feedback: the query result is fed back to the user in the form of a UI interface.
There is a large amount of data in the database storage, and if a traditional processing algorithm is adopted, the processing speed is slow, and the load of the whole system is reduced. The hash technology is a short digital sequence novel technology based on a visual technology, the processing efficiency of a system can be improved, and multi-scene simultaneous processing can be realized for mass data on a database, so that a hash algorithm is selected as a basic algorithm for constructing a database load balancing model. In order to prevent data information from being compressed due to resource adjustment in a database load balancing model to cause distortion of information when information resources are adjusted, double-cubic linear interpolation preprocessing is performed on the data in a hash algorithm, and parameters such as information size are adjusted. The data is subjected to standardized filtering treatment, and the definition of data preprocessing is as follows:
wherein Q (a, b) is a data template of data in a row and b column during hash processing, Q (x) (a i ,b j ) (i, j) th data representing b-type precision information in a-type storage amount of x-type data; q (Q) (1) (a, b) is data with actual constants, and is calculated by:
Q (1) (a i ,b j ) The (1, 1) th data representing b-type accurate information in a-type storage amount of the first-type data changes the number of the data to a certain extent when resource load balancing adjustment is performed. Therefore, the format of the data needs to be converted and the image data needs to be preprocessed, in the preprocessing process of the data, the data is processed by adopting single-instruction multi-data flow, the processors of the data are connected to the same controller, the parallel processing of the data can be performed, the denoising of the data is performed by adopting the low-rank expression of the hash algorithm, and the low-rank expression of the data is minimized:
wherein, R represents a processing result of low rank expression minimization processing, iiriiis a processing matrix, n is the number of data in the matrix, and a low rank expression mode of data J is set as I, then there is:
min‖J‖+α‖I‖=tr[JI]
wherein, alpha represents Lagrange multiplier, tr [ JI ] represents input data J, I is hash code of the data in low rank expression mode of hash algorithm; the improved hash algorithm is used for mainly improving the hash processing data template and the low-rank expression minimization processing method in data denoising, and through improving the hash algorithm, the data magnitude in the invention can be subjected to denoising processing with stronger pertinence after improvement, the denoising effect is better, and meanwhile, the data information is more accurate.
S3, extracting key features and belonging categories of the sales information of the same product in different platforms from a system database by utilizing a big data analysis algorithm, updating the feature information into the system database in a control variable form, and classifying and analyzing the feature information, wherein the detailed process is as follows:
the key features and the belonging categories of the sales information of the product are processed by introducing a K-Means cluster analysis method, the key features and the belonging categories of the sales information are clustered into a set, and a cluster center point is used as a representative of the data, wherein the K-Means cluster analysis method is an unsupervised classification algorithm, and a data set with n samples is assumed:
X={x 1 ,x 2 ,…,x n }
the algorithm targets clustering the dataset into k clusters c= { C 1 ,c 2 ,…,c k Firstly, randomly selecting k initial centroids in a sample, and comparing and calculating the distance d between the sample point and each centroid i =‖x-μ i ‖ 2 Then, the sample points are marked into the nearest clusters; then, according to the sample points marked in each cluster, the cluster center mu is recalculated i The process is repeated until the cluster center converges.
The sum of squares of total errors E is:
wherein mu i Is cluster C i Is a centroid of (2);
wherein, the liquid crystal display device comprises a liquid crystal display device,is cluster C i The number of samples in (a). The function can be used for evaluating the value of k, selecting different k values to respectively calculate E values, and comparing the E changes among the different k values, if the E value is reduced from the beginning to the beginning, the current k value is the optimal k value;
the invention adopts improved contour coefficients to evaluate clustering effects, and determines the optimal cluster number, which belongs to an unsupervised algorithm and cannot be evaluated by adopting a cross-validation method, and comprises the following steps:
(1) Calculating sample x i Average distance c (i) from the rest of samples in the same cluster, defining c (i) as intra-cluster dissimilarity of sample xi, the smaller c (i) is, the more illustrative sample x i The more should be clustered into such clusters;
(2) Calculating sample x i To other cluster Y j Maximum distance d of all samples of (a) ij Referred to as sample x i And cluster Y j Definition d ij For sample x i Is provided for the inter-cluster dissimilarity of the (c),
(3) According to sample x i The intra-cluster dissimilarity c (i) and the inter-cluster dissimilarity d (i) of the samples, the profile coefficient s (i) of the samples is expressed as:
the mean value of all samples s (i) is called the contour coefficient of the clustering result.
The mean value of all samples s (i) is called the contour coefficient of the clustering result. The invention adopts the improved contour coefficient to evaluate the clustering effect, uses the maximum distance to evaluate the dissimilarity of the samples, and has stronger matching degree when compared with the traditional algorithm, the improved algorithm is clustered, and accords with the application scene of the invention.
S4, taking the system database as a correlation basic database for driving a cross-border electronic commerce platform, and acquiring the analyzed characteristic information by the same product according to the index mark of the database, wherein the detailed process is as follows:
applying an improved Apriori algorithm based on matrix and weight to the field of information comparison analysis, and collecting I by using characteristic information 1 Constructing a 0-1 matrix M:
wherein, the liquid crystal display device comprises a liquid crystal display device,T i representing the ith feature information; i=1, 2,3 …, m; j=1, 2,3, …, n; i= { I 1 ,I 2 ,I 3 ,…,I N -N sets of characteristic information; i j The probability of occurrence in the feature information database is p (I j ) Calculate the following formula I j The weight of (c) is denoted as w (I) j ) And p (I) j ) Related, w (I) j ) Is calculated as follows:
w(I j )=1/p(I j )
wherein l represents I j The number of occurrences in the characteristic information database, i.e. the number of 1's in the j-th column of the matrix, m beingTotal number of pieces of characteristic information.
T k Refers to the kth feature information in the dataset, the weight of which refers to the average weight of the feature items contained in the factor, noted as wt (T k ) For a, i.e. for a ij All w of=1 (I j ) Averaging, where j=1, 2,3, …, n, is calculated as follows:
wherein, |T k I represents factor T k The number of feature items contained in the list;
the weight support degree of the item is marked as wtupport, the weight support degree represents the proportion of the factor weight containing the feature item to the weight of all the factors, and then a reasonable threshold is set according to the weight support degree of the feature item to form an optimal feature set, wherein wtupport (S) is calculated as shown in the following formula:
wherein S represents any characteristic item in the characteristic information database, T k+1 Represents the k+1th characteristic information in the dataset.
S5, clustering purchase recommendation and information comparison data into K clusters on the basis of a K-Means cluster analysis method introducing sum of squares of total errors of alpha Lagrangian multipliers and E weights by using a comparison analysis algorithm on the basis of a feature analysis result, and then determining optimal class features by adopting any feature items in a feature information database
And obtaining purchase recommendation and information comparison of the product attribute concerned by the buyer when the buyer purchases the product through comparison analysis, and providing purchase guidance.
The invention has the beneficial effects that: the invention discloses an information comparison system of the same product in different E-commerce platforms based on a big data analysis algorithm, which comprises a product information database module, a product big data analysis module and a multi-platform information comparison module. The invention records original attribute information of the same cross-border product in batches, marks the information of the product origin, material composition, combination of consumption and price information and the like in different platforms in the cross-border electronic commerce, constructs an index mode of the information in a database based on the product attribute information, compresses the database storage capacity and accurate information abstract by utilizing a hash algorithm, can improve the processing efficiency of the system, can realize simultaneous processing of mass data on the database, simultaneously adopts low-rank expression of the hash algorithm to denoise the data, minimizes the low-rank expression of the data, and based on the information, combines the SQL structure and NoSQL unstructured characteristics to construct a system database which is mainly based on a MongoDB database and serves different platforms in the same cross-border electronic commerce system. The method comprises the steps of extracting key features and belonging categories of information sold by the same product in different platforms from a system database by utilizing a big data analysis algorithm, updating feature information into the system database in a control variable form, classifying and analyzing the features of the product information, taking the system database as a correlation base database for driving a cross-border electronic commerce platform, applying an improved Apriori algorithm based on a matrix and weight to the field of information comparison analysis, and acquiring the analyzed feature information by the same product according to index marks of the database. And obtaining purchase recommendation and information comparison of the product attribute concerned by the buyer when the buyer purchases the product by using a comparison analysis algorithm, and providing purchase guidance. Aiming at the problems that the raw material difference is difficult to track and the using functions are not uniform in the transnational transaction of the same transborder e-commerce product at present, the information tracing database is utilized to combine with a big data analysis algorithm to analyze and compare the information of the same product in real time, so that a buyer obtains more transparent product rights and interests, and the dynamic information of the transborder e-commerce product during purchase is marked. The method has wide application range and low economic cost, can be popularized to international social application, and brings good social and economic benefits.
The present invention also provides a computer readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the above-described method. The computer readable storage medium may be, among other things, ROM, random access memory, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. The instructions stored therein may be loaded by a processor in the terminal and perform the methods described above.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (8)
1. An information comparison system of the same product in different E-commerce platforms based on a big data analysis algorithm comprises a product information database module, a product big data analysis module and a multi-platform information comparison module; the specific process is described as follows:
s1, recording original attribute information of the same cross-border product in batches, and marking the information corresponding to different platforms in the cross-border E-commerce;
s2, constructing an index mode of the product attribute information in a database, compressing the storage capacity and the accurate information abstract of the database by utilizing a hash algorithm, and constructing a system database serving different platform-mounted systems in the same cross-border electronic commerce system on the basis;
s3, extracting key features and belonging categories of the sales information of the same product in different platforms from a system database by utilizing a big data analysis algorithm, updating the feature information into the system database in a control variable form, and classifying and analyzing the product information;
s4, taking the system database as a correlation basic database for driving a cross-border electronic commerce platform, and acquiring the analyzed characteristic information by the same type of products according to index marks of the database;
s5, based on the feature analysis result, utilizing a comparison analysis algorithm to obtain purchase recommendation and information comparison of the attribute of the product concerned by the buyer when the buyer purchases the product, and providing purchase guidance.
2. The system for comparing information of the same product in different e-commerce platforms based on the big data analysis algorithm according to claim 1, wherein the step of recording the original attribute information of the same cross-border product in batch in S1 comprises: and the combination of the original place of the product, the material composition, the consumption and the price information.
3. The system for comparing information of the same product in different electronic commerce platforms based on the big data analysis algorithm according to claim 1, wherein the indexing mode of the same product in the database is constructed based on the product attribute information in the step S2, and the detailed process is as follows:
before creating the index, the statistics component aggregates and records statistics corresponding to the information features, and then uses the statistics to determine data that the user wishes to obtain according to the ranking and retrieval model: (1) analyzing query keywords: the analysis and processing of the query keywords are corresponding to the analysis and processing steps of the document, namely, the numbers and words in the query are converted into the same form as words generated when the document text is processed, the analysis mainly comprises lexical analysis, namely, the morpheme information, the vocabulary information and the phrase information contained in the text content are identified, and the result of the file analysis is the corresponding structure of the file and the representation of related content; (2) remove stop words: the stop words are high-frequency words or prepositions used in text information and document files, the prepositions in the documents are helpful to sentence structures, the topics of product information are described, the size of an index can be reduced, the occupation of corresponding memory space can be reduced, and the speed and effect of the index can be improved by removing the prepositions; (3) extracting word stems: in the retrieval process, the stem extraction can enable information retrieval to be matched with related semantics, and if a word has deformation or is derived from multiple forms, the word can be simplified into the same stem; (4) semantic matching: according to the homonym and the synonymous mispronounced word of the same stem, the recognition of the homonym and the synonymous mispronounced word is matched with the information data in a system database; (5) result feedback: the query result is fed back to the user in the form of a UI interface.
4. The system for comparing information of the same product in different electronic commerce platforms based on big data analysis algorithm according to claim 1, wherein the step S2 is to compress the database storage capacity and the accurate information abstract by using the improved hash algorithm, and the detailed process is as follows:
the data is subjected to standardized filtering treatment, and the definition of data preprocessing is as follows:
wherein Q (a, b) is a data template of data in a row and b column during hash processing, Q (x) (a i ,b j ) (i, j) th data representing b-type precision information in a-type storage amount of x-type data; q (Q) (0) (a, b) is data with actual constants, and is calculated by:
Q (1) (a, b) data representing b-type accurate information in a-type storage amount of first-type data, wherein when resource load balancing adjustment is performed, the number of the data is changed to a certain extent, in the preprocessing process of the data, single-instruction multi-data flow is adopted to process the data, a plurality of data processors are connected to the same controller to perform parallel processing of the data, low-rank expression of a hash algorithm is adopted to perform denoising of the data, and the low-rank expression of the data is minimized:
wherein, R represents a processing result of low-rank expression minimization processing, I R I is a processing matrix, n is the number of data in the matrix, and a low-rank expression mode of data J is set as I, then there is:
min‖J‖+α‖I‖=tr[JI]
where α represents a lagrangian multiplier, tr [ JI ] represents input data J, I is a hash code of the data in a low-rank expression of the hash algorithm.
5. The information comparison system of the same product in different electronic commerce platforms based on the big data analysis algorithm according to claim 1, wherein the system database in S2 is built mainly based on a MongoDB database by combining the characteristics of SQL structure type and NoSQL non-structure type.
6. The system for comparing information of the same product in different electronic commerce platforms based on the big data analysis algorithm according to claim 1, wherein the step S3 is characterized in that the key characteristics and the category of the information sold by the same product in different platforms are extracted from a system database by using the big data analysis algorithm, and the detailed process is as follows:
the key features and the belonging categories of the sales information of the product are processed by introducing a K-Means clustering analysis method, the key features and the belonging categories of the sales information are clustered into a set, a clustering center point is used as a representative of the data, and a data set with n samples is assumed:
X={x 1 ,x 2 ,...,x n }
the algorithm targets clustering the dataset into k clusters c= { C 1 ,c 2 ,...,c k Firstly, randomly selecting k initial centroids in a sample, and comparing and calculating the distance d between the sample point and each centroid i =||x-μ i || 2 Then, the sample points are marked into the nearest clusters; then, according to the sample points marked in each cluster, the cluster center mu is recalculated i Repeating the process until the cluster center converges; setting the sum of squares E of total errors as:
wherein C is i Introducing the weight of the sum of squares of total errors of alpha Lagrangian multipliers E for a constant set; the E values are calculated by selecting different k values respectively, and the variation of E between the different k values is compared, if the E value is reduced from the beginning to a very large and rapid extentThe current k value is the optimal k value;
the improved contour coefficient is adopted for carrying out clustering effect evaluation, and the optimal cluster number is determined, and the specific method is as follows:
(1) Calculating sample x i Average distance c (i) from the rest of the samples in the same cluster, c (i) being defined as sample x i The smaller c (i) is, the less intra-cluster dissimilarity of (c), which is indicative of sample x i The more should be clustered into such clusters;
(2) Calculating sample x i To other cluster Y j Maximum distance d of all samples of (a) ij Referred to as sample x i And cluster Y j Definition d ij For sample x i Is provided for the inter-cluster dissimilarity of the (c),
(3) According to sample x i The intra-cluster dissimilarity c (i) and the inter-cluster dissimilarity d (i) of the samples, the profile coefficient s (i) of the samples is expressed as:
the mean value of all samples s (i) is called the contour coefficient of the clustering result.
7. The information comparison system of the same product in different electronic commerce platforms based on the big data analysis algorithm according to claim 1, wherein the same product in the step S4 obtains the analyzed characteristic information according to the index mark of the database, and the detailed process is as follows:
applying an improved Apriori algorithm based on matrix and weight to the field of information comparison analysis, and collecting I by using characteristic information 1 Constructing a 0-1 matrix M:
wherein a is ij To judge the function, take the value asT i Representing the ith feature information; i=1, 2,3., m; j=1, 2,3, n; i= { I 1 ,I 2 ,I 3 ,...,I N -N sets of characteristic information; i j The probability of occurrence in the feature information database is p (I j ),I j The weight of (c) is denoted as w (I) j ) And p (I) j ) Related, w (I) j ) Is calculated as follows:
w(I j )=1/p(I j )
wherein l represents I j The number of occurrences in the feature information database, i.e. the number of 1's in the j-th column in the M matrix, M being the total number of pieces of feature information.
8. The information comparison system of the same product in different electronic commerce platforms based on the big data analysis algorithm according to claim 6, wherein the comparison analysis algorithm in S5 comprises the following detailed procedures:
based on a K-Means cluster analysis method introducing alpha Lagrangian multiplier total error square sum E weight, clustering purchase recommendation and information comparison data into K clusters, and then determining optimal class characteristics by adopting any characteristic item in a characteristic information database
And obtaining purchase recommendation and information comparison of the product attribute concerned by the buyer when the buyer purchases the product through comparison analysis, and providing purchase guidance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310535737.2A CN116596570A (en) | 2023-05-11 | 2023-05-11 | Information comparison system of same product in different E-commerce platforms based on big data analysis algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310535737.2A CN116596570A (en) | 2023-05-11 | 2023-05-11 | Information comparison system of same product in different E-commerce platforms based on big data analysis algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116596570A true CN116596570A (en) | 2023-08-15 |
Family
ID=87605687
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310535737.2A Pending CN116596570A (en) | 2023-05-11 | 2023-05-11 | Information comparison system of same product in different E-commerce platforms based on big data analysis algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116596570A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117196776A (en) * | 2023-09-09 | 2023-12-08 | 广东德澳智慧医疗科技有限公司 | Cross-border electronic commerce product credit marking and after-sale system based on random gradient lifting tree algorithm |
CN117726389A (en) * | 2023-09-09 | 2024-03-19 | 广东德澳智慧医疗科技有限公司 | Cross-border e-commerce product purchase guiding system based on clustering recommendation algorithm |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113409121A (en) * | 2021-06-29 | 2021-09-17 | 南京财经大学 | Cross-border e-commerce recommendation method based on heterogeneous graph expression learning |
CN114266594A (en) * | 2021-12-22 | 2022-04-01 | 亿橡信息科技(苏州)有限公司 | Big data analysis method based on southeast Asia cross-border e-commerce platform |
CN115775173A (en) * | 2023-02-13 | 2023-03-10 | 广东德澳智慧医疗科技有限公司 | E-commerce data group purchase deduction system based on algorithm and artificial intelligence |
-
2023
- 2023-05-11 CN CN202310535737.2A patent/CN116596570A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113409121A (en) * | 2021-06-29 | 2021-09-17 | 南京财经大学 | Cross-border e-commerce recommendation method based on heterogeneous graph expression learning |
CN114266594A (en) * | 2021-12-22 | 2022-04-01 | 亿橡信息科技(苏州)有限公司 | Big data analysis method based on southeast Asia cross-border e-commerce platform |
CN115775173A (en) * | 2023-02-13 | 2023-03-10 | 广东德澳智慧医疗科技有限公司 | E-commerce data group purchase deduction system based on algorithm and artificial intelligence |
Non-Patent Citations (4)
Title |
---|
曾蔚;: "一种基于K-均值的用户行为聚类算法", 绵阳师范学院学报, no. 08, pages 2 - 3 * |
李昌兵等: "基于改进特征提取及聚类的网络评论挖掘研究", 《现代情报》, pages 2 * |
秦慧娟: "基于SQL的教育资源数据库索引自动推荐模型", 《计算机与通信技术》, pages 3 * |
韦宁等: "基于改进哈希算法的云服务平台负载均衡模型构建", 《长江信息通信》, pages 1 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117196776A (en) * | 2023-09-09 | 2023-12-08 | 广东德澳智慧医疗科技有限公司 | Cross-border electronic commerce product credit marking and after-sale system based on random gradient lifting tree algorithm |
CN117726389A (en) * | 2023-09-09 | 2024-03-19 | 广东德澳智慧医疗科技有限公司 | Cross-border e-commerce product purchase guiding system based on clustering recommendation algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103870973B (en) | Information push, searching method and the device of keyword extraction based on electronic information | |
CN116596570A (en) | Information comparison system of same product in different E-commerce platforms based on big data analysis algorithm | |
US20170329804A1 (en) | Method And Apparatus Of Generating Image Characteristic Representation Of Query, And Image Search Method And Apparatus | |
CN109064285B (en) | Commodity recommendation sequence and commodity recommendation method | |
US20150317390A1 (en) | Computer-implemented systems and methods for taxonomy development | |
CN108846047A (en) | A kind of picture retrieval method and system based on convolution feature | |
CN112100512A (en) | Collaborative filtering recommendation method based on user clustering and project association analysis | |
CN114741603A (en) | Mixed collaborative filtering recommendation algorithm based on user clustering and commodity clustering | |
CN112102029A (en) | Knowledge graph-based long-tail recommendation calculation method | |
CN111209469A (en) | Personalized recommendation method and device, computer equipment and storage medium | |
CN112906396A (en) | Cross-platform commodity matching method and system based on natural language processing | |
CN113159892A (en) | Commodity recommendation method based on multi-mode commodity feature fusion | |
CN112632373A (en) | Personalized recommendation method and device based on probability matrix decomposition and computer readable storage medium | |
CN114588633A (en) | Content recommendation method | |
US11526756B1 (en) | Artificial intelligence system with composite models for multiple response-string queries | |
CN104965928A (en) | Chinese character image retrieval method based on shape matching | |
CN111651477A (en) | Multi-source heterogeneous commodity consistency judging method and device based on semantic similarity | |
CN116127194A (en) | Enterprise recommendation method | |
CN112381627B (en) | Commodity scoring processing recommendation method and device under child-care knowledge | |
Yu et al. | Computer image content retrieval considering k-means clustering algorithm | |
CN114861079A (en) | Collaborative filtering recommendation method and system fusing commodity features | |
CN113435713A (en) | Risk map compiling method and system based on GIS technology and two-model fusion | |
CN111382265B (en) | Searching method, device, equipment and medium | |
CN113111257A (en) | Collaborative filtering-based recommendation method for fusing multi-source heterogeneous information | |
CN112650869A (en) | Image retrieval reordering method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |