CN114676123A - E-commerce data analysis method - Google Patents

E-commerce data analysis method Download PDF

Info

Publication number
CN114676123A
CN114676123A CN202210367597.8A CN202210367597A CN114676123A CN 114676123 A CN114676123 A CN 114676123A CN 202210367597 A CN202210367597 A CN 202210367597A CN 114676123 A CN114676123 A CN 114676123A
Authority
CN
China
Prior art keywords
data
commerce
information
analysis
analysis method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210367597.8A
Other languages
Chinese (zh)
Inventor
金和
王振宇
杨克杰
金开子
周建清
郑祥智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Newford Research Institute Of Advanced Technology
Original Assignee
Newford Research Institute Of Advanced Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Newford Research Institute Of Advanced Technology filed Critical Newford Research Institute Of Advanced Technology
Priority to CN202210367597.8A priority Critical patent/CN114676123A/en
Publication of CN114676123A publication Critical patent/CN114676123A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

An e-commerce data analysis method comprises the following steps: (1) data acquisition: collecting Internet e-commerce transaction data; (2) storing data; (3) data processing: firstly, judging the quality and the characteristics of data, and then cleaning, integrating and standardizing the data; (4) and (3) data analysis: analyzing the collected data through a special tool; (5) visual display: and displaying the data and the analysis result through front-end display software. Has the advantages that: through analyzing and mining internet e-commerce transaction big data, the collected e-commerce data is analyzed, and information with different dimensionalities is provided to support government control and trend early warning of regional e-commerce.

Description

E-commerce data analysis method
Technical Field
The invention relates to an e-commerce data analysis method.
Background
With the vigorous development of electronic commerce, online shopping becomes a current necessity, and an e-commerce platform also generates massive data in development; due to the development of big data technology and artificial intelligence technology, a brand-new mode and value are brought to the e-commerce platform, and the accumulated data of the user data warehouse of the existing e-commerce platform is not fully utilized, so that a unified and complete data view facing the whole e-commerce industry is lacked.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides an e-commerce data analysis method which analyzes the e-commerce data acquired by analyzing and mining internet e-commerce transaction big data and provides information with different dimensions to support government control and early warning of regional e-commerce and trend.
In order to achieve the purpose, the invention adopts the following technical scheme: an e-commerce data analysis method comprises the following steps:
(1) data acquisition: collecting Internet e-commerce transaction data;
(2) storing data;
(3) data processing: firstly, judging the quality and the characteristics of data, and then cleaning, integrating and standardizing the data;
(4) and (3) data analysis: analyzing the collected data through a special tool;
(5) visual display: and displaying the data and the analysis result through front-end display software.
Further, data acquisition: including time, condition, format, content, length, and constraint of data generation.
Further, data storage: a parallel processing technology based on Hadoop and a non-relational database are adopted to build a non-structured data management cloud platform, and the requirements of mass storage, high concurrent reading and writing, data reliability, complex association analysis and high expandability of big data are met.
Further, data processing: firstly, judging the quality and the characteristics of data, and then cleaning, integrating and standardizing the data;
judging the data quality mainly by checking whether dirty data exists in the original data;
the cleaning is to replace or delete missing values, abnormal values, inconsistent values, repeated data and data containing special symbols in the data;
the integration is to combine different information acquisition tables according to a certain rule to generate a finished information table for subsequent processing;
the standardization processing mainly comprises the steps of processing the address information of the industry and commerce and dividing the collected enterprise addresses according to standard province, city and county divisions.
Further, data processing: the method comprises the processing of store billing, information messy codes, discrete values and missing values, the correspondence of sales of stores and commodities and the correspondence of store and enterprise information.
Further, data analysis: and analyzing the data by utilizing Python, including overall analysis, B2B analysis, B2C analysis and analysis of the obtained e-commerce data according to the platform, the region, the field, the industry, the brand, the commodity category and the time dimension, and generating an analysis report.
Further, the front-end display software comprises: EXCEL, PPT, Word.
Further, updating and replacing the existing enterprise information in the database according to the latest acquired enterprise information, wherein the latest acquired enterprise data is not in the database, and dividing and warehousing the latest acquired enterprise data according to the standard province, city and county division and the address;
preprocessing commodity information, separating agricultural product data, deleting data with a list in the collected commodity information, and separating the commodity data belonging to the agricultural product according to commodity category id to store separately;
respectively calculating the sales volume ring ratio of the month according to the store information of different platforms, and summarizing the store information, the enterprise information and the commodity information;
according to different platforms, calculating sales of the processed shop lists, the shop grade lists, the commodity lists and the enterprise information lists according to commodities corresponding to shops, and correspondingly converging enterprise names and enterprise information of the shops into a complete platform shop sales information list;
processing abnormal stores, screening stores 100 before the sales amount in the current month, checking whether commodities are live broadcast or single-printed commodities, and then processing the live broadcast or single-printed commodities;
after checking whether the sales volume information of the shops has abnormal shop goods, recording the processed goods and modifying the corresponding sales volume of the shops. And the sub-platform integrally checks whether the sales trend of the shop collected in the current month is abnormal or not, and if so, checks and solves the problem.
Furthermore, the collected and collected data are imported into a corresponding database, and the data are subjected to overall analysis in the database to check whether the data are abnormal or not.
And further, outputting different reports according to different dimensions, and then generating different dimension information tables in a database according to the address, the business of store owner and business, agriculture and the scoring dimension of the store for subsequent use.
The invention has the beneficial effects that: the collected mass data are analyzed by appropriate statistical analysis methods, and are summarized, understood and digested in order to maximize the data development function and play a role of the data. The method comprises the steps of analyzing and mining big data of internet e-commerce transactions, analyzing the collected e-commerce data, and matching the analyzed e-commerce data with basic enterprise information to improve information such as sale amount, sale quantity, regional e-commerce general situation and the like of the e-commerce enterprises. Classifying and accounting the data according to different dimensions, wherein the dimensions include but are not limited to regions, fields, platforms, industries, brands, commodity categories and the like;
through analyzing and mining internet e-commerce transaction big data, the collected e-commerce data is analyzed, and information with different dimensionalities is provided to support government control and trend early warning of regional e-commerce.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The technical solution in the embodiment of the present invention will be clearly and completely described below with reference to fig. 1, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not a whole embodiment.
Embodiment 1 an e-commerce data analysis method, the steps are as follows:
(1) data acquisition: collecting Internet e-commerce transaction data;
(2) storing data;
(3) data processing: firstly, judging the quality and the characteristics of data, and then cleaning, integrating and standardizing the data;
(4) and (3) data analysis: analyzing the collected data through a special tool;
(5) visual display: and displaying the data and the analysis result through front-end display software.
An e-commerce data analysis method comprises the following steps: including time, condition, format, content, length, and constraint of data generation.
An e-commerce data analysis method comprises the following steps: a parallel processing technology based on Hadoop and a non-relational database are adopted to build a non-structured data management cloud platform, and the requirements of mass storage, high concurrent reading and writing, data reliability, complex association analysis and high expandability of big data are met.
An e-commerce data analysis method comprises the following steps: firstly, judging the quality and the characteristics of data, and then cleaning, integrating and standardizing the data;
judging the data quality mainly by checking whether dirty data exists in the original data;
the cleaning is to replace or delete missing values, abnormal values, inconsistent values, repeated data and data containing special symbols in the data;
the integration is to combine different information acquisition tables according to a certain rule to generate a finished information table for subsequent processing;
the standardization processing mainly comprises the steps of processing the address information of the industry and commerce and dividing the collected enterprise addresses according to standard province, city and county divisions.
An e-commerce data analysis method comprises the following steps: the method comprises the steps of processing shop billing, information messy codes, discrete values and missing values, corresponding the sales of shops and commodities and corresponding shop and enterprise information.
An e-commerce data analysis method comprises the following steps: and analyzing the data by utilizing Python, including overall analysis, B2B analysis and B2C analysis, and analyzing the obtained e-commerce data according to a platform, a region, a field, an industry, a brand, a commodity category and a time dimension to generate an analysis report.
A method for analyzing e-commerce data comprises the following steps of: EXCEL, PPT, Word.
The e-commerce data analysis method comprises the steps of updating and replacing existing enterprise information in a database according to newly acquired enterprise information, wherein the newly acquired enterprise data are not in the database, and are divided and stored according to addresses according to standard province and city county division;
preprocessing commodity information, separating agricultural product data, deleting data with a list in the collected commodity information, and separating and storing the commodity data belonging to the agricultural products according to the commodity category id;
respectively calculating the sales volume ring ratio of the month according to the store information of different platforms, and summarizing the store information, the enterprise information and the commodity information;
according to different platforms, calculating sales of the processed shop lists, the shop grade lists, the commodity lists and the enterprise information lists according to commodities corresponding to shops, and correspondingly converging enterprise names and enterprise information of the shops into a complete platform shop sales information list;
processing abnormal stores, screening stores 100 before the sales amount in the current month, checking whether commodities are live broadcast or single-printed commodities, and then processing the live broadcast or single-printed commodities;
after checking whether the sales volume information of the shops has abnormal shop goods, recording the processed goods and modifying the corresponding sales volume of the shops. And the sub-platform integrally checks whether the sales trend of the shop collected in the current month is abnormal or not, and if so, checks and solves the problem.
The E-commerce data analysis method is characterized in that collected and collected data are imported into a corresponding database, and the data are subjected to overall analysis in the database to check whether the data are abnormal or not.
Different reports are output according to different dimensions, and then different dimension information tables are generated in a database according to addresses, the business of store owner and business, agriculture and the scoring dimensions of stores for subsequent use.
The collected mass data are analyzed by appropriate statistical analysis methods, and are summarized, understood and digested in order to maximize the data development function and play a role of the data. The method comprises the steps of analyzing and mining big data of internet e-commerce transactions, analyzing the collected e-commerce data, and matching the analyzed e-commerce data with basic enterprise information to improve information such as sale amount, sale quantity, regional e-commerce general situation and the like of the e-commerce enterprises. Classifying and accounting the data according to different dimensions, wherein the dimensions include but are not limited to regions, fields, platforms, industries, brands, commodity categories and the like;
through analyzing and mining internet e-commerce transaction big data, the collected e-commerce data is analyzed, and information with different dimensionalities is provided to support government control and trend early warning of regional e-commerce.
Python was designed by Guido van Rossum, the institute of mathematics and computer science, Netherlands, in the early 1990's as a substitute for the language named ABC. Python provides an efficient high-level data structure and also enables simple and efficient object-oriented programming. Python syntax and dynamic types, as well as the nature of interpreted languages, make it a programming language for scripting and rapid development of applications on most platforms, and with the continual updating of versions and the addition of new functionality in languages, it is increasingly used for the development of independent, large projects.
Numpy (numerical Python) is an open source numerical calculation extension of Python. The tool can be used for storing and processing large matrixes, is more efficient than a nested list structure (structure) of a Python per se (the structure can also be used for representing matrixes (matrix)), supports a large number of dimensional arrays and matrix operations, and provides a large number of mathematical function libraries for array operations.
pandas is a NumPy-based tool that was created to address data analysis tasks. Pandas incorporates a large number of libraries and some standard data models, providing the tools needed to efficiently manipulate large datasets.
MySQL is a Relational Database Management System, one of the most popular Relational Database Management systems, and is one of the best RDBMS (Relational Database Management System) application software in terms of WEB applications.
Structured Query Language (SQL), a special purpose programming Language, is a database Query and programming Language for accessing data and querying, updating, and managing relational database systems.
VBA (Visual Basic for applications) is a macro language for Visual Basic, which is a programming language for performing common automation (OLE) tasks in its desktop applications. The method can be mainly used for expanding the application program functions of Windows, in particular Microsoft Office software. It may also be said to be a Basic script that is visualized by an application.
Pearson Correlation Coefficient (Pearson Correlation Coefficient) is used to measure whether two data sets are on a line, and is used to measure the linear relation between distance variables.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be able to cover the technical scope of the present invention and the equivalent alternatives or modifications according to the technical solution and the inventive concept of the present invention within the technical scope of the present invention.

Claims (10)

1. An e-commerce data analysis method is characterized in that: the method comprises the following steps:
(1) data acquisition: collecting Internet e-commerce transaction data;
(2) storing data;
(3) data processing: firstly, judging the quality and the characteristics of data, and then cleaning, integrating and standardizing the data;
(4) and (3) data analysis: analyzing the collected data through a special tool;
(5) visual display: and displaying the data and the analysis result through front-end display software.
2. An e-commerce data analysis method according to claim 1, wherein: data acquisition: including time, condition, format, content, length, and constraint of data generation.
3. An e-commerce data analysis method according to claim 1, wherein: data storage: a parallel processing technology based on Hadoop and a non-relational database are adopted to build a non-structured data management cloud platform, and the requirements of mass storage, high concurrent reading and writing, data reliability, complex association analysis and high expandability of big data are met.
4. An e-commerce data analysis method according to claim 1, wherein: data processing: firstly, judging the quality and the characteristics of data, and then cleaning, integrating and standardizing the data;
judging the data quality mainly by checking whether dirty data exists in the original data;
the cleaning is to replace or delete missing values, abnormal values, inconsistent values, repeated data and data containing special symbols in the data;
the integration is to combine different information acquisition tables according to a certain rule to generate a finished information table for subsequent processing;
the standardization processing mainly comprises the steps of processing the address information of the industry and commerce and dividing the collected enterprise addresses according to standard province, city and county divisions.
5. An e-commerce data analysis method according to claim 4, wherein: data processing: the method comprises the processing of store billing, information messy codes, discrete values and missing values, the correspondence of sales of stores and commodities and the correspondence of store and enterprise information.
6. An e-commerce data analysis method according to claim 1, wherein: and (3) data analysis: and analyzing the data by utilizing Python, including overall analysis, B2B analysis, B2C analysis and analysis of the obtained e-commerce data according to the platform, the region, the field, the industry, the brand, the commodity category and the time dimension, and generating an analysis report.
7. An e-commerce data analysis method according to claim 1, wherein: the front-end display software comprises: EXCEL, PPT, Word.
8. An e-commerce data analysis method according to claim 5, wherein: updating and replacing the existing enterprise information in the database according to the newly acquired enterprise information, wherein the newly acquired enterprise data is not available in the database, and dividing and warehousing the newly acquired enterprise information according to the standard province, city and county division and the address;
preprocessing commodity information, separating agricultural product data, deleting data with a list in the collected commodity information, and separating and storing the commodity data belonging to the agricultural products according to the commodity category id;
respectively calculating the sales volume ring ratio of the month according to the store information of different platforms, and summarizing the store information, the enterprise information and the commodity information;
according to different platforms, calculating sales of the processed shop lists, the shop grade lists, the commodity lists and the enterprise information lists according to commodities corresponding to shops, and correspondingly converging enterprise names and enterprise information of the shops into a complete platform shop sales information list;
processing abnormal stores, screening stores 100 before the sales amount in the current month, checking whether commodities are live broadcast or single-printed commodities, and then processing the live broadcast or single-printed commodities;
after checking whether the sales volume information of the shops has abnormal shop goods, recording the processed goods and modifying the corresponding sales volume of the shops. And the sub-platform integrally checks whether the sales trend of the shop collected in the current month is abnormal or not, and if so, checks and solves the problem.
9. An e-commerce data analysis method according to claim 4, wherein: and importing the collected data into a corresponding database, and carrying out overall analysis on the data in the database to check whether the data is abnormal or not.
10. An e-commerce data analysis method according to claim 7, wherein: and outputting different reports according to different dimensions, and then generating different dimension information tables in a database according to the address, the business of store owner and business, agriculture and the store scoring dimension for subsequent use.
CN202210367597.8A 2022-04-08 2022-04-08 E-commerce data analysis method Pending CN114676123A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210367597.8A CN114676123A (en) 2022-04-08 2022-04-08 E-commerce data analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210367597.8A CN114676123A (en) 2022-04-08 2022-04-08 E-commerce data analysis method

Publications (1)

Publication Number Publication Date
CN114676123A true CN114676123A (en) 2022-06-28

Family

ID=82078523

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210367597.8A Pending CN114676123A (en) 2022-04-08 2022-04-08 E-commerce data analysis method

Country Status (1)

Country Link
CN (1) CN114676123A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091175A (en) * 2023-04-10 2023-05-09 南京航空航天大学 Transaction information data management system and method based on big data
CN116342230A (en) * 2023-05-31 2023-06-27 深圳洽客科技有限公司 Electronic commerce data storage platform based on big data analysis

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116091175A (en) * 2023-04-10 2023-05-09 南京航空航天大学 Transaction information data management system and method based on big data
CN116091175B (en) * 2023-04-10 2023-08-22 南京航空航天大学 Transaction information data management system and method based on big data
CN116342230A (en) * 2023-05-31 2023-06-27 深圳洽客科技有限公司 Electronic commerce data storage platform based on big data analysis
CN116342230B (en) * 2023-05-31 2023-08-08 深圳洽客科技有限公司 Electronic commerce data storage platform based on big data analysis

Similar Documents

Publication Publication Date Title
Agarwal Data mining: Data mining concepts and techniques
US7007020B1 (en) Distributed OLAP-based association rule generation method and system
CN114676123A (en) E-commerce data analysis method
US8140549B2 (en) Methods and arrangements of processing and presenting information
CN108664637B (en) Retrieval method and system
Buza et al. Storage-optimizing clustering algorithms for high-dimensional tick data
JP6586184B2 (en) Data analysis support device and data analysis support method
CN110929969A (en) Supplier evaluation method and device
CN108052542B (en) Multidimensional data analysis method based on presto data
CN113220728B (en) Data query method, device, equipment and storage medium
Ordonez et al. Extending ER models to capture database transformations to build data sets for data mining
Bhaskara et al. Data warehouse implemantation to support batik sales information using MOLAP
US20100293450A1 (en) System and method for simulating discrete financial forecast calculations
Gang et al. The research & application of Business Intelligence system in retail industry
CN110009796B (en) Invoice category identification method and device, electronic equipment and readable storage medium
US6947878B2 (en) Analysis of retail transactions using gaussian mixture models in a data mining system
US20070282804A1 (en) Apparatus and method for extracting database information from a report
Schütz et al. Business model ontologies in OLAP cubes
Anggrainy et al. Implementation of extract, transform, load on data warehouse and business intelligence using pentaho and tableau to analyse sales performance of offlist store
CN107301211B (en) Online data processing method
WO2007089378A2 (en) Apparatus and method for forecasting control chart data
Faiz Multi-approaches on scrubbing data for medium-sized enterprises
Sulo et al. DaVis: a tool for visualizing data quality
CN113240353B (en) Cross-border e-commerce oriented export factory classification method and device
US20240119069A1 (en) Model-based determination of change impact for groups of diverse data objects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination