CN112131302A - Business data analysis method and platform - Google Patents
Business data analysis method and platform Download PDFInfo
- Publication number
- CN112131302A CN112131302A CN202010936462.XA CN202010936462A CN112131302A CN 112131302 A CN112131302 A CN 112131302A CN 202010936462 A CN202010936462 A CN 202010936462A CN 112131302 A CN112131302 A CN 112131302A
- Authority
- CN
- China
- Prior art keywords
- data
- business
- processing
- analysis
- service
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000007405 data analysis Methods 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000012545 processing Methods 0.000 claims abstract description 44
- 238000004458 analytical method Methods 0.000 claims abstract description 17
- 238000005516 engineering process Methods 0.000 claims abstract description 15
- 238000003860 storage Methods 0.000 claims abstract description 11
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 238000013500 data storage Methods 0.000 claims abstract description 9
- 238000013467 fragmentation Methods 0.000 claims abstract 4
- 238000006062 fragmentation reaction Methods 0.000 claims abstract 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 14
- 238000005065 mining Methods 0.000 claims description 11
- 238000013480 data collection Methods 0.000 claims description 7
- 238000013079 data visualisation Methods 0.000 claims description 7
- 238000013499 data model Methods 0.000 claims description 6
- 238000007619 statistical method Methods 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 238000010224 classification analysis Methods 0.000 claims description 5
- 238000010219 correlation analysis Methods 0.000 claims description 5
- 230000002452 interceptive effect Effects 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 5
- 230000000007 visual effect Effects 0.000 claims description 5
- 238000012517 data analytics Methods 0.000 claims 7
- 238000004140 cleaning Methods 0.000 abstract description 2
- 230000003203 everyday effect Effects 0.000 abstract 1
- 238000004519 manufacturing process Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000003442 weekly effect Effects 0.000 description 3
- 241001178520 Stomatepia mongo Species 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 241001236093 Bulbophyllum maximum Species 0.000 description 1
- 238000012098 association analyses Methods 0.000 description 1
- 238000010923 batch production Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a business data analysis method, which generates basic data required by each dimension in a mode of running batch preprocessing of most basic business flow data, adopts a fragmentation technology and a scheme of Elastic Job open source middleware to formulate an optimal fragmentation strategy to process each task into millions and tens of millions of data in a large batch, breaks mass data into configurable small blocks to carry out mutually independent running batch processing and combines multithreading to finish analysis processing of mass data every day. The commercial data analysis method adopts a distributed task technology to realize the purposes of cleaning and analyzing the big data by combining the characteristics of high-performance reading and storage of MongoDB big data storage.
Description
Technical Field
The invention relates to the field of big data analysis, in particular to a business data analysis method.
Background
With the development of company business, hundreds of millions of mass data are achieved, and data with different dimensions need to be cleaned, analyzed and extracted from the mass data in a targeted manner so as to facilitate macro analysis, data mining, business driving, data service provision of each business and the like.
How to effectively clean, filter, analyze, process and store mass data is a problem which needs to be solved urgently.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a business data analysis method, which adopts a distributed task technology to realize the purposes of cleaning and analyzing big data by combining the characteristics of high-performance reading and storage of MongoDB big data storage.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a business data analysis method, comprising:
s1, data collection: the data comes from the main service flow water meter of each service group;
s2, preprocessing data: screening and integrating business data sets collected from one or more data sources to ensure the validity and the valuability of data needing to be analyzed;
s3, data processing and analysis: carrying out parallel analysis and processing on the mass data, mining the data relevance in a big data set, forming an image of a description mode or an attribute rule of an object, and constructing a data model and mass training data to improve the accuracy of data analysis and prediction;
s4, data storage: storing the processed data by using a mongoDB database;
s5, data visualization and application: the data is displayed to the user in a visual mode of computer graphics or images, interactive processing can be carried out between the data and the user, and data results are provided externally in an API service mode to meet application scenes.
Further, in the step S1: and the main service flow table records the most initial and most original data flow of each service function.
Further, in the step S1: the main service flow meter comprises a order flow water meter, a transaction flow water meter and a user behavior log table.
Further, in the step S2: the data source comprises a homogeneous or heterogeneous database, a file system and a service interface.
Further, in the step S3: by adopting a distributed processing technology and a storage form, the massive data is analyzed and processed in parallel, distributed statistical analysis is performed on various structured and unstructured data, and distributed mining is performed on unknown data.
Further, in the step S3: the parallel analysis and processing of the mass data comprises sorting, statistics, processing, clustering and classification and correlation analysis.
Further, in the step S4: after the data is stored, efficient query service can be provided.
The invention also discloses a business data analysis platform, comprising:
a data collection module: for collecting data from the main service flow meter of each service group;
a data preprocessing module: the system is used for screening and integrating the business data sets collected from one or more data sources so as to ensure the validity and the valuability of the data needing to be analyzed;
the data processing and analyzing module: the system is used for analyzing and processing mass data in parallel, mining data relevance in a big data set, forming an image of a description mode or an attribute rule of an object, and constructing a data model and mass training data to improve the accuracy of data analysis and prediction;
a data storage module: used for storing the processed data through the mongoDB database;
a data visualization and application module: the system is used for displaying to a user in a visual mode of computer graphics or images, can perform interactive processing with the user, and provides data results externally in an API service mode to meet application scenes.
Further, the main business pipeline table records the most initial and most original data pipeline of each business function.
Further, the main service flow meter comprises a order flow water meter, a transaction flow water meter and a user behavior log table.
Further, the data source comprises a homogeneous or heterogeneous database, a file system and a service interface.
Further, by adopting a distributed processing technology and a storage form, the massive data is analyzed and processed in parallel, distributed statistical analysis is performed on various structured and unstructured data, and distributed mining is performed on unknown data.
Further, the parallel analysis and processing of the mass data includes sorting, statistics, processing, clustering and classification, and correlation analysis.
Furthermore, after the data is stored, efficient query service can be provided.
The invention has the beneficial effects that:
1. according to the technical scheme, distributed multi-concurrency is adopted, multi-batch data batch execution is performed, the generation performance of the whole data is obviously improved, and the condition that the shutdown of a single server does not influence the server of the whole cluster is ensured. Production environment real case: before the scheme is executed, if the basic data of a certain service on a certain day is produced, 3-5 hours are possibly needed, the production can be completed within 30 minutes (millions of orders of magnitude of running water data are processed in each batch, and only about 20 minutes are needed for analyzing hundreds of thousands of basic data and storing the basic data in a warehouse through a model, so that the high efficiency of data production and the reliability and accuracy of the data are ensured).
2. According to the scheme, the required service scene data is produced in advance, and the corresponding data query and operation functions are provided by utilizing the reasonable storage mode of mongo + mysql + redis. Through preprocessing, once the service end needs data, the data can immediately respond and acquire production data in real time, millisecond-level data response of a service scene is achieved, front-end service processing capacity and user experience are improved, and brand force is improved for products.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a schematic diagram of a big data processing flow structure of a business data analysis method of the present invention;
FIG. 2 is a schematic view of a weekly bulletin board in an embodiment of a business data analysis method of the present invention;
FIG. 3 is a schematic maximum value in an embodiment of a business data analysis method of the present invention;
FIG. 4 is a schematic diagram of a maximum time period in an embodiment of a business data analysis method of the present invention;
FIG. 5 is a schematic illustration of comparatives in an embodiment of a business data analysis method of the present invention;
FIG. 6 is a schematic diagram of a one-week transaction summary in an embodiment of a business data analysis method of the present invention;
FIG. 7 is a schematic diagram of a collection method in an embodiment of a business data analysis method of the present invention;
FIG. 8 is a flow chart of the management of batch tasks for a business data analysis method of the present invention;
FIG. 9 is a task-specific data production flow diagram of a business data analysis method of the present invention;
FIG. 10 is an Elastic Job task segmentation graph of a business data analysis method of the present invention.
Detailed Description
The conception, the specific structure, and the technical effects produced by the present invention will be clearly and completely described below in conjunction with the embodiments and the accompanying drawings to fully understand the objects, the features, and the effects of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and those skilled in the art can obtain other embodiments without inventive effort based on the embodiments of the present invention, and all embodiments are within the protection scope of the present invention. In addition, all the connection/connection relations referred to in the patent do not mean that the components are directly connected, but mean that a better connection structure can be formed by adding or reducing connection auxiliary components according to specific implementation conditions. All technical characteristics in the invention can be interactively combined on the premise of not conflicting with each other.
Abbreviations and key term definitions:
BDA: business Data Analysis (Business Data Analysis platform) system abbreviation;
elastic Job: third-party distributed task technologies;
MongoDB, big data storage database software;
redis, caching the database;
spring: and the third party java open source framework.
The invention discloses a business data analysis method, which comprises the following steps:
namely, based on the mass original data, the targeted analysis data is produced and processed, and the specific description is as follows, as shown in fig. 1:
1. data collection
Data is sourced from main service flow meters of various service groups, such as order flow meters, transaction flow meters, user behavior log tables and the like, and most initial and most original data flow of each service function is recorded.
2. Data pre-processing
The big data collection process usually has one or more data sources, and the data sources include isomorphic or heterogeneous databases, file systems, service interfaces, business control, data conflicts and other influences, so that the collected big data set needs to be preprocessed firstly to ensure the accuracy and the value of the big data analysis and prediction results.
3. Data processing and analysis
The distributed processing technology is adopted to be related to storage forms, service data types and the like, mass data are analyzed and processed in parallel, distributed statistical analysis of data and distributed mining of unknown data are carried out on various structured and unstructured data in a distributed mode,
the method comprises the steps of sorting, counting, processing, clustering and classifying, association analysis and the like, the data association in a big data set is mined, an portrait of a description mode or an attribute rule of an object is formed, and a data model and massive training data are constructed to improve the accuracy of data analysis and prediction.
4. Data storage
The mongoDB database is used for storing the processed data and providing efficient query service.
5. Data visualization and application
5.1 data results visualize the process displayed to the user in a computer graphic or image intuitive manner and can be processed interactively with the user. The data visualization technology is beneficial to finding out the regularity information hidden in a large amount of service data to support management decision, and also can greatly improve the intuitiveness of the big data analysis result, thereby being convenient for a user to understand and use.
5.2 the data result is provided by API service to the outside, which satisfies various application scenarios.
Because various service-oriented scenes are various and data are also numerous and complicated, the invention cannot effectively perform complete product representation, but a real service scene can be taken as an example to perform product description, and the product cases are as follows:
product case requirements: the system provides better service for the merchants, promotes the merchant weekly reporting function, provides the last natural weekly data summarization for the merchants, and provides the business tips for the merchants based on big data analysis.
A. Week bulletin board, as shown in fig. 2:
and (3) analyzing the demand:
first-degree data: transaction amount, transaction number.
Calculating data: the pens were all.
Analyzing data: percentage of ranks in the same city
The realization principle is as follows: 1. the original transaction stream contains the most basic transaction information including merchant number, transaction amount, transaction time, etc. 2. Summarizing all the trade streams of all the merchants in a week to obtain the total trade amount of the week and the first-degree data of the number of the trade strokes, and calculating to obtain average data.3. each merchant has own Unionpay zone code, and at the moment, the first-degree data (generated by statistics according to business model rules) after the merchants with the same zone code are summarized needs to be completed, then all the merchants with the zone code are sorted and screened according to the business rule models to obtain an analysis value (the data is second-degree analysis data produced on the basis of the first-degree data), and finally, the analyzed data is subjected to rule calculation to obtain a product demand result which is more than xx% of the xx city.
B. Maximum, as shown in fig. 3:
and (3) analyzing the demand:
first-degree data: one-cycle transaction streamline information with maximum one-cycle transaction
The realization principle is as follows: as above, the most updated transaction is recorded.
C. Maximum value period, as shown in fig. 4:
and (3) analyzing the demand:
second-degree data: transaction summary data for each time period of 24 hours of the week
The realization principle is as follows: 1. each time period data for each day, 2. summarize each time period value, 3. record update maximum time period value.
D. Comparatives, as shown in FIG. 5:
and (3) analyzing the demand:
history data: calculating the difference percentage compared with the last week transaction;
the realization principle is as follows: only historical production data need be queried.
E. A week transaction summary, as shown in fig. 6:
and (3) analyzing the demand:
first-degree data: transaction information for each day of the week;
the realization principle is as follows: the same as A. 1. And summarizing data of each day, and 2. updating the transaction date with the maximum record.
F. The charge method is shown in fig. 7:
and (3) analyzing the demand:
second-degree calculation data: various charge mode ratios;
the realization principle is as follows: the same as A. 1. And (2) summarizing each charging mode, and calculating each ratio value.
The data produced by the technical scheme is applied to a scene point of a product, the actual product has many application scenes and is very complex, the realized logic process is also very complex, and only a simple product point is used for example in the simplest way.
The technical scheme adopts a mode of running batch preprocessing of the most basic service pipeline data to generate the basic data required by each dimension, and a main implementation flow can be understood by referring to fig. 8 and 9.
The scheme adopts the slicing technology of the Elastic Job open-source middleware to formulate an optimal slicing strategy to process daily mass data, each task divides the mass data to be processed into a plurality of (configurable) small pieces of data to perform mutually independent batch processing, and high-efficiency and rapid data processing analysis is completed by combining multi-server multi-thread parallel. Please refer to fig. 10.
The system is highly decoupled, other systems are not needed, and tasks are independent from each other and are processed by a front-back association relation.
And performing batch running task processing when the pressure of the server is minimum according to system monitoring and execution, and reducing the pressure influence on the server and the database to the minimum.
And various specified basic data can be accurately produced by each task through a plurality of timing tasks in combination with error-tolerant mechanisms such as manual tasks and re-running.
By combining the relationship type characteristics of point and MySQL of quick reading and storage of MongoDB and the cache technology of redis, the system can perform optimal design scheme batch production data according to different service analysis model dimensions.
In the scheme, maintainability and expandability of data and a traceability system of historical data are established, for example: the service model analyzes statistical data of a certain service dimension time interval, for example: the system can be divided into a daily run batch, a monthly run batch, an annual run batch and the like. Once the data is found to be in a problem or the batch task fails, the scheme has perfect fault-tolerant re-running processing and generates corresponding data in a targeted manner, so that the system has high traceability on the aspect of data production.
In the aspect of performance during data production, the batch running task pressure can be uniformly dispersed to different servers according to different slicing strategies and the emphasis points of different tasks, and the CPU and the memory of the service are fully utilized.
The invention adopts a distributed task technology to horizontally split the task into a plurality of subtasks, and simultaneously, the efficiency is greatly improved, and the system resources are fully utilized. The subtasks are independent from each other and do not influence each other. In addition, the distributed mode also ensures the problem of disaster recovery of the downtime of the server and ensures the stable service when the analytic data production is executed.
FIG. 8 is a flow chart for managing a batching task, the chart including:
1. the production of a business datum, first generating the total task of the task, and recording the task state;
2. according to the business rule model, when detail tasks needing to be executed are generated, each valid data is a detail task, and the task state is recorded.
The invention also discloses a business data analysis platform, comprising:
a data collection module: for collecting data from the main service flow meter of each service group;
a data preprocessing module: the system is used for screening and integrating the business data sets collected from one or more data sources so as to ensure the validity and the valuability of the data needing to be analyzed;
the data processing and analyzing module: the system is used for analyzing and processing mass data in parallel, mining data relevance in a big data set, forming an image of a description mode or an attribute rule of an object, and constructing a data model and mass training data to improve the accuracy of data analysis and prediction;
a data storage module: used for storing the processed data through the mongoDB database;
a data visualization and application module: the system is used for displaying to a user in a visual mode of computer graphics or images, can perform interactive processing with the user, and provides data results externally in an API service mode to meet application scenes.
Further, the main business pipeline table records the most initial and most original data pipeline of each business function.
Further, the main service flow meter comprises a order flow water meter, a transaction flow water meter and a user behavior log table.
Further, the data source comprises a homogeneous or heterogeneous database, a file system and a service interface.
Further, by adopting a distributed processing technology and a storage form, the massive data is analyzed and processed in parallel, distributed statistical analysis is performed on various structured and unstructured data, and distributed mining is performed on unknown data.
Further, the parallel analysis and processing of the mass data includes sorting, statistics, processing, clustering and classification, and correlation analysis.
Furthermore, after the data is stored, efficient query service can be provided.
The beneficial effects of the technical scheme are as follows:
1. according to the technical scheme, distributed multi-concurrency is adopted, multi-batch data batch execution is performed, the generation performance of the whole data is obviously improved, and the condition that the shutdown of a single server does not influence the server of the whole cluster is ensured. Production environment real case: before the scheme is executed, if the basic data of a certain service on a certain day is produced, 3-5 hours are possibly needed, the production can be completed within 30 minutes (millions of orders of magnitude of running water data are processed in each batch, and only about 20 minutes are needed for analyzing hundreds of thousands of basic data and storing the basic data in a warehouse through a model, so that the high efficiency of data production and the reliability and accuracy of the data are ensured).
2. According to the scheme, the required service scene data is produced in advance, and the corresponding data query and operation functions are provided by utilizing the reasonable storage mode of mongo + mysql + redis. Through preprocessing, once the service end needs data, the data can immediately respond and acquire production data in real time, millisecond-level data response of a service scene is achieved, front-end service processing capacity and user experience are improved, and brand force is improved for products.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (14)
1. A business data analysis method, comprising:
s1, data collection: the data comes from the main service flow water meter of each service group;
s2, preprocessing data: screening and integrating business data sets collected from one or more data sources to ensure the validity and the valuability of data needing to be analyzed;
s3, data processing and analysis: adopting a fragmentation technology of Elastic Job open-source middleware and formulating an optimal fragmentation strategy, processing millions of data in batches of millions and tens of millions of data for each task, splitting mass data into configurable small blocks to perform mutually independent batch processing, performing parallel analysis and processing on the mass data, mining data relevance in a large data set, forming an image of a description mode or an attribute rule of an object, and constructing a data model and mass training data to improve the accuracy of data analysis and prediction;
s4, data storage: storing the processed data by using a mongoDB database;
s5, data visualization and application: the data is displayed to the user in a visual mode of computer graphics or images, interactive processing can be carried out between the data and the user, and data results are provided externally in an API service mode to meet application scenes.
2. The business data analysis method of claim 1, wherein in the step S1: and the main service flow table records the most initial and most original data flow of each service function.
3. The business data analysis method of claim 1, wherein in the step S1: the main service flow meter comprises a order flow water meter, a transaction flow water meter and a user behavior log table.
4. The business data analysis method of claim 1, wherein in the step S2: the data source comprises a homogeneous or heterogeneous database, a file system and a service interface.
5. The business data analysis method of claim 1, wherein in the step S3: by adopting a distributed processing technology and a storage form, the massive data is analyzed and processed in parallel, distributed statistical analysis is performed on various structured and unstructured data, and distributed mining is performed on unknown data.
6. The business data analysis method of claim 1, wherein in the step S3: the parallel analysis and processing of the mass data comprises sorting, statistics, processing, clustering and classification and correlation analysis.
7. The business data analysis method of claim 1, wherein in the step S4: after the data is stored, efficient query service can be provided.
8. A business data analytics platform, comprising:
a data collection module: for collecting data from the main service flow meter of each service group;
a data preprocessing module: the system is used for screening and integrating the business data sets collected from one or more data sources so as to ensure the validity and the valuability of the data needing to be analyzed;
the data processing and analyzing module: the system is used for analyzing and processing mass data in parallel, mining data relevance in a big data set, forming an image of a description mode or an attribute rule of an object, and constructing a data model and mass training data to improve the accuracy of data analysis and prediction;
a data storage module: used for storing the processed data through the mongoDB database;
a data visualization and application module: the system is used for displaying to a user in a visual mode of computer graphics or images, can perform interactive processing with the user, and provides data results externally in an API service mode to meet application scenes.
9. A business data analytics platform as claimed in claim 8, wherein: and the main service flow table records the most initial and most original data flow of each service function.
10. A business data analytics platform as claimed in claim 8, wherein: the main service flow meter comprises a order flow water meter, a transaction flow water meter and a user behavior log table.
11. A business data analytics platform as claimed in claim 8, wherein: the data source comprises a homogeneous or heterogeneous database, a file system and a service interface.
12. A business data analytics platform as claimed in claim 8, wherein: by adopting a distributed processing technology and a storage form, the massive data is analyzed and processed in parallel, distributed statistical analysis is performed on various structured and unstructured data, and distributed mining is performed on unknown data.
13. A business data analytics platform as claimed in claim 8, wherein: the parallel analysis and processing of the mass data comprises sorting, statistics, processing, clustering and classification and correlation analysis.
14. A business data analytics platform as claimed in claim 8, wherein: after the data is stored, efficient query service can be provided.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010936462.XA CN112131302B (en) | 2020-09-08 | 2020-09-08 | Commercial data analysis method and platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010936462.XA CN112131302B (en) | 2020-09-08 | 2020-09-08 | Commercial data analysis method and platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112131302A true CN112131302A (en) | 2020-12-25 |
CN112131302B CN112131302B (en) | 2024-05-07 |
Family
ID=73845088
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010936462.XA Active CN112131302B (en) | 2020-09-08 | 2020-09-08 | Commercial data analysis method and platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112131302B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104915793A (en) * | 2015-06-30 | 2015-09-16 | 北京西塔网络科技股份有限公司 | Public information intelligent analysis platform based on big data analysis and mining |
US20170235466A1 (en) * | 2015-06-17 | 2017-08-17 | NetSuite Inc. | System and Method to Generate Interactive User Interface for Visualizing and Navigating Data or Information |
CN110618983A (en) * | 2019-08-15 | 2019-12-27 | 复旦大学 | JSON document structure-based industrial big data multidimensional analysis and visualization method |
CN111158672A (en) * | 2019-12-31 | 2020-05-15 | 浪潮云信息技术有限公司 | Integrated interactive Elastic MapReduce job management method |
-
2020
- 2020-09-08 CN CN202010936462.XA patent/CN112131302B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170235466A1 (en) * | 2015-06-17 | 2017-08-17 | NetSuite Inc. | System and Method to Generate Interactive User Interface for Visualizing and Navigating Data or Information |
CN104915793A (en) * | 2015-06-30 | 2015-09-16 | 北京西塔网络科技股份有限公司 | Public information intelligent analysis platform based on big data analysis and mining |
CN110618983A (en) * | 2019-08-15 | 2019-12-27 | 复旦大学 | JSON document structure-based industrial big data multidimensional analysis and visualization method |
CN111158672A (en) * | 2019-12-31 | 2020-05-15 | 浪潮云信息技术有限公司 | Integrated interactive Elastic MapReduce job management method |
Also Published As
Publication number | Publication date |
---|---|
CN112131302B (en) | 2024-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Isah et al. | A survey of distributed data stream processing frameworks | |
AU2017202873B2 (en) | Efficient query processing using histograms in a columnar database | |
CN103513983B (en) | method and system for predictive alert threshold determination tool | |
US8983895B2 (en) | Representation of multiplicities for Docflow reporting | |
Souza et al. | Provenance data in the machine learning lifecycle in computational science and engineering | |
CN111339073A (en) | Real-time data processing method and device, electronic equipment and readable storage medium | |
CA2900287C (en) | Queue monitoring and visualization | |
US20200097483A1 (en) | Novel olap pre-calculation model and method for generating pre-calculation result | |
CN112506743A (en) | Log monitoring method and device and server | |
US11288258B2 (en) | Dedicated audit port for implementing recoverability in outputting audit data | |
WO2011090519A1 (en) | Accessing large collection object tables in a database | |
US20240037084A1 (en) | Method and apparatus for storing data | |
Rabl et al. | The vision of BigBench 2.0 | |
US8250024B2 (en) | Search relevance in business intelligence systems through networked ranking | |
CN112131302B (en) | Commercial data analysis method and platform | |
US20070179922A1 (en) | Apparatus and method for forecasting control chart data | |
CN114860819A (en) | Method, device, equipment and storage medium for constructing business intelligent system | |
CN114661571A (en) | Model evaluation method, model evaluation device, electronic equipment and storage medium | |
CN114281494A (en) | Data full life cycle management method, system, terminal device and storage medium | |
CN109656981B (en) | Data statistics method and system | |
US11281689B1 (en) | Distributed interaction feature generation system | |
Khatiwada | Architectural issues in real-time business intelligence | |
US20240242154A1 (en) | Generative Business Intelligence | |
US20240264986A1 (en) | Automated, In-Context Data Quality Annotations for Data Analytics Visualization | |
Palpanas | A knowledge mining framework for business analysts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |