CN115309749A - Big data experiment system for scientific and technological service - Google Patents

Big data experiment system for scientific and technological service Download PDF

Info

Publication number
CN115309749A
CN115309749A CN202211030785.8A CN202211030785A CN115309749A CN 115309749 A CN115309749 A CN 115309749A CN 202211030785 A CN202211030785 A CN 202211030785A CN 115309749 A CN115309749 A CN 115309749A
Authority
CN
China
Prior art keywords
data
scientific
module
technological
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211030785.8A
Other languages
Chinese (zh)
Inventor
费敏锐
李晨辉
周文举
徐昱琳
王海宽
易开祥
吕泽昊
沈赟怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN202211030785.8A priority Critical patent/CN115309749A/en
Publication of CN115309749A publication Critical patent/CN115309749A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a big data experiment system for scientific and technological services, which comprises a data import module, a data cleaning module, a data warehouse construction module and a data association analysis module. The data import module imports a behavior log of a user to a scientific and technological service platform and a business database into the distributed file storage system; the data cleaning module is used for extracting, converting and loading data of the distributed file storage system; the data warehouse construction module establishes a star model based on the relation of the fact table and the dimension table and carries out layered processing on the data warehouse; the data association analysis module adopts an efficient association processing technology for the data warehouse, and meets the multi-dimensional and high-association data analysis operation in online analysis and processing (OLAP). The invention meets the requirement of sub-second level analysis query in the large-scale data OLAP production environment of the scientific and technological service platform through the data management full life cycle of data acquisition, cleaning, storage, analysis and the like in the scientific and technological service platform.

Description

Big data experiment system for scientific and technological service
Technical Field
The invention relates to the technical field of scientific and technological services, in particular to the technical field of big data of scientific and technological service platforms, and specifically relates to a big data experiment system for scientific and technological services.
Background
In recent years, the scientific and technological service industry of China opens a brand-new stable situation, and scientific and technological service platforms in many areas are brought forward. However, since the early development stage is limited by influence and the number of user enterprises, the platform is mostly a front-end architecture based on the platform. With the improvement of long-term operation influence of the scientific and technological service platform and the introduction of scientific and technological resources of a large number of user enterprises, more and more data including user behavior data and business system data can be generated, and the traditional relational database is difficult to deal with the correlation operation and storage of large-scale data, so that the development of the scientific and technological service platform needs to be realized by means of a big data technology. Meanwhile, the difficulty of data management is aggravated by the increase of data and multi-source isomerism, more importantly, the data should be orderly circulated in the whole analysis, the whole flow of the data can be clearly and definitely mastered and used, and the layered construction of the scientific and technological service data warehouse plays an important role in the process. The current science and technology service big data platform should support complex analysis operation, emphasizes decision support, provides visual query results, and simultaneously excavates data value and provides decision basis for the operation of the science and technology service platform.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a big data experiment system for scientific and technological services, which has the advantages of abundant data, relative stability and wider application range.
In order to achieve the above object, the big data experiment system for scientific and technological services of the present invention is as follows:
based on the technical problems, the invention discloses a big data experiment system for scientific and technological services, which comprises a data import module, a data cleaning module, a data warehouse construction module and a data association analysis module,
the data import module is connected with the scientific and technological service platform and used for collecting the behavior logs of the user to the scientific and technological service platform into the distributed storage system and importing the behavior logs into the distributed storage system from the business database of the scientific and technological service platform;
the data cleaning module is connected with the data import module and is used for extracting, converting and loading import data in the distributed file storage system;
the data warehouse building module is connected with the data cleaning module and used for building a star model based on the relationship between the fact table and the dimension table of the data and carrying out layered processing on the data warehouse;
and the data association analysis module is connected with the data warehouse construction module and is used for performing association processing on the established data warehouse and performing multi-dimensional and high-association data analysis operation in online analysis processing.
Preferably, the data importing module includes:
the embedded point triggering program part is connected with the data cleaning module and is used for embedding key events in a scientific and technological service platform and simultaneously implanting a front-end embedded point triggering unit based on js sdk and a rear-end event triggering unit based on java sdk to track the behaviors of the user;
the data acquisition part is connected with the data cleaning module and is used for acquiring behavior log data to a distributed file storage system in real time through a flash tool and loading the behavior log data to a data warehouse in batches;
and the data import part is connected with the data cleaning module and is used for directly importing the relational service database data comprising a scientific and technological resource data table, a user information table and an order information table into a data warehouse through a Sqoop tool.
Preferably, the data warehouse construction module comprises an original data layer, a detailed data layer, a service data layer and an application data layer,
the original data layer stores original data including user behavior log data and system service data;
the detail data layer stores data cleaned by data, and the data comprises data obtained by performing judgment and repeated filtering on original data and data obtained by performing dimensionality reduction and degradation on a resource classification table;
the service data layer stores the slightly aggregated data, and obtains a preliminary result according to a table related to business association;
and the application data layer stores related data according to specific required services.
Preferably, the data association analysis module includes:
the correlation calculation engine unit is used for correlating the table data correlation operation among all layers of the scientific and technological service data warehouse;
the analysis business unit is used for performing correlation optimization based on an OLAP analysis type data warehouse tool engine, and performing related statistical work and data mining tasks according to actual business;
and the service display unit is used for providing lightweight data query and visualization for the data result service and providing a query billboard and a graphical interface.
By adopting the big data experiment system for scientific and technological service, the invention has the following beneficial effects:
1. the invention is built around a scientific and technological service platform and aims to develop the scientific and technological service platform. The experimental system is aimed at four industries of modern service industry, marine economy, cultural tourism and international medical care, converges diversified scientific and technological service resources, provides nine services such as research and development, intellectual property, scientific and technological finance, technical transfer, entrepreneurship incubation, scientific and technological consultation, scientific and technological popularization, inspection and detection authentication and comprehensive scientific and technological service for government departments, parks, enterprises and universities, and assists a scientific and technological service platform and the high-quality development of the scientific and technological service industry.
2. The method has rich data sources, is not limited to the business database, collects more valuable user behavior data into the system, and meets the requirements of a scientific and technological service platform on multi-source heterogeneous data. Meanwhile, the data can be orderly circulated in the subsequent processes of cleaning, storing, analyzing and the like, and the sub-second level analysis query can be achieved in the large-scale data OLAP production environment of a science and technology service platform.
3. The technical service data warehouse is constructed on the basis of the technical service platform and is used for supporting the management decision process of the technical service platform. The data warehouse is subjected to layered integrated management, is relatively stable, reflects historical changes and is oriented to science and technology service topics, complex analysis operation is supported, visual query results are provided, data values are mined, and potential requirements of users are explored.
Drawings
FIG. 1 is a schematic diagram illustrating an operation process of a big data experiment system for scientific and technical services according to the present invention.
FIG. 2 is a diagram of a data acquisition architecture of a big data experiment system for scientific and technical services according to the present invention.
FIG. 3 is a dimensional modeling diagram of a big data experiment system for scientific and technical services according to the present invention.
FIG. 4 is a diagram of a scientific and technological service data warehouse hierarchy of the big data experiment system for scientific and technological services of the present invention.
Detailed Description
In order to more clearly describe the technical contents of the present invention, the following further description is given in conjunction with specific embodiments.
The big data experiment system for scientific and technological service comprises a data import module, a data cleaning module, a data warehouse construction module and a data association analysis module,
the data import module is connected with the scientific and technological service platform and used for collecting the behavior logs of the user to the scientific and technological service platform into the distributed storage system and importing the behavior logs into the distributed storage system from the business database of the scientific and technological service platform;
the data cleaning module is connected with the data import module and is used for extracting, converting and loading import data in the distributed file storage system;
the data warehouse building module is connected with the data cleaning module and used for building a star model based on the relationship between the fact table and the dimension table of the data and carrying out layered processing on the data warehouse;
and the data association analysis module is connected with the data warehouse construction module and is used for performing association processing on the established data warehouse and performing multi-dimensional and high-association data analysis operation in online analysis processing.
As a preferred embodiment of the present invention, the data importing module includes:
the embedded point triggering program part is connected with the data cleaning module and is used for embedding key events in a scientific and technological service platform and simultaneously implanting a front-end embedded point triggering unit based on js sdk and a rear-end event triggering unit based on java sdk to track the behaviors of the user;
the data acquisition part is connected with the data cleaning module and is used for acquiring behavior log data to a distributed file storage system in real time through a flash tool and loading the behavior log data to a data warehouse in batches;
and the data import part is connected with the data cleaning module and is used for directly importing the relational service database data comprising a scientific and technological resource data table, a user information table and an order information table into a data warehouse through an Sqoop tool.
In a preferred embodiment of the present invention, the data warehouse construction module comprises an original data layer, a detail data layer, a service data layer and an application data layer,
the original data layer stores original data including user behavior log data and system service data;
the detail data layer stores data cleaned by data, and the data comprises data obtained by performing judgment and repeated filtering on original data and data obtained by performing dimensionality reduction and degradation on a resource classification table;
the service data layer stores the slightly aggregated data, and obtains a preliminary result according to a table related to business association;
and the application data layer stores related data according to specific required services.
As a preferred embodiment of the present invention, the data association analysis module includes:
the correlation calculation engine unit is used for correlating the table data correlation operation among all layers of the scientific and technological service data warehouse;
the analysis business unit is used for performing correlation optimization based on an OLAP analysis type data warehouse tool engine, and performing related statistical work and data mining tasks according to actual business;
and the service display unit is used for providing lightweight data query and visualization for the data result service and providing a query billboard and a graphical interface.
The invention provides a service recommendation experiment system and method for scientific and technical services, which comprises a data import module, a data processing module, a data cleaning module, a data warehouse construction module and a data association analysis module. To further clarify the technical solution of the present invention, the present invention is further explained with reference to the accompanying drawings. As shown in fig. 1, the present invention is specifically realized according to the following steps:
firstly, importing scientific and technological service resource data.
The resource data includes user behavior data and a relational service database, which will be specifically developed below.
1. User behavior data collection
Step 1: setting a front-end buried point trigger program based on js sdk, and acquiring user basic information, area information, browser information, external link data, order information and the like, wherein the events mainly comprise Launch events, pageview events, chargeRequest events and Event events.
Step 2: setting a java sdk-based back-end event trigger program, and sending payment and refund success information to an Nginx server for an order generated by a scientific and technological service platform, wherein the events comprise a chargeSuccess event and a chargeRefund event.
And step 3: setting Ngnix local log storage, and storing behavior data of a user browsing scientific and technological service platform in an access.
And 4, step 4: and setting the Flume to collect the user behavior logs, as shown in fig. 2, the behavior logs generated by the user can be collected by the Flume in real time into the HDFS of the distributed file storage system.
2. Relational business database data import
The relational service database data comprises a scientific and technological resource data table, a user information table and an order information table which are directly imported into Hive through a Sqoop tool, and the metadata formats of the data tables are shown in tables 1, 2 and 3.
TABLE 1 scientific and technological resources data sheet
Name of field Type of field Field description
servi boolean Service type eg 0 (supply) 1 (demand)
sid int ID of science and technology resources
name string Name of providing/demanding scientific resources
descri string Description of scientific and technological resources
indu string Industry eg of scientific resources: cultural tourism
type string Type of scientific resource eg: research and development
issue string Scientific and technological resource release time
price double Price of scientific and technological resources
ins string Resource/demander organization name
TABLE 2 user information Table
Name of field Type of field Field description
uid int User ID
uname string User name
uins string The unit of the user
city string Area of user
uiph string User mobile phone number
passward string User password note: irreversible encryption processing
first boolean Whether to log in eg for the first time 0 (NO) 1 (YES)
time long User creation time stamp: time stamp format
Table 3 order information table
Name of field Type of field Field description
oid string Order id
order string Name of order
cua string Amount of payment
pm string Payment mode
uid int User ID
uname string User name
sid int ID of scientific and technological resources
name string Name of providing/demanding scientific resources
otime long Order creation time
And secondly, cleaning the data.
Step 1: the collected HDFS user behavior log data need to be split through separators, data which do not meet requirements and parameter values need to be filtered if the parameter values do not belong to 6 event types.
Step 2: and performing conversion and extraction operations on related data, such as ip conversion into regions, timestamp conversion time expression, browser related information extraction processing and the like.
And step 3: the behavior log in the HDFS is directly imported into Hive, and the format of Hive metadata is shown in a table 4.
Table 4 user behavior log table
Figure BDA0003817259150000061
And 4, step 4: and establishing related characteristic engineering by referring to subsequent data analysis and mining, wherein the related characteristic engineering comprises establishing an implicit grading model of a user on a scientific and technological service product, performing One-Hot coding on a data tag and the like.
And thirdly, building a scientific and technological service data warehouse.
The star model is built based on the relationships between the fact tables and the dimension tables of the data, as shown in FIG. 3. The fact table is a platform order table, wherein the platform order table comprises fields such as order ID, time, region, resource/demand party, order quantity, order amount and payment mode, the relevant information of the time, the region, the resource and the resource/demand party can be specifically expanded, and dimension tables can be respectively established for the time, the region, the resource and the resource/demand party, so that a star model with all the dimension tables directly connected with the fact table as the center is established. Although the model can enable data to have certain redundancy, the model is low in complexity, convenient to understand, low in maintenance cost and high in correlation analysis performance.
Fig. 4 is a layered architecture diagram of the science and technology service data warehouse in the present invention, which includes an original data layer (ODS), a detailed data layer (DWD), a service data layer (DWS) and an application data layer (ADS). By implementing a data warehouse hierarchical management architecture, a platform can be clear in data structure, original data can be isolated, reusability of data can be increased, and complex problems can be simplified.
The ODS layer stores original data, including user behavior log data and system service data, fields are completely the same as those of the HDFS and the service database, and the service database data needs to be imported into the Hive by using a Sqoop tool. The table name of the ODS layer requires addition of an ODS _ field before the original table name.
And the DWD layer stores the data cleaned by the data, and comprises the steps of carrying out judgment and repeated filtration on the ODS layer data and carrying out dimension reduction and degradation on the resource classification table, wherein the specific cleaning process is shown as the second step in the invention. The table name of the DWD layer needs to add a DWD _ field before the original table name.
The DWS layer stores data of light aggregation, for example, total amount and times of order placing are obtained from an order form dwd _ order _ com, payment amount and payment times are obtained from an order base form dwd _ event _ com, click conditions are obtained from a user click table dwd _ user _ log, and finally a detail table is obtained according to user _ id aggregation. The table name of the DWS layer needs to add a DWS _ field before the original table name.
And the ADS layer stores related data according to the specific required services. For the service of the transaction total, the transaction total can be obtained by grouping and aggregating according to the statistical date and solving the sum function only for the user behavior broad table DWS _ user _ action in the DWS layer, and the final result is exported to the service database so as to facilitate the subsequent data visualization. The table name of the ADS layer needs to add an ADS _ field before the original table name.
The data warehouse is subjected to layered integrated management, is relatively stable, reflects historical changes and is oriented to scientific and technical service subjects, supports complex analysis operation, provides visual query results, simultaneously mines data values and explores potential requirements of users.
And fourthly, establishing a data association analysis service.
The business analysis of the invention can be mostly completed by the layered construction function of the data warehouse, but for the complex multidimensional correlation operation part of each table data between each layer of the scientific and technological service data warehouse, the invention is based on a Kylin frame engine, and the rough flow is as follows: (1) pre-computing by using the metrics for the multi-dimensional analysis; (2) converting operations such as high-dimensional complex multi-table connection, aggregation calculation and the like into pre-calculation results; (3) the pre-computed results are stored in a distributed storage system for quick access during querying. By using a data processing mode of exchanging space for time, the data association analysis service module has good rapid query and high concurrency capability, so that multidimensional and high association data analysis operations such as drilling, scrolling, slicing and rotating in OLAP are met, and the requirement of sub-second-level analysis query of mass data is met.
And analyzing a business part, performing correlation optimization based on an OLAP analysis type data warehouse tool engine, and performing related statistical work and data mining tasks according to actual business. The platform related statistical work comprises bargain, active user retention service, day/week/month resource heat ranking list and the like, which relate to multi-table related query service, and the platform data mining task comprises user portrait generation, user click recommendation, accurate pushing, resource matching, resource similarity analysis and the like, which relate to machine learning related service.
The business display part is based on a BI tool Superset framework, provides a lightweight data query and visualization scheme for a data result business, and has the following operation flow: (1) logging in a SuperSet, (2) clicking a data source, (3) selecting a database, (4) adding a MySQL data source, (5) adding a database table, (6) editing a table format, and (7) drawing a chart. The visual query billboard and the graphical interface are provided, and a decision basis is provided for the operation of a scientific and technological service platform.
For a specific implementation of this embodiment, reference may be made to the relevant description in the above embodiments, which is not described herein again.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that, in the description of the present invention, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution device. For example, if implemented in hardware, as in another embodiment, any one or combination of the following technologies, which are well known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, and the corresponding program may be stored in a computer readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
By adopting the big data experiment system for scientific and technological service, the invention has the following beneficial effects:
1. the invention is built around a scientific and technological service platform and aims to develop the scientific and technological service platform. The experimental system is aimed at four industries of modern service industry, marine economy, cultural tourism and international medical care, converges diversified scientific and technological service resources, provides nine services such as research and development, intellectual property, scientific and technological finance, technical transfer, entrepreneurship incubation, scientific and technological consultation, scientific and technological popularization, inspection, detection and authentication, comprehensive scientific and technological service and the like for government departments, parks, enterprises and universities, and is beneficial to the high-quality development of scientific and technological service platforms and the scientific and technological service industry.
2. The method has rich data sources, is not limited to the business database, collects more valuable user behavior data into the system, and meets the requirements of a scientific and technological service platform on multi-source heterogeneous data. Meanwhile, the data can be orderly transferred in the subsequent processes of cleaning, storing, analyzing and the like, and the sub-second level analysis query can be achieved in the large-scale data OLAP production environment of the science and technology service platform.
3. The technical service data warehouse is constructed on the basis of the technical service platform and is used for supporting the management decision process of the technical service platform. The data warehouse is subjected to layered integrated management, is relatively stable, reflects historical changes and is oriented to science and technology service topics, complex analysis operation is supported, visual query results are provided, data values are mined, and potential requirements of users are explored.
In this specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims (4)

1. A big data experiment system for scientific and technological services is characterized by comprising a data import module, a data cleaning module, a data warehouse construction module and a data association analysis module,
the data import module is connected with the scientific and technological service platform and used for collecting the behavior logs of the user to the scientific and technological service platform to the distributed storage system and importing the behavior logs into the distributed storage system from the business database of the scientific and technological service platform;
the data cleaning module is connected with the data import module and is used for extracting, converting and loading import data in the distributed file storage system;
the data warehouse building module is connected with the data cleaning module and used for building a star model based on the relationship between the fact table and the dimension table of the data and carrying out layered processing on the data warehouse;
and the data association analysis module is connected with the data warehouse construction module and is used for performing association processing on the established data warehouse and performing multi-dimensional and high-association data analysis operation in online analysis processing.
2. The big data experiment system for scientific and technological services as claimed in claim 1, wherein said data import module includes:
the embedded point triggering program part is connected with the data cleaning module and is used for embedding key events in a scientific and technological service platform and simultaneously implanting a front-end embedded point triggering unit based on js sdk and a rear-end event triggering unit based on java sdk to track the behaviors of the user;
the data acquisition part is connected with the data cleaning module and is used for acquiring behavior log data to a distributed file storage system in real time through a flash tool and loading the behavior log data to a data warehouse in batches;
and the data import part is connected with the data cleaning module and is used for directly importing the relational service database data comprising a scientific and technological resource data table, a user information table and an order information table into a data warehouse through an Sqoop tool.
3. The big data experiment system for scientific and technological services as claimed in claim 1, wherein the data warehouse building module includes a raw data layer, a detailed data layer, a service data layer and an application data layer,
the original data layer stores original data including user behavior log data and system service data;
the detail data layer stores data cleaned by data, and the data comprises data obtained by performing judgment and repeated filtering on original data and data obtained by performing dimensionality reduction and degradation on a resource classification table;
the service data layer stores the slightly aggregated data, and obtains a preliminary result according to a table related to business association;
and the application data layer stores related data according to specific required services.
4. A big data experiment system for scientific and technological services according to claim 1, wherein the data association analysis module comprises:
the correlation calculation engine unit is used for correlating the table data correlation operation among the layers of the scientific and technological service data warehouse;
the analysis business unit is used for performing correlation optimization based on an OLAP analysis type data warehouse tool engine, and performing related statistical work and data mining tasks according to actual business;
and the service display unit is used for providing lightweight data query and visualization for the data result service and providing a query billboard and a graphical interface.
CN202211030785.8A 2022-08-26 2022-08-26 Big data experiment system for scientific and technological service Pending CN115309749A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211030785.8A CN115309749A (en) 2022-08-26 2022-08-26 Big data experiment system for scientific and technological service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211030785.8A CN115309749A (en) 2022-08-26 2022-08-26 Big data experiment system for scientific and technological service

Publications (1)

Publication Number Publication Date
CN115309749A true CN115309749A (en) 2022-11-08

Family

ID=83865442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211030785.8A Pending CN115309749A (en) 2022-08-26 2022-08-26 Big data experiment system for scientific and technological service

Country Status (1)

Country Link
CN (1) CN115309749A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116226894A (en) * 2023-05-10 2023-06-06 杭州比智科技有限公司 Data security treatment system and method based on meta bin
CN116578547A (en) * 2023-05-16 2023-08-11 佛山众陶联供应链服务有限公司 Several-bin modeling method and system for ceramic production line

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116226894A (en) * 2023-05-10 2023-06-06 杭州比智科技有限公司 Data security treatment system and method based on meta bin
CN116578547A (en) * 2023-05-16 2023-08-11 佛山众陶联供应链服务有限公司 Several-bin modeling method and system for ceramic production line

Similar Documents

Publication Publication Date Title
CN109446344B (en) Intelligent analysis report automatic generation system based on big data
CN115309749A (en) Big data experiment system for scientific and technological service
CN104899199A (en) Data processing method and system for data warehouse
CN112347071B (en) Power distribution network cloud platform data fusion method and power distribution network cloud platform
CN102917009B (en) A kind of stock certificate data collection based on cloud computing technology and storage means and system
CN102880687A (en) Personal interactive data retrieval method and system based on tag technology
CN108446391A (en) Processing method, device, electronic equipment and the computer-readable medium of data
CN109669975B (en) Industrial big data processing system and method
CN112163017B (en) Knowledge mining system and method
CN110737729A (en) Engineering map data information management method based on knowledge map concept and technology
CN111242559A (en) Data resource management platform and method
CN113722564A (en) Visualization method and device for energy and material supply chain based on space map convolution
CN113254517A (en) Service providing method based on internet big data
CN111125045B (en) Lightweight ETL processing platform
CN117573646A (en) Data management method and system based on dimension modeling
CN116127047B (en) Method and device for establishing enterprise information base
CN116881376A (en) Automatic exploration method for enterprise data assets
US20140067874A1 (en) Performing predictive analysis
CN114817226A (en) Government data processing method and device
CN113779215A (en) Data processing platform
CN113590684A (en) Non-tax payment big data analysis system
CN113111244A (en) Multisource heterogeneous big data fusion system based on traditional Chinese medicine knowledge large-scale popularization
CN111382149A (en) Financial data analysis system and method
CN110990745A (en) Method for automatically synchronizing similar public cloud resources
CN112131302B (en) Commercial data analysis method and platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination