CN113486014A - Data collection method and management system for data aggregation platform - Google Patents

Data collection method and management system for data aggregation platform Download PDF

Info

Publication number
CN113486014A
CN113486014A CN202110773485.8A CN202110773485A CN113486014A CN 113486014 A CN113486014 A CN 113486014A CN 202110773485 A CN202110773485 A CN 202110773485A CN 113486014 A CN113486014 A CN 113486014A
Authority
CN
China
Prior art keywords
data
collection method
cleaning
registration information
aggregation platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110773485.8A
Other languages
Chinese (zh)
Inventor
杨炳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huisheng Information Technology Co ltd
Original Assignee
Huisheng Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huisheng Information Technology Co ltd filed Critical Huisheng Information Technology Co ltd
Priority to CN202110773485.8A priority Critical patent/CN113486014A/en
Publication of CN113486014A publication Critical patent/CN113486014A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data collection method and a management system of a data aggregation platform, belonging to the field of data collection, wherein the data collection method comprises the following steps: s1: receiving data uploaded by each service system, wherein the received data are provided with identification codes; s2: creating a data table for storing the data received in S1 in a temporary data pool; s3: reading data in the data table, and analyzing the data through identification codes; s4: and cleaning the analyzed data and storing the cleaned data. The invention can improve the concurrency efficiency of the system interface, reduce the i/o access amount of the server, reduce the pressure of the acquisition server and reduce the possibility of data loss.

Description

Data collection method and management system for data aggregation platform
Technical Field
The invention belongs to the field of data acquisition, and particularly relates to a data acquisition method and a management system of a data aggregation platform.
Background
The main technology of data collection of the data aggregation platform is data integration. Under the normal condition, the data acquisition system customizes the updating interfaces of various data according to different updating strategies such as batch updating, incremental updating, real-time updating, data synchronization and the like, provides data acquisition modes such as manual entry, interface service acquisition and the like, provides a strict quality inspection tool, realizes the acquisition and updating of various data of a data center, and ensures the timeliness, authority and consistency of a data resource center database.
The data aggregation platform has a plurality of types of collected data, each type of data has different data attributes, formats and updating strategies, and different types of data need to customize different interfaces for data collection. The interface needs to analyze the data according to a certain rule, then simply cleans the data, eliminates the data lacking important data items, and finally stores the data in the database.
Data acquisition is a continuous process as the business system is updated iteratively. The types of collected data are increased continuously, and in general, new collected data types are added, and a special interface needs to be customized for the new collected data types. This is a time consuming and laborious task for data acquisition.
Generally, the interface needs to adjust the data transmitted each time according to the number of data items of the data type. If the data volume is too large, a large amount of system resources of the server are occupied to insert or modify the data, so that the corresponding time of the interface is too long, the transmission and the acquisition of subsequent data are influenced, and the data loss is caused.
In view of the above, the present invention is particularly proposed.
Disclosure of Invention
The invention aims to provide a data acquisition method of a data aggregation platform and a management system thereof, which can improve the concurrency efficiency of system interfaces, reduce the i/o access amount of a server, reduce the pressure of an acquisition server and reduce the possibility of data loss.
In order to achieve the above object, the data aggregation platform data acquisition method provided by the present invention, which is applied to a data acquisition system, comprises the following steps:
s1: receiving data uploaded by each service system, wherein the received data are provided with identification codes;
s2: creating a data table for storing the data received in S1 in a temporary data pool;
s3: reading data in the data table, and analyzing the data through identification codes;
s4: and cleaning the analyzed data and storing the cleaned data.
Further, the operation time of the step S1 is 1 to 5 points in the morning.
Further, the identification code includes a system source number and a sequence number of the data structure.
Further, in step S3, the parsing the data by identifying the code includes: inquiring the registration information stored in the background server through the unique identification code at the time other than 1 to 5 in the morning, analyzing the data stored in the step S2 according to the rule of the registration information and converting the data into a json object; the rule of the registration information, the name of the data structure attribute and the type of the attribute are stored in the registration information.
Further, in step S4, the data washing step includes the following steps:
s401: cleaning data integrity, inquiring and judging whether a key field of each piece of data has a valid numerical value;
s402: cleaning data uniformity, namely encoding a data dictionary of each piece of data to be uniform;
s403: and (4) data error or repeated cleaning, removing repeated data, and removing error data with key fields exceeding a limited range.
The invention also provides a management system of the data aggregation platform data acquisition method, which is connected with each service system and comprises the following steps: a data acquisition system and a data center,
wherein, data acquisition center includes: the uniform interface is used for receiving the data uploaded by the service system, wherein the received data are provided with identification codes; the temporary data pool is used for creating a data table to store the data of the business system; and the processor is used for analyzing the data in the data table and cleaning the analyzed data.
And the data center is used for storing the data processed by the data acquisition system.
Further, the time for the unified interface to receive data is 1 to 5 points in the morning.
Further, the identification code includes a system source number and a sequence number of the data structure.
Further, the processor is configured to query the registration information stored in the background server through a unique identification code at a time other than 1 to 5 in the morning, and parse the data stored in step S2 according to the rule of the registration information and convert the data into a json object; the rule of the registration information, the name of the data structure attribute and the type of the attribute are stored in the registration information.
Further, the processor is used for cleaning the integrity of the data, inquiring the key field of each piece of data and judging whether the key field has a valid numerical value; cleaning the data uniformly, and coding a data dictionary of each piece of data to be uniform; and carrying out error or repeated cleaning on the data, removing repeated data, and removing error data with the key fields exceeding the limited range.
The data aggregation platform data acquisition method and the management system thereof provided by the invention have the following beneficial effects:
1. the data acquisition process is divided into two steps, wherein the uploaded message is stored in the memory database in the first step, and the message in the temporary data pool is analyzed and stored in the data center in the second step when the data acquisition system is idle, so that the data processing efficiency can be improved.
2. Compared with the prior art, the data pool is established in the technical scheme provided by the invention, so that the receiving efficiency is obviously improved; originally, the number of pieces of data submitted in the normal load cannot exceed 5000, and 50000 pieces of data are submitted in the normal load.
Drawings
Fig. 1 is a flowchart of a data aggregation platform data acquisition method in this embodiment.
Fig. 2 is a schematic diagram of a management system of the data aggregation platform data collection method in this embodiment.
Detailed Description
The present invention will be described in further detail with reference to specific embodiments in order to make the technical field better understand the scheme of the present invention.
In a big data project of an intelligent community, in order to not influence the flow of a network channel in the daytime and the load of a server, the time for collecting data is set between 1 point and 5 points in the morning. In this part of time, a plurality of systems (property, user APP, internet of things, news of online relevant communities and other data) operated by the smart community upload a large amount of data to the acquisition system at this time, and the acquisition system needs to analyze and store a large amount of data to the database at the same time.
Because the performance of the server is seriously reduced due to the speed problem of database storage, and the server usually has errors and loses the collected data due to overtime, a temporary data pool is added in the data collection system, the data collection is divided into two steps, firstly, data is collected, all data is converted into character strings without analysis and is directly stored in the temporary data pool, and secondly, after the collection task of the previous step is completed, the character strings are taken out from the memory database and are analyzed and stored in the data center, so that the data analysis and storage functions are realized. The method greatly improves the operation efficiency of the server and shields the bottleneck in the data collection process.
As shown in fig. 1, a data aggregation platform data acquisition method, which is applied to a data acquisition system, includes the following steps:
s1: receiving data uploaded by each service system, wherein the received data are provided with identification codes;
wherein the time of the received data is set to be between 1 and 5 in the morning.
Specifically, before data collection (registration or maintenance phase), each data type structure of the service system, such as the property system and the user side, obtains a unique identification code of the system. Wherein, the distribution process of the identification code is as follows: firstly, registering or maintaining a plurality of service systems operated in the intelligent community on a management page of a data acquisition system, wherein each data type structure of the registered or maintained service system can obtain a unique identification code of the system, and the identification code comprises two parts: one is the system source number (4-bit number) and the second is the sequence number (5-bit sequence number) of the data structure.
For example, the system source number of the data type structure from the property system is "WYXT", and the system source number of the data type structure of the user APP is "APPD"; the last 5 bits of the data structure are the serial number of the data structure, which is arranged in order from "00001", and thus, the identification code of the data structure of the property system will be "WYXT 00001" and "WYXT 00002".
A plurality of business systems operating in the intelligent community send data through a unified interface of the data acquisition system, and a unique identification code (the code is fixed with 9 bits of length, so the first 9 bits of the default data message are the unique identification code) is added at the beginning of the data message when the data is sent.
S2: creating a data table for storing the data received in step S1 in a temporary data pool;
the data table with the current date as a table name is created in a temporary data pool of the data acquisition system, data received on the day is converted into data character strings and is directly stored in the newly created data table in the temporary data pool, the temporary data pool can be an internal memory database (e.g., mongoDB, redis), and the data acquisition system automatically deletes the data table 7 days ago at other times (except for 1 to 5 in the morning).
S3: reading data in the data table, and analyzing the data by acquiring registration information (namely rules of the registration information) in the background server through the identification codes;
specifically, at idle (time other than 1 to 5 in the morning), the registration information stored in the background server is queried through a unique identification code, wherein rules of the registration information, names of data structure attributes and types of the attributes are stored in the registration information; the data character string converted in step S2 is parsed according to the rule of the registration information, the data character string is converted into a json object, and the value in the json object is read according to the name and type of the data structure attribute in the registration information.
Through the steps, the information rule in the background server is inquired according to the identification code, the service data is converted into a json object, and then the data uploaded by the service system can be extracted according to the attribute name and the type of the rule.
S4: and (5) cleaning the data analyzed in the step (S3) and storing the cleaned data in a data center.
Specifically, the step of cleaning the data comprises the following steps:
s401: and (4) cleaning the integrity of the data, inquiring key fields of each piece of data and judging whether the key fields have valid values. If the data of the key field is lacked, the data is removed from the data set and is subjected to other processing.
S402: and (4) cleaning data uniformity, and encoding the data dictionary of each piece of data to be uniform. For example: the identities of different system identities "male" and "female" are different, and the system needs to change the identities into a uniform data center identity.
S403: and (4) data error or repeated cleaning, removing repeated data, removing error data of which the key field exceeds a limited range, and processing the data.
As shown in fig. 2, the present invention further provides a management system of a data aggregation platform data acquisition method, which is connected to each service system data, and includes: a data acquisition system and a data center,
wherein, data acquisition center includes:
and the uniform interface is used for receiving the data uploaded by the service system, wherein the received data are provided with identification codes. The identification code includes two parts: one is the system source number (4-bit number) of the data structure, and the other is the sequence number (5 is the sequence number). The temporary data pool may be an in-memory database. The time for receiving the data by the unified interface is from 1 point to 5 points in the morning.
And the temporary data pool is used for creating a data table to temporarily store the data of the service system, wherein all the stored data have identification codes unique to the system. The temporary data pool will automatically delete the data tables 7 days ago.
And the processor is used for analyzing the data in the temporary data pool and cleaning the analyzed data.
Specifically, the registration information stored in the background server is queried by a unique identification code at idle (time other than 1 to 5 am), and a character string of the registration information is converted into a json object.
The data cleaning comprises the following steps:
s401: and (4) cleaning the integrity of the data, inquiring key fields of each piece of data and judging whether the key fields have valid values. If the data of the key field is lacked, the data is removed from the data set and is subjected to other processing.
S402: and (4) cleaning data uniformity, and encoding the data dictionary of each piece of data to be uniform. For example: the identities of different system identities "male" and "female" are different, and the system needs to change the identities into a uniform data center identity.
S403: and (4) data error or repeated cleaning, removing repeated data, removing error data of which the key field exceeds a limited range, and processing the data.
And the data center is used for storing the data processed by the processor.
The inventive concept is explained in detail herein using specific examples, which are given only to aid in understanding the core concepts of the invention. It should be understood that any obvious modifications, equivalents and other improvements made by those skilled in the art without departing from the spirit of the present invention are included in the scope of the present invention.

Claims (10)

1. A data collection method of a data aggregation platform acts on a data collection system, and is characterized by comprising the following steps:
s1: receiving data uploaded by each service system, wherein the received data are provided with identification codes;
s2: creating a data table for storing the data received in S1 in a temporary data pool;
s3: reading data in the data table, and analyzing the data through identification codes;
s4: and cleaning the analyzed data and storing the cleaned data.
2. The data collection method of the data aggregation platform of claim 1, wherein the operation time of the step S1 is from 1 to 5 points in the morning.
3. The data collection method of claim 1, wherein the identification code comprises a system source number and a sequence number of the data structure.
4. The data collection method of claim 1, wherein the parsing the data through the identification code in step S3 includes: inquiring the registration information stored in the background server through the unique identification code at the time other than 1 to 5 in the morning, analyzing the data stored in the step S2 according to the rule of the registration information and converting the data into a json object; the rule of the registration information, the name of the data structure attribute and the type of the attribute are stored in the registration information.
5. The data collection method of the data aggregation platform of claim 1, wherein in the step S4, the step of cleaning the data comprises the steps of:
s401: cleaning data integrity, inquiring and judging whether a key field of each piece of data has a valid numerical value;
s402: cleaning data uniformity, namely encoding a data dictionary of each piece of data to be uniform;
s403: and (4) data error or repeated cleaning, removing repeated data, and removing error data with key fields exceeding a limited range.
6. A management system of a data collection method of a data aggregation platform is connected with each business system, and is characterized by comprising the following steps: a data acquisition system and a data center,
wherein, data acquisition center includes: the uniform interface is used for receiving the data uploaded by the service system, wherein the received data are provided with identification codes; the temporary data pool is used for creating a data table to store the data of the business system; and the processor is used for analyzing the data in the data table and cleaning the analyzed data.
And the data center is used for storing the data processed by the data acquisition system.
7. The management system of the data collection method of the data aggregation platform as claimed in claim 6, wherein the time for the unified interface to receive the data is from 1 to 5 points in the morning.
8. The management system of the data aggregation platform data collection method of claim 6, wherein the identification code comprises a system source number and a sequence number of the data structure.
9. The management system of the data collection method of the data aggregation platform according to claim 6, wherein the processor is configured to query the registration information stored in the backend server through a unique identification code at a time other than 1 to 5 am, parse the data stored in step S2 according to a rule of the registration information, and convert the data into a json object; the rule of the registration information, the name of the data structure attribute and the type of the attribute are stored in the registration information.
10. The management system of the data collection method of the data aggregation platform as claimed in claim 6, wherein the processor is configured to perform integrity cleaning on the data, query and judge whether the key field of each piece of data has a valid value; cleaning the data uniformly, and coding a data dictionary of each piece of data to be uniform; and carrying out error or repeated cleaning on the data, removing repeated data, and removing error data with the key fields exceeding the limited range.
CN202110773485.8A 2021-07-08 2021-07-08 Data collection method and management system for data aggregation platform Pending CN113486014A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110773485.8A CN113486014A (en) 2021-07-08 2021-07-08 Data collection method and management system for data aggregation platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110773485.8A CN113486014A (en) 2021-07-08 2021-07-08 Data collection method and management system for data aggregation platform

Publications (1)

Publication Number Publication Date
CN113486014A true CN113486014A (en) 2021-10-08

Family

ID=77938077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110773485.8A Pending CN113486014A (en) 2021-07-08 2021-07-08 Data collection method and management system for data aggregation platform

Country Status (1)

Country Link
CN (1) CN113486014A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101697223A (en) * 2009-09-23 2010-04-21 易程科技股份有限公司 Passenger flow management system for comprehensive transportation hub
CN101763622A (en) * 2009-09-23 2010-06-30 易程科技股份有限公司 Public information platform system
CN104915909A (en) * 2015-07-01 2015-09-16 深圳市申泓科技有限公司 Data aggregation platform
CN107016056A (en) * 2017-03-07 2017-08-04 西安电子科技大学 The distributed memory system and method for magnanimity heterogeneous sensor data in a kind of Internet of Things
CN112422626A (en) * 2020-10-15 2021-02-26 山东汇金海智慧农业研究院有限公司 Device independence internet of things data acquisition, analysis and forwarding method based on coder and decoder

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101697223A (en) * 2009-09-23 2010-04-21 易程科技股份有限公司 Passenger flow management system for comprehensive transportation hub
CN101763622A (en) * 2009-09-23 2010-06-30 易程科技股份有限公司 Public information platform system
CN104915909A (en) * 2015-07-01 2015-09-16 深圳市申泓科技有限公司 Data aggregation platform
CN107016056A (en) * 2017-03-07 2017-08-04 西安电子科技大学 The distributed memory system and method for magnanimity heterogeneous sensor data in a kind of Internet of Things
CN112422626A (en) * 2020-10-15 2021-02-26 山东汇金海智慧农业研究院有限公司 Device independence internet of things data acquisition, analysis and forwarding method based on coder and decoder

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李佐军: "大数据的架构技术与应用实践的探究", 长春:东北师范大学出版社 *

Similar Documents

Publication Publication Date Title
CN111552687B (en) Time sequence data storage method, query method, device, equipment and storage medium
CN110909066B (en) Streaming data processing method based on SparkSQL and RestAPI
CN101141754B (en) Value-added service analysis system and method thereof
CN112235159B (en) Gateway quality portrait generation method, system, network equipment and storage medium
CN103246745A (en) Device and method for processing data based on data warehouse
CN111309868B (en) Knowledge graph construction and retrieval method and device
CN114793250A (en) Configurable CAN data analysis method
CN112182031B (en) Data query method and device, storage medium and electronic device
CN113486014A (en) Data collection method and management system for data aggregation platform
CN115185663B (en) Intelligent data processing system based on big data
CN108717438B (en) Chained data state acquisition system and method
CN115983582A (en) Data analysis method and energy consumption management system
CN116089431A (en) Data processing method and device of data warehouse, electronic equipment and storage medium
CN116149849A (en) Edge computing method for intelligent water affair complex time scale data fusion
CN109669972A (en) Generation method, device and the intelligent terminal of distribution load report
CN114722045A (en) Time series data storage method and device
CN115033646A (en) Method for constructing real-time warehouse system based on Flink and Doris
CN113761121B (en) Knowledge extraction method for structured data
CN115470279A (en) Data source conversion method, device, equipment and medium based on enterprise data
CN114860693A (en) Intelligent terminal structured data management method
CN112632173A (en) ETL-based due diligence data analysis system and method under mass data
CN112416904A (en) Electric power data standardization processing method and device
CN113139012A (en) Method for processing data by ETL tool engine based on JSON and ETL data processing system
CN113239039A (en) Dynamic data storage method, query method, management method and management system
CN111966675A (en) Fixed asset investment project data cleaning method and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211008