CN113486014A

CN113486014A - Data collection method and management system for data aggregation platform

Info

Publication number: CN113486014A
Application number: CN202110773485.8A
Authority: CN
Inventors: 杨炳
Original assignee: Huisheng Information Technology Co ltd
Current assignee: Huisheng Information Technology Co ltd
Priority date: 2021-07-08
Filing date: 2021-07-08
Publication date: 2021-10-08

Abstract

The invention discloses a data collection method and a management system of a data aggregation platform, belonging to the field of data collection, wherein the data collection method comprises the following steps: s1: receiving data uploaded by each service system, wherein the received data are provided with identification codes; s2: creating a data table for storing the data received in S1 in a temporary data pool; s3: reading data in the data table, and analyzing the data through identification codes; s4: and cleaning the analyzed data and storing the cleaned data. The invention can improve the concurrency efficiency of the system interface, reduce the i/o access amount of the server, reduce the pressure of the acquisition server and reduce the possibility of data loss.

Description

Data collection method and management system for data aggregation platform

Technical Field

The invention belongs to the field of data acquisition, and particularly relates to a data acquisition method and a management system of a data aggregation platform.

Background

The main technology of data collection of the data aggregation platform is data integration. Under the normal condition, the data acquisition system customizes the updating interfaces of various data according to different updating strategies such as batch updating, incremental updating, real-time updating, data synchronization and the like, provides data acquisition modes such as manual entry, interface service acquisition and the like, provides a strict quality inspection tool, realizes the acquisition and updating of various data of a data center, and ensures the timeliness, authority and consistency of a data resource center database.

The data aggregation platform has a plurality of types of collected data, each type of data has different data attributes, formats and updating strategies, and different types of data need to customize different interfaces for data collection. The interface needs to analyze the data according to a certain rule, then simply cleans the data, eliminates the data lacking important data items, and finally stores the data in the database.

Data acquisition is a continuous process as the business system is updated iteratively. The types of collected data are increased continuously, and in general, new collected data types are added, and a special interface needs to be customized for the new collected data types. This is a time consuming and laborious task for data acquisition.

Generally, the interface needs to adjust the data transmitted each time according to the number of data items of the data type. If the data volume is too large, a large amount of system resources of the server are occupied to insert or modify the data, so that the corresponding time of the interface is too long, the transmission and the acquisition of subsequent data are influenced, and the data loss is caused.

In view of the above, the present invention is particularly proposed.

Disclosure of Invention

The invention aims to provide a data acquisition method of a data aggregation platform and a management system thereof, which can improve the concurrency efficiency of system interfaces, reduce the i/o access amount of a server, reduce the pressure of an acquisition server and reduce the possibility of data loss.

In order to achieve the above object, the data aggregation platform data acquisition method provided by the present invention, which is applied to a data acquisition system, comprises the following steps:

s1: receiving data uploaded by each service system, wherein the received data are provided with identification codes;

s2: creating a data table for storing the data received in S1 in a temporary data pool;

s3: reading data in the data table, and analyzing the data through identification codes;

s4: and cleaning the analyzed data and storing the cleaned data.

Further, the operation time of the step S1 is 1 to 5 points in the morning.

Further, the identification code includes a system source number and a sequence number of the data structure.

Further, in step S3, the parsing the data by identifying the code includes: inquiring the registration information stored in the background server through the unique identification code at the time other than 1 to 5 in the morning, analyzing the data stored in the step S2 according to the rule of the registration information and converting the data into a json object; the rule of the registration information, the name of the data structure attribute and the type of the attribute are stored in the registration information.

Further, in step S4, the data washing step includes the following steps:

s401: cleaning data integrity, inquiring and judging whether a key field of each piece of data has a valid numerical value;

s402: cleaning data uniformity, namely encoding a data dictionary of each piece of data to be uniform;

s403: and (4) data error or repeated cleaning, removing repeated data, and removing error data with key fields exceeding a limited range.

The invention also provides a management system of the data aggregation platform data acquisition method, which is connected with each service system and comprises the following steps: a data acquisition system and a data center,

wherein, data acquisition center includes: the uniform interface is used for receiving the data uploaded by the service system, wherein the received data are provided with identification codes; the temporary data pool is used for creating a data table to store the data of the business system; and the processor is used for analyzing the data in the data table and cleaning the analyzed data.

And the data center is used for storing the data processed by the data acquisition system.

Further, the time for the unified interface to receive data is 1 to 5 points in the morning.

Further, the processor is configured to query the registration information stored in the background server through a unique identification code at a time other than 1 to 5 in the morning, and parse the data stored in step S2 according to the rule of the registration information and convert the data into a json object; the rule of the registration information, the name of the data structure attribute and the type of the attribute are stored in the registration information.

Further, the processor is used for cleaning the integrity of the data, inquiring the key field of each piece of data and judging whether the key field has a valid numerical value; cleaning the data uniformly, and coding a data dictionary of each piece of data to be uniform; and carrying out error or repeated cleaning on the data, removing repeated data, and removing error data with the key fields exceeding the limited range.

The data aggregation platform data acquisition method and the management system thereof provided by the invention have the following beneficial effects:

1. the data acquisition process is divided into two steps, wherein the uploaded message is stored in the memory database in the first step, and the message in the temporary data pool is analyzed and stored in the data center in the second step when the data acquisition system is idle, so that the data processing efficiency can be improved.

2. Compared with the prior art, the data pool is established in the technical scheme provided by the invention, so that the receiving efficiency is obviously improved; originally, the number of pieces of data submitted in the normal load cannot exceed 5000, and 50000 pieces of data are submitted in the normal load.

Drawings

Fig. 1 is a flowchart of a data aggregation platform data acquisition method in this embodiment.

Fig. 2 is a schematic diagram of a management system of the data aggregation platform data collection method in this embodiment.

Detailed Description

The present invention will be described in further detail with reference to specific embodiments in order to make the technical field better understand the scheme of the present invention.

In a big data project of an intelligent community, in order to not influence the flow of a network channel in the daytime and the load of a server, the time for collecting data is set between 1 point and 5 points in the morning. In this part of time, a plurality of systems (property, user APP, internet of things, news of online relevant communities and other data) operated by the smart community upload a large amount of data to the acquisition system at this time, and the acquisition system needs to analyze and store a large amount of data to the database at the same time.

Because the performance of the server is seriously reduced due to the speed problem of database storage, and the server usually has errors and loses the collected data due to overtime, a temporary data pool is added in the data collection system, the data collection is divided into two steps, firstly, data is collected, all data is converted into character strings without analysis and is directly stored in the temporary data pool, and secondly, after the collection task of the previous step is completed, the character strings are taken out from the memory database and are analyzed and stored in the data center, so that the data analysis and storage functions are realized. The method greatly improves the operation efficiency of the server and shields the bottleneck in the data collection process.

As shown in fig. 1, a data aggregation platform data acquisition method, which is applied to a data acquisition system, includes the following steps:

wherein the time of the received data is set to be between 1 and 5 in the morning.

Specifically, before data collection (registration or maintenance phase), each data type structure of the service system, such as the property system and the user side, obtains a unique identification code of the system. Wherein, the distribution process of the identification code is as follows: firstly, registering or maintaining a plurality of service systems operated in the intelligent community on a management page of a data acquisition system, wherein each data type structure of the registered or maintained service system can obtain a unique identification code of the system, and the identification code comprises two parts: one is the system source number (4-bit number) and the second is the sequence number (5-bit sequence number) of the data structure.

For example, the system source number of the data type structure from the property system is "WYXT", and the system source number of the data type structure of the user APP is "APPD"; the last 5 bits of the data structure are the serial number of the data structure, which is arranged in order from "00001", and thus, the identification code of the data structure of the property system will be "WYXT 00001" and "WYXT 00002".

A plurality of business systems operating in the intelligent community send data through a unified interface of the data acquisition system, and a unique identification code (the code is fixed with 9 bits of length, so the first 9 bits of the default data message are the unique identification code) is added at the beginning of the data message when the data is sent.

S2: creating a data table for storing the data received in step S1 in a temporary data pool;

the data table with the current date as a table name is created in a temporary data pool of the data acquisition system, data received on the day is converted into data character strings and is directly stored in the newly created data table in the temporary data pool, the temporary data pool can be an internal memory database (e.g., mongoDB, redis), and the data acquisition system automatically deletes the data table 7 days ago at other times (except for 1 to 5 in the morning).

S3: reading data in the data table, and analyzing the data by acquiring registration information (namely rules of the registration information) in the background server through the identification codes;

specifically, at idle (time other than 1 to 5 in the morning), the registration information stored in the background server is queried through a unique identification code, wherein rules of the registration information, names of data structure attributes and types of the attributes are stored in the registration information; the data character string converted in step S2 is parsed according to the rule of the registration information, the data character string is converted into a json object, and the value in the json object is read according to the name and type of the data structure attribute in the registration information.

Through the steps, the information rule in the background server is inquired according to the identification code, the service data is converted into a json object, and then the data uploaded by the service system can be extracted according to the attribute name and the type of the rule.

S4: and (5) cleaning the data analyzed in the step (S3) and storing the cleaned data in a data center.

Specifically, the step of cleaning the data comprises the following steps:

s401: and (4) cleaning the integrity of the data, inquiring key fields of each piece of data and judging whether the key fields have valid values. If the data of the key field is lacked, the data is removed from the data set and is subjected to other processing.

S402: and (4) cleaning data uniformity, and encoding the data dictionary of each piece of data to be uniform. For example: the identities of different system identities "male" and "female" are different, and the system needs to change the identities into a uniform data center identity.

S403: and (4) data error or repeated cleaning, removing repeated data, removing error data of which the key field exceeds a limited range, and processing the data.

As shown in fig. 2, the present invention further provides a management system of a data aggregation platform data acquisition method, which is connected to each service system data, and includes: a data acquisition system and a data center,

wherein, data acquisition center includes:

and the uniform interface is used for receiving the data uploaded by the service system, wherein the received data are provided with identification codes. The identification code includes two parts: one is the system source number (4-bit number) of the data structure, and the other is the sequence number (5 is the sequence number). The temporary data pool may be an in-memory database. The time for receiving the data by the unified interface is from 1 point to 5 points in the morning.

And the temporary data pool is used for creating a data table to temporarily store the data of the service system, wherein all the stored data have identification codes unique to the system. The temporary data pool will automatically delete the data tables 7 days ago.

And the processor is used for analyzing the data in the temporary data pool and cleaning the analyzed data.

Specifically, the registration information stored in the background server is queried by a unique identification code at idle (time other than 1 to 5 am), and a character string of the registration information is converted into a json object.

The data cleaning comprises the following steps:

And the data center is used for storing the data processed by the processor.

The inventive concept is explained in detail herein using specific examples, which are given only to aid in understanding the core concepts of the invention. It should be understood that any obvious modifications, equivalents and other improvements made by those skilled in the art without departing from the spirit of the present invention are included in the scope of the present invention.

Claims

1. A data collection method of a data aggregation platform acts on a data collection system, and is characterized by comprising the following steps:

s4: and cleaning the analyzed data and storing the cleaned data.

2. The data collection method of the data aggregation platform of claim 1, wherein the operation time of the step S1 is from 1 to 5 points in the morning.

3. The data collection method of claim 1, wherein the identification code comprises a system source number and a sequence number of the data structure.

4. The data collection method of claim 1, wherein the parsing the data through the identification code in step S3 includes: inquiring the registration information stored in the background server through the unique identification code at the time other than 1 to 5 in the morning, analyzing the data stored in the step S2 according to the rule of the registration information and converting the data into a json object; the rule of the registration information, the name of the data structure attribute and the type of the attribute are stored in the registration information.

5. The data collection method of the data aggregation platform of claim 1, wherein in the step S4, the step of cleaning the data comprises the steps of:

6. A management system of a data collection method of a data aggregation platform is connected with each business system, and is characterized by comprising the following steps: a data acquisition system and a data center,

7. The management system of the data collection method of the data aggregation platform as claimed in claim 6, wherein the time for the unified interface to receive the data is from 1 to 5 points in the morning.

8. The management system of the data aggregation platform data collection method of claim 6, wherein the identification code comprises a system source number and a sequence number of the data structure.

9. The management system of the data collection method of the data aggregation platform according to claim 6, wherein the processor is configured to query the registration information stored in the backend server through a unique identification code at a time other than 1 to 5 am, parse the data stored in step S2 according to a rule of the registration information, and convert the data into a json object; the rule of the registration information, the name of the data structure attribute and the type of the attribute are stored in the registration information.

10. The management system of the data collection method of the data aggregation platform as claimed in claim 6, wherein the processor is configured to perform integrity cleaning on the data, query and judge whether the key field of each piece of data has a valid value; cleaning the data uniformly, and coding a data dictionary of each piece of data to be uniform; and carrying out error or repeated cleaning on the data, removing repeated data, and removing error data with the key fields exceeding the limited range.