WO2022178976A1

WO2022178976A1 - Information processing method and apparatus based on big data, and related devices

Info

Publication number: WO2022178976A1
Application number: PCT/CN2021/090464
Authority: WO
Inventors: 刘耀晖
Original assignee: 平安科技（深圳）有限公司
Priority date: 2021-02-26
Filing date: 2021-04-28
Publication date: 2022-09-01
Also published as: CN112948382A

Abstract

The present application relates to data processing technology. Provided are an information processing method and apparatus based on big data, and a computer device and a storage medium. The method comprises: acquiring a target clustered table structure corresponding to several target data collection points; according to the target clustered table structure, performing aggregation processing on data collected at the several target data collection points, so as to obtain target clustered table data; calling a virtual management node to parse a data query request, so as to obtain a table identifier to be queried, and determining a target virtual data node according to the table identifier to be queried; acquiring a first data version of the target virtual data node and a second data version of a virtual data node in a corresponding node group; detecting whether the first data version is consistent with the second data version; and when the detection result is 'yes', calling the target virtual data node to obtain node data, and aggregating the node data according to an aggregation rule, so as to obtain target node data. By means of the present application, the efficiency of information processing can be improved, and the rapid development of smart cities can be promoted.

Description

Information processing method, device and related equipment based on big data

This application claims the priority of the Chinese patent application filed on February 26, 2021 with the application number 202110219983.8 titled "information processing method, device and related equipment based on big data", the entire contents of which are incorporated by reference in this application.

technical field

The present application relates to the technical field of data processing, and in particular, to an information processing method, apparatus, computer equipment and medium based on big data.

Background technique

With the vigorous development of mobile Internet and Internet of Things, and the emergence of various sensors and smart devices, mobile phones, computers, wristbands, shared bicycles, taxis, electricity meters, environmental monitoring equipment, large-scale equipment, industrial production lines, etc. Generate massive real-time data and send it to the cloud. These massive data can help enterprises to monitor the operation of business and equipment in real time, generate reports, make predictions and early warnings for business through big data analysis and machine learning, and help enterprises make scientific decisions, save costs and create new value.

In the process of realizing the present application, the inventor found that the prior art has the following technical problems: due to the huge number of data records, the real-time writing of data becomes a bottleneck, and the data processing is extremely slow. Traditional relational databases or NoSQL databases and streaming computing engines do not make full use of the characteristics of time-series space big data, and their performance improvement is extremely limited. They can only rely on cluster architecture and invest more computing and storage resources, which greatly increases enterprise costs.

Therefore, it is necessary to provide an information processing method based on big data, which can improve the efficiency of information processing.

SUMMARY OF THE INVENTION

In view of the above, it is necessary to propose an information processing method based on big data, an information processing apparatus, computer equipment and medium based on big data, which can improve the efficiency of information processing.

A first aspect of the embodiments of the present application provides a big data-based information processing method, where the big data-based information processing method includes:

Obtain the target cluster table structure corresponding to several target data collection points;

Perform aggregation processing on the data collected by several of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database;

When a data query request is received, the virtual management node is called to parse the data query request, to obtain an identifier of the table to be queried corresponding to the data query request, and to determine a target virtual data node according to the identifier of the table to be queried;

acquiring the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node;

detecting whether the first data version is consistent with the second data version;

When the detection result is that the first data version is consistent with the second data version, the target virtual data node is called to obtain node data, and the node data is aggregated according to an aggregation rule to obtain target node data.

A second aspect of the embodiments of the present application further provides an apparatus for information processing based on big data, where the apparatus for information processing based on big data includes:

The aggregation table acquisition module is used to acquire the target aggregation table structure corresponding to several target data collection points;

a data storage module, configured to perform aggregation processing on the data collected by a plurality of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database;

The request parsing module is configured to, when receiving a data query request, call the virtual management node to parse the data query request, obtain the identifier of the table to be queried corresponding to the data query request, and determine the target virtual data according to the identifier of the table to be queried node;

a version obtaining module, configured to obtain the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node;

a version detection module, configured to detect whether the first data version is consistent with the second data version;

A data aggregation module, configured to call the target virtual data node to obtain node data when the detection result is that the first data version is consistent with the second data version, and aggregate the node data according to aggregation rules to obtain a target node data.

A third aspect of the embodiments of the present application further provides a computer device, wherein the computer device includes a processor, and the processor is configured to execute computer-readable instructions stored in a memory to implement the following steps:

A fourth aspect of the embodiments of the present application further provides a computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, wherein when the computer-readable instructions are executed by a processor, the following steps are implemented:

The above-mentioned big data-based information processing method, big data-based information processing device, computer equipment, and computer-readable storage medium provided in the embodiments of the present application, before data is collected at several target data collection points, will be collected for data of the same type. The collection point creates the same aggregate table structure to avoid the huge number of tables caused by building separate tables for each data collection point, which can reduce memory usage and improve information processing efficiency; and when the application receives a data query request from an application , call the virtual management node to parse the data query request, obtain the identifier of the table to be queried corresponding to the data query request, determine the target virtual node according to the identifier of the table to be queried, and execute the data query request by the target virtual node, which can improve the efficiency of information query In addition, the present application performs version comparison on the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node, so that the target that is performing the task can be found in time. Whether the data version of the virtual data node is the latest version, so as to ensure the accuracy of data processing. The present application can be applied to various functional modules of smart cities such as smart government affairs and smart transportation, such as the big data-based information processing modules of smart government affairs, etc., which can promote the rapid development of smart cities.

Description of drawings

FIG. 1 is a flowchart of a big data-based information processing method provided in Embodiment 1 of the present application.

FIG. 2 is a structural diagram of a big data-based information processing apparatus provided in Embodiment 2 of the present application.

FIG. 3 is a schematic structural diagram of a computer device provided in Embodiment 3 of the present application.

The following specific embodiments will further illustrate the present application in conjunction with the above drawings.

Detailed ways

In order to more clearly understand the above objects, features and advantages of the present application, the present application will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present application and the features in the embodiments may be combined with each other in the case of no conflict.

In the following description, many specific details are set forth in order to facilitate a full understanding of the present application, and the described embodiments are some, but not all, of the embodiments of the present application. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein in the specification of the application are for the purpose of describing specific embodiments only, and are not intended to limit the application.

The big data-based information processing methods provided in the embodiments of the present application are executed by computer equipment, and correspondingly, the big data-based information processing apparatuses run in the computer equipment.

FIG. 1 is a flowchart of a big data-based information processing method according to the first embodiment of the present application. The big data-based information processing method can be applied in a distributed architecture. As shown in FIG. 1 , the big data-based information processing method may include the following steps. According to different requirements, the order of the steps in the flowchart may be changed, and some may be omitted.

S11. Acquire a target aggregation table structure corresponding to several target data collection points.

In at least one embodiment of the present application, the target data collection point may refer to a data collection terminal, for example, a terminal capable of collecting data such as a wristband, a sensor, and an electric meter, which is not limited herein. The target data collection point may be to collect the same type of data, or may be to collect different types of data. For example, when 300 wristbands are used to collect the heart rate information of all employees on a certain floor of an office building, the 300 data collection points are all of the same type; Heart rate information, when 100 wristbands B are used to collect ECG information of a residential building, wristbands A and B are different types of collection points.

Optionally, before acquiring the target aggregation table structure corresponding to several target data collection points, the method further includes:

Obtain the data collection type of each of the target data collection points;

Detecting whether the data collection types are consistent;

When the detection result is that the data collection types are consistent, determine the target aggregation table structure of the data collection types;

When the detection result is that the data collection types are inconsistent, a separate table structure is created for each of the data collection types.

Wherein, when the detection result is that the data collection types are consistent, the target aggregation table structure of the data collection types is determined. The heart rate information collected by the 300 bracelets A can be stored in the same target aggregation table structure. When the detection result is that the data collection types are inconsistent, a separate table structure is created for each data collection type to store the time series data collected by the data collection point, and the table structure refers to the data collected with the data collection point. The format of the time series data matches the aggregate table structure. Exemplarily, when 300 wristbands A are used to collect the heart rate information of all employees on a certain floor of an office building, and 100 wristbands B are used to collect the electrocardiogram information of a certain residential building, the wristbands A and B are of different types. The collection point is to create respective table structure A and table structure B for the data collected by the bracelet A and the bracelet B, wherein all the data collected by the bracelet A are stored in the same table structure A, All data collected by ring B are stored in the same table structure B. In the present application, by creating a table structure for each of the data collection types separately, it can be written in a lock-free manner, avoiding a lot of consumption caused by locking, and the speed of data writing into the distributed architecture is greatly improved.

In an embodiment, the aggregation table structures corresponding to different data collection types may or may not be the same. Optionally, the acquiring the target aggregation table structure corresponding to several target data collection points may include:

obtaining the data collection type of the target data collection point;

Analyzing the data collection type to obtain items to be collected and attribute information corresponding to each item to be collected;

Create a target aggregation table structure according to the item to be collected and the attribute information.

The data collection type includes information such as collected data content and data attributes, and the attribute information may include data attributes such as data length or data type, which is not limited herein. The items to be collected are arranged according to a preset method in the table structure. For example, the items to be collected can be arranged according to the importance of the data or the frequency of querying the data, and the corresponding items to be collected are added to each item to be collected. attribute information, so as to filter the initial data collected by each item to be collected to obtain data that meets the requirements.

In one embodiment, the target cluster table structure can be stored in the distributed architecture in the form of snapshots, which can avoid problems such as storage errors of the target cluster table structure and improve data storage reliability.

Optionally, when invoking a known target aggregation table structure, the method further includes:

obtaining the data collection type of the target data collection point;

Determine the snapshot of the target cluster table structure according to the data collection type;

Scan the snapshot of the target cluster table structure to obtain the target cluster table structure.

There is a mapping relationship between the data collection type and the target cluster table structure snapshot, and by querying the mapping relationship, the target cluster table structure snapshot corresponding to the data collection type can be determined. It can be understood that when an update of the target cluster table structure corresponding to a certain data collection type is detected, the snapshot of the target cluster table structure in the distributed architecture can be directly replaced, which can improve the data update rate and thus the information processing rate.

S12. Perform aggregation processing on the data collected by a plurality of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database.

In at least one embodiment of the present application, the data collected at each target data collection point is added to the target aggregation table structure to obtain initial aggregation table data, and several target data collection points are collected to obtain The initial clustering table data is added with tags and then aggregated to obtain the target clustering table data.

Optionally, performing aggregation processing on the data collected by several of the target data collection points according to the target aggregation table structure, and obtaining the target aggregation table data may include:

Acquire the data collected by each of the target data collection points, and fill the data into the target cluster table structure to obtain initial cluster table data;

constructing a preset label corresponding to the identification information of the target data collection point, and adding the label to the initial aggregation table data;

Aggregate and process the initial clustered table data to obtain target clustered table data.

Wherein, the identification information of the target data collection point refers to an identification used to identify the characteristics of the collection point, and the identification information may be an ID identification or a code identification. The preset label is used to identify the initial aggregation table data, and according to the preset tag, it can be determined to which data collection point the initial aggregation table data belongs. The preset label may be a number label, a color label or a letter label.

In one embodiment, after obtaining the target table aggregation data, the method further includes: detecting whether abnormal data exists in the target table aggregation data. The abnormal data may include data with a null value or excessive data beyond a preset reasonable range. Due to network or collection point and other problems, a certain item to be collected collected by the collection point may fail, and the data of the item to be collected cannot be collected, resulting in the value of this item being empty; it may also cause a certain item to be collected collected by the collection point The data of the item deviates from the normal value and exceeds the preset reasonable range too much. The preset reasonable range refers to a preset range value. When the detection result is that there is abnormal data in the target aggregation table data, the data volume of the abnormal data is determined; whether the data volume exceeds the preset data volume threshold range is detected; when the detection result is that the data volume exceeds the preset data When the volume threshold is within the range, obtain the historical time series data corresponding to the abnormal data, fit a reasonable value according to the historical time series data, and replace the abnormal data with the reasonable value; when the detection result is that the data volume does not exceed When the data volume threshold range is preset, the abnormal data is controlled to be empty. The preset data volume threshold is a preset value. The fitting of the reasonable value according to the historical time series data may be by using a pre-trained reasonable value estimation model to process the historical time series data to obtain the reasonable value. The training process of the reasonable value estimation model is in the prior art, and details are not described here. In the present application, abnormal data is detected before the target cluster data is stored in the preset database, and the abnormal data is processed in time, so as to ensure that the data stored in the preset database is always correct, thereby improving the accuracy of information processing.

In at least one embodiment of the present application, after the target aggregation table data is obtained, the target aggregation table data is stored in a preset database, and the preset database may be a memory in a distributed architecture. In order to reduce the memory overhead and effectively deal with the problem of time disorder, the row storage mode is adopted, the jump table is used to build the index, and the memory is managed according to the first-in, first-out method. In order to make full use of the characteristics of time series data, column storage is used for persistence, and the physical structure is continuous in blocks, which improves the compression rate and reading speed. Each data block is pre-computed to improve the speed of data analysis.

Optionally, the data volume of the target aggregation table data stored in the distributed architecture increases as the data collected by the data collection points increases, and when the data volume of the target aggregation table data is large, the method further include:

Get the remaining space value in the preset database;

monitoring whether the remaining space value satisfies a preset space critical value;

When the monitoring result is that the remaining space value meets the preset space critical value, select the target data in the preset database;

Migrate the target data to the hard disk.

Among them, the memory management of the present application adopts a first-in, first-out queue management to ensure that the newly collected data is in the memory. The target data refers to data of a certain amount of data in the preset database that exceeds a preset spatial critical value and is collected earlier in time. Optionally, the migrating the target data to be stored in the hard disk may include: determining the load information of the transmission channel and the data volume information of the target data to be transmitted; determining a single transmission according to the load information and the data volume information. Optimal value; migrate the target data in batches according to the optimal value of the single transmission. The optimal value for a single transmission can be calculated by a pre-trained optimal value determination model according to the load information and the data volume information, and the optimal value for a single transmission refers to a value that can ensure fast data transmission. The training process of the optimal value determination model is in the prior art and will not be repeated here. The target data is written to the hard disk by adding a log, which can improve the speed of disk loading.

Through the above method, data is stored on different physical media according to the degree of freshness, for example, new data is stored in memory, and old data is stored in large-capacity slow hard disks, which greatly reduces the random read consumption of hard disks and improves write query efficiency.

S13. When a data query request is received, the virtual management node is invoked to parse the data query request to obtain an identifier of the table to be queried corresponding to the data query request, and a target virtual data node is determined according to the identifier of the table to be queried.

In at least one embodiment of the present application, the data query request may be a request sent by an application to query aggregate table data in the preset database. The data query request carries the identifier of the table to be queried, and the identifier of the table to be queried includes information blocks such as the name or ID of the collection point, the start and end time of data collection, and several query items, and the query items correspond to the items to be collected. . The identifier of the table to be queried corresponds to the aggregation table data in the preset database, and the target aggregation table data can be obtained by traversing the preset database according to the identifier of the table to be queried. The preset database includes several data nodes, the data nodes are a running instance in a physical machine, a virtual machine or a container, and a working system has at least one data node. The data nodes include several virtual data nodes, and at most one virtual management node.

The virtual management node is responsible for the collection, load balancing, and metadata management of all nodes' running states. When the application needs to query a table, it obtains information by connecting to the management node, and obtains which data node the table is located on. The virtual data node is responsible for storing specific time series data, and query operations for the time series data are all performed on the virtual data node, and virtual data nodes located on different physical machines can form a virtual data node group.

Wherein, the virtual management node is used for storing metadata, and at the same time performing load balancing according to the state of each virtual data node. The metadata may refer to metadata such as the start time of data collection, the number of data points, and the compression algorithm. Since the amount of metadata is not large, it is completely stored in memory to ensure efficient query operations. On the application side, in order to avoid accessing the virtual management node for each data operation, the driver saves the necessary metadata locally, and accesses the virtual management node only when the required metadata does not exist or is invalid. Improve system performance.

Optionally, the invoking the virtual management node to parse the data query request to obtain an identifier of the table to be queried corresponding to the data query request, and determining the target virtual data node according to the identifier of the table to be queried may include:

Parse the identifier of the table to be queried to obtain a target information block;

Traverse metadata according to the target information block to obtain target metadata;

The virtual data node corresponding to the target metadata is determined as the target virtual data node.

The data in the virtual data node group can be synchronized through asynchronous replication to realize the eventual consistency of the data and ensure that a piece of data is copied on multiple physical machines. The virtual data nodes on the computer can process query requests to ensure high reliability of system operation. Wherein, multiple virtual management nodes may form a virtual management node group. The number of virtual management nodes in the virtual management node group may be determined according to the number of virtual data nodes. Optionally, determining the number of the virtual management nodes may include: acquiring a first number of virtual data nodes; traversing the preset quantitative relationship between the virtual data nodes and the virtual management nodes according to the first number, and obtaining a second number of the virtual management nodes corresponding to the first number; constructing a node tree between the virtual management nodes and the virtual data nodes. Wherein, in the node tree, the virtual management node is a parent node, and the virtual data node managed by it is a child node.

In one embodiment, the Master-Slave (master-slave device mode) synchronous replication mode is used to realize the data synchronization of the virtual management node. In the Master-Slave synchronous replication mode, the virtual management node includes a leading virtual management node ( Also called Master node) and several subordinate virtual management nodes (also called Slave nodes), the Master node is a task scheduler, assigning computing tasks to multiple Slave nodes, when all Slave nodes complete the task, the Master Node aggregates results. When performing a write operation, the Master node will return success only after the Slave node writes successfully, thereby ensuring strong data consistency. If the Master node goes down, the system has a mechanism to ensure that one of the Slaves will be elected as the Master immediately, thus ensuring the high reliability of system write operations.

Optionally, the method further includes:

Detect whether the dominant virtual management node in the virtual management node group composed of virtual management nodes is abnormal;

When the detection result is that the dominant virtual management node is abnormal, acquire a node tree, and calculate the number of child nodes in each of the node trees;

Determine the node tree with the smallest number of child nodes as the target node tree;

The parent node corresponding to the target node tree is selected as the new dominant virtual management node. Wherein, detecting whether the dominant virtual management node is abnormal means detecting whether the dominant virtual management node is down.

In one embodiment, in a virtual data node group, each virtual data node knows the status of each other through heartbeat packets. If a virtual data node receives a data write request, the request will be immediately forwarded to other virtual data nodes, and then stored locally. When the application wants to operate any aggregation table data, the system will provide the application with the IP addresses of each virtual node in the virtual data node group to which the table belongs. If the connection to one of them fails or the operation fails, the application will try the second and third Three, failure will be returned only if all nodes fail. This ensures that the failure of any machine in the virtual data node group will not affect external services.

Optionally, the target virtual data node may have an exception before executing the data query, or may have an exception during the execution of the data query. For the above two situations, the method further includes:

obtaining a virtual data node group corresponding to the target virtual data node;

calling the virtual data node in the virtual data node group to receive the heartbeat packet of the target virtual data node;

Parse and detect whether the heartbeat packet is in an abnormal state;

When the detection result is that the heartbeat packet is in an abnormal state, other virtual data nodes are determined from the virtual data node group for performing data query.

Wherein, when it is determined by detecting the heartbeat packet that the target virtual data node is abnormal before executing the data query, other virtual data nodes are directly and randomly determined from the virtual data node group for executing the data query. By detecting the heartbeat packet, when it is determined that the target virtual data node is abnormal in the process of executing the data query, the heartbeat packet is parsed, and the heartbeat packet carries the data information queried by the target virtual data node. Other virtual data nodes are randomly determined in the virtual data node group to perform the remaining data query work. In order to avoid the problem that the transmission rate of the heartbeat packet is slow due to too much queried data information carried in the heartbeat packet.

In one embodiment, the method further includes: acquiring data information queried by the target virtual data node; compressing the data information to a preset size; and storing the compressed data information in a heartbeat packet. The preset size is a preset compression amount size. By compressing the queried data information, the amount of data information carried in the heartbeat packet can be reduced, and the transmission rate of the heartbeat packet can be improved.

In an embodiment thereof, the method further includes: acquiring data information queried by the target virtual data node; constructing a data link for the queried data information; and storing the data link in the heartbeat packet. The method of constructing the data link is in the prior art, and details are not described here. By establishing a data link for the queried data information, the amount of data information carried in the heartbeat packet can be reduced, and the transmission rate of the heartbeat packet can be improved. By carrying in the heartbeat packet the data information queried by the target virtual data node in which the abnormality occurs, repeated execution of data query work can be avoided, and the efficiency of information processing can be improved.

S14. Acquire the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node.

In at least one embodiment of the present application, when a host is restarted, each virtual data node will check whether the version of its own data is consistent with other virtual data nodes in the corresponding node group. If the data version is inconsistent, it needs to be synchronized before it can be externally Serve. During the running process, due to various reasons, the data can be out of synchronization. This kind of out-of-synchronization will be found when the forwarded write request is received. Once found, the virtual data node with a low data version will immediately stop the external service and enter the The synchronization process will resume external services only after synchronization is complete. During the synchronization process, nodes with higher data versions can provide services normally.

Optionally, the data version is used to identify the freshness of the data stored in the target virtual data node. The higher the data version, the newer the corresponding stored data. Whether the stored data is the latest version. The number of the second data versions of the virtual data node in the node group of the target virtual data node may be one, or may be multiple. When the number of the second data versions is multiple, the method further includes:

obtaining the number of second data versions of each virtual data node in the node group of the target virtual data node;

detecting whether the number of the second data versions exceeds one;

When the detection result is that the number of the second data versions exceeds one, the release time of each second data version is acquired, and the second data version with the latest release time is selected as the latest version of the aggregate table data.

S15. Detect whether the first data version is consistent with the second data version.

In at least one embodiment of the present application, it is detected whether the first data version is consistent with the second data version to determine whether the data stored in the target virtual data node is the latest version. When the detection result is that the first data version is consistent with the second data version, it is determined that the data stored in the target virtual data node is the latest version, and the target virtual data node can continue to perform data query operations; When the result is that the first data version is inconsistent with the second data version, it is determined that the data stored in the target virtual data node is not the latest version, and the data of the latest version needs to be acquired and stored in the target virtual data node. data is updated. While updating the data stored in the target virtual data node, the data query request may be allocated to other virtual data nodes in the node group with the latest version for execution, so as to ensure that the data stored in the target virtual data node is When the data is not the latest version, it will not affect the data query process, which can improve the reliability and efficiency of information processing.

S16. When the detection result is that the first data version is consistent with the second data version, call the target virtual data node to obtain node data, and aggregate the node data according to an aggregation rule to obtain target node data.

In at least one embodiment of the present application, when the detection result is that the first data version is consistent with the second data version, the target virtual data node is called to obtain node data, and the node data is aggregated according to an aggregation rule , get the target node data. Wherein, the node data refers to the aggregation table data stored in the preset database for the request of the data query request.

In one embodiment, when the data query request is used to request at least two aggregation table data, after acquiring the corresponding two aggregation table data, it is necessary to process the two aggregation table data according to the aggregation rules to obtain the aggregation time series. data. Wherein, the aggregation rule may be obtained by structural processing after analysis of the aggregation condition carried in the data query request. The aggregation condition may be an average value, a maximum value, or a minimum value, etc., of the requested aggregation table data.

In the above-mentioned big data-based information processing method provided by the embodiment of the present application, before data is collected at several target data collection points, the same aggregation table structure is created for collection points with the same data collection type, so as to avoid collecting data for each data collection point. The problem of huge number of tables caused by the single point building of tables can reduce memory usage and improve information processing efficiency; and when receiving a data query request sent by an application, the application calls the virtual management node to parse the data query request, and obtains the corresponding data. The identifier of the table to be queried for the query request, and the target virtual node is determined according to the identifier of the table to be queried, and the target virtual node executes the data query request, which can improve the information query efficiency; And the second data version of each virtual data node in the node group corresponding to the target virtual data node performs version comparison, which can timely find out whether the data version of the target virtual data node that is executing the task is the latest version, thereby ensuring the accuracy of data processing. sex.

In some embodiments, the big data-based information processing apparatus 20 may include a plurality of functional modules composed of computer program segments. The computer program of each program segment in the big data-based information processing apparatus 20 can be stored in the memory of the computer device and executed by at least one processor to execute (details described in FIG. 1 ) the big data-based information processing function.

In this embodiment, the big data-based information processing apparatus 20 can be divided into a plurality of functional modules according to the functions performed by the information processing apparatus 20 . The functional modules may include: a table aggregation acquisition module 201 , a data storage module 202 , a request analysis module 203 , a version acquisition module 204 , a version detection module 205 and a data aggregation module 206 . A module referred to in this application refers to a series of computer-readable instruction segments that can be executed by at least one processor and can perform fixed functions, and are stored in a memory. In this embodiment, the functions of each module will be described in detail in subsequent embodiments.

The aggregation table obtaining module 201 may be configured to obtain a target aggregation table structure corresponding to several target data collection points.

The data storage module 202 can be configured to perform aggregation processing on the data collected by several of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database. middle.

The request parsing module 203 may be configured to, when receiving a data query request, call the virtual management node to parse the data query request, obtain the identifier of the table to be queried corresponding to the data query request, and determine the identifier of the table to be queried according to the identifier of the table to be queried. The target virtual data node.

The version obtaining module 204 may be configured to obtain the first data version of the target virtual data node and the second data version of each virtual data node in the node group corresponding to the target virtual data node.

The version detection module 205 may be configured to detect whether the first data version is consistent with the second data version.

The data aggregation module 206 may be configured to call the target virtual data node to obtain node data when the detection result is that the first data version is consistent with the second data version, and aggregate the node data according to an aggregation rule, Get the target node data.

Referring to FIG. 3 , it is a schematic structural diagram of a computer device according to Embodiment 3 of the present application. In a preferred embodiment of the present application, the computer device 3 includes a memory 31 , at least one processor 32 , at least one communication bus 33 and a transceiver 34 .

Those skilled in the art should understand that the structure of the computer device shown in FIG. 3 does not constitute a limitation of the embodiments of the present application, and may be a bus-type structure or a star-shaped structure. more or less other hardware or software, or a different arrangement of components is shown.

In some embodiments, the computer device 3 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, application-specific integrated circuits, Programmable gate arrays, digital processors and embedded devices, etc. The computer equipment 3 may also include client equipment, including but not limited to any electronic product that can interact with the client through a keyboard, a mouse, a remote control, a touchpad or a voice-activated device, etc., for example, Personal computers, tablets, smartphones, digital cameras, etc.

It should be noted that the computer equipment 3 is only an example, and other existing or future electronic products, if applicable to the present application, should also be included within the protection scope of the present application, and incorporated herein by reference .

In some embodiments, a computer program is stored in the memory 31, and when the computer program is executed by the at least one processor 32, all or part of the steps in the above-mentioned big data-based information processing method are implemented. Exemplarily, the computer program may be divided into one or more modules/units, and the one or more modules/units may be a series of computer-readable instruction segments capable of performing specific functions, and the instruction segments are used to describe The execution process of the computer program in the computer device. For example, each module described in FIG. 2 is a computer program stored in the memory 31 and executed by the at least one processor 32, thereby realizing the functions of the various modules to achieve information processing based on big data the goal of. Described memory 31 comprises read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), erasable programmable read-only memory (Erasable Programmable Read-Only Memory, EPROM) , One-time Programmable Read-Only Memory (OTPROM), Electronically-Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read- Only Memory, CD-ROM) or other optical disk storage, magnetic disk storage, magnetic tape storage, or any other computer-readable medium that can be used to carry or store data.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function, and the like; The data created by the use of the node, etc. The computer-readable storage medium may be non-volatile or volatile.

The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

In some embodiments, the at least one processor 32 is a control core (Control Unit) of the computer device 3, using various interfaces and lines to connect various components of the entire computer device 3, and by running or executing storage in the computer device 3 The programs or modules in the memory 31 and the data stored in the memory 31 are called to perform various functions of the computer device 3 and process data. For example, when the at least one processor 32 executes the computer program stored in the memory, all or part of the steps of the big data-based information processing method described in the embodiments of the present application are implemented; or a big data-based information processing apparatus is implemented. all or part of the functions. The at least one processor 32 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one or more central processing units. (Central Processing unit, CPU), microprocessor, digital processing chip, graphics processor and combination of various control chips, etc.

In some embodiments, the at least one communication bus 33 is configured to enable connection communication between the memory 31 and the at least one processor 32 and the like.

Although not shown, the computer device 3 may also include a power source (such as a battery) for supplying power to various components. Preferably, the power source may be logically connected to the at least one processor 32 through a power management device, so as to be implemented by the power management device Manage charging, discharging, and power management functions. The power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components. The computer device 3 may also include a variety of sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.

The above-mentioned integrated units implemented in the form of software functional modules may be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to make a computer device (which may be a personal computer, a computer device, or a network device, etc.) or a processor (processor) to execute the methods described in the various embodiments of the present application. part.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division manners in actual implementation.

The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, and may be located in one place or distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.

It will be apparent to those skilled in the art that the present application is not limited to the details of the above-described exemplary embodiments, but that the present application can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application. Accordingly, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the application is to be defined by the appended claims rather than the foregoing description, which is therefore intended to fall within the scope of the claims. All changes within the meaning and scope of the equivalents of , are included in this application. Any reference signs in the claims shall not be construed as limiting the involved claim. Furthermore, it is clear that the word "comprising" does not exclude other units or, and the singular does not exclude the plural. A plurality of units or devices stated in the specification can also be implemented by one unit or device through software or hardware. The terms first, second, etc. are used to denote names and do not denote any particular order.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application rather than limitations. Although the present application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present application can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present application.

Claims

A big data-based information processing method, wherein the big data-based information processing method comprises:

Obtain the target cluster table structure corresponding to several target data collection points;

Perform aggregation processing on the data collected by several of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database;

When a data query request is received, the virtual management node is called to parse the data query request, to obtain an identifier of the table to be queried corresponding to the data query request, and to determine a target virtual data node according to the identifier of the table to be queried;

acquiring the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node;

detecting whether the first data version is consistent with the second data version;

When the detection result is that the first data version is consistent with the second data version, the target virtual data node is called to obtain node data, and the node data is aggregated according to an aggregation rule to obtain target node data.
The big data-based information processing method according to claim 1, wherein, before acquiring the target aggregation table structure corresponding to several target data collection points, the method further comprises:

Obtain the data collection type of each of the target data collection points;

Detecting whether the data collection types are consistent;

When the detection result is that the data collection types are consistent, determine the target aggregation table structure of the data collection types;

When the detection result is that the data collection types are inconsistent, a separate table structure is created for each of the data collection types.
The big data-based information processing method according to claim 1, wherein the acquiring a target clustering table structure corresponding to several target data collection points comprises:

obtaining the data collection type of the target data collection point;

Analyzing the data collection type to obtain items to be collected and attribute information corresponding to each item to be collected;

Create a target aggregation table structure according to the item to be collected and the attribute information.
The big data-based information processing method according to claim 1 , wherein, according to the target aggregation table structure, performing aggregation processing on the data collected by a plurality of the target data collection points, and obtaining the target aggregation table data comprises:

Acquire the data collected by each of the target data collection points, and fill the data into the target cluster table structure to obtain initial cluster table data;

constructing a preset label corresponding to the identification information of the target data collection point, and adding the label to the initial aggregation table data;

Aggregate and process the initial clustered table data to obtain target clustered table data.
The big data-based information processing method according to claim 1, wherein the method further comprises:

obtaining the remaining space value in the preset database;

monitoring whether the remaining space value satisfies a preset space critical value;

When the monitoring result is that the remaining space value meets the preset space critical value, select the target data in the preset database;

Migrate the target data to the hard disk.
The big data-based information processing method according to claim 1, wherein the method further comprises:

Detect whether the dominant virtual management node in the virtual management node group composed of virtual management nodes is abnormal;

When the detection result is that the dominant virtual management node is abnormal, acquire a node tree, and calculate the number of child nodes in each of the node trees;

Determine the node tree with the smallest number of child nodes as the target node tree;

The parent node corresponding to the target node tree is selected as the new dominant virtual management node.
The big data-based information processing method according to claim 1, wherein the method further comprises:

obtaining a virtual data node group corresponding to the target virtual data node;

calling the virtual data node in the virtual data node group to receive the heartbeat packet of the target virtual data node;

Parse and detect whether the heartbeat packet is in an abnormal state;

When the detection result is that the heartbeat packet is in an abnormal state, other virtual data nodes are determined from the virtual data node group for performing data query.
A big data-based information processing device, wherein the big data-based information processing device comprises:

The aggregation table acquisition module is used to acquire the target aggregation table structure corresponding to several target data collection points;

a data storage module, configured to perform aggregation processing on the data collected by a plurality of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database;

The request parsing module is configured to, when receiving a data query request, call the virtual management node to parse the data query request, obtain the identifier of the table to be queried corresponding to the data query request, and determine the target virtual data according to the identifier of the table to be queried node;

a version obtaining module, configured to obtain the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node;

a version detection module, configured to detect whether the first data version is consistent with the second data version;

A data aggregation module, configured to call the target virtual data node to obtain node data when the detection result is that the first data version is consistent with the second data version, and aggregate the node data according to aggregation rules to obtain a target node data.
A computer device, wherein the computer device includes a processor for executing computer-readable instructions stored in a memory to implement the following steps:

Obtain the target cluster table structure corresponding to several target data collection points;

Perform aggregation processing on the data collected by several of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database;

When a data query request is received, the virtual management node is called to parse the data query request, to obtain an identifier of the table to be queried corresponding to the data query request, and to determine a target virtual data node according to the identifier of the table to be queried;

acquiring the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node;

detecting whether the first data version is consistent with the second data version;

When the detection result is that the first data version is consistent with the second data version, the target virtual data node is called to obtain node data, and the node data is aggregated according to an aggregation rule to obtain target node data.
The computer device according to claim 9, wherein before acquiring the target aggregation table structure corresponding to several target data collection points, the processor executes the computer-readable instructions to further implement the following steps:

Obtain the data collection type of each of the target data collection points;

Detecting whether the data collection types are consistent;

When the detection result is that the data collection types are consistent, determine the target aggregation table structure of the data collection types;

When the detection result is that the data collection types are inconsistent, a separate table structure is created for each of the data collection types.
The computer device according to claim 9, wherein, when the processor executes the computer-readable instructions to realize the obtaining of the target aggregation table structure corresponding to several target data collection points, the method comprises:

obtaining the data collection type of the target data collection point;

Analyzing the data collection type to obtain items to be collected and attribute information corresponding to each item to be collected;

Create a target aggregation table structure according to the item to be collected and the attribute information.
The computer device according to claim 9, wherein the processor executes the computer-readable instructions to implement the aggregation processing of the data collected from a plurality of the target data collection points according to the target aggregation table structure, When getting the target cluster table data, include:

Acquiring the data collected by each of the target data collection points, and filling the data into the target table aggregation structure to obtain initial table aggregation data;

constructing a preset label corresponding to the identification information of the target data collection point, and adding the label to the initial aggregation table data;

Aggregate and process the initial clustered table data to obtain target clustered table data.
The computer device of claim 9, wherein the processor executes the computer-readable instructions to further implement the following steps:

obtaining the remaining space value in the preset database;

monitoring whether the remaining space value satisfies a preset space critical value;

When the monitoring result is that the remaining space value meets the preset space critical value, select the target data in the preset database;

Migrate the target data to the hard disk.
The computer device of claim 9, wherein the processor executes the computer-readable instructions to further implement the following steps:

Detect whether the dominant virtual management node in the virtual management node group composed of virtual management nodes is abnormal;

When the detection result is that the dominant virtual management node is abnormal, acquire a node tree, and calculate the number of child nodes in each of the node trees;

Determine the node tree with the smallest number of child nodes as the target node tree;

The parent node corresponding to the target node tree is selected as the new dominant virtual management node.
The computer device of claim 9, wherein the processor executes the computer-readable instructions to further implement the following steps:

obtaining a virtual data node group corresponding to the target virtual data node;

calling the virtual data node in the virtual data node group to receive the heartbeat packet of the target virtual data node;

Parse and detect whether the heartbeat packet is in an abnormal state;

When the detection result is that the heartbeat packet is in an abnormal state, other virtual data nodes are determined from the virtual data node group for performing data query.
A computer-readable storage medium storing computer-readable instructions on the computer-readable storage medium, wherein the computer-readable instructions realize the following steps when executed by a processor:

Obtain the target cluster table structure corresponding to several target data collection points;

Perform aggregation processing on the data collected by several of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database;

When a data query request is received, the virtual management node is called to parse the data query request, to obtain an identifier of the table to be queried corresponding to the data query request, and to determine a target virtual data node according to the identifier of the table to be queried;

acquiring the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node;

detecting whether the first data version is consistent with the second data version;

When the detection result is that the first data version is consistent with the second data version, the target virtual data node is called to obtain node data, and the node data is aggregated according to an aggregation rule to obtain target node data.
The computer-readable storage medium according to claim 16, wherein before acquiring the target aggregation table structure corresponding to several target data collection points, the computer-readable instructions are executed by the processor to further implement the following steps:

Obtain the data collection type of each of the target data collection points;

Detecting whether the data collection types are consistent;

When the detection result is that the data collection types are consistent, determine the target aggregation table structure of the data collection types;

When the detection result is that the data collection types are inconsistent, a separate table structure is created for each of the data collection types.
The computer-readable storage medium according to claim 16, wherein, when the computer-readable instructions are executed by the processor to realize the acquisition of a target aggregation table structure corresponding to several target data collection points, the method comprises:

obtaining the data collection type of the target data collection point;

Analyzing the data collection type to obtain items to be collected and attribute information corresponding to each item to be collected;

Create a target aggregation table structure according to the item to be collected and the attribute information.
The computer-readable storage medium according to claim 16, wherein the computer-readable instructions are executed by the processor to realize the aggregation of the data collected by a plurality of the target data collection points according to the target aggregation table structure When processing to obtain the target cluster table data, include:

Acquire the data collected by each of the target data collection points, and fill the data into the target cluster table structure to obtain initial cluster table data;

constructing a preset label corresponding to the identification information of the target data collection point, and adding the label to the initial aggregation table data;

Aggregate and process the initial clustered table data to obtain target clustered table data.
The computer-readable storage medium of claim 16, wherein the computer-readable instructions are executed by the processor to further implement the steps of:

obtaining the remaining space value in the preset database;

monitoring whether the remaining space value satisfies a preset space critical value;

When the monitoring result is that the remaining space value meets the preset space critical value, select the target data in the preset database;

Migrate the target data to the hard disk.