WO2022178976A1 - Information processing method and apparatus based on big data, and related devices - Google Patents

Information processing method and apparatus based on big data, and related devices Download PDF

Info

Publication number
WO2022178976A1
WO2022178976A1 PCT/CN2021/090464 CN2021090464W WO2022178976A1 WO 2022178976 A1 WO2022178976 A1 WO 2022178976A1 CN 2021090464 W CN2021090464 W CN 2021090464W WO 2022178976 A1 WO2022178976 A1 WO 2022178976A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target
node
virtual
aggregation
Prior art date
Application number
PCT/CN2021/090464
Other languages
French (fr)
Chinese (zh)
Inventor
刘耀晖
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022178976A1 publication Critical patent/WO2022178976A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination

Definitions

  • the present application relates to the technical field of data processing, and in particular, to an information processing method, apparatus, computer equipment and medium based on big data.
  • a first aspect of the embodiments of the present application provides a big data-based information processing method, where the big data-based information processing method includes:
  • the virtual management node When a data query request is received, the virtual management node is called to parse the data query request, to obtain an identifier of the table to be queried corresponding to the data query request, and to determine a target virtual data node according to the identifier of the table to be queried;
  • the target virtual data node is called to obtain node data, and the node data is aggregated according to an aggregation rule to obtain target node data.
  • a second aspect of the embodiments of the present application further provides an apparatus for information processing based on big data, where the apparatus for information processing based on big data includes:
  • the aggregation table acquisition module is used to acquire the target aggregation table structure corresponding to several target data collection points;
  • a data storage module configured to perform aggregation processing on the data collected by a plurality of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database;
  • the request parsing module is configured to, when receiving a data query request, call the virtual management node to parse the data query request, obtain the identifier of the table to be queried corresponding to the data query request, and determine the target virtual data according to the identifier of the table to be queried node;
  • a version obtaining module configured to obtain the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node;
  • a version detection module configured to detect whether the first data version is consistent with the second data version
  • a data aggregation module configured to call the target virtual data node to obtain node data when the detection result is that the first data version is consistent with the second data version, and aggregate the node data according to aggregation rules to obtain a target node data.
  • a third aspect of the embodiments of the present application further provides a computer device, wherein the computer device includes a processor, and the processor is configured to execute computer-readable instructions stored in a memory to implement the following steps:
  • the virtual management node When a data query request is received, the virtual management node is called to parse the data query request, to obtain an identifier of the table to be queried corresponding to the data query request, and to determine a target virtual data node according to the identifier of the table to be queried;
  • the target virtual data node is called to obtain node data, and the node data is aggregated according to an aggregation rule to obtain target node data.
  • a fourth aspect of the embodiments of the present application further provides a computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, wherein when the computer-readable instructions are executed by a processor, the following steps are implemented:
  • the virtual management node When a data query request is received, the virtual management node is called to parse the data query request, to obtain an identifier of the table to be queried corresponding to the data query request, and to determine a target virtual data node according to the identifier of the table to be queried;
  • the target virtual data node is called to obtain node data, and the node data is aggregated according to an aggregation rule to obtain target node data.
  • the above-mentioned big data-based information processing method, big data-based information processing device, computer equipment, and computer-readable storage medium provided in the embodiments of the present application, before data is collected at several target data collection points, will be collected for data of the same type.
  • the collection point creates the same aggregate table structure to avoid the huge number of tables caused by building separate tables for each data collection point, which can reduce memory usage and improve information processing efficiency; and when the application receives a data query request from an application , call the virtual management node to parse the data query request, obtain the identifier of the table to be queried corresponding to the data query request, determine the target virtual node according to the identifier of the table to be queried, and execute the data query request by the target virtual node, which can improve the efficiency of information query In addition, the present application performs version comparison on the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node, so that the target that is performing the
  • the present application can be applied to various functional modules of smart cities such as smart government affairs and smart transportation, such as the big data-based information processing modules of smart government affairs, etc., which can promote the rapid development of smart cities.
  • FIG. 1 is a flowchart of a big data-based information processing method provided in Embodiment 1 of the present application.
  • FIG. 2 is a structural diagram of a big data-based information processing apparatus provided in Embodiment 2 of the present application.
  • FIG. 3 is a schematic structural diagram of a computer device provided in Embodiment 3 of the present application.
  • the big data-based information processing methods provided in the embodiments of the present application are executed by computer equipment, and correspondingly, the big data-based information processing apparatuses run in the computer equipment.
  • FIG. 1 is a flowchart of a big data-based information processing method according to the first embodiment of the present application.
  • the big data-based information processing method can be applied in a distributed architecture.
  • the big data-based information processing method may include the following steps. According to different requirements, the order of the steps in the flowchart may be changed, and some may be omitted.
  • the target data collection point may refer to a data collection terminal, for example, a terminal capable of collecting data such as a wristband, a sensor, and an electric meter, which is not limited herein.
  • the target data collection point may be to collect the same type of data, or may be to collect different types of data.
  • the 300 data collection points are all of the same type;
  • Heart rate information when 100 wristbands B are used to collect ECG information of a residential building, wristbands A and B are different types of collection points.
  • the method before acquiring the target aggregation table structure corresponding to several target data collection points, the method further includes:
  • the target aggregation table structure of the data collection types is determined.
  • the heart rate information collected by the 300 bracelets A can be stored in the same target aggregation table structure.
  • a separate table structure is created for each data collection type to store the time series data collected by the data collection point, and the table structure refers to the data collected with the data collection point.
  • the format of the time series data matches the aggregate table structure.
  • 300 wristbands A are used to collect the heart rate information of all employees on a certain floor of an office building, and 100 wristbands B are used to collect the electrocardiogram information of a certain residential building, the wristbands A and B are of different types.
  • the collection point is to create respective table structure A and table structure B for the data collected by the bracelet A and the bracelet B, wherein all the data collected by the bracelet A are stored in the same table structure A, All data collected by ring B are stored in the same table structure B.
  • table structure for each of the data collection types separately, it can be written in a lock-free manner, avoiding a lot of consumption caused by locking, and the speed of data writing into the distributed architecture is greatly improved.
  • the aggregation table structures corresponding to different data collection types may or may not be the same.
  • the acquiring the target aggregation table structure corresponding to several target data collection points may include:
  • the data collection type includes information such as collected data content and data attributes, and the attribute information may include data attributes such as data length or data type, which is not limited herein.
  • the items to be collected are arranged according to a preset method in the table structure. For example, the items to be collected can be arranged according to the importance of the data or the frequency of querying the data, and the corresponding items to be collected are added to each item to be collected. attribute information, so as to filter the initial data collected by each item to be collected to obtain data that meets the requirements.
  • the target cluster table structure can be stored in the distributed architecture in the form of snapshots, which can avoid problems such as storage errors of the target cluster table structure and improve data storage reliability.
  • the method when invoking a known target aggregation table structure, the method further includes:
  • mapping relationship between the data collection type and the target cluster table structure snapshot, and by querying the mapping relationship, the target cluster table structure snapshot corresponding to the data collection type can be determined. It can be understood that when an update of the target cluster table structure corresponding to a certain data collection type is detected, the snapshot of the target cluster table structure in the distributed architecture can be directly replaced, which can improve the data update rate and thus the information processing rate.
  • the data collected at each target data collection point is added to the target aggregation table structure to obtain initial aggregation table data, and several target data collection points are collected to obtain The initial clustering table data is added with tags and then aggregated to obtain the target clustering table data.
  • performing aggregation processing on the data collected by several of the target data collection points according to the target aggregation table structure, and obtaining the target aggregation table data may include:
  • the identification information of the target data collection point refers to an identification used to identify the characteristics of the collection point, and the identification information may be an ID identification or a code identification.
  • the preset label is used to identify the initial aggregation table data, and according to the preset tag, it can be determined to which data collection point the initial aggregation table data belongs.
  • the preset label may be a number label, a color label or a letter label.
  • the method further includes: detecting whether abnormal data exists in the target table aggregation data.
  • the abnormal data may include data with a null value or excessive data beyond a preset reasonable range. Due to network or collection point and other problems, a certain item to be collected collected by the collection point may fail, and the data of the item to be collected cannot be collected, resulting in the value of this item being empty; it may also cause a certain item to be collected collected by the collection point The data of the item deviates from the normal value and exceeds the preset reasonable range too much.
  • the preset reasonable range refers to a preset range value.
  • the data volume of the abnormal data is determined; whether the data volume exceeds the preset data volume threshold range is detected; when the detection result is that the data volume exceeds the preset data
  • the volume threshold is within the range, obtain the historical time series data corresponding to the abnormal data, fit a reasonable value according to the historical time series data, and replace the abnormal data with the reasonable value; when the detection result is that the data volume does not exceed
  • the data volume threshold range is preset, the abnormal data is controlled to be empty.
  • the preset data volume threshold is a preset value.
  • the fitting of the reasonable value according to the historical time series data may be by using a pre-trained reasonable value estimation model to process the historical time series data to obtain the reasonable value.
  • abnormal data is detected before the target cluster data is stored in the preset database, and the abnormal data is processed in time, so as to ensure that the data stored in the preset database is always correct, thereby improving the accuracy of information processing.
  • the target aggregation table data is stored in a preset database, and the preset database may be a memory in a distributed architecture.
  • the row storage mode is adopted, the jump table is used to build the index, and the memory is managed according to the first-in, first-out method.
  • column storage is used for persistence, and the physical structure is continuous in blocks, which improves the compression rate and reading speed. Each data block is pre-computed to improve the speed of data analysis.
  • the data volume of the target aggregation table data stored in the distributed architecture increases as the data collected by the data collection points increases, and when the data volume of the target aggregation table data is large, the method further include:
  • the memory management of the present application adopts a first-in, first-out queue management to ensure that the newly collected data is in the memory.
  • the target data refers to data of a certain amount of data in the preset database that exceeds a preset spatial critical value and is collected earlier in time.
  • the migrating the target data to be stored in the hard disk may include: determining the load information of the transmission channel and the data volume information of the target data to be transmitted; determining a single transmission according to the load information and the data volume information. Optimal value; migrate the target data in batches according to the optimal value of the single transmission.
  • the optimal value for a single transmission can be calculated by a pre-trained optimal value determination model according to the load information and the data volume information, and the optimal value for a single transmission refers to a value that can ensure fast data transmission.
  • the training process of the optimal value determination model is in the prior art and will not be repeated here.
  • the target data is written to the hard disk by adding a log, which can improve the speed of disk loading.
  • data is stored on different physical media according to the degree of freshness, for example, new data is stored in memory, and old data is stored in large-capacity slow hard disks, which greatly reduces the random read consumption of hard disks and improves write query efficiency.
  • the virtual management node is invoked to parse the data query request to obtain an identifier of the table to be queried corresponding to the data query request, and a target virtual data node is determined according to the identifier of the table to be queried.
  • the data query request may be a request sent by an application to query aggregate table data in the preset database.
  • the data query request carries the identifier of the table to be queried, and the identifier of the table to be queried includes information blocks such as the name or ID of the collection point, the start and end time of data collection, and several query items, and the query items correspond to the items to be collected.
  • the identifier of the table to be queried corresponds to the aggregation table data in the preset database, and the target aggregation table data can be obtained by traversing the preset database according to the identifier of the table to be queried.
  • the preset database includes several data nodes, the data nodes are a running instance in a physical machine, a virtual machine or a container, and a working system has at least one data node.
  • the data nodes include several virtual data nodes, and at most one virtual management node.
  • the virtual management node is responsible for the collection, load balancing, and metadata management of all nodes' running states. When the application needs to query a table, it obtains information by connecting to the management node, and obtains which data node the table is located on.
  • the virtual data node is responsible for storing specific time series data, and query operations for the time series data are all performed on the virtual data node, and virtual data nodes located on different physical machines can form a virtual data node group.
  • the virtual management node is used for storing metadata, and at the same time performing load balancing according to the state of each virtual data node.
  • the metadata may refer to metadata such as the start time of data collection, the number of data points, and the compression algorithm. Since the amount of metadata is not large, it is completely stored in memory to ensure efficient query operations.
  • the driver saves the necessary metadata locally, and accesses the virtual management node only when the required metadata does not exist or is invalid. Improve system performance.
  • the invoking the virtual management node to parse the data query request to obtain an identifier of the table to be queried corresponding to the data query request, and determining the target virtual data node according to the identifier of the table to be queried may include:
  • the virtual data node corresponding to the target metadata is determined as the target virtual data node.
  • the data in the virtual data node group can be synchronized through asynchronous replication to realize the eventual consistency of the data and ensure that a piece of data is copied on multiple physical machines.
  • the virtual data nodes on the computer can process query requests to ensure high reliability of system operation.
  • multiple virtual management nodes may form a virtual management node group.
  • the number of virtual management nodes in the virtual management node group may be determined according to the number of virtual data nodes.
  • determining the number of the virtual management nodes may include: acquiring a first number of virtual data nodes; traversing the preset quantitative relationship between the virtual data nodes and the virtual management nodes according to the first number, and obtaining a second number of the virtual management nodes corresponding to the first number; constructing a node tree between the virtual management nodes and the virtual data nodes.
  • the virtual management node is a parent node, and the virtual data node managed by it is a child node.
  • the Master-Slave (master-slave device mode) synchronous replication mode is used to realize the data synchronization of the virtual management node.
  • the virtual management node includes a leading virtual management node ( Also called Master node) and several subordinate virtual management nodes (also called Slave nodes), the Master node is a task scheduler, assigning computing tasks to multiple Slave nodes, when all Slave nodes complete the task, the Master Node aggregates results.
  • the Master node will return success only after the Slave node writes successfully, thereby ensuring strong data consistency. If the Master node goes down, the system has a mechanism to ensure that one of the Slaves will be elected as the Master immediately, thus ensuring the high reliability of system write operations.
  • the method further includes:
  • the parent node corresponding to the target node tree is selected as the new dominant virtual management node.
  • detecting whether the dominant virtual management node is abnormal means detecting whether the dominant virtual management node is down.
  • each virtual data node knows the status of each other through heartbeat packets. If a virtual data node receives a data write request, the request will be immediately forwarded to other virtual data nodes, and then stored locally.
  • the application wants to operate any aggregation table data, the system will provide the application with the IP addresses of each virtual node in the virtual data node group to which the table belongs. If the connection to one of them fails or the operation fails, the application will try the second and third Three, failure will be returned only if all nodes fail. This ensures that the failure of any machine in the virtual data node group will not affect external services.
  • the target virtual data node may have an exception before executing the data query, or may have an exception during the execution of the data query.
  • the method further includes:
  • the heartbeat packet when it is determined by detecting the heartbeat packet that the target virtual data node is abnormal before executing the data query, other virtual data nodes are directly and randomly determined from the virtual data node group for executing the data query.
  • the heartbeat packet when it is determined that the target virtual data node is abnormal in the process of executing the data query, the heartbeat packet is parsed, and the heartbeat packet carries the data information queried by the target virtual data node.
  • Other virtual data nodes are randomly determined in the virtual data node group to perform the remaining data query work. In order to avoid the problem that the transmission rate of the heartbeat packet is slow due to too much queried data information carried in the heartbeat packet.
  • the method further includes: acquiring data information queried by the target virtual data node; compressing the data information to a preset size; and storing the compressed data information in a heartbeat packet.
  • the preset size is a preset compression amount size.
  • the method further includes: acquiring data information queried by the target virtual data node; constructing a data link for the queried data information; and storing the data link in the heartbeat packet.
  • the method of constructing the data link is in the prior art, and details are not described here.
  • each virtual data node when a host is restarted, each virtual data node will check whether the version of its own data is consistent with other virtual data nodes in the corresponding node group. If the data version is inconsistent, it needs to be synchronized before it can be externally Serve. During the running process, due to various reasons, the data can be out of synchronization. This kind of out-of-synchronization will be found when the forwarded write request is received. Once found, the virtual data node with a low data version will immediately stop the external service and enter the The synchronization process will resume external services only after synchronization is complete. During the synchronization process, nodes with higher data versions can provide services normally.
  • the data version is used to identify the freshness of the data stored in the target virtual data node.
  • the number of the second data versions of the virtual data node in the node group of the target virtual data node may be one, or may be multiple. When the number of the second data versions is multiple, the method further includes:
  • the release time of each second data version is acquired, and the second data version with the latest release time is selected as the latest version of the aggregate table data.
  • the target virtual data node it is detected whether the first data version is consistent with the second data version to determine whether the data stored in the target virtual data node is the latest version.
  • the detection result is that the first data version is consistent with the second data version, it is determined that the data stored in the target virtual data node is the latest version, and the target virtual data node can continue to perform data query operations;
  • the result is that the first data version is inconsistent with the second data version, it is determined that the data stored in the target virtual data node is not the latest version, and the data of the latest version needs to be acquired and stored in the target virtual data node. data is updated.
  • the data query request may be allocated to other virtual data nodes in the node group with the latest version for execution, so as to ensure that the data stored in the target virtual data node is When the data is not the latest version, it will not affect the data query process, which can improve the reliability and efficiency of information processing.
  • the target virtual data node when the detection result is that the first data version is consistent with the second data version, the target virtual data node is called to obtain node data, and the node data is aggregated according to an aggregation rule , get the target node data.
  • the node data refers to the aggregation table data stored in the preset database for the request of the data query request.
  • the aggregation rule may be obtained by structural processing after analysis of the aggregation condition carried in the data query request.
  • the aggregation condition may be an average value, a maximum value, or a minimum value, etc., of the requested aggregation table data.
  • the same aggregation table structure is created for collection points with the same data collection type, so as to avoid collecting data for each data collection point.
  • the problem of huge number of tables caused by the single point building of tables can reduce memory usage and improve information processing efficiency; and when receiving a data query request sent by an application, the application calls the virtual management node to parse the data query request, and obtains the corresponding data.
  • the identifier of the table to be queried for the query request, and the target virtual node is determined according to the identifier of the table to be queried, and the target virtual node executes the data query request, which can improve the information query efficiency;
  • the second data version of each virtual data node in the node group corresponding to the target virtual data node performs version comparison, which can timely find out whether the data version of the target virtual data node that is executing the task is the latest version, thereby ensuring the accuracy of data processing. sex.
  • FIG. 2 is a structural diagram of a big data-based information processing apparatus provided in Embodiment 2 of the present application.
  • the big data-based information processing apparatus 20 may include a plurality of functional modules composed of computer program segments.
  • the computer program of each program segment in the big data-based information processing apparatus 20 can be stored in the memory of the computer device and executed by at least one processor to execute (details described in FIG. 1 ) the big data-based information processing function.
  • the big data-based information processing apparatus 20 can be divided into a plurality of functional modules according to the functions performed by the information processing apparatus 20 .
  • the functional modules may include: a table aggregation acquisition module 201 , a data storage module 202 , a request analysis module 203 , a version acquisition module 204 , a version detection module 205 and a data aggregation module 206 .
  • a module referred to in this application refers to a series of computer-readable instruction segments that can be executed by at least one processor and can perform fixed functions, and are stored in a memory. In this embodiment, the functions of each module will be described in detail in subsequent embodiments.
  • the aggregation table obtaining module 201 may be configured to obtain a target aggregation table structure corresponding to several target data collection points.
  • the data storage module 202 can be configured to perform aggregation processing on the data collected by several of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database. middle.
  • the request parsing module 203 may be configured to, when receiving a data query request, call the virtual management node to parse the data query request, obtain the identifier of the table to be queried corresponding to the data query request, and determine the identifier of the table to be queried according to the identifier of the table to be queried.
  • the target virtual data node may be configured to, when receiving a data query request, call the virtual management node to parse the data query request, obtain the identifier of the table to be queried corresponding to the data query request, and determine the identifier of the table to be queried according to the identifier of the table to be queried.
  • the version obtaining module 204 may be configured to obtain the first data version of the target virtual data node and the second data version of each virtual data node in the node group corresponding to the target virtual data node.
  • the version detection module 205 may be configured to detect whether the first data version is consistent with the second data version.
  • the data aggregation module 206 may be configured to call the target virtual data node to obtain node data when the detection result is that the first data version is consistent with the second data version, and aggregate the node data according to an aggregation rule, Get the target node data.
  • the computer device 3 includes a memory 31 , at least one processor 32 , at least one communication bus 33 and a transceiver 34 .
  • FIG. 3 does not constitute a limitation of the embodiments of the present application, and may be a bus-type structure or a star-shaped structure. more or less other hardware or software, or a different arrangement of components is shown.
  • the computer device 3 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, application-specific integrated circuits, Programmable gate arrays, digital processors and embedded devices, etc.
  • the computer equipment 3 may also include client equipment, including but not limited to any electronic product that can interact with the client through a keyboard, a mouse, a remote control, a touchpad or a voice-activated device, etc., for example, Personal computers, tablets, smartphones, digital cameras, etc.
  • a computer program is stored in the memory 31, and when the computer program is executed by the at least one processor 32, all or part of the steps in the above-mentioned big data-based information processing method are implemented.
  • the computer program may be divided into one or more modules/units, and the one or more modules/units may be a series of computer-readable instruction segments capable of performing specific functions, and the instruction segments are used to describe The execution process of the computer program in the computer device.
  • each module described in FIG. 2 is a computer program stored in the memory 31 and executed by the at least one processor 32, thereby realizing the functions of the various modules to achieve information processing based on big data the goal of.
  • Described memory 31 comprises read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), erasable programmable read-only memory (Erasable Programmable Read-Only Memory, EPROM) , One-time Programmable Read-Only Memory (OTPROM), Electronically-Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read- Only Memory, CD-ROM) or other optical disk storage, magnetic disk storage, magnetic tape storage, or any other computer-readable medium that can be used to carry or store data.
  • Read-Only Memory Read-Only Memory
  • PROM Programmable Read-Only Memory
  • EPROM Erasable Programmable Read-Only Memory
  • OTPROM One-time Programmable Read-Only Memory
  • EEPROM Electronically-Erasable Programmable Read-Only Memory
  • CD-ROM Compact Disc Read- Only Memory
  • CD-ROM Compact Disc Read- Only Memory
  • the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function, and the like; The data created by the use of the node, etc.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the at least one processor 32 is a control core (Control Unit) of the computer device 3, using various interfaces and lines to connect various components of the entire computer device 3, and by running or executing storage in the computer device 3
  • the programs or modules in the memory 31 and the data stored in the memory 31 are called to perform various functions of the computer device 3 and process data.
  • the at least one processor 32 executes the computer program stored in the memory, all or part of the steps of the big data-based information processing method described in the embodiments of the present application are implemented; or a big data-based information processing apparatus is implemented. all or part of the functions.
  • the at least one processor 32 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one or more central processing units. (Central Processing unit, CPU), microprocessor, digital processing chip, graphics processor and combination of various control chips, etc.
  • CPU Central Processing unit
  • microprocessor digital processing chip
  • graphics processor and combination of various control chips, etc.
  • the at least one communication bus 33 is configured to enable connection communication between the memory 31 and the at least one processor 32 and the like.
  • the computer device 3 may also include a power source (such as a battery) for supplying power to various components.
  • the power source may be logically connected to the at least one processor 32 through a power management device, so as to be implemented by the power management device Manage charging, discharging, and power management functions.
  • the power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components.
  • the computer device 3 may also include a variety of sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the above-mentioned integrated units implemented in the form of software functional modules may be stored in a computer-readable storage medium.
  • the above-mentioned software function modules are stored in a storage medium, and include several instructions to make a computer device (which may be a personal computer, a computer device, or a network device, etc.) or a processor (processor) to execute the methods described in the various embodiments of the present application. part.
  • modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, and may be located in one place or distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application relates to data processing technology. Provided are an information processing method and apparatus based on big data, and a computer device and a storage medium. The method comprises: acquiring a target clustered table structure corresponding to several target data collection points; according to the target clustered table structure, performing aggregation processing on data collected at the several target data collection points, so as to obtain target clustered table data; calling a virtual management node to parse a data query request, so as to obtain a table identifier to be queried, and determining a target virtual data node according to the table identifier to be queried; acquiring a first data version of the target virtual data node and a second data version of a virtual data node in a corresponding node group; detecting whether the first data version is consistent with the second data version; and when the detection result is 'yes', calling the target virtual data node to obtain node data, and aggregating the node data according to an aggregation rule, so as to obtain target node data. By means of the present application, the efficiency of information processing can be improved, and the rapid development of smart cities can be promoted.

Description

基于大数据的信息处理方法、装置及相关设备Information processing method, device and related equipment based on big data
本申请要求于2021年02月26日提交中国专利局,申请号为202110219983.8发明名称为“基于大数据的信息处理方法、装置及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on February 26, 2021 with the application number 202110219983.8 titled "information processing method, device and related equipment based on big data", the entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请涉及数据处理技术领域,尤其涉及一种基于大数据的信息处理方法、装置、计算机设备及介质。The present application relates to the technical field of data processing, and in particular, to an information processing method, apparatus, computer equipment and medium based on big data.
背景技术Background technique
伴随移动互联网、物联网等蓬勃发展,以及各种传感器、智能设备出现,手机、计算机、手环、共享自行车、出租车、电表、环境监测设备、大型设备、工业生产线等等都在源源不断的产生海量的实时数据并发往云端。这些海量数据可以帮助企业实时监控业务、设备的运行情况,生成报表,通过大数据分析、机器学习对业务进行预测和预警,帮助企业进行科学决策、节约成本并创造新价值。With the vigorous development of mobile Internet and Internet of Things, and the emergence of various sensors and smart devices, mobile phones, computers, wristbands, shared bicycles, taxis, electricity meters, environmental monitoring equipment, large-scale equipment, industrial production lines, etc. Generate massive real-time data and send it to the cloud. These massive data can help enterprises to monitor the operation of business and equipment in real time, generate reports, make predictions and early warnings for business through big data analysis and machine learning, and help enterprises make scientific decisions, save costs and create new value.
在实现本申请的过程中,发明人发现现有技术存在以下技术问题:由于数据记录条数巨大,导致数据的实时写入成为瓶颈,数据处理极为缓慢。传统的关系型数据库或NoSQL数据库以及流式计算引擎由于没有充分利用时序空间大数据的特点,性能提升极为有限,只能依靠集群架构,投入更多计算资源和存储资源,极大增加企业成本。In the process of realizing the present application, the inventor found that the prior art has the following technical problems: due to the huge number of data records, the real-time writing of data becomes a bottleneck, and the data processing is extremely slow. Traditional relational databases or NoSQL databases and streaming computing engines do not make full use of the characteristics of time-series space big data, and their performance improvement is extremely limited. They can only rely on cluster architecture and invest more computing and storage resources, which greatly increases enterprise costs.
因此,有必要提供一种基于大数据的信息处理方法,能够提高信息处理的效率。Therefore, it is necessary to provide an information processing method based on big data, which can improve the efficiency of information processing.
发明内容SUMMARY OF THE INVENTION
鉴于以上内容,有必要提出一种基于大数据的信息处理方法、基于大数据的信息处理装置、计算机设备及介质,能够提高信息处理效率。In view of the above, it is necessary to propose an information processing method based on big data, an information processing apparatus, computer equipment and medium based on big data, which can improve the efficiency of information processing.
本申请实施例第一方面提供一种基于大数据的信息处理方法,所述基于大数据的信息处理方法包括:A first aspect of the embodiments of the present application provides a big data-based information processing method, where the big data-based information processing method includes:
获取对应若干个目标数据采集点的目标聚表结构;Obtain the target cluster table structure corresponding to several target data collection points;
按照所述目标聚表结构将若干个所述目标数据采集点采集的数据进行聚合处理,得到目标聚表数据,并存储所述目标聚表数据至预设数据库中;Perform aggregation processing on the data collected by several of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database;
当接收到数据查询请求时,调用虚拟管理节点解析所述数据查询请求,得到对应所述数据查询请求的待查询表标识,并根据所述待查询表标识确定目标虚拟数据节点;When a data query request is received, the virtual management node is called to parse the data query request, to obtain an identifier of the table to be queried corresponding to the data query request, and to determine a target virtual data node according to the identifier of the table to be queried;
获取所述目标虚拟数据节点的第一数据版本以及对应所述目标虚拟数据节点的虚拟数据节点组中每一虚拟数据节点的第二数据版本;acquiring the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node;
检测所述第一数据版本与所述第二数据版本是否一致;detecting whether the first data version is consistent with the second data version;
当检测结果为所述第一数据版本与所述第二数据版本一致时,调用所述目标虚拟数据节点获取节点数据,并根据聚合规则聚合所述节点数据,得到目标节点数据。When the detection result is that the first data version is consistent with the second data version, the target virtual data node is called to obtain node data, and the node data is aggregated according to an aggregation rule to obtain target node data.
本申请实施例第二方面还提供一种基于大数据的信息处理装置,所述基于大数据的信息处理装置包括:A second aspect of the embodiments of the present application further provides an apparatus for information processing based on big data, where the apparatus for information processing based on big data includes:
聚表获取模块,用于获取对应若干个目标数据采集点的目标聚表结构;The aggregation table acquisition module is used to acquire the target aggregation table structure corresponding to several target data collection points;
数据存储模块,用于按照所述目标聚表结构将若干个所述目标数据采集点采集的数据进行聚合处理,得到目标聚表数据,并存储所述目标聚表数据至预设数据库中;a data storage module, configured to perform aggregation processing on the data collected by a plurality of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database;
请求解析模块,用于当接收到数据查询请求时,调用虚拟管理节点解析所述数据查询请 求,得到对应所述数据查询请求的待查询表标识,并根据所述待查询表标识确定目标虚拟数据节点;The request parsing module is configured to, when receiving a data query request, call the virtual management node to parse the data query request, obtain the identifier of the table to be queried corresponding to the data query request, and determine the target virtual data according to the identifier of the table to be queried node;
版本获取模块,用于获取所述目标虚拟数据节点的第一数据版本以及对应所述目标虚拟数据节点的虚拟数据节点组中每一虚拟数据节点的第二数据版本;a version obtaining module, configured to obtain the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node;
版本检测模块,用于检测所述第一数据版本与所述第二数据版本是否一致;a version detection module, configured to detect whether the first data version is consistent with the second data version;
数据聚合模块,用于当检测结果为所述第一数据版本与所述第二数据版本一致时,调用所述目标虚拟数据节点获取节点数据,并根据聚合规则聚合所述节点数据,得到目标节点数据。A data aggregation module, configured to call the target virtual data node to obtain node data when the detection result is that the first data version is consistent with the second data version, and aggregate the node data according to aggregation rules to obtain a target node data.
本申请实施例第三方面还提供一种计算机设备,其中,所述计算机设备包括处理器,所述处理器用于执行存储器中存储的计算机可读指令以实现以下步骤:A third aspect of the embodiments of the present application further provides a computer device, wherein the computer device includes a processor, and the processor is configured to execute computer-readable instructions stored in a memory to implement the following steps:
获取对应若干个目标数据采集点的目标聚表结构;Obtain the target cluster table structure corresponding to several target data collection points;
按照所述目标聚表结构将若干个所述目标数据采集点采集的数据进行聚合处理,得到目标聚表数据,并存储所述目标聚表数据至预设数据库中;Perform aggregation processing on the data collected by several of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database;
当接收到数据查询请求时,调用虚拟管理节点解析所述数据查询请求,得到对应所述数据查询请求的待查询表标识,并根据所述待查询表标识确定目标虚拟数据节点;When a data query request is received, the virtual management node is called to parse the data query request, to obtain an identifier of the table to be queried corresponding to the data query request, and to determine a target virtual data node according to the identifier of the table to be queried;
获取所述目标虚拟数据节点的第一数据版本以及对应所述目标虚拟数据节点的虚拟数据节点组中每一虚拟数据节点的第二数据版本;acquiring the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node;
检测所述第一数据版本与所述第二数据版本是否一致;detecting whether the first data version is consistent with the second data version;
当检测结果为所述第一数据版本与所述第二数据版本一致时,调用所述目标虚拟数据节点获取节点数据,并根据聚合规则聚合所述节点数据,得到目标节点数据。When the detection result is that the first data version is consistent with the second data version, the target virtual data node is called to obtain node data, and the node data is aggregated according to an aggregation rule to obtain target node data.
本申请实施例第四方面还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现以下步骤:A fourth aspect of the embodiments of the present application further provides a computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, wherein when the computer-readable instructions are executed by a processor, the following steps are implemented:
获取对应若干个目标数据采集点的目标聚表结构;Obtain the target cluster table structure corresponding to several target data collection points;
按照所述目标聚表结构将若干个所述目标数据采集点采集的数据进行聚合处理,得到目标聚表数据,并存储所述目标聚表数据至预设数据库中;Perform aggregation processing on the data collected by several of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database;
当接收到数据查询请求时,调用虚拟管理节点解析所述数据查询请求,得到对应所述数据查询请求的待查询表标识,并根据所述待查询表标识确定目标虚拟数据节点;When a data query request is received, the virtual management node is called to parse the data query request, to obtain an identifier of the table to be queried corresponding to the data query request, and to determine a target virtual data node according to the identifier of the table to be queried;
获取所述目标虚拟数据节点的第一数据版本以及对应所述目标虚拟数据节点的虚拟数据节点组中每一虚拟数据节点的第二数据版本;acquiring the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node;
检测所述第一数据版本与所述第二数据版本是否一致;detecting whether the first data version is consistent with the second data version;
当检测结果为所述第一数据版本与所述第二数据版本一致时,调用所述目标虚拟数据节点获取节点数据,并根据聚合规则聚合所述节点数据,得到目标节点数据。When the detection result is that the first data version is consistent with the second data version, the target virtual data node is called to obtain node data, and the node data is aggregated according to an aggregation rule to obtain target node data.
本申请实施例提供的上述基于大数据的信息处理方法、基于大数据的信息处理装置、计算机设备以及计算机可读存储介质,在若干个目标数据采集点采集数据之前,会对数据采集类型相同的采集点创建同样的聚表结构,避免对每个数据采集点单独建表造成表的数量巨大的问题,能够减少内存占用,提高信息处理效率;且本申请在接收到应用发出的数据查询请求时,调用虚拟管理节点解析数据查询请求,得到对应所述数据查询请求的待查询表标识,并根据所述待查询表标识确定目标虚拟节点,由目标虚拟节点执行数据查询请求,能够提高信息查询效率;此外,本申请对目标虚拟数据节点的第一数据版本以及对应目标虚拟数据节点的虚拟数据节点组中每一虚拟数据节点的第二数据版本进行版本比对,能够及时发现正在执行任务的目标虚拟数据节点的数据版本是否为最新版本,从而保证数据处理的准确性。本申请可应用于智慧政务、智慧交通等智慧城市的各个功能模块中,比如智慧政务的基于大数据的信息处理模块等,能够促进智慧城市的快速发展。The above-mentioned big data-based information processing method, big data-based information processing device, computer equipment, and computer-readable storage medium provided in the embodiments of the present application, before data is collected at several target data collection points, will be collected for data of the same type. The collection point creates the same aggregate table structure to avoid the huge number of tables caused by building separate tables for each data collection point, which can reduce memory usage and improve information processing efficiency; and when the application receives a data query request from an application , call the virtual management node to parse the data query request, obtain the identifier of the table to be queried corresponding to the data query request, determine the target virtual node according to the identifier of the table to be queried, and execute the data query request by the target virtual node, which can improve the efficiency of information query In addition, the present application performs version comparison on the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node, so that the target that is performing the task can be found in time. Whether the data version of the virtual data node is the latest version, so as to ensure the accuracy of data processing. The present application can be applied to various functional modules of smart cities such as smart government affairs and smart transportation, such as the big data-based information processing modules of smart government affairs, etc., which can promote the rapid development of smart cities.
附图说明Description of drawings
图1是本申请实施例一提供的基于大数据的信息处理方法的流程图。FIG. 1 is a flowchart of a big data-based information processing method provided in Embodiment 1 of the present application.
图2是本申请实施例二提供的基于大数据的信息处理装置的结构图。FIG. 2 is a structural diagram of a big data-based information processing apparatus provided in Embodiment 2 of the present application.
图3是本申请实施例三提供的计算机设备的结构示意图。FIG. 3 is a schematic structural diagram of a computer device provided in Embodiment 3 of the present application.
如下具体实施方式将结合上述附图进一步说明本申请。The following specific embodiments will further illustrate the present application in conjunction with the above drawings.
具体实施方式Detailed ways
为了能够更清楚地理解本申请的上述目的、特征和优点,下面结合附图和具体实施例对本申请进行详细描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互组合。In order to more clearly understand the above objects, features and advantages of the present application, the present application will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present application and the features in the embodiments may be combined with each other in the case of no conflict.
在下面的描述中阐述了很多具体细节以便于充分理解本申请,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In the following description, many specific details are set forth in order to facilitate a full understanding of the present application, and the described embodiments are some, but not all, of the embodiments of the present application. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein in the specification of the application are for the purpose of describing specific embodiments only, and are not intended to limit the application.
本申请实施例提供的基于大数据的信息处理方法由计算机设备执行,相应地,基于大数据的信息处理装置运行于计算机设备中。The big data-based information processing methods provided in the embodiments of the present application are executed by computer equipment, and correspondingly, the big data-based information processing apparatuses run in the computer equipment.
图1是本申请第一实施方式的基于大数据的信息处理方法的流程图。所述基于大数据的信息处理方法可应用于分布式架构中。如图1所示,所述基于大数据的信息处理方法可以包括如下步骤,根据不同的需求,该流程图中步骤的顺序可以改变,某些可以省略。FIG. 1 is a flowchart of a big data-based information processing method according to the first embodiment of the present application. The big data-based information processing method can be applied in a distributed architecture. As shown in FIG. 1 , the big data-based information processing method may include the following steps. According to different requirements, the order of the steps in the flowchart may be changed, and some may be omitted.
S11、获取对应若干个目标数据采集点的目标聚表结构。S11. Acquire a target aggregation table structure corresponding to several target data collection points.
在本申请的至少一实施例中,所述目标数据采集点可以是指数据采集终端,例如,手环、传感器、电表等能够采集数据的终端,在此不做限制。所述目标数据采集点可以是采集同一类型的数据,也可以是采集不同类型的数据。例如,当使用300个手环采集写字楼某一楼层的所有员工的心率信息时,这300个数据采集点均为同类型采集点;当使用300个手环A采集写字楼某一楼层的所有员工的心率信息,使用100个手环B采集某一栋居民楼的心电图信息时,手环A与手环B为不同类型采集点。In at least one embodiment of the present application, the target data collection point may refer to a data collection terminal, for example, a terminal capable of collecting data such as a wristband, a sensor, and an electric meter, which is not limited herein. The target data collection point may be to collect the same type of data, or may be to collect different types of data. For example, when 300 wristbands are used to collect the heart rate information of all employees on a certain floor of an office building, the 300 data collection points are all of the same type; Heart rate information, when 100 wristbands B are used to collect ECG information of a residential building, wristbands A and B are different types of collection points.
可选地,在获取对应若干个目标数据采集点的目标聚表结构之前,所述方法还包括:Optionally, before acquiring the target aggregation table structure corresponding to several target data collection points, the method further includes:
获取每一所述目标数据采集点的数据采集类型;Obtain the data collection type of each of the target data collection points;
检测所述数据采集类型是否一致;Detecting whether the data collection types are consistent;
当检测结果为所述数据采集类型一致时,确定所述数据采集类型的目标聚表结构;When the detection result is that the data collection types are consistent, determine the target aggregation table structure of the data collection types;
当检测结果为所述数据采集类型不一致时,为每一所述数据采集类型单独创建表格结构。When the detection result is that the data collection types are inconsistent, a separate table structure is created for each of the data collection types.
其中,当检测结果为所述数据采集类型一致时,确定所述数据采集类型的目标聚表结构,示例性地,当使用300个手环A采集写字楼某一楼层的所有员工的心率信息时,可以将300个手环A采集的心率信息均存储于同一所述目标聚表结构中。当检测结果为所述数据采集类型不一致时,为每一所述数据采集类型单独创建表格结构,用于存储该数据采集点采集的时序数据,所述表格结构是指与该数据采集点采集的时序数据的格式相匹配的聚表结构。示例性地,当使用300个手环A采集写字楼某一楼层的所有员工的心率信息,使用100个手环B采集某一栋居民楼的心电图信息时,手环A与手环B为不同类型采集点,则分别为手环A与手环B采集的数据创建各自的表格结构A与表格结构B,其中,将手环A采集的所有数据均存储于同一所述表格结构A中,将手环B采集的所有数据均存储于同一所述表格结构B中。本申请通过为每一所述数据采集类型单独创建表格结构的方式,可采用无锁的方式写入,避免由加锁产生大量消耗,数据写入分布式架构中的速度大幅提升。Wherein, when the detection result is that the data collection types are consistent, the target aggregation table structure of the data collection types is determined. The heart rate information collected by the 300 bracelets A can be stored in the same target aggregation table structure. When the detection result is that the data collection types are inconsistent, a separate table structure is created for each data collection type to store the time series data collected by the data collection point, and the table structure refers to the data collected with the data collection point. The format of the time series data matches the aggregate table structure. Exemplarily, when 300 wristbands A are used to collect the heart rate information of all employees on a certain floor of an office building, and 100 wristbands B are used to collect the electrocardiogram information of a certain residential building, the wristbands A and B are of different types. The collection point is to create respective table structure A and table structure B for the data collected by the bracelet A and the bracelet B, wherein all the data collected by the bracelet A are stored in the same table structure A, All data collected by ring B are stored in the same table structure B. In the present application, by creating a table structure for each of the data collection types separately, it can be written in a lock-free manner, avoiding a lot of consumption caused by locking, and the speed of data writing into the distributed architecture is greatly improved.
在一实施例中,不同的数据采集类型对应的聚表结构可能相同,也可能不相同。可选地,所述获取对应若干个目标数据采集点的目标聚表结构可以包括:In an embodiment, the aggregation table structures corresponding to different data collection types may or may not be the same. Optionally, the acquiring the target aggregation table structure corresponding to several target data collection points may include:
获取所述目标数据采集点的数据采集类型;obtaining the data collection type of the target data collection point;
解析所述数据采集类型,得到待采集项以及每一所述待采集项对应的属性信息;Analyzing the data collection type to obtain items to be collected and attribute information corresponding to each item to be collected;
根据所述待采集项与所述属性信息创建目标聚表结构。Create a target aggregation table structure according to the item to be collected and the attribute information.
其中,所述数据采集类型包括采集的数据内容以及数据属性等信息,所述属性信息可以包括数据长度或数据类型等数据的属性,在此不做限制。在表格结构中按照预设方式排列所述待采集项,例如,可以按照数据的重要程度或者数据的查询频率等方式排列所述待采集项,并在每一所述待采集项中添加对应的属性信息,以对每一所述待采集项采集的初始数据进行过滤处理,得到符合要求的数据。The data collection type includes information such as collected data content and data attributes, and the attribute information may include data attributes such as data length or data type, which is not limited herein. The items to be collected are arranged according to a preset method in the table structure. For example, the items to be collected can be arranged according to the importance of the data or the frequency of querying the data, and the corresponding items to be collected are added to each item to be collected. attribute information, so as to filter the initial data collected by each item to be collected to obtain data that meets the requirements.
在一实施例中,所述目标聚表结构可通过快照的形式存储至分布式架构中,能够避免目标聚表结构存储出错等问题,提高数据存储可靠性。In one embodiment, the target cluster table structure can be stored in the distributed architecture in the form of snapshots, which can avoid problems such as storage errors of the target cluster table structure and improve data storage reliability.
可选地,对于调用已知的目标聚表结构时,所述方法还包括:Optionally, when invoking a known target aggregation table structure, the method further includes:
获取所述目标数据采集点的数据采集类型;obtaining the data collection type of the target data collection point;
根据所述数据采集类型确定目标聚表结构快照;Determine the snapshot of the target cluster table structure according to the data collection type;
扫描所述目标聚表结构快照,得到目标聚表结构。Scan the snapshot of the target cluster table structure to obtain the target cluster table structure.
其中,所述数据采集类型与所述目标聚表结构快照间存在映射关系,通过查询所述映射关系,能够确定对应所述数据采集类型的目标聚表结构快照。可以理解的是,当检测到某一数据采集类型对应的目标聚表结构存在更新时,可直接替换分布式架构中目标聚表结构快照即可,能够提高数据更新速率,进而提高信息处理速率。There is a mapping relationship between the data collection type and the target cluster table structure snapshot, and by querying the mapping relationship, the target cluster table structure snapshot corresponding to the data collection type can be determined. It can be understood that when an update of the target cluster table structure corresponding to a certain data collection type is detected, the snapshot of the target cluster table structure in the distributed architecture can be directly replaced, which can improve the data update rate and thus the information processing rate.
S12、按照所述目标聚表结构将若干个所述目标数据采集点采集的数据进行聚合处理,得到目标聚表数据,并存储所述目标聚表数据至预设数据库中。S12. Perform aggregation processing on the data collected by a plurality of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database.
在本申请的至少一实施例中,将每一所述目标数据采集点采集的数据添加至所述目标聚表结构中,得到初始聚表数据,并将若干个所述目标数据采集点采集得到的初始聚表数据进行添加标签后聚合处理,得到目标聚表数据。In at least one embodiment of the present application, the data collected at each target data collection point is added to the target aggregation table structure to obtain initial aggregation table data, and several target data collection points are collected to obtain The initial clustering table data is added with tags and then aggregated to obtain the target clustering table data.
可选地,所述按照所述目标聚表结构将若干个所述目标数据采集点采集的数据进行聚合处理,得到目标聚表数据可以包括:Optionally, performing aggregation processing on the data collected by several of the target data collection points according to the target aggregation table structure, and obtaining the target aggregation table data may include:
获取每一所述目标数据采集点采集的数据,并将所述数据填充至所述目标聚表结构中,得到初始聚表数据;Acquire the data collected by each of the target data collection points, and fill the data into the target cluster table structure to obtain initial cluster table data;
构建与所述目标数据采集点的标识信息对应的预设标签,并将所述标签添加至所述初始聚表数据中;constructing a preset label corresponding to the identification information of the target data collection point, and adding the label to the initial aggregation table data;
聚合处理所述初始聚表数据,得到目标聚表数据。Aggregate and process the initial clustered table data to obtain target clustered table data.
其中,所述目标数据采集点的标识信息是指用于标识采集点特征的标识,所述标识信息可以为ID标识、代码标识。所述预设标签用于标识所述初始聚表数据,根据所述预设标签能够确定所述初始聚表数据属于哪个数据采集点。所述预设标签可以为数字标签、颜色标签或者字母标签。Wherein, the identification information of the target data collection point refers to an identification used to identify the characteristics of the collection point, and the identification information may be an ID identification or a code identification. The preset label is used to identify the initial aggregation table data, and according to the preset tag, it can be determined to which data collection point the initial aggregation table data belongs. The preset label may be a number label, a color label or a letter label.
在一实施例中,在得到所述目标聚表数据之后,所述方法还包括:检测所述目标聚表数据是否存在异常数据。所述异常数据可以包括数值为空的数据或者超出预设合理范围过多的数据等。由于网络或者采集点等问题,可能导致采集点采集的某个待采集项失败的情况,无法采集该待采集项的数据,导致该项数值为空;也可能导致采集点采集的某个待采集项的数据与正常值存在偏差,超出预设合理范围过多。所述预设合理范围是指预先设置的范围值。当检测结果为所述目标聚表数据存在异常数据时,确定所述异常数据的数据量;检测所述数据量是否超过预设数据量阈值范围;当检测结果为所述数据量超过预设数据量阈值范围时,获取所述异常数据对应的历史时序数据,并根据所述历史时序数据拟合出合理值,由所述合 理值替代所述异常数据;当检测结果为所述数据量未超过预设数据量阈值范围时,控制所述异常数据为空。所述预设数据量阈值为预先设置的值。所述根据所述历史时序数据拟合出合理值可以是采用预先训练的合理值预估模型对历史时序数据进行处理,得到合理值。所述合理值预估模型的训练过程为现有技术,在此不做赘述。本申请通过在将目标聚表数据存储至预设数据库前进行异常数据检测,并对存在异常的数据及时处理,能够保证存储至预设数据库中的数据始终正确,从而提高信息处理的准确性。In one embodiment, after obtaining the target table aggregation data, the method further includes: detecting whether abnormal data exists in the target table aggregation data. The abnormal data may include data with a null value or excessive data beyond a preset reasonable range. Due to network or collection point and other problems, a certain item to be collected collected by the collection point may fail, and the data of the item to be collected cannot be collected, resulting in the value of this item being empty; it may also cause a certain item to be collected collected by the collection point The data of the item deviates from the normal value and exceeds the preset reasonable range too much. The preset reasonable range refers to a preset range value. When the detection result is that there is abnormal data in the target aggregation table data, the data volume of the abnormal data is determined; whether the data volume exceeds the preset data volume threshold range is detected; when the detection result is that the data volume exceeds the preset data When the volume threshold is within the range, obtain the historical time series data corresponding to the abnormal data, fit a reasonable value according to the historical time series data, and replace the abnormal data with the reasonable value; when the detection result is that the data volume does not exceed When the data volume threshold range is preset, the abnormal data is controlled to be empty. The preset data volume threshold is a preset value. The fitting of the reasonable value according to the historical time series data may be by using a pre-trained reasonable value estimation model to process the historical time series data to obtain the reasonable value. The training process of the reasonable value estimation model is in the prior art, and details are not described here. In the present application, abnormal data is detected before the target cluster data is stored in the preset database, and the abnormal data is processed in time, so as to ensure that the data stored in the preset database is always correct, thereby improving the accuracy of information processing.
在本申请的至少一实施例中,在得到所述目标聚表数据之后,存储所述目标聚表数据至预设数据库中,所述预设数据库可以是分布式架构中的内存。为了减少内存开销,并有效处理时间乱序问题,采用行存储模式,使用跳表建立索引,按照先入先出的方式管理内存。为充分利用时序数据特点,采用列存储持久化,物理结构上做到块连续,提高压缩率与读取速度,每个数据块通过预计算,提高数据分析速度。In at least one embodiment of the present application, after the target aggregation table data is obtained, the target aggregation table data is stored in a preset database, and the preset database may be a memory in a distributed architecture. In order to reduce the memory overhead and effectively deal with the problem of time disorder, the row storage mode is adopted, the jump table is used to build the index, and the memory is managed according to the first-in, first-out method. In order to make full use of the characteristics of time series data, column storage is used for persistence, and the physical structure is continuous in blocks, which improves the compression rate and reading speed. Each data block is pre-computed to improve the speed of data analysis.
可选地,所述分布式架构中存储的目标聚表数据的数据量随着数据采集点采集的数据增多而增大,当所述目标聚表数据的数据量较大时,所述方法还包括:Optionally, the data volume of the target aggregation table data stored in the distributed architecture increases as the data collected by the data collection points increases, and when the data volume of the target aggregation table data is large, the method further include:
获取预设数据库中的剩余空间值;Get the remaining space value in the preset database;
监测所述剩余空间值是否满足预设空间临界值;monitoring whether the remaining space value satisfies a preset space critical value;
当监测结果为所述剩余空间值满足预设空间临界值时,选取所述预设数据库中的目标数据;When the monitoring result is that the remaining space value meets the preset space critical value, select the target data in the preset database;
迁移所述目标数据存储至硬盘中。Migrate the target data to the hard disk.
其中,本申请的内存管理采取先进先出的队列方式管理,保证新采集的数据处在内存中。所述目标数据是指所述预设数据库中超过预设空间临界值且采集时间靠前的一定数据量的数据。可选地,所述迁移所述目标数据存储至硬盘中可以包括:确定传输通道的负载信息与待传输的目标数据的数据量信息;根据所述负载信息与所述数据量信息确定单次传输最优值;按照所述单次传输最优值分批次迁移所述目标数据。其中,可通过预先训练的最优值确定模型根据所述负载信息与所述数据量信息计算出单次传输最优值,单次传输最优值是指能够保证数据快速传输的值,所述最优值确定模型的训练过程为现有技术,在此不再赘述。所述目标数据通过添加日志的方式写入硬盘,能够提高落盘速度。Among them, the memory management of the present application adopts a first-in, first-out queue management to ensure that the newly collected data is in the memory. The target data refers to data of a certain amount of data in the preset database that exceeds a preset spatial critical value and is collected earlier in time. Optionally, the migrating the target data to be stored in the hard disk may include: determining the load information of the transmission channel and the data volume information of the target data to be transmitted; determining a single transmission according to the load information and the data volume information. Optimal value; migrate the target data in batches according to the optimal value of the single transmission. The optimal value for a single transmission can be calculated by a pre-trained optimal value determination model according to the load information and the data volume information, and the optimal value for a single transmission refers to a value that can ensure fast data transmission. The training process of the optimal value determination model is in the prior art and will not be repeated here. The target data is written to the hard disk by adding a log, which can improve the speed of disk loading.
通过上述方法,将数据按照新旧程度在不同物理介质上存储,例如新数据存储至内存,旧数据存储在大容量慢速硬盘,使得硬盘随机读取消耗大幅降低,提升写入查询效率。Through the above method, data is stored on different physical media according to the degree of freshness, for example, new data is stored in memory, and old data is stored in large-capacity slow hard disks, which greatly reduces the random read consumption of hard disks and improves write query efficiency.
S13、当接收到数据查询请求时,调用虚拟管理节点解析所述数据查询请求,得到对应所述数据查询请求的待查询表标识,并根据所述待查询表标识确定目标虚拟数据节点。S13. When a data query request is received, the virtual management node is invoked to parse the data query request to obtain an identifier of the table to be queried corresponding to the data query request, and a target virtual data node is determined according to the identifier of the table to be queried.
在本申请的至少一实施例中,所述数据查询请求可以是某一应用发出的,查询所述预设数据库中聚表数据的请求。所述数据查询请求中携带待查询表标识,所述待查询表标识包括采集点的名称或ID、数据采集起止时间以及若干个查询项等信息块,所述查询项与所述待采集项对应。所述待查询表标识对应所述预设数据库中的聚表数据,根据所述待查询表标识遍历所述预设数据库,能够得到目标聚表数据。所述预设数据库包括若干个数据节点,所述数据节点是在一台物理机、虚拟机或容器中的一个运行实例,一个工作的系统至少有一个数据节点。所述数据节点包含若干个虚拟数据节点,和至多一个虚拟管理节点。In at least one embodiment of the present application, the data query request may be a request sent by an application to query aggregate table data in the preset database. The data query request carries the identifier of the table to be queried, and the identifier of the table to be queried includes information blocks such as the name or ID of the collection point, the start and end time of data collection, and several query items, and the query items correspond to the items to be collected. . The identifier of the table to be queried corresponds to the aggregation table data in the preset database, and the target aggregation table data can be obtained by traversing the preset database according to the identifier of the table to be queried. The preset database includes several data nodes, the data nodes are a running instance in a physical machine, a virtual machine or a container, and a working system has at least one data node. The data nodes include several virtual data nodes, and at most one virtual management node.
其中,所述虚拟管理节点负责所有节点运行状态的采集、负载均衡以及元数据管理,当应用需要查询一张表时,通过连接管理节点获取信息,得到该表处于哪个数据节点。所述虚拟数据节点负责存储具体时序数据,针对时序数据的查询操作,都在虚拟数据节点进行,位于不同物理机上的虚拟数据节点可以组成虚拟数据节点组。The virtual management node is responsible for the collection, load balancing, and metadata management of all nodes' running states. When the application needs to query a table, it obtains information by connecting to the management node, and obtains which data node the table is located on. The virtual data node is responsible for storing specific time series data, and query operations for the time series data are all performed on the virtual data node, and virtual data nodes located on different physical machines can form a virtual data node group.
其中,所述虚拟管理节点用于存储元数据,同时根据每个虚拟数据节点状态来负载均衡。所述元数据可以指数据采集的起始时间、数据点数、压缩算法等元数据。由于元数据量并不大,将其完全保存在内存中,以保证查询操作的高效。在应用端,为避免每次数据操作都访 问虚拟管理节点,驱动程序将必要的元数据保存在本地,只有当需要的元数据不存在或失效的情况下,才会访问虚拟管理节点,以此提高系统性能。Wherein, the virtual management node is used for storing metadata, and at the same time performing load balancing according to the state of each virtual data node. The metadata may refer to metadata such as the start time of data collection, the number of data points, and the compression algorithm. Since the amount of metadata is not large, it is completely stored in memory to ensure efficient query operations. On the application side, in order to avoid accessing the virtual management node for each data operation, the driver saves the necessary metadata locally, and accesses the virtual management node only when the required metadata does not exist or is invalid. Improve system performance.
可选地,所述调用虚拟管理节点解析所述数据查询请求,得到对应所述数据查询请求的待查询表标识,并根据所述待查询表标识确定目标虚拟数据节点可以包括:Optionally, the invoking the virtual management node to parse the data query request to obtain an identifier of the table to be queried corresponding to the data query request, and determining the target virtual data node according to the identifier of the table to be queried may include:
解析所述待查询表标识,得到目标信息块;Parse the identifier of the table to be queried to obtain a target information block;
根据所述目标信息块遍历元数据,得到目标元数据;Traverse metadata according to the target information block to obtain target metadata;
确定所述目标元数据对应的虚拟数据节点为目标虚拟数据节点。The virtual data node corresponding to the target metadata is determined as the target virtual data node.
所述虚拟数据节点组中的数据可通过异步复制的方式进行同步,实现数据的最终一致性,保证一份数据在多台物理机上有拷贝,即使一台物理机宕机,总有位于其他物理机上的虚拟数据节点能处理查询请求,保证系统运行的高可靠性。其中,多个虚拟管理节点可以组成虚拟管理节点组。所述虚拟管理节点组中虚拟管理节点的数量可根据所述虚拟数据节点的数量确定。可选地,确定所述虚拟管理节点的数量可包括:获取虚拟数据节点的第一数量;根据所述第一数量遍历预先设置的所述虚拟数据节点与所述虚拟管理节点的数量关系,得到对应所述第一数量的所述虚拟管理节点的第二数量;构建关于所述虚拟管理节点与所述虚拟数据节点间的节点树。其中,在所述节点树中,所述虚拟管理节点为父节点,被其管理的所述虚拟数据节点为子节点。The data in the virtual data node group can be synchronized through asynchronous replication to realize the eventual consistency of the data and ensure that a piece of data is copied on multiple physical machines. The virtual data nodes on the computer can process query requests to ensure high reliability of system operation. Wherein, multiple virtual management nodes may form a virtual management node group. The number of virtual management nodes in the virtual management node group may be determined according to the number of virtual data nodes. Optionally, determining the number of the virtual management nodes may include: acquiring a first number of virtual data nodes; traversing the preset quantitative relationship between the virtual data nodes and the virtual management nodes according to the first number, and obtaining a second number of the virtual management nodes corresponding to the first number; constructing a node tree between the virtual management nodes and the virtual data nodes. Wherein, in the node tree, the virtual management node is a parent node, and the virtual data node managed by it is a child node.
在一实施例中,采用Master-Slave(主从设备模式)同步复制模式实现虚拟管理节点的数据同步,所述Master-Slave同步复制模式下,所述虚拟管理节点中包括一个主导虚拟管理节点(也称Master节点)与若干个从属虚拟管理节点(也称Slave节点),所述Master节点为任务调度者,为多个Slave节点分配计算任务,当所有的Slave节点将任务完成之后,最后由Master节点汇集结果。在执行写的操作时,只有Slave节点写入成功后,Master节点才会返回成功,从而保证数据的强一致性。如果Master节点宕机,系统有机制保证其中一个Slave会立即被选举为Master,从而保证系统写操作的高可靠性。In one embodiment, the Master-Slave (master-slave device mode) synchronous replication mode is used to realize the data synchronization of the virtual management node. In the Master-Slave synchronous replication mode, the virtual management node includes a leading virtual management node ( Also called Master node) and several subordinate virtual management nodes (also called Slave nodes), the Master node is a task scheduler, assigning computing tasks to multiple Slave nodes, when all Slave nodes complete the task, the Master Node aggregates results. When performing a write operation, the Master node will return success only after the Slave node writes successfully, thereby ensuring strong data consistency. If the Master node goes down, the system has a mechanism to ensure that one of the Slaves will be elected as the Master immediately, thus ensuring the high reliability of system write operations.
可选地,所述方法还包括:Optionally, the method further includes:
检测由虚拟管理节点组成的虚拟管理节点组中的主导虚拟管理节点是否异常;Detect whether the dominant virtual management node in the virtual management node group composed of virtual management nodes is abnormal;
当检测结果为所述主导虚拟管理节点存在异常时,获取节点树,并计算每一所述节点树中的子节点数量;When the detection result is that the dominant virtual management node is abnormal, acquire a node tree, and calculate the number of child nodes in each of the node trees;
确定所述子节点数量最小的节点树为目标节点树;Determine the node tree with the smallest number of child nodes as the target node tree;
选取所述目标节点树对应的父节点作为新的主导虚拟管理节点。其中,检测主导虚拟管理节点是否异常也即检测主导虚拟管理节点是否宕机。The parent node corresponding to the target node tree is selected as the new dominant virtual management node. Wherein, detecting whether the dominant virtual management node is abnormal means detecting whether the dominant virtual management node is down.
在一实施例中,在一个虚拟数据节点组里,各个虚拟数据节点通过心跳包知道对方的状态。如果一个虚拟数据节点收到数据写入的请求,该请求会被立即转发给其他虚拟数据节点,然后在本地存储。当应用要操作任何一份聚表数据时,系统会给应用提供该表所属的虚拟数据节点组里各个虚拟节点的IP地址,如果连接其中一个失败或者操作失败,应用会尝试第二个、第三个,只有所有节点失败才会返回失败。以此保证虚拟数据节点组里任何一台机器宕机,都不会影响对外的服务。In one embodiment, in a virtual data node group, each virtual data node knows the status of each other through heartbeat packets. If a virtual data node receives a data write request, the request will be immediately forwarded to other virtual data nodes, and then stored locally. When the application wants to operate any aggregation table data, the system will provide the application with the IP addresses of each virtual node in the virtual data node group to which the table belongs. If the connection to one of them fails or the operation fails, the application will try the second and third Three, failure will be returned only if all nodes fail. This ensures that the failure of any machine in the virtual data node group will not affect external services.
可选地,所述目标虚拟数据节点可能在执行数据查询前就存在异常,也可能在执行数据查询过程中存在异常。针对上述两种情况,所述方法还包括:Optionally, the target virtual data node may have an exception before executing the data query, or may have an exception during the execution of the data query. For the above two situations, the method further includes:
获取所述目标虚拟数据节点对应的虚拟数据节点组;obtaining a virtual data node group corresponding to the target virtual data node;
调用所述虚拟数据节点组中的虚拟数据节点接收所述目标虚拟数据节点的心跳包;calling the virtual data node in the virtual data node group to receive the heartbeat packet of the target virtual data node;
解析并检测所述心跳包是否存在异常状态;Parse and detect whether the heartbeat packet is in an abnormal state;
当检测结果为所述心跳包存在异常状态时,从所述虚拟数据节点组中确定其他虚拟数据节点,用于执行数据查询。When the detection result is that the heartbeat packet is in an abnormal state, other virtual data nodes are determined from the virtual data node group for performing data query.
其中,通过检测所述心跳包确定所述目标虚拟数据节点在执行数据查询前就存在异常时, 直接从所述虚拟数据节点组中随机确定其他虚拟数据节点,用于执行数据查询。通过检测所述心跳包确定所述目标虚拟数据节点在执行数据查询过程中发生异常时,解析所述心跳包,所述心跳包中携带有所述目标虚拟数据节点已查询的数据信息,从所述虚拟数据节点组中随机确定其他虚拟数据节点,用于执行剩余数据查询工作。为了避免所述心跳包中携带的已查询的数据信息过多导致所述心跳包的传输速率较慢的问题。Wherein, when it is determined by detecting the heartbeat packet that the target virtual data node is abnormal before executing the data query, other virtual data nodes are directly and randomly determined from the virtual data node group for executing the data query. By detecting the heartbeat packet, when it is determined that the target virtual data node is abnormal in the process of executing the data query, the heartbeat packet is parsed, and the heartbeat packet carries the data information queried by the target virtual data node. Other virtual data nodes are randomly determined in the virtual data node group to perform the remaining data query work. In order to avoid the problem that the transmission rate of the heartbeat packet is slow due to too much queried data information carried in the heartbeat packet.
在一实施例中,所述方法还包括:获取所述目标虚拟数据节点已查询的数据信息;压缩所述数据信息至预设大小;将压缩处理的数据信息存储至心跳包中。所述预设大小为预先设置的压缩量大小。通过对已查询的数据信息进行压缩处理,能够减少所述心跳包中携带的数据信息量,提高心跳包的传输速率。In one embodiment, the method further includes: acquiring data information queried by the target virtual data node; compressing the data information to a preset size; and storing the compressed data information in a heartbeat packet. The preset size is a preset compression amount size. By compressing the queried data information, the amount of data information carried in the heartbeat packet can be reduced, and the transmission rate of the heartbeat packet can be improved.
在其实施例中,所述方法还包括:获取所述目标虚拟数据节点已查询的数据信息;针对已查询的数据信息构建数据链接;将所述数据链接存储至所述心跳包中。构建数据链接的方式为现有技术,在此不做赘述。通过对已查询的数据信息建立数据链接的方式,能够减少所述心跳包中携带的数据信息量,提高心跳包的传输速率。通过在所述心跳包中携带发生异常的所述目标虚拟数据节点已查询的数据信息,避免重复执行数据查询工作,能够提高信息处理的效率。In an embodiment thereof, the method further includes: acquiring data information queried by the target virtual data node; constructing a data link for the queried data information; and storing the data link in the heartbeat packet. The method of constructing the data link is in the prior art, and details are not described here. By establishing a data link for the queried data information, the amount of data information carried in the heartbeat packet can be reduced, and the transmission rate of the heartbeat packet can be improved. By carrying in the heartbeat packet the data information queried by the target virtual data node in which the abnormality occurs, repeated execution of data query work can be avoided, and the efficiency of information processing can be improved.
S14、获取所述目标虚拟数据节点的第一数据版本以及对应所述目标虚拟数据节点的虚拟数据节点组中每一虚拟数据节点的第二数据版本。S14. Acquire the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node.
在本申请的至少一实施例中,当一台主机重启时,每个虚拟数据节点都会检查自己数据的版本是否与对应节点组中其他虚拟数据节点一致,如果数据版本不一致,需要同步后才能对外服务。在运行过程中,由于各种原因,数据可以失去同步,这种不同步会在收到转发的写入请求时被发现,一旦被发现,数据版本低的虚拟数据节点将马上停止对外服务,进入同步流程,同步完后,才会重新恢复对外服务。同步过程中,高数据版本的节点可以正常的提供服务。In at least one embodiment of the present application, when a host is restarted, each virtual data node will check whether the version of its own data is consistent with other virtual data nodes in the corresponding node group. If the data version is inconsistent, it needs to be synchronized before it can be externally Serve. During the running process, due to various reasons, the data can be out of synchronization. This kind of out-of-synchronization will be found when the forwarded write request is received. Once found, the virtual data node with a low data version will immediately stop the external service and enter the The synchronization process will resume external services only after synchronization is complete. During the synchronization process, nodes with higher data versions can provide services normally.
可选地,数据版本用于标识所述目标虚拟数据节点中存储数据的新旧程度,所述数据版本越高,其对应存储数据越新,根据所述数据版本能够确定所述目标虚拟数据节点中存储的数据是否为最新版本。所述目标虚拟数据节点的节点组中虚拟数据节点的第二数据版本的数量可能为一个,也可能为多个。当所述第二数据版本的数量为多个时,所述方法还包括:Optionally, the data version is used to identify the freshness of the data stored in the target virtual data node. The higher the data version, the newer the corresponding stored data. Whether the stored data is the latest version. The number of the second data versions of the virtual data node in the node group of the target virtual data node may be one, or may be multiple. When the number of the second data versions is multiple, the method further includes:
获取所述目标虚拟数据节点的节点组中每一虚拟数据节点的第二数据版本的数量;obtaining the number of second data versions of each virtual data node in the node group of the target virtual data node;
检测所述第二数据版本的数量是否超过1个;detecting whether the number of the second data versions exceeds one;
当检测结果为所述第二数据版本的数量超过1个时,获取每个所述第二数据版本的发布时间,并选取发布时间最近的第二数据版本作为聚表数据的最新版本。When the detection result is that the number of the second data versions exceeds one, the release time of each second data version is acquired, and the second data version with the latest release time is selected as the latest version of the aggregate table data.
S15、检测所述第一数据版本与所述第二数据版本是否一致。S15. Detect whether the first data version is consistent with the second data version.
在本申请的至少一实施例中,检测所述第一数据版本与所述第二数据版本是否一致以确定所述目标虚拟数据节点中存储的数据是否为最新版本。当检测结果为所述第一数据版本与所述第二数据版本一致时,确定所述目标虚拟数据节点中存储的数据为最新版本,所述目标虚拟数据节点可以继续执行数据查询操作;当检测结果为所述第一数据版本与所述第二数据版本不一致时,确定所述目标虚拟数据节点中存储的数据并非最新版本,需要获取最新版本的数据,并对所述目标虚拟数据节点中存储的数据进行更新。在对所述目标虚拟数据节点中存储的数据进行更新的同时,可以将数据查询请求分配给拥有最新版本的节点组中的其他虚拟数据节点中执行,以保证所述目标虚拟数据节点中存储的数据不是最新版本时,不会影响数据查询过程,能够提高信息处理的可靠性与高效性。In at least one embodiment of the present application, it is detected whether the first data version is consistent with the second data version to determine whether the data stored in the target virtual data node is the latest version. When the detection result is that the first data version is consistent with the second data version, it is determined that the data stored in the target virtual data node is the latest version, and the target virtual data node can continue to perform data query operations; When the result is that the first data version is inconsistent with the second data version, it is determined that the data stored in the target virtual data node is not the latest version, and the data of the latest version needs to be acquired and stored in the target virtual data node. data is updated. While updating the data stored in the target virtual data node, the data query request may be allocated to other virtual data nodes in the node group with the latest version for execution, so as to ensure that the data stored in the target virtual data node is When the data is not the latest version, it will not affect the data query process, which can improve the reliability and efficiency of information processing.
S16、当检测结果为所述第一数据版本与所述第二数据版本一致时,调用所述目标虚拟数据节点获取节点数据,并根据聚合规则聚合所述节点数据,得到目标节点数据。S16. When the detection result is that the first data version is consistent with the second data version, call the target virtual data node to obtain node data, and aggregate the node data according to an aggregation rule to obtain target node data.
在本申请的至少一实施例中,当检测结果为所述第一数据版本与所述第二数据版本一致 时,调用所述目标虚拟数据节点获取节点数据,并根据聚合规则聚合所述节点数据,得到目标节点数据。其中,所述节点数据是指所述数据查询请求用于请求的、存储于预设数据库中的聚表数据。In at least one embodiment of the present application, when the detection result is that the first data version is consistent with the second data version, the target virtual data node is called to obtain node data, and the node data is aggregated according to an aggregation rule , get the target node data. Wherein, the node data refers to the aggregation table data stored in the preset database for the request of the data query request.
在一实施例中,当所述数据查询请求用于请求至少两个聚表数据时,在获取对应的两个聚表数据之后,需根据聚合规则对两个聚表数据进行处理,得到聚合时序数据。其中,所述聚合规则可由所述数据查询请求携带的聚合条件解析后结构化处理得到。所述聚合条件可以是对请求到的聚表数据求取平均值、最大值或最小值等。In one embodiment, when the data query request is used to request at least two aggregation table data, after acquiring the corresponding two aggregation table data, it is necessary to process the two aggregation table data according to the aggregation rules to obtain the aggregation time series. data. Wherein, the aggregation rule may be obtained by structural processing after analysis of the aggregation condition carried in the data query request. The aggregation condition may be an average value, a maximum value, or a minimum value, etc., of the requested aggregation table data.
在本申请实施例提供的上述基于大数据的信息处理方法中,在若干个目标数据采集点采集数据之前,会对数据采集类型相同的采集点创建同样的聚表结构,避免对每个数据采集点单独建表造成表的数量巨大的问题,能够减少内存占用,提高信息处理效率;且本申请在接收到应用发出的数据查询请求时,调用虚拟管理节点解析数据查询请求,得到对应所述数据查询请求的待查询表标识,并根据所述待查询表标识确定目标虚拟节点,由目标虚拟节点执行数据查询请求,能够提高信息查询效率;此外,本申请对目标虚拟数据节点的第一数据版本以及对应目标虚拟数据节点的节点组中每一虚拟数据节点的第二数据版本进行版本比对,能够及时发现正在执行任务的目标虚拟数据节点的数据版本是否为最新版本,从而保证数据处理的准确性。In the above-mentioned big data-based information processing method provided by the embodiment of the present application, before data is collected at several target data collection points, the same aggregation table structure is created for collection points with the same data collection type, so as to avoid collecting data for each data collection point. The problem of huge number of tables caused by the single point building of tables can reduce memory usage and improve information processing efficiency; and when receiving a data query request sent by an application, the application calls the virtual management node to parse the data query request, and obtains the corresponding data. The identifier of the table to be queried for the query request, and the target virtual node is determined according to the identifier of the table to be queried, and the target virtual node executes the data query request, which can improve the information query efficiency; And the second data version of each virtual data node in the node group corresponding to the target virtual data node performs version comparison, which can timely find out whether the data version of the target virtual data node that is executing the task is the latest version, thereby ensuring the accuracy of data processing. sex.
图2是本申请实施例二提供的基于大数据的信息处理装置的结构图。FIG. 2 is a structural diagram of a big data-based information processing apparatus provided in Embodiment 2 of the present application.
在一些实施例中,所述基于大数据的信息处理装置20可以包括多个由计算机程序段所组成的功能模块。所述基于大数据的信息处理装置20中的各个程序段的计算机程序可以存储于计算机设备的存储器中,并由至少一个处理器所执行,以执行(详见图1描述)基于大数据的信息处理的功能。In some embodiments, the big data-based information processing apparatus 20 may include a plurality of functional modules composed of computer program segments. The computer program of each program segment in the big data-based information processing apparatus 20 can be stored in the memory of the computer device and executed by at least one processor to execute (details described in FIG. 1 ) the big data-based information processing function.
本实施例中,所述基于大数据的信息处理装置20根据其所执行的功能,可以被划分为多个功能模块。所述功能模块可以包括:聚表获取模块201、数据存储模块202、请求解析模块203、版本获取模块204、版本检测模块205以及数据聚合模块206。本申请所称的模块是指一种能够被至少一个处理器所执行并且能够完成固定功能的一系列计算机可读指令段,其存储在存储器中。在本实施例中,关于各模块的功能将在后续的实施例中详述。In this embodiment, the big data-based information processing apparatus 20 can be divided into a plurality of functional modules according to the functions performed by the information processing apparatus 20 . The functional modules may include: a table aggregation acquisition module 201 , a data storage module 202 , a request analysis module 203 , a version acquisition module 204 , a version detection module 205 and a data aggregation module 206 . A module referred to in this application refers to a series of computer-readable instruction segments that can be executed by at least one processor and can perform fixed functions, and are stored in a memory. In this embodiment, the functions of each module will be described in detail in subsequent embodiments.
所述聚表获取模块201可以用于获取对应若干个目标数据采集点的目标聚表结构。The aggregation table obtaining module 201 may be configured to obtain a target aggregation table structure corresponding to several target data collection points.
所述数据存储模块202可以用于按照所述目标聚表结构将若干个所述目标数据采集点采集的数据进行聚合处理,得到目标聚表数据,并存储所述目标聚表数据至预设数据库中。The data storage module 202 can be configured to perform aggregation processing on the data collected by several of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database. middle.
所述请求解析模块203可以用于当接收到数据查询请求时,调用虚拟管理节点解析所述数据查询请求,得到对应所述数据查询请求的待查询表标识,并根据所述待查询表标识确定目标虚拟数据节点。The request parsing module 203 may be configured to, when receiving a data query request, call the virtual management node to parse the data query request, obtain the identifier of the table to be queried corresponding to the data query request, and determine the identifier of the table to be queried according to the identifier of the table to be queried. The target virtual data node.
所述版本获取模块204可以用于获取所述目标虚拟数据节点的第一数据版本以及对应所述目标虚拟数据节点的节点组中每一虚拟数据节点的第二数据版本。The version obtaining module 204 may be configured to obtain the first data version of the target virtual data node and the second data version of each virtual data node in the node group corresponding to the target virtual data node.
所述版本检测模块205可以用于检测所述第一数据版本与所述第二数据版本是否一致。The version detection module 205 may be configured to detect whether the first data version is consistent with the second data version.
所述数据聚合模块206可以用于当检测结果为所述第一数据版本与所述第二数据版本一致时,调用所述目标虚拟数据节点获取节点数据,并根据聚合规则聚合所述节点数据,得到目标节点数据。The data aggregation module 206 may be configured to call the target virtual data node to obtain node data when the detection result is that the first data version is consistent with the second data version, and aggregate the node data according to an aggregation rule, Get the target node data.
参阅图3所示,为本申请实施例三提供的计算机设备的结构示意图。在本申请较佳实施例中,所述计算机设备3包括存储器31、至少一个处理器32、至少一条通信总线33及收发器34。Referring to FIG. 3 , it is a schematic structural diagram of a computer device according to Embodiment 3 of the present application. In a preferred embodiment of the present application, the computer device 3 includes a memory 31 , at least one processor 32 , at least one communication bus 33 and a transceiver 34 .
本领域技术人员应该了解,图3示出的计算机设备的结构并不构成本申请实施例的限定,既可以是总线型结构,也可以是星形结构,所述计算机设备3还可以包括比图示更多或更少的其他硬件或者软件,或者不同的部件布置。Those skilled in the art should understand that the structure of the computer device shown in FIG. 3 does not constitute a limitation of the embodiments of the present application, and may be a bus-type structure or a star-shaped structure. more or less other hardware or software, or a different arrangement of components is shown.
在一些实施例中,所述计算机设备3是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路、可编程门阵列、数字处理器及嵌入式设备等。所述计算机设备3还可包括客户设备,所述客户设备包括但不限于任何一种可与客户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、数码相机等。In some embodiments, the computer device 3 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, application-specific integrated circuits, Programmable gate arrays, digital processors and embedded devices, etc. The computer equipment 3 may also include client equipment, including but not limited to any electronic product that can interact with the client through a keyboard, a mouse, a remote control, a touchpad or a voice-activated device, etc., for example, Personal computers, tablets, smartphones, digital cameras, etc.
需要说明的是,所述计算机设备3仅为举例,其他现有的或今后可能出现的电子产品如可适应于本申请,也应包含在本申请的保护范围以内,并以引用方式包含于此。It should be noted that the computer equipment 3 is only an example, and other existing or future electronic products, if applicable to the present application, should also be included within the protection scope of the present application, and incorporated herein by reference .
在一些实施例中,所述存储器31中存储有计算机程序,所述计算机程序被所述至少一个处理器32执行时实现如所述的基于大数据的信息处理方法中的全部或者部分步骤。示例性的,所述计算机程序可以被分割成一个或多个模块/单元,所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段,该指令段用于描述所述计算机程序在所述计算机设备中的执行过程。例如,图2中所述的各个模块是存储在所述存储器31中的计算机程序,并由所述至少一个处理器32所执行,从而实现所述各个模块的功能以达到基于大数据的信息处理的目的。所述存储器31包括只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable Read-Only Memory,PROM)、可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)、一次可编程只读存储器(One-time Programmable Read-Only Memory,OTPROM)、电子擦除式可复写只读存储器(Electrically-Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储器、磁盘存储器、磁带存储器、或者能够用于携带或存储数据的计算机可读的任何其他介质。In some embodiments, a computer program is stored in the memory 31, and when the computer program is executed by the at least one processor 32, all or part of the steps in the above-mentioned big data-based information processing method are implemented. Exemplarily, the computer program may be divided into one or more modules/units, and the one or more modules/units may be a series of computer-readable instruction segments capable of performing specific functions, and the instruction segments are used to describe The execution process of the computer program in the computer device. For example, each module described in FIG. 2 is a computer program stored in the memory 31 and executed by the at least one processor 32, thereby realizing the functions of the various modules to achieve information processing based on big data the goal of. Described memory 31 comprises read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), erasable programmable read-only memory (Erasable Programmable Read-Only Memory, EPROM) , One-time Programmable Read-Only Memory (OTPROM), Electronically-Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read- Only Memory, CD-ROM) or other optical disk storage, magnetic disk storage, magnetic tape storage, or any other computer-readable medium that can be used to carry or store data.
进一步地,所述计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。所述计算机可读存储介质可以是非易失性,也可以是易失性。Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function, and the like; The data created by the use of the node, etc. The computer-readable storage medium may be non-volatile or volatile.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
在一些实施例中,所述至少一个处理器32是所述计算机设备3的控制核心(Control Unit),利用各种接口和线路连接整个计算机设备3的各个部件,通过运行或执行存储在所述存储器31内的程序或者模块,以及调用存储在所述存储器31内的数据,以执行计算机设备3的各种功能和处理数据。例如,所述至少一个处理器32执行所述存储器中存储的计算机程序时实现本申请实施例中所述的基于大数据的信息处理方法的全部或者部分步骤;或者实现基于大数据的信息处理装置的全部或者部分功能。所述至少一个处理器32可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。In some embodiments, the at least one processor 32 is a control core (Control Unit) of the computer device 3, using various interfaces and lines to connect various components of the entire computer device 3, and by running or executing storage in the computer device 3 The programs or modules in the memory 31 and the data stored in the memory 31 are called to perform various functions of the computer device 3 and process data. For example, when the at least one processor 32 executes the computer program stored in the memory, all or part of the steps of the big data-based information processing method described in the embodiments of the present application are implemented; or a big data-based information processing apparatus is implemented. all or part of the functions. The at least one processor 32 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one or more central processing units. (Central Processing unit, CPU), microprocessor, digital processing chip, graphics processor and combination of various control chips, etc.
在一些实施例中,所述至少一条通信总线33被设置为实现所述存储器31以及所述至少一个处理器32等之间的连接通信。In some embodiments, the at least one communication bus 33 is configured to enable connection communication between the memory 31 and the at least one processor 32 and the like.
尽管未示出,所述计算机设备3还可以包括给各个部件供电的电源(比如电池),优选的,电源可以通过电源管理装置与所述至少一个处理器32逻辑相连,从而通过电源管理装置实现管理充电、放电、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述计算机设备3还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。Although not shown, the computer device 3 may also include a power source (such as a battery) for supplying power to various components. Preferably, the power source may be logically connected to the at least one processor 32 through a power management device, so as to be implemented by the power management device Manage charging, discharging, and power management functions. The power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components. The computer device 3 may also include a variety of sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
上述以软件功能模块的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,计算机设备,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的部分。The above-mentioned integrated units implemented in the form of software functional modules may be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to make a computer device (which may be a personal computer, a computer device, or a network device, etc.) or a processor (processor) to execute the methods described in the various embodiments of the present application. part.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division manners in actual implementation.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,既可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, and may be located in one place or distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或,单数不排除复数。说明书中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。It will be apparent to those skilled in the art that the present application is not limited to the details of the above-described exemplary embodiments, but that the present application can be implemented in other specific forms without departing from the spirit or essential characteristics of the present application. Accordingly, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the application is to be defined by the appended claims rather than the foregoing description, which is therefore intended to fall within the scope of the claims. All changes within the meaning and scope of the equivalents of , are included in this application. Any reference signs in the claims shall not be construed as limiting the involved claim. Furthermore, it is clear that the word "comprising" does not exclude other units or, and the singular does not exclude the plural. A plurality of units or devices stated in the specification can also be implemented by one unit or device through software or hardware. The terms first, second, etc. are used to denote names and do not denote any particular order.
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application rather than limitations. Although the present application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present application can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present application.

Claims (20)

  1. 一种基于大数据的信息处理方法,其中,所述基于大数据的信息处理方法包括:A big data-based information processing method, wherein the big data-based information processing method comprises:
    获取对应若干个目标数据采集点的目标聚表结构;Obtain the target cluster table structure corresponding to several target data collection points;
    按照所述目标聚表结构将若干个所述目标数据采集点采集的数据进行聚合处理,得到目标聚表数据,并存储所述目标聚表数据至预设数据库中;Perform aggregation processing on the data collected by several of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database;
    当接收到数据查询请求时,调用虚拟管理节点解析所述数据查询请求,得到对应所述数据查询请求的待查询表标识,并根据所述待查询表标识确定目标虚拟数据节点;When a data query request is received, the virtual management node is called to parse the data query request, to obtain an identifier of the table to be queried corresponding to the data query request, and to determine a target virtual data node according to the identifier of the table to be queried;
    获取所述目标虚拟数据节点的第一数据版本以及对应所述目标虚拟数据节点的虚拟数据节点组中每一虚拟数据节点的第二数据版本;acquiring the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node;
    检测所述第一数据版本与所述第二数据版本是否一致;detecting whether the first data version is consistent with the second data version;
    当检测结果为所述第一数据版本与所述第二数据版本一致时,调用所述目标虚拟数据节点获取节点数据,并根据聚合规则聚合所述节点数据,得到目标节点数据。When the detection result is that the first data version is consistent with the second data version, the target virtual data node is called to obtain node data, and the node data is aggregated according to an aggregation rule to obtain target node data.
  2. 根据权利要求1所述的基于大数据的信息处理方法,其中,在获取对应若干个目标数据采集点的目标聚表结构之前,所述方法还包括:The big data-based information processing method according to claim 1, wherein, before acquiring the target aggregation table structure corresponding to several target data collection points, the method further comprises:
    获取每一所述目标数据采集点的数据采集类型;Obtain the data collection type of each of the target data collection points;
    检测所述数据采集类型是否一致;Detecting whether the data collection types are consistent;
    当检测结果为所述数据采集类型一致时,确定所述数据采集类型的目标聚表结构;When the detection result is that the data collection types are consistent, determine the target aggregation table structure of the data collection types;
    当检测结果为所述数据采集类型不一致时,为每一所述数据采集类型单独创建表格结构。When the detection result is that the data collection types are inconsistent, a separate table structure is created for each of the data collection types.
  3. 根据权利要求1所述的基于大数据的信息处理方法,其中,所述获取对应若干个目标数据采集点的目标聚表结构包括:The big data-based information processing method according to claim 1, wherein the acquiring a target clustering table structure corresponding to several target data collection points comprises:
    获取所述目标数据采集点的数据采集类型;obtaining the data collection type of the target data collection point;
    解析所述数据采集类型,得到待采集项以及每一所述待采集项对应的属性信息;Analyzing the data collection type to obtain items to be collected and attribute information corresponding to each item to be collected;
    根据所述待采集项与所述属性信息创建目标聚表结构。Create a target aggregation table structure according to the item to be collected and the attribute information.
  4. 根据权利要求1所述的基于大数据的信息处理方法,其中,所述按照所述目标聚表结构将若干个所述目标数据采集点采集的数据进行聚合处理,得到目标聚表数据包括:The big data-based information processing method according to claim 1 , wherein, according to the target aggregation table structure, performing aggregation processing on the data collected by a plurality of the target data collection points, and obtaining the target aggregation table data comprises:
    获取每一所述目标数据采集点采集的数据,并将所述数据填充至所述目标聚表结构中,得到初始聚表数据;Acquire the data collected by each of the target data collection points, and fill the data into the target cluster table structure to obtain initial cluster table data;
    构建与所述目标数据采集点的标识信息对应的预设标签,并将所述标签添加至所述初始聚表数据中;constructing a preset label corresponding to the identification information of the target data collection point, and adding the label to the initial aggregation table data;
    聚合处理所述初始聚表数据,得到目标聚表数据。Aggregate and process the initial clustered table data to obtain target clustered table data.
  5. 根据权利要求1所述的基于大数据的信息处理方法,其中,所述方法还包括:The big data-based information processing method according to claim 1, wherein the method further comprises:
    获取所述预设数据库中的剩余空间值;obtaining the remaining space value in the preset database;
    监测所述剩余空间值是否满足预设空间临界值;monitoring whether the remaining space value satisfies a preset space critical value;
    当监测结果为所述剩余空间值满足预设空间临界值时,选取所述预设数据库中的目标数据;When the monitoring result is that the remaining space value meets the preset space critical value, select the target data in the preset database;
    迁移所述目标数据存储至硬盘中。Migrate the target data to the hard disk.
  6. 根据权利要求1所述的基于大数据的信息处理方法,其中,所述方法还包括:The big data-based information processing method according to claim 1, wherein the method further comprises:
    检测由虚拟管理节点组成的虚拟管理节点组中的主导虚拟管理节点是否异常;Detect whether the dominant virtual management node in the virtual management node group composed of virtual management nodes is abnormal;
    当检测结果为所述主导虚拟管理节点存在异常时,获取节点树,并计算每一所述节点树中的子节点数量;When the detection result is that the dominant virtual management node is abnormal, acquire a node tree, and calculate the number of child nodes in each of the node trees;
    确定所述子节点数量最小的节点树为目标节点树;Determine the node tree with the smallest number of child nodes as the target node tree;
    选取所述目标节点树对应的父节点作为新的主导虚拟管理节点。The parent node corresponding to the target node tree is selected as the new dominant virtual management node.
  7. 根据权利要求1所述的基于大数据的信息处理方法,其中,所述方法还包括:The big data-based information processing method according to claim 1, wherein the method further comprises:
    获取所述目标虚拟数据节点对应的虚拟数据节点组;obtaining a virtual data node group corresponding to the target virtual data node;
    调用所述虚拟数据节点组中的虚拟数据节点接收所述目标虚拟数据节点的心跳包;calling the virtual data node in the virtual data node group to receive the heartbeat packet of the target virtual data node;
    解析并检测所述心跳包是否存在异常状态;Parse and detect whether the heartbeat packet is in an abnormal state;
    当检测结果为所述心跳包存在异常状态时,从所述虚拟数据节点组中确定其他虚拟数据节点,用于执行数据查询。When the detection result is that the heartbeat packet is in an abnormal state, other virtual data nodes are determined from the virtual data node group for performing data query.
  8. 一种基于大数据的信息处理装置,其中,所述基于大数据的信息处理装置包括:A big data-based information processing device, wherein the big data-based information processing device comprises:
    聚表获取模块,用于获取对应若干个目标数据采集点的目标聚表结构;The aggregation table acquisition module is used to acquire the target aggregation table structure corresponding to several target data collection points;
    数据存储模块,用于按照所述目标聚表结构将若干个所述目标数据采集点采集的数据进行聚合处理,得到目标聚表数据,并存储所述目标聚表数据至预设数据库中;a data storage module, configured to perform aggregation processing on the data collected by a plurality of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database;
    请求解析模块,用于当接收到数据查询请求时,调用虚拟管理节点解析所述数据查询请求,得到对应所述数据查询请求的待查询表标识,并根据所述待查询表标识确定目标虚拟数据节点;The request parsing module is configured to, when receiving a data query request, call the virtual management node to parse the data query request, obtain the identifier of the table to be queried corresponding to the data query request, and determine the target virtual data according to the identifier of the table to be queried node;
    版本获取模块,用于获取所述目标虚拟数据节点的第一数据版本以及对应所述目标虚拟数据节点的虚拟数据节点组中每一虚拟数据节点的第二数据版本;a version obtaining module, configured to obtain the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node;
    版本检测模块,用于检测所述第一数据版本与所述第二数据版本是否一致;a version detection module, configured to detect whether the first data version is consistent with the second data version;
    数据聚合模块,用于当检测结果为所述第一数据版本与所述第二数据版本一致时,调用所述目标虚拟数据节点获取节点数据,并根据聚合规则聚合所述节点数据,得到目标节点数据。A data aggregation module, configured to call the target virtual data node to obtain node data when the detection result is that the first data version is consistent with the second data version, and aggregate the node data according to aggregation rules to obtain a target node data.
  9. 一种计算机设备,其中,所述计算机设备包括处理器,所述处理器用于执行存储器中存储的计算机可读指令以实现以下步骤:A computer device, wherein the computer device includes a processor for executing computer-readable instructions stored in a memory to implement the following steps:
    获取对应若干个目标数据采集点的目标聚表结构;Obtain the target cluster table structure corresponding to several target data collection points;
    按照所述目标聚表结构将若干个所述目标数据采集点采集的数据进行聚合处理,得到目标聚表数据,并存储所述目标聚表数据至预设数据库中;Perform aggregation processing on the data collected by several of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database;
    当接收到数据查询请求时,调用虚拟管理节点解析所述数据查询请求,得到对应所述数据查询请求的待查询表标识,并根据所述待查询表标识确定目标虚拟数据节点;When a data query request is received, the virtual management node is called to parse the data query request, to obtain an identifier of the table to be queried corresponding to the data query request, and to determine a target virtual data node according to the identifier of the table to be queried;
    获取所述目标虚拟数据节点的第一数据版本以及对应所述目标虚拟数据节点的虚拟数据节点组中每一虚拟数据节点的第二数据版本;acquiring the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node;
    检测所述第一数据版本与所述第二数据版本是否一致;detecting whether the first data version is consistent with the second data version;
    当检测结果为所述第一数据版本与所述第二数据版本一致时,调用所述目标虚拟数据节点获取节点数据,并根据聚合规则聚合所述节点数据,得到目标节点数据。When the detection result is that the first data version is consistent with the second data version, the target virtual data node is called to obtain node data, and the node data is aggregated according to an aggregation rule to obtain target node data.
  10. 根据权利要求9所述的计算机设备,其中,在获取对应若干个目标数据采集点的目标聚表结构之前,所述处理器执行所述计算机可读指令还用以实现以下步骤:The computer device according to claim 9, wherein before acquiring the target aggregation table structure corresponding to several target data collection points, the processor executes the computer-readable instructions to further implement the following steps:
    获取每一所述目标数据采集点的数据采集类型;Obtain the data collection type of each of the target data collection points;
    检测所述数据采集类型是否一致;Detecting whether the data collection types are consistent;
    当检测结果为所述数据采集类型一致时,确定所述数据采集类型的目标聚表结构;When the detection result is that the data collection types are consistent, determine the target aggregation table structure of the data collection types;
    当检测结果为所述数据采集类型不一致时,为每一所述数据采集类型单独创建表格结构。When the detection result is that the data collection types are inconsistent, a separate table structure is created for each of the data collection types.
  11. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令以实现所述获取对应若干个目标数据采集点的目标聚表结构时,包括:The computer device according to claim 9, wherein, when the processor executes the computer-readable instructions to realize the obtaining of the target aggregation table structure corresponding to several target data collection points, the method comprises:
    获取所述目标数据采集点的数据采集类型;obtaining the data collection type of the target data collection point;
    解析所述数据采集类型,得到待采集项以及每一所述待采集项对应的属性信息;Analyzing the data collection type to obtain items to be collected and attribute information corresponding to each item to be collected;
    根据所述待采集项与所述属性信息创建目标聚表结构。Create a target aggregation table structure according to the item to be collected and the attribute information.
  12. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令以实现所述按照所述目标聚表结构将若干个所述目标数据采集点采集的数据进行聚合处理,得到目标聚表数据时,包括:The computer device according to claim 9, wherein the processor executes the computer-readable instructions to implement the aggregation processing of the data collected from a plurality of the target data collection points according to the target aggregation table structure, When getting the target cluster table data, include:
    获取每一所述目标数据采集点采集的数据,并将所述数据填充至所述目标聚表结构中, 得到初始聚表数据;Acquiring the data collected by each of the target data collection points, and filling the data into the target table aggregation structure to obtain initial table aggregation data;
    构建与所述目标数据采集点的标识信息对应的预设标签,并将所述标签添加至所述初始聚表数据中;constructing a preset label corresponding to the identification information of the target data collection point, and adding the label to the initial aggregation table data;
    聚合处理所述初始聚表数据,得到目标聚表数据。Aggregate and process the initial clustered table data to obtain target clustered table data.
  13. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令还用以实现以下步骤:The computer device of claim 9, wherein the processor executes the computer-readable instructions to further implement the following steps:
    获取所述预设数据库中的剩余空间值;obtaining the remaining space value in the preset database;
    监测所述剩余空间值是否满足预设空间临界值;monitoring whether the remaining space value satisfies a preset space critical value;
    当监测结果为所述剩余空间值满足预设空间临界值时,选取所述预设数据库中的目标数据;When the monitoring result is that the remaining space value meets the preset space critical value, select the target data in the preset database;
    迁移所述目标数据存储至硬盘中。Migrate the target data to the hard disk.
  14. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令还用以实现以下步骤:The computer device of claim 9, wherein the processor executes the computer-readable instructions to further implement the following steps:
    检测由虚拟管理节点组成的虚拟管理节点组中的主导虚拟管理节点是否异常;Detect whether the dominant virtual management node in the virtual management node group composed of virtual management nodes is abnormal;
    当检测结果为所述主导虚拟管理节点存在异常时,获取节点树,并计算每一所述节点树中的子节点数量;When the detection result is that the dominant virtual management node is abnormal, acquire a node tree, and calculate the number of child nodes in each of the node trees;
    确定所述子节点数量最小的节点树为目标节点树;Determine the node tree with the smallest number of child nodes as the target node tree;
    选取所述目标节点树对应的父节点作为新的主导虚拟管理节点。The parent node corresponding to the target node tree is selected as the new dominant virtual management node.
  15. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述计算机可读指令还用以实现以下步骤:The computer device of claim 9, wherein the processor executes the computer-readable instructions to further implement the following steps:
    获取所述目标虚拟数据节点对应的虚拟数据节点组;obtaining a virtual data node group corresponding to the target virtual data node;
    调用所述虚拟数据节点组中的虚拟数据节点接收所述目标虚拟数据节点的心跳包;calling the virtual data node in the virtual data node group to receive the heartbeat packet of the target virtual data node;
    解析并检测所述心跳包是否存在异常状态;Parse and detect whether the heartbeat packet is in an abnormal state;
    当检测结果为所述心跳包存在异常状态时,从所述虚拟数据节点组中确定其他虚拟数据节点,用于执行数据查询。When the detection result is that the heartbeat packet is in an abnormal state, other virtual data nodes are determined from the virtual data node group for performing data query.
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现以下步骤:A computer-readable storage medium storing computer-readable instructions on the computer-readable storage medium, wherein the computer-readable instructions realize the following steps when executed by a processor:
    获取对应若干个目标数据采集点的目标聚表结构;Obtain the target cluster table structure corresponding to several target data collection points;
    按照所述目标聚表结构将若干个所述目标数据采集点采集的数据进行聚合处理,得到目标聚表数据,并存储所述目标聚表数据至预设数据库中;Perform aggregation processing on the data collected by several of the target data collection points according to the target aggregation table structure to obtain target aggregation table data, and store the target aggregation table data in a preset database;
    当接收到数据查询请求时,调用虚拟管理节点解析所述数据查询请求,得到对应所述数据查询请求的待查询表标识,并根据所述待查询表标识确定目标虚拟数据节点;When a data query request is received, the virtual management node is called to parse the data query request, to obtain an identifier of the table to be queried corresponding to the data query request, and to determine a target virtual data node according to the identifier of the table to be queried;
    获取所述目标虚拟数据节点的第一数据版本以及对应所述目标虚拟数据节点的虚拟数据节点组中每一虚拟数据节点的第二数据版本;acquiring the first data version of the target virtual data node and the second data version of each virtual data node in the virtual data node group corresponding to the target virtual data node;
    检测所述第一数据版本与所述第二数据版本是否一致;detecting whether the first data version is consistent with the second data version;
    当检测结果为所述第一数据版本与所述第二数据版本一致时,调用所述目标虚拟数据节点获取节点数据,并根据聚合规则聚合所述节点数据,得到目标节点数据。When the detection result is that the first data version is consistent with the second data version, the target virtual data node is called to obtain node data, and the node data is aggregated according to an aggregation rule to obtain target node data.
  17. 根据权利要求16所述的计算机可读存储介质,其中,在获取对应若干个目标数据采集点的目标聚表结构之前,所述计算机可读指令被处理器执行还用以实现以下步骤:The computer-readable storage medium according to claim 16, wherein before acquiring the target aggregation table structure corresponding to several target data collection points, the computer-readable instructions are executed by the processor to further implement the following steps:
    获取每一所述目标数据采集点的数据采集类型;Obtain the data collection type of each of the target data collection points;
    检测所述数据采集类型是否一致;Detecting whether the data collection types are consistent;
    当检测结果为所述数据采集类型一致时,确定所述数据采集类型的目标聚表结构;When the detection result is that the data collection types are consistent, determine the target aggregation table structure of the data collection types;
    当检测结果为所述数据采集类型不一致时,为每一所述数据采集类型单独创建表格结构。When the detection result is that the data collection types are inconsistent, a separate table structure is created for each of the data collection types.
  18. 根据权利要求16所述的计算机可读存储介质,其中,所述计算机可读指令被处理器 执行以实现所述获取对应若干个目标数据采集点的目标聚表结构时,包括:The computer-readable storage medium according to claim 16, wherein, when the computer-readable instructions are executed by the processor to realize the acquisition of a target aggregation table structure corresponding to several target data collection points, the method comprises:
    获取所述目标数据采集点的数据采集类型;obtaining the data collection type of the target data collection point;
    解析所述数据采集类型,得到待采集项以及每一所述待采集项对应的属性信息;Analyzing the data collection type to obtain items to be collected and attribute information corresponding to each item to be collected;
    根据所述待采集项与所述属性信息创建目标聚表结构。Create a target aggregation table structure according to the item to be collected and the attribute information.
  19. 根据权利要求16所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行以实现所述按照所述目标聚表结构将若干个所述目标数据采集点采集的数据进行聚合处理,得到目标聚表数据时,包括:The computer-readable storage medium according to claim 16, wherein the computer-readable instructions are executed by the processor to realize the aggregation of the data collected by a plurality of the target data collection points according to the target aggregation table structure When processing to obtain the target cluster table data, include:
    获取每一所述目标数据采集点采集的数据,并将所述数据填充至所述目标聚表结构中,得到初始聚表数据;Acquire the data collected by each of the target data collection points, and fill the data into the target cluster table structure to obtain initial cluster table data;
    构建与所述目标数据采集点的标识信息对应的预设标签,并将所述标签添加至所述初始聚表数据中;constructing a preset label corresponding to the identification information of the target data collection point, and adding the label to the initial aggregation table data;
    聚合处理所述初始聚表数据,得到目标聚表数据。Aggregate and process the initial clustered table data to obtain target clustered table data.
  20. 根据权利要求16所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行还用以实现以下步骤:The computer-readable storage medium of claim 16, wherein the computer-readable instructions are executed by the processor to further implement the steps of:
    获取所述预设数据库中的剩余空间值;obtaining the remaining space value in the preset database;
    监测所述剩余空间值是否满足预设空间临界值;monitoring whether the remaining space value satisfies a preset space critical value;
    当监测结果为所述剩余空间值满足预设空间临界值时,选取所述预设数据库中的目标数据;When the monitoring result is that the remaining space value meets the preset space critical value, select the target data in the preset database;
    迁移所述目标数据存储至硬盘中。Migrate the target data to the hard disk.
PCT/CN2021/090464 2021-02-26 2021-04-28 Information processing method and apparatus based on big data, and related devices WO2022178976A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110219983.8A CN112948382A (en) 2021-02-26 2021-02-26 Information processing method and device based on big data and related equipment
CN202110219983.8 2021-02-26

Publications (1)

Publication Number Publication Date
WO2022178976A1 true WO2022178976A1 (en) 2022-09-01

Family

ID=76246608

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090464 WO2022178976A1 (en) 2021-02-26 2021-04-28 Information processing method and apparatus based on big data, and related devices

Country Status (2)

Country Link
CN (1) CN112948382A (en)
WO (1) WO2022178976A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113868335A (en) * 2021-09-15 2021-12-31 威讯柏睿数据科技(北京)有限公司 Method and equipment for expanding distributed clusters of memory database

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103123652A (en) * 2013-03-14 2013-05-29 曙光信息产业(北京)有限公司 Data query method and cluster database system
CN103237060A (en) * 2013-04-08 2013-08-07 北京小米科技有限责任公司 Method, device and system for data object acquisition
CN107784044A (en) * 2016-08-31 2018-03-09 华为技术有限公司 Table data query method and device
CN110008257A (en) * 2019-04-10 2019-07-12 深圳市腾讯计算机系统有限公司 Data processing method, device, system, computer equipment and storage medium
WO2020078381A1 (en) * 2018-10-16 2020-04-23 杭州海康威视数字技术股份有限公司 Data aggregation method, device, equipment, storage medium and system
CN112115147A (en) * 2020-09-25 2020-12-22 北京百度网讯科技有限公司 Data processing method, device, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3171282A4 (en) * 2014-11-19 2017-12-06 Informex Inc. Data retrieval apparatus, program and recording medium
CN104679897A (en) * 2015-03-18 2015-06-03 成都金本华科技股份有限公司 Data retrieval method under big data environment
CN105701240A (en) * 2016-02-24 2016-06-22 中国联合网络通信集团有限公司 Wearable device data processing method, device and system
CN110633096B (en) * 2018-06-21 2023-09-15 阿里巴巴集团控股有限公司 Node control method and device, version control method and device and distributed system
CN110502513A (en) * 2019-08-15 2019-11-26 中国平安财产保险股份有限公司 Collecting method, device, equipment and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103123652A (en) * 2013-03-14 2013-05-29 曙光信息产业(北京)有限公司 Data query method and cluster database system
CN103237060A (en) * 2013-04-08 2013-08-07 北京小米科技有限责任公司 Method, device and system for data object acquisition
CN107784044A (en) * 2016-08-31 2018-03-09 华为技术有限公司 Table data query method and device
WO2020078381A1 (en) * 2018-10-16 2020-04-23 杭州海康威视数字技术股份有限公司 Data aggregation method, device, equipment, storage medium and system
CN110008257A (en) * 2019-04-10 2019-07-12 深圳市腾讯计算机系统有限公司 Data processing method, device, system, computer equipment and storage medium
CN112115147A (en) * 2020-09-25 2020-12-22 北京百度网讯科技有限公司 Data processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112948382A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
US10262032B2 (en) Cache based efficient access scheduling for super scaled stream processing systems
US9652287B2 (en) Using databases for both transactions and analysis
KR102013004B1 (en) Dynamic load balancing in a scalable environment
US9699017B1 (en) Dynamic utilization of bandwidth for a quorum-based distributed storage system
CN104965850B (en) A kind of database high availability implementation method based on open source technology
US10877810B2 (en) Object storage system with metadata operation priority processing
CN107809467B (en) Method for deleting container mirror image data in cloud environment
US10409804B2 (en) Reducing I/O operations for on-demand demand data page generation
CN112445854A (en) Multi-source business data real-time processing method and device, terminal and storage medium
EP3172682B1 (en) Distributing and processing streams over one or more networks for on-the-fly schema evolution
WO2022257575A1 (en) Data processing method, apparatus, and device
US11816511B1 (en) Virtual partitioning of a shared message bus
WO2022237506A1 (en) Method, apparatus, and device for monitoring online diagnosis service, and storage medium
WO2022178976A1 (en) Information processing method and apparatus based on big data, and related devices
CN115344207A (en) Data processing method and device, electronic equipment and storage medium
CN114691050A (en) Cloud native storage method, device, equipment and medium based on kubernets
Matri et al. Týr: blob storage meets built-in transactions
US11853364B2 (en) Level-based queries in a database system and methods for use therewith
CN114925075B (en) Real-time dynamic fusion method for multi-source time-space monitoring information
US11888938B2 (en) Systems and methods for optimizing distributed computing systems including server architectures and client drivers
CN110837970A (en) Regional health platform quality control method and system
CN113688009B (en) Cloud host monitoring data acquisition method, system and equipment of cloud platform
Chen et al. Big data storage architecture design in cloud computing
CN106557492A (en) A kind of method of data synchronization and device
CN113157645B (en) Cluster data migration method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21927409

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21927409

Country of ref document: EP

Kind code of ref document: A1