CN112650755A - Data storage method, method for querying data, database and readable medium - Google Patents

Data storage method, method for querying data, database and readable medium Download PDF

Info

Publication number
CN112650755A
CN112650755A CN202011563163.2A CN202011563163A CN112650755A CN 112650755 A CN112650755 A CN 112650755A CN 202011563163 A CN202011563163 A CN 202011563163A CN 112650755 A CN112650755 A CN 112650755A
Authority
CN
China
Prior art keywords
data
edge node
stored
information
queried
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011563163.2A
Other languages
Chinese (zh)
Inventor
黄松
沈达宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011563163.2A priority Critical patent/CN112650755A/en
Publication of CN112650755A publication Critical patent/CN112650755A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Abstract

The disclosure provides a data storage method, a data query method, a database and a readable medium, and relates to the technical field of computers, in particular to the technical field of cloud computing and edge computing. The data storage method may be performed at an edge node, and includes: receiving data to be stored; caching the received data; compressing the data of the same time sequence in the cached data; writing the compressed data, wherein the written data includes a timestamp associated with the received data. By using the data storage method and the data query method provided by the disclosure, time series data can be efficiently stored and queried at the edge data.

Description

Data storage method, method for querying data, database and readable medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to the field of edge computing and cloud computing technologies, and in particular, to a data storage method, a data query method, an apparatus, and a readable medium.
Background
With the development of the internet of things and chip technology, the processing capability of the edge device is stronger and stronger, so that more and more business processes are processed by the edge device. In the related art, the main functions of the edge computing platform are focused on the aspects of equipment management, cloud-edge cooperation, application delivery and the like.
Disclosure of Invention
According to an aspect of the embodiments of the present disclosure, there is provided a data storage method performed at an edge node, including: receiving data to be stored; caching the received data; compressing the data of the same time sequence in the cached data; writing the compressed data, wherein the written data includes a timestamp associated with the received data.
According to another aspect of the present disclosure, there is also provided a method for querying data, including: generating a query request, wherein the query request includes a time range of data to be queried; parsing the query request to determine edge nodes associated with data to be queried; obtaining the data to query based on the information of the associated edge node and the time range.
According to another aspect of the present disclosure, there is also provided a data storage apparatus executed at an edge node, including: a receiving unit configured to receive data to be stored; a cache unit configured to cache the data to be stored; the compression unit is configured to compress the data of the same time sequence in the cached data; a time series data storage unit configured to write compressed data, wherein the written data includes a time stamp associated with the written data.
According to another aspect of the present disclosure, there is also provided an apparatus for querying data, including: a query request generating unit configured to generate a query request, wherein the query request includes a time range of data to be queried; a parsing unit configured to parse the query request to determine an edge node associated with data to be queried; a query data obtaining unit configured to obtain the data to be queried based on the information of the associated edge node and the time range.
According to another aspect of the present disclosure, there is also provided a database including: a log storage unit configured to save data to be stored in a form of a log; a cache unit configured to cache the data to be stored; a data storage unit comprising: a meta information storing subunit configured to store meta information associated with data to be stored; an index information storage subunit configured to store index information associated with data to be stored; a time series data storage unit configured to store key-value pairs made up of the data to be stored and a time stamp associated with the data to be stored.
According to another aspect of the present disclosure, there is also provided a computer device including: a memory, a processor and a computer program stored on the memory, wherein the processor is configured to execute the computer program to implement the steps of the method as described before.
According to another aspect of the present disclosure, there is also provided a non-transitory computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method as previously described.
According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program realizes the steps of the method as described before when executed by a processor.
By means of the scheme of the embodiment of the disclosure, the time sequence data collected by the edge device can be stored at the edge device, and the data transmission pressure and the network requirement between the edge device and the cloud end are reduced. In addition, the time sequence data are stored at the edge device, so that the stored time sequence data can be acquired without accessing the cloud under some conditions, and the data processing pressure of the cloud is further reduced.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements, in which:
fig. 1 is a schematic diagram of an exemplary system in which various methods and apparatus described herein may be implemented, according to some exemplary embodiments of the present disclosure.
Fig. 2 shows a schematic flow diagram of a data storage method performed at an edge node according to an embodiment of the present disclosure;
FIG. 3 shows another schematic flow diagram of a data storage method performed at an edge node according to an embodiment of the present disclosure;
FIG. 4 shows a flow diagram of a process for querying data, in accordance with an embodiment of the present disclosure;
FIG. 5 shows a schematic flow of a method of edge cloud collaborative data querying according to an embodiment of the present disclosure;
FIG. 6 shows a schematic block diagram of a data storage at an edge node according to an embodiment of the present disclosure;
FIG. 7 shows a schematic block diagram of an apparatus for querying data according to an embodiment of the present disclosure;
FIG. 8 shows a schematic block diagram of a database according to an embodiment of the present disclosure;
FIG. 9 shows another schematic block diagram of a database in accordance with an embodiment of the present disclosure; and
fig. 10 is a schematic block diagram of an example computing device, according to an example embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present disclosure better understood, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure. It is to be understood that the described embodiments are merely a subset of the disclosed embodiments and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
The terms "first" and "second," and the like in the description and claims of the present disclosure and the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Cloud computing (cloud computing) refers to a technology architecture that accesses a flexibly extensible shared physical or virtual resource pool through a network, where resources may include servers, operating systems, networks, software, applications, storage devices, and the like, and may be deployed and managed in an on-demand, self-service manner. Through the cloud computing technology, high-efficiency and strong data processing capacity can be provided for technical application and model training of artificial intelligence, block chains and the like. The cloud computing is arranged in a core network of a data center, and the data of the terminal is collected through network equipment layer by layer, and big data analysis is carried out by means of strong storage and computing capacity.
In contrast to cloud computing, edge computing is to provide cloud services and IT environment services for application developers and service providers on the edge side of a network, with the goal of providing computing, storage, and network bandwidth near data input or users. The edge calculation can analyze real-time and short-period data, can more efficiently perform real-time intelligent processing and execution on local data, and relieves the data flow and the workload of a cloud in a network.
The time sequence data storage has the advantages of high-efficiency reading and writing, high compression ratio storage and the like. Aiming at the data acquisition scene of the equipment of the Internet of things, the time sequence data storage can solve the problems of high storage cost, low writing and query analysis efficiency and the like caused by huge number of equipment acquisition points and high data acquisition frequency. Through the analysis of the time series formed by the time series data, the statistical characteristics and the development regularity of the time series in the sample can be found out. Therefore, the time sequence data in the field of the Internet of things has important application value.
With the development of internet of things and chip technology, the processing capability of devices (e.g., edge devices) is becoming stronger, and more things are being put to the edge devices for processing. However, in the related art, the edge device needs to transmit the time-series data collected and generated at the edge device to the server device as the cloud and store in the time-series database of the server as the cloud. This requires the edge device and the server in the cloud to maintain a network connection, and to write the time series data at the edge device directly into the time series database in the cloud through the network connection. However, since the network connection at the edge device cannot be kept stable all the time, the server in the cloud may also fail, and thus, there is a risk that data to be written into the cloud by the edge device is lost.
Once the network connection between the edge device and the cloud or the server itself of the cloud fails, the risk of data write failure needs to be reduced by adding retry or caching logic in the code. Although this method can reduce the risk of write failure to some extent, data loss cannot be avoided if a network failure occurs for a long time.
In addition, data can also be stored by building a database on the edge device. However, due to the limited storage and computation capabilities of the edge devices, it is difficult to efficiently store and query massive amounts of time series data at the edge devices.
Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure. As shown in fig. 1, the system 100 includes a user terminal 101, a server 102, and an edge device 103.
Illustratively, the server 102 may provide other services or software applications that may include non-virtual environments and virtual environments. In some embodiments, these services may be provided as web-based services or cloud services, for example, provided to the user terminal 101 under a software as a service (SaaS) model. In some examples, the server 102 may be an edge computing system cloud.
In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 101. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user of the user terminal 101 may, in turn, utilize one or more client applications to interact with the server 102 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 100. Accordingly, fig. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.
The user can use the user terminal 101 to perform upload and management of application modules and the like. The user terminal 101 may provide an interface that enables a user of the user terminal 101 to interact with the client device. The user terminal 101 may also output information to the user via the interface.
Illustratively, the user terminal 101 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and so forth. These computer devices may run various types and versions of software applications and operating systems, such as Microsoft Windows, Apple iOS, UNIX-like operating systems, Linux, or Linux-like operating systems (e.g., Google Chrome OS); or include various Mobile operating systems, such as Microsoft Windows Mobile OS, iOS, Windows Phone, Android. Portable handheld devices may include cellular telephones, smart phones, tablets, Personal Digital Assistants (PDAs), and the like. Wearable devices may include head mounted displays and other devices. The gaming system may include a variety of handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), Short Message Service (SMS) applications, and may use a variety of communication protocols.
Illustratively, the server 102 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, the server 120 may run one or more services or software applications that provide the functionality described below.
Illustratively, the computing units in server 102 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. The server 102 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.
Illustratively, the server 102 may include one or more applications to analyze and consolidate data feeds and/or event updates received from a user of the user terminal 101. The server 102 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of the user terminal 101.
Illustratively, edge devices 103 are devices that provide an entry point to an enterprise or service provider core network, which may be, for example, routers, routing switches, Integrated Access Devices (IADs), multiplexers, and various Metropolitan Area Network (MAN) and Wide Area Network (WAN) access devices. In other examples, the edge device 103 may also include, but is not limited to, for example, a smart router, a smart stereo, a Network Attached Storage (NAS), a webcam, a storable network device, a smart watch, a smart television, a monitor, and the like.
It is understood that the user terminal 101, the server 102, and the edge device 103 may communicate over a network. The network may be any type of network known to those skilled in the art that may support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.
Fig. 2 shows a schematic flow diagram of a data storage method performed at an edge node according to an embodiment of the present disclosure. With the method illustrated in fig. 2, the time series data generated at the edge node may be stored in the form of a time series database. In some embodiments, the data stored in the utilization method 200 may be data generated or collected by a single edge node. In other embodiments, the data stored in the utilization method 200 may be used for data generated by a plurality of edge nodes within a distance.
In step S202, data to be stored may be received. The received data to be stored may be time series data. Where the time series data may be time dependent information that may be used to reflect the data over time. For example, the time series data may be monitoring data for certain information.
In some embodiments, the received data to be stored may be monitoring data generated at or collected by an edge device that is an edge node. For example, the edge device may be an onboard electronic system, and the data to be stored may be various vehicle travel data (e.g., vehicle travel speed, acceleration, travel route, etc.) collected by the onboard electronic system through sensors. For another example, the edge device may be a wearable device, and the data to be stored may be various physiological information of the human body (such as blood pressure, body temperature, heartbeat, etc. of the human body) collected by the wearable device through a sensor.
In other embodiments, the data to be stored is sent by the other edge node to the current edge node for storage.
In step S204, the received data may be buffered.
In some embodiments, the data received by the edge node may be cached in the memory first, and the data in the cache is written to the disk after the data is accumulated to the predetermined data amount. Compared with the writing of the memory, the writing of the disk needs to consume more resources, so that the writing performance of the edge device can be improved by reducing the times of writing the disk in the above mode.
In addition, by buffering the received data in the memory, data having the same property can be aggregated together. For example, data received at the same time may be aggregated together, or data from the same object may be aggregated together.
In step S206, the same time-series data in the buffered data may be compressed.
In some embodiments, the data of the same time series may be data from the same object or data having the same property. When data is queried, data analysis processing can be provided for data of the same time series to obtain a data analysis result for a certain object or a certain characteristic. For example, statistics such as an average value and a sum of data in the same time series can be obtained by data analysis processing.
In some implementations, various efficient coding schemes may be employed to compress the data. For example, data compression may be achieved using predictive coding, transform coding, vector quantization coding, subband coding, neural network coding, and the like.
The compression rate of data can be improved by combining the data of the same time sequence together for compression, so that the data is more suitable for being stored at the edge node with limited storage capacity and computing capacity.
In step S208, the compressed data may be written, where the written data may include a timestamp associated with the received data.
In some embodiments, the timestamp associated with the data received in step S202 may be stored in association with the numerical value of the data.
By using the data storage method executed at the edge node, the time sequence data can be stored at the edge node, so that the problem that the data at the edge node cannot be synchronized to the cloud when the network connection or the cloud breaks down can be solved. By caching the data in the memory, the read-write operation at the edge node can be reduced, and the write-in performance at the edge node is improved. By compressing the data of the same time series, the storage capacity at the edge node can be improved.
Fig. 3 shows another schematic flow diagram of a data storage method performed at an edge node according to an embodiment of the present disclosure.
As shown in fig. 3, in step S302, time-series data may be written. The received data to be stored may be processed using steps S202 to S208 described in conjunction with fig. 2, and the processed time-series data may be written to a disk at the edge node.
Various processes can be performed on the devices stored at the edge node using S304 to S308. Although various operations performed on the written time-series data are illustrated in the form of the flowchart 300 in fig. 3, in fact, a person skilled in the art may perform the steps illustrated in steps S304 to S308 in a different order according to the actual situation, or repeat or omit at least one of steps S304, S306, and S308 according to the actual situation.
In some embodiments, the written timing data may include a key-value pair of a time stamp and a numerical value of the data in association with the data to be stored. The time series data can be efficiently stored by utilizing the key value pair form, and the subsequent query for the time series data is facilitated.
In some embodiments, the written data may also include meta information and index information associated with the received data. Wherein the meta-information can be used for fast screening of the data to be queried. For example, it is possible to quickly determine whether data having the same meta information as the data to be queried is stored in the edge node using the query meta information. If it is determined that the meta-information of the data stored in the edge node is different from the meta-information of the data to be queried, the step of reading the data stored on the disk may be omitted. The index information may be used to quickly locate data having the same properties as the data to be queried among the stored data at the time of querying. The meta information may include, among other things, Metric (Metric) information and domain (Field) information of the received data. Metric information in a time-series database is similar to a relational database Table (Table), and domain information may be used to represent field information representing measured values of data. The index information may include identification information associated with the received data. For example, the index information may be used to represent Identification (ID) information of a device used to collect or generate the received data. In some examples, time series data having the same index information may be stored sequentially and may be compressed by various efficient data encoding schemes.
In step S304, the received data may be synchronized to the cloud. By synchronizing the received data to the cloud, it is possible to reduce the size of the data amount of the data stored at the edge node and alleviate the storage pressure at the edge node. Further, a synchronization rate to synchronize the received data to the cloud may be determined based on the usage information of the edge node. By performing synchronization in a period in which the operating pressure of the edge node is small, the data read-write pressure at the edge node can be reduced.
In some embodiments, the synchronized data is in log form. In some implementations, in addition to storing data received at an edge node as time series data according to the method described in connection with fig. 2, data received at the edge node may be written to disk in the form of a log. For example, the received data may be written to disk based on Write Ahead Logging (WAL) techniques. Such log-form data retains the original information of the received data. Since the log-form data is not subjected to complicated encoding and compression processes, various information (such as meta information, index information, etc.) associated with the data can be easily extracted from the log-form data.
In addition, the data written to the disk in log form can also be used as a backup for the time series data stored at the edge node. If the equipment at the edge node encounters a fault and needs to be restarted, the data in the cache and the data which is not completely compressed and encoded can be lost. In this case, the data can be recovered from the log.
In some examples, the synchronization rate may be set to a lower value during busy periods of the edge nodes and to a higher value during idle periods of the edge nodes.
As previously described, in embodiments provided by the present disclosure, devices accepted by an edge node may be stored at the edge node in the form of time-series data without being transmitted in real-time to the cloud for storage. Therefore, the data synchronization rate can be set according to the use information of the edge node, so that the read-write pressure of the edge node is reduced. For example, the synchronization rate may be set to a lower value during the day period and to a higher value during the night period.
In some implementations, the log data can be sent to the cloud in a streaming manner. The cloud can analyze the log data and write the data into a time sequence database of the cloud. The time sequence database in the cloud may be any time sequence database, such as infiluxdb, OpenTSDB, and the like.
In step S306, data before a predetermined time among the written data is cleared by an expiration mechanism. By reducing the data amount of data stored at the edge node, the storage pressure at the edge node can be reduced.
Because the storage capacity of the device at the edge node is limited, massive data cannot be stored like a cloud time sequence database, and the device generally does not have capacity expansion capacity, so that part of data stored at the edge node can be cleared through a preset expiration mechanism.
In some embodiments, the preset expiration mechanism may include clearing data prior to a predetermined time at a predetermined time period. For example, portions of the data stored at the edge nodes (e.g., data stored a week ago) may be purged daily, every third day, or weekly (or any other predefined period of time).
In other embodiments, the preset expiration mechanism may include performing a purge of data prior to a predetermined time when the amount of stored data exceeds a predetermined data amount threshold.
In step S308, the method 300 may further include: determining a plurality of files smaller than a predetermined file size included in the written data, and merging the plurality of files smaller than the predetermined file size. By combining a plurality of small files into one large file, the number of storage files traversed during query can be reduced, and the query efficiency is improved.
In some cases, if less data is received at the edge node, it may happen that the data is written to disk in the form of multiple smaller sized files. In this case, since the number of files is large, the number of files to be traversed when performing data query is also large, thereby reducing query efficiency. In order to improve efficiency of querying data, when storing data, a plurality of files smaller than a predetermined file size may be merged into one file to reduce the number of stored files.
FIG. 4 shows a flow diagram of a process for querying data, in accordance with an embodiment of the present disclosure. The data query flow 400 shown in fig. 4 may be implemented using an edge device as an edge node.
As shown in fig. 4, in step S402, a query request may be generated, wherein the query request includes a time range of data to be queried.
In some embodiments, the query request may include identification information associated with the data to be queried. In some implementations, the query request can be used to query a value of data within a predetermined period of time. In other implementations, the query request may be for numerical processing results (e.g., sums, averages, standard deviations, etc.) for data over a predetermined period of time.
In some embodiments, the query request may be generated in response to input from a user or in response to a request from a user terminal. Parameters of the query request, such as a time range associated with the query request, identification information, etc., may be determined in response to content input by the user or a request from the user terminal. In other embodiments, the query request may be generated based on a preset rule. For example, a command for generating a query request may be written within an application installed in an edge device or a user terminal to generate a query request for data at a predetermined timing or in response to the occurrence of a preset event.
In step S404, the query request may be parsed to determine edge nodes associated with the data to be queried.
In some embodiments, the edge node may obtain parameters of the query request, such as a time range, identification information, and/or numerical processing results associated with the query request, by parsing the query request.
In some implementations, the edge node associated with the data to be queried can be determined based on identification information associated with the query request. In some examples, the identification information associated with the query request indicates that the data to be queried was collected or generated by the device indicated by the identification information. Therefore, the data to be queried may be stored in the disk of the device indicated by the identification information.
In step S406, data to be queried may be obtained based on the information of the associated edge node and the time range.
In response to determining that the edge node associated with the data to be queried includes at least two edge nodes, a cloud corresponding to the determined at least two edge nodes is accessed to obtain the data to be queried. Since the data to be queried involves at least two edge nodes, it is difficult to obtain the data to be queried by accessing the database at a single edge node. In this case, the data to be queried may be obtained by accessing the cloud corresponding to the determined at least two edge nodes.
And in response to determining that the time range exceeds a preset time range threshold, accessing a cloud corresponding to the edge node to obtain the data to be queried. As mentioned before, due to the limited storage capacity of the edge node, data stored in the edge node before a predetermined time may be cleared based on an expiration mechanism. Thus, if the time range associated with the query request exceeds a preset time range threshold, the data to be queried may have been cleared in the edge node. Therefore, in this case, the data to be queried can be obtained by accessing the cloud corresponding to the edge node.
In response to determining that the edge node associated with the data to be queried comprises a single edge node and that the time range does not exceed a preset time range threshold, accessing the single edge node to obtain the data to be queried. As previously described, in the case where the data to be queried is associated with a single edge node and the time range does not exceed the preset time range threshold, the data to be queried may be acquired at the associated edge node.
In the case where data to be queried is stored at an edge node, the data to be queried is acquired by accessing the edge node without accessing a cloud server. Because the network delay of the user for accessing the edge node is less than the network delay of the user for accessing the cloud server, better query performance can be achieved.
Fig. 5 shows a schematic flow of an edge cloud collaborative data query method according to an embodiment of the present disclosure. In some embodiments, the steps shown in fig. 5 may be performed by an edge node.
As shown in fig. 5, at step S501, the query starts.
At step S502, it may be determined by parsing the query request whether the query request originated to a timing database at the cloud or to an edge timing database at the edge node.
In a case where it is determined that the query request is initiated to the cloud timing database, the method may proceed to step S503, where the query is initiated to the cloud timing database.
In the event that it is determined that the query request is not to be initiated to the cloud-based timing database, the method may proceed to step S504, where the query is initiated to the edge timing database.
In step S505, it may be determined whether the data to be queried by the query request involves multiple edge nodes.
In the event that it is determined whether the data to be queried involves multiple edge nodes, the method may proceed to step S503, initiating a query to a timing database at the cloud.
In the event that it is determined that the data to be queried involves multiple edge nodes, the method may proceed to step S506.
In step S506, it may be determined whether the time range to be queried by the query request exceeds the data expiration time of the edge node.
In the event that it is determined that the time range to be queried exceeds the data expiration time of the edge node, the method may proceed to step S503, initiating a query to a timing database in the cloud.
In the event that it is determined that the time range to be queried does not exceed the data expiration time of the edge node, the method may proceed to step S507.
In step S507, it may be determined whether data to be queried is stored at an edge node currently executing the method.
In the case that it is determined that the data to be queried is stored in the current node, the method may proceed to step S508, where the time series data storage engine of the current node is queried to obtain a query result.
In the event that it is determined that the data to be queried is stored at the other edge node, the method may proceed to step S509, initiating a query to another edge node indicated in the query request.
In step S510, a query result may be returned to the calling end. For example, the query results may be sent to an output device of the user terminal or edge node.
Fig. 6 shows a schematic block diagram of a data storage at an edge node according to an embodiment of the present disclosure. As shown in fig. 6, the data storage device 600 may include a receiving unit 610, a buffering unit 620, a compressing unit 630, and a time-series data storage unit 640.
The receiving unit 610 may be configured to receive data to be stored. The buffering unit 620 may be configured to buffer the received data. The compression unit 630 may be configured to compress the same time series of data among the buffered data. The time series data storage unit 640 may be configured to write compressed data, where the written data includes a timestamp associated with the received data.
The operations of the units 610-640 of the data storage device 600 are similar to the operations of the steps S202-S208 described above, and are not described again.
By utilizing the data storage device executed at the edge node, the storage of time sequence data can be realized at the edge node, so that the problem that the data at the edge node cannot be synchronized to the cloud end when the network connection or the cloud end fails can be solved. By caching the data in the memory, the read-write operation at the edge node can be reduced, and the write-in performance at the edge node is improved. By compressing the data of the same time series, the storage capacity at the edge node can be improved.
Fig. 7 shows a schematic block diagram of an apparatus for querying data according to an embodiment of the present disclosure. As shown in fig. 7, an apparatus 700 may comprise.
The query request generating unit 710 may be configured to generate a query request, wherein the query request comprises a time range of the data to be queried. The parsing unit 720 may be configured to parse the query request to determine edge nodes associated with the data to be queried. The query data acquisition unit 730 may be configured to acquire data to be queried based on the information of the associated edge node and the time range.
The operations of the units 710-730 of the apparatus 700 are similar to the operations of the steps S402-S406, and are not repeated herein.
Fig. 8 shows a schematic block diagram of a database according to an embodiment of the present disclosure. The database shown in fig. 8 may be provided at an edge node and may be used to store data collected or generated by at least one edge node.
As shown in fig. 8, the database 800 may include a log storage unit 810. The log storage unit 810 may write the received data in the form of a log. For example, the received data may be written to disk based on Write Ahead Logging (WAL) techniques. The data stored in the log storage unit 810 can be used for backup and for synchronization to the cloud.
The database 800 may further include a caching unit 820. The buffering unit 820 may be used to buffer the received data. In some embodiments, the data received by the edge node may be cached in the memory first, and the data in the cache is written to the disk after the data is accumulated to the predetermined data amount.
The database 800 may also include a data storage unit 830. Among them, the data storage unit 830 may include a meta information storage sub-unit 831, an index information storage sub-unit 832, and a time-series data storage sub-unit 833. The meta information may include, among other things, Metric (Metric) information and domain (Field) information of the received data. The index information may include identification information associated with the received data. For example, the index information may be used to represent Identification (ID) information of a device used to collect or generate the received data. The time series data may include a key-value pair of a time stamp and a numerical value of the data in association with the data to be stored.
Fig. 9 shows another schematic block diagram of a database according to an embodiment of the present disclosure.
As shown in fig. 9, the database 900 may include a log storage unit 910, a cache unit 920, a data storage unit 930, a side cloud coordination unit 940, and a data maintenance unit 950. The journal storage unit 910, the cache unit 920, and the data storage unit 930 may be implemented by the journal storage unit 810, the cache unit 820, and the data storage unit 830 shown in fig. 8, and are not described herein again.
The edge cloud cooperation unit 940 may be configured to synchronize data stored in the log storage unit 910 to a server in the cloud. A synchronization rate for synchronizing the received data to the cloud may be determined based on the usage information of the edge nodes.
The data maintenance unit 950 may be used to clear written data of data before a predetermined time through an expiration mechanism. The data maintenance unit 950 may also determine a plurality of files smaller than a predetermined file size included in the written data and merge the plurality of files smaller than the predetermined file size in some embodiments.
An exemplary embodiment of the present disclosure also provides a computer apparatus including: a memory, a processor and a computer program stored on the memory, wherein the processor is configured to execute the computer program to implement the steps of the method as described before. .
The exemplary embodiments of the present disclosure also provide a non-transitory computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method as previously described.
Exemplary embodiments of the present disclosure also provide a computer program product comprising a computer program, wherein the computer program realizes the steps of the method as described before when being executed by a processor.
Examples of such electronic devices and computer-readable storage media are described below with reference to fig. 10.
Fig. 10 illustrates an example configuration of a computing device 1000 as an electronic device that may be used to implement the modules and functions described herein. Computing device 1000 may be a variety of different types of devices, such as a server of a service provider, a device associated with a user terminal (e.g., a client device), a system on a chip, and/or any other suitable computing device or computing system. Examples of computing device 1000 include, but are not limited to: a desktop computer, a server computer, a notebook or netbook computer, a mobile device (e.g., a tablet or phablet device, a cellular or other wireless phone (e.g., a smartphone), a notepad computer, a mobile station), a wearable device (e.g., glasses, a watch), an entertainment device (e.g., an entertainment appliance, a set-top box communicatively coupled to a display device, a game console), a television or other display device, an automotive computer, and so forth. Thus, the computing device 1000 may range from a full resource device with substantial memory and processor resources (e.g., personal computers, game consoles) to a low-resource device with limited memory and/or processing resources (e.g., traditional set-top boxes, hand-held game consoles).
Computing device 1000 may include at least one processor 1002, memory 1004, communication interface(s) 1006, display device 1008, other input/output (I/O) devices 1010, and one or more mass storage devices 1012, capable of communication with each other, such as by system bus 1014 or other appropriate connection.
The processor 1002 may be a single processing unit or multiple processing units, all of which may include single or multiple computing units or multiple cores. The processor 1002 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitry, and/or any devices that manipulate signals based on operational instructions. The processor 1002 may be configured to retrieve and execute computer-readable instructions, such as program code for an operating system 1016, program code for an application 1018, program code for other programs 1020, and so forth, stored in the memory 1004, mass storage device 1012, or other computer-readable medium, among other capabilities.
The memory 1004 and mass storage devices 1012 are examples of computer storage media for storing instructions that are executed by the processor 1002 to implement the various functions described above. By way of example, the memory 1004 may generally include both volatile and nonvolatile memory (e.g., RAM, ROM, and the like). In addition, mass storage devices 1012 may generally include hard disk drives, solid state drives, removable media, including external and removable drives, memory cards, flash memory, floppy disks, optical disks (e.g., CDs, DVDs), storage arrays, network attached storage, storage area networks, and the like. Memory 1004 and mass storage 1012 may both be referred to herein collectively as memory or computer storage media, and may be non-transitory media capable of storing computer-readable, processor-executable program instructions as computer program code that may be executed by processor 1002 as a particular machine configured to implement the operations and functions described in the examples herein.
A number of program modules may be stored on the mass storage device 1012. These programs include an operating system 1016, one or more application programs 1018, other programs 1020, and program data 1022, and can be loaded into memory 1004 for execution. Examples of such applications or program modules may include, for instance, computer program logic (e.g., computer program code or instructions) for implementing the following components/functions: the receiving unit 610, the caching unit 620, the compressing unit 630, and the time series data storage unit 640 described in conjunction with fig. 6, the query request generating unit 710, the parsing unit 720, and the query data obtaining unit 730 described in conjunction with fig. 7, the log storage unit 810, the caching unit 820, and the data storage unit 830 described in conjunction with fig. 8, the log storage unit 910, the caching unit 920, the data storage unit 930, the edge cloud coordination unit 940, and the data maintenance unit 950 described in conjunction with fig. 9, the method 200, the method 300, the method 400 described in conjunction with fig. 2-4, and/or further embodiments described herein.
Although illustrated in fig. 10 as being stored in memory 1004 of computing device 1000, modules 1016, 1018, 1020, and 1022, or portions thereof, may be implemented using any form of computer-readable media that is accessible by computing device 1000. As used herein, "computer-readable media" includes at least two types of computer-readable media, namely computer storage media and communication media.
Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device.
In contrast, communication media may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism. Computer storage media, as defined herein, does not include communication media.
Computing device 1000 may also include one or more communication interfaces 1006 for exchanging data with other devices, such as over a network, direct connection, etc., as discussed above. Such communication interfaces may be one or more of the following: any type of network interface (e.g., a Network Interface Card (NIC)), wired or wireless (such as IEEE 802.11 wireless lan (wlan)) wireless interface, a global microwave access interoperability (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth. Communication interface 1006 may facilitate communications within a variety of networks and protocol types, including wired networks (e.g., LAN, cable, etc.) and wireless networks (e.g., WLAN, cellular, satellite, etc.), the Internet, and so forth. Communication interface 1006 may also provide for communication with external storage devices (not shown), such as in a storage array, network attached storage, storage area network, or the like.
In some examples, a display device 1008, such as a monitor, may be included for displaying information and images to a user. Other I/O devices 1010 may be devices that receive various inputs from a user and provide various outputs to the user, and may include touch input devices, gesture input devices, cameras, keyboards, remote controls, mice, printers, audio input/output devices, and so forth.
While the disclosure has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative and exemplary and not restrictive; the present disclosure is not limited to the disclosed embodiments. Variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed subject matter, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps not listed, the indefinite article "a" or "an" does not exclude a plurality, and the term "a plurality" means two or more. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Claims (18)

1. A data storage method performed at an edge node, comprising:
receiving data to be stored;
caching the received data;
compressing the data of the same time sequence in the cached data;
writing the compressed data, wherein the written data includes a timestamp associated with the received data.
2. The method of claim 1, wherein the written data comprises a key-value pair consisting of the timestamp and a value of the received data.
3. The method of claim 1, wherein the written data further comprises meta information and index information associated with the received data, wherein the meta information comprises metric information and domain information of the received data, and the index information comprises identification information associated with the received data.
4. The method of any of claims 1-3, further comprising:
synchronizing the received data to a cloud, wherein a synchronization rate for synchronizing the received data to the cloud is determined based on the usage information of the edge node.
5. The method of any of claims 1-3, further comprising:
and clearing the data before the preset time in the written data through an expiration mechanism.
6. The method of any of claims 1-3, further comprising:
determining a plurality of files smaller than a predetermined file size included in the written data;
and merging the plurality of files smaller than the predetermined file size.
7. A method for querying data, comprising:
generating a query request, wherein the query request includes a time range of data to be queried;
parsing the query request to determine edge nodes associated with data to be queried;
obtaining the data to query based on the information of the associated edge node and the time range.
8. The method of claim 7, wherein obtaining the data to query based on the information of the associated edge node and the time range comprises:
in response to determining that an edge node associated with data to query includes at least two edge nodes, a cloud corresponding to the edge node is accessed to obtain the data to query.
9. The method of claim 7, wherein obtaining the data to query based on the information of the associated edge node and the time range comprises:
in response to determining that the time range exceeds a preset time range threshold, accessing a cloud corresponding to the edge node to obtain the data to be queried.
10. The method of claim 7, wherein obtaining the data to query based on the information of the associated edge node and the time range comprises:
in response to determining that the edge node with which the data to be queried is associated comprises a single edge node and that the time range does not exceed a preset time range threshold, accessing the single edge node to obtain the data to be queried.
11. A data storage device at an edge node, comprising:
a receiving unit configured to receive data to be stored;
a cache unit configured to cache the data to be stored;
the compression unit is configured to compress the data of the same time sequence in the cached data;
a time series data storage unit configured to write compressed data, wherein the written data includes a time stamp associated with the written data.
12. An apparatus for querying data, comprising:
a query request generating unit configured to generate a query request, wherein the query request includes a time range of data to be queried;
a parsing unit configured to parse the query request to determine an edge node associated with data to be queried;
a query data obtaining unit configured to obtain the data to be queried based on the information of the associated edge node and the time range.
13. A database, comprising:
a log storage unit configured to save data to be stored in a form of a log;
a cache unit configured to cache the data to be stored;
a data storage unit comprising:
a meta information storing subunit configured to store meta information associated with data to be stored;
an index information storage subunit configured to store index information associated with data to be stored;
a time series data storage subunit configured to store key-value pairs made up of the data to be stored and a time stamp associated with the data to be stored.
14. The database of claim 13, further comprising:
and the edge cloud synchronization unit is configured to synchronize the data to be stored to a cloud end.
15. The database of claim 13 or 14, further comprising:
and the data maintenance unit is configured to clear the data before the preset time in the written data through an expiration mechanism.
16. A computer device, comprising:
a memory, a processor, and a computer program stored on the memory,
wherein the processor is configured to execute the computer program to implement the steps of the method of any one of claims 1-10.
17. A non-transitory computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the steps of the method of any of claims 1-10.
18. A computer program product comprising a computer program, wherein the computer program realizes the steps of the method of any one of claims 1-10 when executed by a processor.
CN202011563163.2A 2020-12-25 2020-12-25 Data storage method, method for querying data, database and readable medium Pending CN112650755A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011563163.2A CN112650755A (en) 2020-12-25 2020-12-25 Data storage method, method for querying data, database and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011563163.2A CN112650755A (en) 2020-12-25 2020-12-25 Data storage method, method for querying data, database and readable medium

Publications (1)

Publication Number Publication Date
CN112650755A true CN112650755A (en) 2021-04-13

Family

ID=75363003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011563163.2A Pending CN112650755A (en) 2020-12-25 2020-12-25 Data storage method, method for querying data, database and readable medium

Country Status (1)

Country Link
CN (1) CN112650755A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113507369A (en) * 2021-06-18 2021-10-15 深圳先进技术研究院 Black box data access method based on block chain and cloud storage
CN113806307A (en) * 2021-08-09 2021-12-17 阿里巴巴(中国)有限公司 Data processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003273952A (en) * 2002-03-13 2003-09-26 Yazaki Corp Data transmitter and data transmission method
CN108399263A (en) * 2018-03-15 2018-08-14 北京大众益康科技有限公司 The storage of time series data and querying method and storage and processing platform
CN111309720A (en) * 2018-12-11 2020-06-19 北京京东尚科信息技术有限公司 Time sequence data storage method, time sequence data reading method, time sequence data storage device, time sequence data reading device, electronic equipment and storage medium
CN111552687A (en) * 2020-03-10 2020-08-18 远景智能国际私人投资有限公司 Time sequence data storage method, query method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003273952A (en) * 2002-03-13 2003-09-26 Yazaki Corp Data transmitter and data transmission method
CN108399263A (en) * 2018-03-15 2018-08-14 北京大众益康科技有限公司 The storage of time series data and querying method and storage and processing platform
CN111309720A (en) * 2018-12-11 2020-06-19 北京京东尚科信息技术有限公司 Time sequence data storage method, time sequence data reading method, time sequence data storage device, time sequence data reading device, electronic equipment and storage medium
CN111552687A (en) * 2020-03-10 2020-08-18 远景智能国际私人投资有限公司 Time sequence data storage method, query method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113507369A (en) * 2021-06-18 2021-10-15 深圳先进技术研究院 Black box data access method based on block chain and cloud storage
CN113806307A (en) * 2021-08-09 2021-12-17 阿里巴巴(中国)有限公司 Data processing method and device

Similar Documents

Publication Publication Date Title
US11836533B2 (en) Automated reconfiguration of real time data stream processing
CN109034993B (en) Account checking method, account checking equipment, account checking system and computer readable storage medium
US10180891B2 (en) Monitoring processes running on a platform as a service architecture
US11329928B2 (en) Dynamic allocation of network resources using external inputs
CN111090699A (en) Service data synchronization method and device, storage medium and electronic device
US8805849B1 (en) Enabling use of analytic functions for distributed storage system data
CN106815254B (en) Data processing method and device
CN113220715B (en) Data processing method, system, computer and readable storage medium
CN113010565B (en) Server real-time data processing method and system based on server cluster
US9832259B2 (en) Method and apparatus for cell configuration
CN112788270B (en) Video backtracking method, device, computer equipment and storage medium
CN111611129B (en) Performance monitoring method and device of PaaS cloud platform
CN112650755A (en) Data storage method, method for querying data, database and readable medium
CN112583898A (en) Business process arranging method and device and readable medium
WO2021068891A1 (en) Method, system, electronic device, and storage medium for storing and collecting temperature data
CN110413588B (en) Distributed object storage method and device, computer equipment and storage medium
CN109788251B (en) Video processing method, device and storage medium
CN110502510A (en) A kind of real-time analysis of WIFI terminal equipment track data and De-weight method and system
CN116010348B (en) Distributed mass object management method and device
CN116304390A (en) Time sequence data processing method and device, storage medium and electronic equipment
CN115695587A (en) Service data processing system, method, device and storage medium
CN111294231B (en) Resource management method and system
CN111935237B (en) Log processing method and system, electronic device and storage medium
RU2802373C1 (en) Systems and methods for obtaining radio access network information
CN111090818B (en) Resource management method, resource management system, server and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination