CN111291047A - Space-time data storage method and device, storage medium and electronic equipment - Google Patents

Space-time data storage method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN111291047A
CN111291047A CN202010048464.5A CN202010048464A CN111291047A CN 111291047 A CN111291047 A CN 111291047A CN 202010048464 A CN202010048464 A CN 202010048464A CN 111291047 A CN111291047 A CN 111291047A
Authority
CN
China
Prior art keywords
data
spatiotemporal
lake
storing
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010048464.5A
Other languages
Chinese (zh)
Inventor
李蔚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN202010048464.5A priority Critical patent/CN111291047A/en
Publication of CN111291047A publication Critical patent/CN111291047A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a space-time data storage method and device, a storage medium and electronic equipment. Firstly, the spatio-temporal data are cached in the kafka queue, and then the spatio-temporal data are stored in the data lake from the kafka queue through the storage component, only one set of storage component needs to be developed, so that the development and maintenance cost is reduced, and meanwhile, the storage framework is simplified, so that the data import is simpler; meanwhile, the data in kafka does not need to be subjected to structure conversion, so that the data loss is avoided; and then, the received original data are directly stored in the data storage layer without presetting a data mode in the data lake, so that a convenient data export mode is provided, and the support of the original data is provided for future data modeling and machine learning.

Description

Space-time data storage method and device, storage medium and electronic equipment
Technical Field
The present application relates to the field of data storage, and in particular, to a method and an apparatus for storing spatiotemporal data, a storage medium, and an electronic device.
Background
The spatiotemporal data generally consists of elements such as time, space (place), things (people) and the like, and can be point location data or trajectory data due to unique characteristics, so that the generation modes of the data are various, such as data collected by cameras in key places such as communities, railway stations, ports and the like, data generated by wifi probes, data generated by fingerprint card punching, data collected by imsi and the like, the output modes of the data are also eight-fold, some are json files or text files, some are relational database tables, some are real-time data streams, some are compression coded data, how to store the data, support is provided for later-stage data analysis, machine learning and deep learning, and the data are in the present of people.
In the existing data storage mode, the processing modes of batch processing and real-time flow are separated when data are imported into a data warehouse, different components and development packages need to be introduced, and even two different programs need to be written for realization, so that the problems are more difficult to realize and maintain, and the risk of problems of the architecture is increased invisibly. Secondly, due to the diversity of space-time data sources, structured data, unstructured data and even semi-structured data, a data warehouse can only store the structured data, which inevitably causes a problem that the data in the formats are converted into the structured data, sometimes the conversion is at the cost of performance overhead, sometimes the precision of some data is lost, even some data cannot be converted into the structured data by root pressing, and then the data can only be discarded, and also causes certain difficulty for later-stage data analysis and data mining.
Disclosure of Invention
An object of the present application is to provide a spatio-temporal data storage method, apparatus, storage medium and electronic device, which solve the above problems.
In order to achieve the above purpose, the embodiments of the present application employ the following technical solutions:
in a first aspect, an embodiment of the present application provides a spatiotemporal data storage method, where the method includes:
caching spatiotemporal data into a kafka queue, wherein the spatiotemporal data comprises batch processing data and stream processing data;
storing the spatiotemporal data from the kafka queue into a data lake through a storage component, wherein the data lake is used for storing the batch processing data and the stream processing data, and the original format of the stored data is maintained.
In a second aspect, an embodiment of the present application provides a spatiotemporal data storage apparatus, the apparatus including:
the data caching unit is used for caching spatio-temporal data into a kafka queue, wherein the spatio-temporal data comprises batch processing data and stream processing data;
and the processing unit is used for storing the spatiotemporal data from the kafka queue into a data lake through a storage component, wherein the data lake is used for storing the batch processing data and the stream processing data, and the original format of the stored data is maintained.
In a third aspect, the present application provides a storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the method described above.
In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor and memory for storing one or more programs; the one or more programs, when executed by the processor, implement the methods described above.
Compared with the prior art, the spatio-temporal data storage method, the spatio-temporal data storage device, the spatio-temporal data storage medium and the electronic equipment provided by the embodiment of the application have the beneficial effects that: firstly, the spatio-temporal data are cached in the kafka queue, and then the spatio-temporal data are stored in the data lake from the kafka queue through the storage component, only one set of storage component needs to be developed, so that the development and maintenance cost is reduced, and meanwhile, the storage framework is simplified, so that the data import is simpler; meanwhile, the data in kafka does not need to be subjected to structure conversion, so that the data loss is avoided; and then, the received original data are directly stored in the data storage layer without presetting a data mode in the data lake, so that a convenient data export mode is provided, and the support of the original data is provided for future data modeling and machine learning.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and it will be apparent to those skilled in the art that other related drawings can be obtained from the drawings without inventive effort.
FIG. 1 is a diagram of a conventional spatiotemporal data storage architecture provided by an embodiment of the present application;
fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
FIG. 3 is a schematic flow chart illustrating a spatiotemporal data storage method according to an embodiment of the present disclosure;
FIG. 4 is a schematic flow chart illustrating a spatiotemporal data storage method according to an embodiment of the present application;
fig. 5 is a schematic diagram illustrating the substeps of S103 according to an embodiment of the present application;
FIG. 6 is a schematic flow chart illustrating a spatiotemporal data storage method according to an embodiment of the present application;
fig. 7 is a schematic diagram illustrating the substeps of S102 according to an embodiment of the present disclosure;
fig. 8 is a schematic view of another substep of S102 according to an embodiment of the present disclosure;
FIG. 9 is a schematic diagram of the elements of a spatiotemporal data storage device according to an embodiment of the present application.
In the figure: 10-a processor; 11-a memory; 12-a bus; 13-a communication interface; 201-a data caching unit; 202-processing unit.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
In the description of the present application, it should be noted that the terms "upper", "lower", "inner", "outer", and the like indicate orientations or positional relationships based on orientations or positional relationships shown in the drawings or orientations or positional relationships conventionally found in use of products of the application, and are used only for convenience in describing the present application and for simplification of description, but do not indicate or imply that the referred devices or elements must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present application.
In the description of the present application, it is also to be noted that, unless otherwise explicitly specified or limited, the terms "disposed" and "connected" are to be interpreted broadly, e.g., as being either fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood in a specific case by those of ordinary skill in the art.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
By spatiotemporal data is meant data having temporal and spatial dimensions, and will generally contain such items: the ID of an entity, namely the ID of the entity, which can uniquely identify the entity, such as the number of the entity; time: the time of occurrence of the event may include a start time and an end time; position: the place representing the event can be represented by the ID of the place, or by the specific latitude and longitude. The space-time data generation mode is various, for example, data are collected by cameras in key places such as communities, railway stations and ports, data generated by wifi probes, data generated by fingerprint card punching, data collected by imsi and the like, the data output mode is also five-door, some data are json files or text files, some data are relational database tables, some data are real-time data streams, some data are compressed and encoded data, how to store the data is to provide support for later-stage data analysis, machine learning and deep learning, and the method is a current business in front of people. The imsi is an international mobile subscriber identity, is used for distinguishing different users in the network and is an identity which cannot be repeated; json is a format of object representation.
The prior art provides a technical solution, please refer to fig. 1, and fig. 1 shows a prior spatiotemporal data storage architecture. Firstly, multi-source spatiotemporal data are butted through an ETL module, a data source based on a file can acquire data by using a similar button, a base table stored in a database can acquire data through a Canal log, and the method can be understood as a batch processing mode; for the stream data generated by the similar sensing device, the stream data can be passively received by a way of initiating a socket service through a netty or initiating a web service through an http group initiating element, or can be actively pulled by a socket or an http client. Of course, data can be collected by customization for other formats of data. The collected data needs to be subjected to basic filtering and cleaning, not business-level filtering, but obvious dirty data filtering, and secondly, the data stored in the data warehouse are all structured data, so that the data needs to be structured in advance. The data marts are built on the basis of a data warehouse, the data marts are extracted from the data warehouse in a department-oriented mode, and the application of each department is analysis and exploration which are developed for the respective data marts and data mining. The ETL module can realize the processes of extracting, converting and loading from the source end to the destination end; kettle is an open-source ETL tool; canal is a data synchronization component in an open-source database through analysis logs; netty is an open source network communication component; socket is a socket (an abstraction layer) through which an application sends and receives data over a network.
After a great deal of practice and careful summarization of the inventor, it is found that, in the existing storage process, firstly, the batch processing mode and the real-time stream processing mode are separated in the ETL importing data to the data warehouse, different components and development packages need to be introduced, and even two different programs need to be written for implementation, so that the problems are that the implementation is more difficult, the maintenance is difficult, and the risk of creating a problem is invisibly increased.
Secondly, due to the diversity of space-time data sources, structured data (such as a table in a relational database) and unstructured data (such as pictures and documents) exist, even semi-structured data (such as XML and JSON files) exist, but a data warehouse can only store the structured data, which inevitably brings a problem that the data in the formats are converted into the structured data, the conversion sometimes comes at the cost of performance overhead, sometimes comes at the cost of losing the precision of some data, even some data cannot be converted into the structured data after being pressed, and then the data can only be discarded, which also brings certain difficulty for later data analysis and data mining.
The data schema in the data warehouse is then designed according to specific needs. During the data import phase, it may not be clear how the data should be used, but only imported into the data warehouse. Therefore, a problem is necessarily brought about, how to build a table and how to build a model can meet the requirements better, one way is to build a base table structure and a model in a data warehouse by the past experience, the other way is to make all information in source data into a large wide table and copy the content of original data as much as possible, but both solutions are not optimal, the past experience cannot necessarily cover new business and new requirements, the requirements may need to be cut, or even if the requirements can be met by a compromise method, the performance is sacrificed. Although the large broad table can retain the information of the original data to a certain extent, the original data is difficult to replace, which is particularly obvious when the characteristics are extracted by machine learning and deep learning, and when some information of the original data is converted into the content of the structured broad table, the content is quite different from the original content, the characteristics are extracted from the converted content, and then the model is trained, so that the accuracy of prediction of the model is necessarily greatly reduced.
The embodiment of the application provides an electronic device which can be a server. Please refer to fig. 2, a schematic structural diagram of an electronic device. The electronic device comprises a processor 10, a memory 11, a bus 12. The processor 10 and the memory 11 are connected by a bus 12, and the processor 10 is configured to execute an executable module, such as a computer program, stored in the memory 11.
The processor 10 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the spatiotemporal data storage method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 10. The Processor 10 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
The Memory 11 may comprise a high-speed Random Access Memory (RAM) and may further comprise a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.
The bus 12 may be an ISA (Industry Standard architecture) bus, a PCI (peripheral component interconnect) bus, an EISA (extended Industry Standard architecture) bus, or the like. Only one bi-directional arrow is shown in fig. 2, but this does not indicate only one bus 12 or one type of bus 12.
The memory 11 is used for storing programs, such as programs corresponding to spatiotemporal data storage means. The spatiotemporal data storage means comprises at least one software functional module which may be stored in the form of software or firmware in the memory 11 or solidified in an Operating System (OS) of the electronic device. The processor 10, upon receiving the execution instructions, executes the program to implement the spatiotemporal data storage method.
Possibly, the electronic device provided by the embodiment of the present application further includes a communication interface 13. The communication interface 13 is connected to the processor 10 via a bus. The electronic device may receive data transmitted by other devices through the communication interface 13.
It should be understood that the structure shown in fig. 2 is merely a structural schematic diagram of a portion of an electronic device, which may also include more or fewer components than shown in fig. 2, or have a different configuration than shown in fig. 2. The components shown in fig. 2 may be implemented in hardware, software, or a combination thereof.
The spatio-temporal data storage method provided by the embodiment of the present invention can be applied to, but is not limited to, the electronic device shown in fig. 2, and please refer to fig. 3:
s101, buffering the spatiotemporal data into a kafka queue.
In particular, the types of spatio-temporal data are not completely consistent due to different data sources of the spatio-temporal data. Spatio-temporal data includes batch data and stream data. Batch processing data such as file type data, database table type data and the like; the stream processing data is mainly real-time acquisition type data, such as human image acquisition data, wifi probe acquisition data, imsi acquisition data and the like.
The kafka queue can aggregate batch data and stream data and differentiate between them by different topics.
And S102, storing the spatiotemporal data from the kafka queue into a data lake through a storage component.
The data lake is used for storing batch processing data and stream processing data, and original formats of the stored data are kept.
In particular, the data lake is capable of compatibly storing batch processing data and stream processing data. Data in the kafka queue is imported into the data lake indiscriminately by the storage component. And the data in kafka does not need to be subjected to structure conversion, so that the data is prevented from being lost. Meanwhile, the data mode in the data lake does not need to be preset, and the received original data are directly stored in the data storage layer, so that a convenient data export mode is provided, and the support of the original data is provided for future data modeling and machine learning.
Compared with the prior art, different components are needed for importing batch data and stream data into the data warehouse. According to the embodiment of the application, only one set of data import component (namely the storage component) needs to be developed, so that the development and maintenance cost is reduced, and meanwhile, the storage architecture is simplified, so that the data import is simpler.
To sum up, in the spatio-temporal data storage method provided by the embodiment of the application, spatio-temporal data is firstly cached in the kafka queue, and then the spatio-temporal data is stored in the data lake from the kafka queue through the storage component, only one set of storage component needs to be developed, so that the development and maintenance cost is reduced, and meanwhile, the storage architecture is simplified, so that the data import is simpler; meanwhile, the data in kafka does not need to be subjected to structure conversion, so that the data loss is avoided; and then, the received original data are directly stored in the data storage layer without presetting a data mode in the data lake, so that a convenient data export mode is provided, and the support of the original data is provided for future data modeling and machine learning.
Possibly, the data Lake in the embodiment of the present application is implemented based on Spark Delta Lake open source components. A data lake is a large warehouse that stores a wide variety of raw data of an enterprise, which can be accessed, processed, analyzed, and transmitted. The data lake can gather different kinds of data together, and meanwhile, the data in the data lake can be analyzed without a predefined model (or mode). It can process all types of data, including structured, unstructured, and semi-structured data. It has a great computing power for processing and analyzing all types of data, it contains more relevant information, and the probability of accessing the information is high. Spark Delta Lake is an open-source storage layer implementation component of a data Lake built on a Spark big data computing engine.
On the basis of fig. 3, for spatiotemporal data stored in a data lake, a possible implementation manner is further provided in the embodiments of the present application, please refer to fig. 4, where the spatiotemporal data storage method further includes:
s103, storing the space-time data in the data lake into the HDFS cluster.
The Distributed File System (HDFS) cluster is used for storing various data, so as to facilitate user access. The time-space data are stored in the HDFS cluster and backed up, so that data loss is avoided, and the safety of the data is guaranteed.
On the basis of fig. 4, for the content in S103, the embodiment of the present application further provides a possible implementation manner, please refer to fig. 5, where S103 includes:
s103-1, arranging the space-time data areas in the data lake into a partial column file.
The partial column file is a storage format designed for query, partition filtering and column pruning can be performed according to specific fields during query, compression is supported, and each column of data can also store some statistical information, such as the size, maximum value and minimum value of the column of data. The statistical indexes can be directly returned without calculation, so that the user can conveniently access the statistical indexes. Parque is an efficient columnar storage format.
S103-2, and storing the partial column file into the HDFS cluster.
In one possible implementation, the content in the queue column file cannot be deleted. The file of the queue column in the data lake can be modified only by deleting or adding.
In the scheme of the application, the data lake not only stores the queue column files, but also records logs. The method aims to realize the transaction characteristics of the ACID in a log mode, and make up the defects of inconsistent data and low quality in the data lake by realizing the transaction characteristics of the ACID. ACID is four basic characteristics of the database, atomicity, consistency, isolation and durability.
Possibly, all operations of the user on the data storage are recorded in a JSON file (log) mode and used when rolling back, recovering data and returning to the previous version of data, but as the number of updating times increases, the log files are more and more, a large number of small files consume the performance of the system in the process of processing large data, so that old data files need to be merged, for example, when the number of log files is increased to 10, a merging operation is called to merge the old data files into a large file, and the performance of the system is saved.
On the basis of fig. 4, regarding how to query the spatiotemporal data, the embodiment of the present application further provides a possible implementation manner, please refer to fig. 6, where the spatiotemporal data storage method further includes:
and S104, establishing an index relation of the spatio-temporal data through an index component.
The space-time data at least comprises time information and space information, and the index relation comprises the corresponding relation of the time information and the space information.
Possibly, the indexing component is a Geomesa component. The Geomesa component reduces three-dimensional data of time information and space information (location, longitude and latitude) into one-dimensional data in a Hash algorithm mode, and the one-dimensional data is used as rowkey in hbase for storage. Other data information corresponding to the rowkey can be stored as data information of corresponding columns in a column family, and the search of the spatio-temporal data is accelerated by means of the hbaserowkey rapid positioning data rows.
Of course, a space-time data index structure (index component) can be customized to complete fast positioning of space-time data such as bit data or trajectory data, and creation and query of the space-time index are completed by means of an API for data import and analysis provided by Spark Delta Lake and Spark components.
On the basis of fig. 3, possibly, the storage component is a spare Structured Streaming component, and for the content in S102, the embodiment of the present application further provides a possible implementation manner, please refer to fig. 7, where S102 includes:
s102-1, calling the flow processing data in the kafka queue.
Specifically, when buffering stream processing data in the kafka queue, the storage component reads the stream processing data in the kafka queue. When the stream processing data is not cached in the kafka queue, the action is not executed, and the occupied resources of the processor 10 are reduced.
S102-2, loading the stream processing data into DataFrame data.
The DataFrame is a distributed data set abstract representation in a spark big data calculation engine, and can be registered in spark as a temporary view to query data in an SQL manner.
S102-3, storing the DataFrame data into the data lake.
Specifically, the storage component reads the stream processing data in the kafka queue, and stores the read data into the data lake through the storage API of the Spark Delta lake component.
On the basis of fig. 3, possibly, the storage component is a spare Structured Streaming component, and for the content in S102, the embodiment of the present application further provides a possible implementation manner, please refer to fig. 8, where S102 includes:
s102-4, storing the batch processing data in the kafka queue in the preset interval into a data lake every preset time.
Compared with the prior art, the processing flow of the analog stream processing data is required for the batch processing data, and even if the batch processing data is not cached in the kafka queue, continuous attempts are required to occupy the resources of the processor 10, and the resources are not released, so that the performance of the processor 10 is not affected. In the embodiment of the present application, the batch processing data from the kafka queue in the preset interval is stored into the data lake at regular time, and the resources of the processor 10 are not occupied all the time, so that the performance of the processor 10 is improved.
Wherein, the starting address and the ending address of the preset interval can be set by the instruction of a user,
the division may also be performed according to a preset rule, which is not limited herein.
In one possible implementation, the data lake can export data to third party data platforms, data warehouses, machine learning platforms, and data analysis platforms by way of a data source through a Spark Structured Streaming component.
The Spark Structured Streaming in the embodiment of the application is a processing component providing a batch stream consistent API in a Spark big data computing engine ecological environment.
Referring to fig. 9, fig. 9 is a block diagram of a spatiotemporal data storage device according to an embodiment of the present application,
optionally, the spatiotemporal data storage means is applied to the electronic device described above.
The spatiotemporal data storage device includes: a data buffer unit 201 and a processing unit 202.
The data buffering unit 201 is used for buffering space-time data into the kafka queue, wherein the space-time data comprises batch processing data and stream processing data. Specifically, the data cache unit 201 may perform S101 described above.
And the processing unit 202 is used for storing the spatio-temporal data from the kafka queue into a data lake through the storage component, wherein the data lake is used for storing batch processing data and stream processing data, and the original format of the stored data is maintained. Specifically, the processing unit 202 may execute S102 described above.
In one possible implementation, the processing unit 202 is further configured to store spatio-temporal data in the data lake into the HDFS cluster. Specifically, the processing unit 202 may execute S103 described above.
It should be noted that the spatiotemporal data storage device provided in this embodiment may execute the method flows shown in the above method flow embodiments to achieve the corresponding technical effects. For the sake of brevity, the corresponding contents in the above embodiments may be referred to where not mentioned in this embodiment.
The embodiment of the invention also provides a storage medium, wherein the storage medium stores computer instructions and programs, and the computer instructions and the programs execute the space-time data storage method of the embodiment when being read and run. The storage medium may include memory, flash memory, registers, or a combination thereof, etc.
An electronic device, which may be a server, is provided below. The electronic device can implement the spatiotemporal data storage method as shown in fig. 2. Specifically, the electronic device includes: processor 10, memory 11, bus 12. The processor 10 may be a CPU. The memory 11 is used to store one or more programs which, when executed by the processor 10, perform the spatiotemporal data storage methods of the above embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (10)

1. A method of spatiotemporal data storage, the method comprising:
caching spatiotemporal data into a kafka queue, wherein the spatiotemporal data comprises batch processing data and stream processing data;
storing the spatiotemporal data from the kafka queue into a data lake through a storage component, wherein the data lake is used for storing the batch processing data and the stream processing data, and the original format of the stored data is maintained.
2. The spatiotemporal data storage method of claim 1, wherein after storing the spatiotemporal data from the kafka queue into a data lake through a storage component, the method further comprises:
and storing the spatiotemporal data in the data lake into the HDFS cluster.
3. The spatiotemporal data storage method of claim 2, wherein the step of storing spatiotemporal data in the data lake into an HDFS cluster comprises:
arranging the space-time data areas in the data lake into a queue file;
and storing the queue column file into the HDFS cluster.
4. The spatiotemporal data storage method of claim 2, wherein after storing the spatiotemporal data in the data lake into an HDFS cluster, the further method comprises:
and establishing an index relationship of the spatio-temporal data through an index component, wherein the spatio-temporal data at least comprises time information and space information, and the index relationship comprises a corresponding relationship between the time information and the space information.
5. The spatiotemporal data storage method of claim 1, wherein the step of storing the spatiotemporal data from the kafka queue into a data lake via a storage component comprises:
calling flow processing data in the kafka queue;
loading the stream processing data into DataFrame data;
and storing the DataFrame data into the data lake.
6. The spatiotemporal data storage method of claim 1, wherein the step of storing the spatiotemporal data from the kafka queue into a data lake via a storage component comprises:
and storing the batch processing data in the kafka queue in a preset interval into the data lake every other preset time.
7. A spatiotemporal data storage device, the device comprising:
the data caching unit is used for caching spatio-temporal data into a kafka queue, wherein the spatio-temporal data comprises batch processing data and stream processing data;
and the processing unit is used for storing the spatiotemporal data from the kafka queue into a data lake through a storage component, wherein the data lake is used for storing the batch processing data and the stream processing data, and the original format of the stored data is maintained.
8. The spatiotemporal data storage device according to claim 7, wherein the processing unit is further to store the spatiotemporal data in the data lake into a HDFS cluster.
9. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.
10. An electronic device, comprising: a processor and memory for storing one or more programs; the one or more programs, when executed by the processor, implement the method of any of claims 1-6.
CN202010048464.5A 2020-01-16 2020-01-16 Space-time data storage method and device, storage medium and electronic equipment Pending CN111291047A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010048464.5A CN111291047A (en) 2020-01-16 2020-01-16 Space-time data storage method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010048464.5A CN111291047A (en) 2020-01-16 2020-01-16 Space-time data storage method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN111291047A true CN111291047A (en) 2020-06-16

Family

ID=71025447

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010048464.5A Pending CN111291047A (en) 2020-01-16 2020-01-16 Space-time data storage method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111291047A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597218A (en) * 2020-12-04 2021-04-02 光大科技有限公司 Data processing method and device and data lake framework
CN112711593A (en) * 2021-01-04 2021-04-27 浪潮云信息技术股份公司 Big data processing method for realizing mixed transaction analysis
CN113568938A (en) * 2021-08-04 2021-10-29 北京百度网讯科技有限公司 Data stream processing method and device, electronic equipment and storage medium
CN113791742A (en) * 2021-11-18 2021-12-14 南湖实验室 High-performance data lake system and data storage method
CN113836235A (en) * 2021-09-29 2021-12-24 平安医疗健康管理股份有限公司 Data processing method based on data center and related equipment thereof
CN114116842A (en) * 2021-11-25 2022-03-01 上海柯林布瑞信息技术有限公司 Multi-dimensional medical data real-time acquisition method and device, electronic equipment and storage medium
CN116737854A (en) * 2023-05-26 2023-09-12 上海优异达机电有限公司 Space-time data lake management system based on multi-source remote sensing data and safety protection method thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391719A (en) * 2017-07-31 2017-11-24 南京邮电大学 Distributed stream data processing method and system in a kind of cloud environment
CN109271382A (en) * 2018-08-17 2019-01-25 广东技术师范学院 A kind of data lake system towards full data shape opening and shares
CN110347342A (en) * 2019-07-12 2019-10-18 上海英方软件股份有限公司 A kind of method and system for realizing Kafka cluster synchronization based on disk queue
US20190370263A1 (en) * 2018-06-04 2019-12-05 Cisco Technology, Inc. Crowdsourcing data into a data lake
CN110659294A (en) * 2019-09-25 2020-01-07 北京明略软件系统有限公司 Space-time data ad hoc query method, system, electronic device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391719A (en) * 2017-07-31 2017-11-24 南京邮电大学 Distributed stream data processing method and system in a kind of cloud environment
US20190370263A1 (en) * 2018-06-04 2019-12-05 Cisco Technology, Inc. Crowdsourcing data into a data lake
CN109271382A (en) * 2018-08-17 2019-01-25 广东技术师范学院 A kind of data lake system towards full data shape opening and shares
CN110347342A (en) * 2019-07-12 2019-10-18 上海英方软件股份有限公司 A kind of method and system for realizing Kafka cluster synchronization based on disk queue
CN110659294A (en) * 2019-09-25 2020-01-07 北京明略软件系统有限公司 Space-time data ad hoc query method, system, electronic device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ISURU SURIARACHCHI 等: "Big Provenance Stream Processing for Data Intensive Computations", 《2018 IEEE 14TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE)》 *
李丹: "企业大数据分析应用平台及其实现", 《商场现代化》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597218A (en) * 2020-12-04 2021-04-02 光大科技有限公司 Data processing method and device and data lake framework
CN112711593A (en) * 2021-01-04 2021-04-27 浪潮云信息技术股份公司 Big data processing method for realizing mixed transaction analysis
CN113568938A (en) * 2021-08-04 2021-10-29 北京百度网讯科技有限公司 Data stream processing method and device, electronic equipment and storage medium
CN113568938B (en) * 2021-08-04 2023-11-14 北京百度网讯科技有限公司 Data stream processing method and device, electronic equipment and storage medium
CN113836235B (en) * 2021-09-29 2024-04-09 平安医疗健康管理股份有限公司 Data processing method based on data center and related equipment thereof
CN113836235A (en) * 2021-09-29 2021-12-24 平安医疗健康管理股份有限公司 Data processing method based on data center and related equipment thereof
CN113791742B (en) * 2021-11-18 2022-03-25 南湖实验室 High-performance data lake system and data storage method
US20230153267A1 (en) * 2021-11-18 2023-05-18 Nanhu Laboratory High-performance data lake system and data storage method
US11789899B2 (en) * 2021-11-18 2023-10-17 Nanhu Laboratory High-performance data lake system and data storage method
CN113791742A (en) * 2021-11-18 2021-12-14 南湖实验室 High-performance data lake system and data storage method
CN114116842B (en) * 2021-11-25 2023-05-19 上海柯林布瑞信息技术有限公司 Multidimensional medical data real-time acquisition method and device, electronic equipment and storage medium
CN114116842A (en) * 2021-11-25 2022-03-01 上海柯林布瑞信息技术有限公司 Multi-dimensional medical data real-time acquisition method and device, electronic equipment and storage medium
CN116737854A (en) * 2023-05-26 2023-09-12 上海优异达机电有限公司 Space-time data lake management system based on multi-source remote sensing data and safety protection method thereof
CN116737854B (en) * 2023-05-26 2024-04-30 上海优异达机电有限公司 Space-time data lake management system based on multi-source remote sensing data and safety protection method thereof

Similar Documents

Publication Publication Date Title
CN111291047A (en) Space-time data storage method and device, storage medium and electronic equipment
CN109034993B (en) Account checking method, account checking equipment, account checking system and computer readable storage medium
CN109684352B (en) Data analysis system, data analysis method, storage medium, and electronic device
JP6617117B2 (en) Scalable analysis platform for semi-structured data
Poorthuis et al. Making big data small: strategies to expand urban and geographical research using social media
US9317541B2 (en) Apparatus, systems, and methods for batch and realtime data processing
CN110609865B (en) Information synchronization method, device and system
US11036685B2 (en) System and method for compressing data in a database
CN108509437B (en) ElasticSearch query acceleration method
CN106021583B (en) Statistical method and system for page flow data
CN110675194A (en) Funnel analysis method, device, equipment and readable medium
WO2013106595A2 (en) Processing store visiting data
US9177043B2 (en) Management of data segments for analytics queries
CN111897867A (en) Database log statistical method, system and related device
Gupta et al. Faster as well as early measurements from big data predictive analytics model
CN111930751A (en) Time sequence data storage method and device
US8694503B1 (en) Real-time indexing of data for analytics
CN103200269A (en) Internet information statistical method and Internet information statistical system
CN106557483B (en) Data processing method, data query method, data processing equipment and data query equipment
CN106777140B (en) Method and device for searching unstructured document
CN112508720A (en) Insurance client identity attribute screening method and screening device and electronic equipment
CN109495537B (en) Storage method and storage system for monitoring big data of Internet of things
US8849833B1 (en) Indexing of data segments to facilitate analytics
CN110555021A (en) Data storage method, query method and related device
CN113111244A (en) Multisource heterogeneous big data fusion system based on traditional Chinese medicine knowledge large-scale popularization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200616