CN113254445B - Real-time data storage method, device, computer equipment and storage medium - Google Patents

Real-time data storage method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN113254445B
CN113254445B CN202110575138.4A CN202110575138A CN113254445B CN 113254445 B CN113254445 B CN 113254445B CN 202110575138 A CN202110575138 A CN 202110575138A CN 113254445 B CN113254445 B CN 113254445B
Authority
CN
China
Prior art keywords
json string
real
message
preset
string message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110575138.4A
Other languages
Chinese (zh)
Other versions
CN113254445A (en
Inventor
王慧云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Heliang Technology Shanghai Co ltd
Shenzhen Lian Intellectual Property Service Center
Original Assignee
Heliang Technology Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Heliang Technology Shanghai Co ltd filed Critical Heliang Technology Shanghai Co ltd
Priority to CN202110575138.4A priority Critical patent/CN113254445B/en
Publication of CN113254445A publication Critical patent/CN113254445A/en
Application granted granted Critical
Publication of CN113254445B publication Critical patent/CN113254445B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application belongs to the field of big data, and relates to a real-time data storage method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: receiving real-time data, and performing Json formatting on the real-time data to obtain a Json string message; inputting the Json string message into a message release system based on Kafka, and storing the Json string message in a Top i c pre-created in the message release system based on Kafka; creating a Kafka consumer and setting a consumption Top i c of the Kafka consumer, the consumption Top i c pointing to the pre-created Top i c; configuring a stream computing execution environment of an f/I nk data stream AP I, and configuring the Kafka consumer as a data source in the stream computing execution environment; and calling the f l I nk data stream AP I, and storing the real-time data into a preset Hbase database and a preset ES database through the data source. The method for consuming kafka information and storing data in HBASE and ES based on f/i nk flexible expansion supports automatic expansion, the time consumption for data insertion is shorter, and the indexing is more convenient.

Description

Real-time data storage method, device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of big data technologies, and in particular, to a method and apparatus for storing real-time data, a computer device, and a storage medium.
Background
The scenes of streaming data are increasingly used in the current industries: data report forms, advertisement delivery and business process requirements in electronic commerce and marketing; real-time data acquisition and display, real-time alarm and transportation industry of a sensor in the Internet of things; base station flow allocation in the telecommunications industry; real-time settlement and notification pushing in banking and financial industries, real-time detection of abnormal behaviors and the like all require real-time processing of data.
The traditional real-time data processing technology has long delay time, inconvenient expansion, low indexing speed and large occupied database space.
Disclosure of Invention
An embodiment of the application aims to provide a real-time data storage method, a device, computer equipment and a storage medium, so as to solve the problems of long delay time and inconvenient expansion of real-time data processing.
In order to solve the above technical problems, the embodiments of the present application provide a real-time data storage method, which adopts the following technical schemes:
receiving real-time data, and performing Json formatting on the real-time data to obtain a Json string message;
inputting the Json string message into a message publishing system based on Kafka and storing the Json string message in a Topic pre-created in the message publishing system based on Kafka;
creating a Kafka consumer and setting a consumption Topic of the Kafka consumer, the consumption Topic pointing to the pre-created Topic;
configuring a stream computing execution environment of a flink data stream API, and configuring the Kafka consumer as a data source in the stream computing execution environment;
and calling the flink data stream API, and storing the real-time data to a preset Hbase database and a preset ES database through the data source.
Further, before the step of calling the link data stream API and saving the real-time data to a preset Hbase database and a preset ES database through the data source, the method further includes:
filtering the Json string message according to preset filtering conditions to obtain a filtered Json string message;
and calling the flink data stream API, and storing the filtered Json string information into a preset Hbase database and a preset ES database.
Further, before the step of calling the link data stream API and saving the real-time data to a preset Hbase database and a preset ES database through the data source, the method further includes:
creating an ES index;
and calling the flink data stream API, and storing a field corresponding to the ES index in the Json string message to a preset ES database.
Further, the Json string message includes an ID field, the ID field is defined as a primary key, and the step of calling the flink data flow API and storing a field corresponding to the ES index in the Json string message in a preset ES database includes:
converting the ID into an ES-ID of the Json string message in the ES database through a hash algorithm;
and inserting a field corresponding to the ES index of the Json string message into a record corresponding to the ES-ID in the ES database.
Further, the Json string message includes an ID field, and before the step of calling the flink data stream API and saving the real-time data to a preset Hbase database and a preset ES database through the data source, the method further includes:
comparing an ID field in the Json string message with a preset ID validity rule;
and when the ID field accords with the preset ID validity rule, storing the Json string message to a preset Hbase database and a preset ES database through the data source.
Further, after the step of receiving the real-time data and performing Json formatting on the real-time data to obtain the Json string message, the method further includes:
the Json string message is stored into a blockchain.
In order to solve the above technical problems, the embodiments of the present application further provide a real-time data storage device, which adopts the following technical scheme:
the receiving module is used for receiving real-time data, and carrying out Json formatting on the real-time data to obtain a Json string message;
the message storage module is used for inputting the Json string message into a message issuing system based on Kafka and storing the Json string message in a Topic pre-created in the message issuing system based on Kafka;
the creating module is used for creating a Kafka consumer and setting a consumption Topic of the Kafka consumer, wherein the consumption Topic points to the pre-created Topic;
the configuration module is used for configuring a stream computing execution environment of the link data stream API and configuring the Kafka consumer as a data source in the stream computing execution environment;
and the data storage module is used for calling the link data stream API and storing the real-time data to a preset Hbase database and a preset ES database through the data source.
Further, the real-time data storage device further includes:
the first filtering submodule is used for filtering the Json string message according to preset filtering conditions to obtain a filtered Json string message;
and the first storage submodule is used for calling the flink data stream API and storing the filtered Json string information into a preset Hbase database and a preset ES database.
Further, the real-time data storage device further includes:
a first creating sub-module for creating an ES index;
and the second storage submodule is used for calling the flink data stream API and storing a field corresponding to the ES index in the Json string message into a preset ES database.
Further, the second storage sub-module further includes:
the first conversion sub-module is used for converting the ID into an ES-ID of the Json string message in the ES database through a hash algorithm;
and the first inserting sub-module is used for inserting the field corresponding to the ES index of the Json string message into the record corresponding to the ES-ID in the ES database.
Further, the real-time data storage device further includes:
the first comparison sub-module is used for comparing the ID field in the Json string message with a preset ID validity rule;
and the third storage submodule is used for storing the Json string message to a preset Hbase database and a preset ES database through the data source when the ID field accords with the preset ID validity rule.
Further, the real-time data storage device further includes:
a fourth storage sub-module for storing the Json string message into a blockchain
In order to solve the above technical problems, the embodiments of the present application further provide a computer device, which adopts the following technical schemes:
a computer device comprising a memory having stored therein computer readable instructions which when executed by the processor implement the steps of the real-time data storage method as described above.
In order to solve the above technical problems, embodiments of the present application further provide a computer readable storage medium, which adopts the following technical solutions:
a computer readable storage medium having computer readable instructions stored thereon which when executed by a processor perform the steps of the real-time data storage method as described above.
Compared with the prior art, the embodiment of the application has the following main beneficial effects: receiving real-time data, and performing Json formatting on the real-time data to obtain a Json string message; inputting the Json string message into a message publishing system based on Kafka and storing the Json string message in a Topic pre-created in the message publishing system based on Kafka; creating a Kafka consumer and setting a consumption Topic of the Kafka consumer, the consumption Topic pointing to the pre-created Topic; configuring a stream computing execution environment of a flink data stream API, and configuring the Kafka consumer as a data source in the stream computing execution environment; and calling the flink data stream API, and storing the real-time data to a preset Hbase database and a preset ES database through the data source. The flexible expansion method for consuming the kafka message and storing the data in the HBASE and the ES based on the flink supports automatic expansion, the time consumption for data insertion is shorter, and the indexing is more convenient.
Drawings
For a clearer description of the solution in the present application, a brief description will be given below of the drawings that are needed in the description of the embodiments of the present application, it being obvious that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow chart of one embodiment of a real-time data storage method according to the present application;
FIG. 3 is a schematic structural view of one embodiment of a real-time data storage device according to the present application;
FIG. 4 is a schematic structural diagram of one embodiment of a computer device according to the present application.
Description of the embodiments
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the description of the figures above are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In order to better understand the technical solutions of the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings.
As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that the method for storing real-time data according to the embodiments of the present application generally comprisesServer/terminal device Preparation methodExecution, accordingly, the real-time data storage device is generally arranged atServer/terminal deviceIs a kind of medium.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow chart of one embodiment of a method of real-time data storage according to the present application is shown. The real-time data storage method comprises the following steps:
step S201, receiving real-time data, and performing Json formatting on the real-time data to obtain a Json string message.
In this embodiment, the electronic device (e.g., as shown in FIG. 1) on which the real-time data storage method operatesService Device/terminal equipment) By wired connection or withoutThe wire connection receives real-time data. It should be noted that the wireless connection may include, but is not limited to, 3G/4G connections, wiFi connections, bluetooth connections, wiMAX connections, zigbee connections, UWB (ultra wideband) connections, and other now known or later developed wireless connection means.
The real-time data can be sensor real-time data acquisition and display in the Internet of things, real-time alarm, traffic transportation industry real-time traffic flow information, base station real-time flow allocation in the telecommunication industry, real-time settlement and notification pushing in the banking and financial industry and the like. The real-time data is formatted in Json, which is a lightweight data interchange format that stores and represents the data in a text format that is completely independent of the programming language. The compact and clear hierarchical structure makes Json an ideal data exchange language. Is easy to read and write by people, is easy to analyze and generate by machines, and effectively improves the network transmission efficiency. Json formatting is implemented in general-purpose software.
Step S202, inputting the Json string message into a message publishing system based on Kafka, and storing the Json string message in a Topic pre-created in the message publishing system based on Kafka.
Kafka is a high-throughput distributed publish-subscribe messaging system that can handle all action flow data for consumers in a web site. The purpose of Kafka is to unify on-line and off-line message processing through the Hadoop parallel loading mechanism. Each message issued to the Kafka cluster has a category called Topic. Physically distinct Topic messages are stored separately, and logically a message of one Topic, while stored on one or more servers, a user need only specify the Topic of the message to produce or consume data without concern about where the data is stored. The Json string message is input to a Kafka-based message distribution system and stored in a Topic pre-created in the Kafka-based message distribution system.
In step S203, a Kafka consumer is created and a consumption Topic of the Kafka consumer is set, the consumption Topic pointing to the pre-created Topic.
A Kafka consumer (kafkaConsumer) is newly created and the consumer Topic is set to point to the pre-created Topic, indicating that the Kafka consumer fetches the data distribution in the pre-created Topic. Other attributes may also be set, including the kafka brooker address, i.e., the server address stored by Topic; setting a consumption policy of the Kafka consumer, for example using earlie, when using the policy, each Topic contains one or more Partition partitions, when there is a committed offset under each Partition, the offset is a message consumption point, and consumption starts from the committed offset; without committed offset, consumption begins from the head, i.e., using the policy does not lose data due to restarting the program.
Step S204, a stream computing execution environment of the link data stream API is configured, and the Kafka consumer is configured as a data source in the stream computing execution environment.
The data stream API of the link supports transformations (e.g., filters, aggregations, and window functions) on bounded or unbounded data streams, which can be used in Java and Scala. The configuration of the stream computing execution environment states and assigns variables, parameters and functions of the flink data stream API call. In which a data source parallelism is configured, which refers to the maximum number of instructions or data executing in parallel.
Step S205, call the said flink data stream API, save the said real-time data to the pre-set Hbase database and pre-set ES database through the said data source.
And calling a flink data stream API, wherein a Kafka consumer is defined as a data source and points to a pre-created Topic in a Kafka-based message issuing system, real-time data is stored in the pre-created Topic after being subjected to Json formatting, and the flink data stream API executes sink operation from the data source to the destination, namely, the real-time data is stored in a preset Hbase database and a preset ES (Elasticsearch) database.
According to the method, the real-time data are received, json formatting is conducted on the real-time data, and Json string information is obtained; inputting the Json string message into a message publishing system based on Kafka and storing the Json string message in a Topic pre-created in the message publishing system based on Kafka; creating a Kafka consumer and setting a consumption Topic of the Kafka consumer, the consumption Topic pointing to the pre-created Topic; configuring a stream computing execution environment of a flink data stream API, and configuring the Kafka consumer as a data source in the stream computing execution environment; and calling the flink data stream API, and storing the real-time data to a preset Hbase database and a preset ES database through the data source. The flexible expansion method for consuming the kafka message and storing the data in the HBASE and the ES based on the flink supports automatic expansion, the time consumption for data insertion is shorter, and the indexing is more convenient.
In some optional implementations of this embodiment, before step S204, the electronic device may further perform the following steps:
and according to the partition number of the consumption Topic, configuring the parallelism of the data source to be consistent with the partition number.
Parallelism refers to the maximum number of instructions or data to be executed in parallel, and is set to be consistent with the number of partitions of kafka topic to be consumed to be optimal if resources allow, and one parallelism corresponds to reading data of one partition.
The execution of the flank program has a parallel, distributed nature.
In execution, one stream contains one or more stream parts, and each of the operators contains one or more operator subtasks, which operator subtasks execute independently of each other in different threads, different physical machines, or different containers.
The number of subtask of a particular operator is called its parallelism. The parallelism of one stream is always equivalent to that of producing operator.
Parallelism is a dynamic concept, i.e. concurrency capability actually used when a task manager runs a program, and can be configured through a parameter parallelism.
According to the method and the device, the parallelism of the data source is configured to be consistent with the consumption topic partition number, so that the processing speed is improved.
In some optional implementations, in step S204, the electronic device may perform the following steps:
checkpoints of the flink data stream API are enabled and configured.
Checkpoints (checkpoints), which are the most core functions of a flink to implement fault tolerance mechanisms, are enabled and configured to generate snapshots based on stream states according to configuration periodicity, so that the state data are stored in a persistent manner periodically, and when a flink program is crashed accidentally, the program is rerun and can be selectively recovered from the snapshots, so that program data anomalies caused by faults are corrected, and the checkpoints mode is EXACTLY_ONCE by default. The HDFS path saved by checkpoint is set (which is automatically created when the program is started), and when the task is restarted, subsequent data can continue to be consumed from the checkpoint without the need to save by means of the rest of the specialized cache components.
The location of the store of the Checkpoint depends on the configured State back. By default, state is stored in the TaskManager memory and Checkpoint is stored in the JobManager memory. Flink supports storing State and Checkpoint in other State slots. The configuration can be achieved by the following method: streamExecutionenvironmental.setStateBackend ()
The Flink provides different State bases, and supports different State storage modes and positions. The configuration file flink-conf. Yaml specified option will be used by default.
In some alternative implementations, before step S205, the electronic device may perform the following steps:
filtering the Json string message according to preset filtering conditions to obtain a filtered Json string message;
and calling the flink data stream API, and storing the filtered Json string information into a preset Hbase database and a preset ES database.
In this embodiment, the filtering condition of the Json string message is set according to the actual requirement, for example, the message format: { "Id": "56411767", "name": "Zhang Sanj" }, wherein ID is the primary key of each message, the filtering condition is set as "ID legal", only json string messages containing Id legal are reserved after filtering, and the rest messages are discarded.
In some embodiments, the preset ID validity rules, for example, the ID validity rules include: comparing an ID field in the Json string message with a preset ID validity rule without special characters, forbidden words and the like; when the ID field accords with the preset ID validity rule, the Json string information is stored into a preset Hbase database and a preset ES database through a data source, otherwise, the Json string information is discarded.
By setting the filtering conditions, the Json string message is filtered according to the filtering conditions, redundant invalid data are removed, the storage space is saved, and the data storage efficiency is improved.
In some alternative implementations, before step S205, the electronic device may perform the following steps:
creating an ES index;
and calling the flink data stream API, and storing a field corresponding to the ES index in the Json string message to a preset ES database.
In this embodiment, the fields to be saved in the ES may be configured in a configuration file, and before the program runs, an ES index is established according to the actual type requirement.
And if the field is added in the subsequent message, the corresponding field information can be automatically stored only by adding ESmapping for the field pairs and adding the ESmapping to the index field in the configuration file.
The multi-field index is realized by means of the ES, so that storage is saved compared with a secondary index based on HBASE alone, time consumption for data insertion is short, and indexing is convenient.
In some alternative implementations, the electronic device may further perform the steps of:
converting the ID into an ES-ID of the Json string message in the ES database through a hash algorithm;
and inserting a field corresponding to the ES index of the Json string message into a record corresponding to the ES-ID in the ES database.
The mapping rule for mapping binary character strings with any length into binary character strings with fixed length is called a hash algorithm, wherein a hash table in the hash algorithm is a storage mode based on an array and mainly comprises a hash function and the array. When a data is to be stored, the address of the data is first calculated using a function, and then the data is stored in an array of specified address locations. This function is a hash function and the array is a hash table. The hash algorithm is mainly characterized in that original data cannot be deduced reversely from a hash value, the hash algorithm is sensitive to input data, the input data is modified by 1 bit, the finally obtained hash values are also different, and the probability of hash collision is small. The ID of the Json string message is converted into the ES-ID of the Json string message in the ES database through a hash algorithm, so that the Json string message can be uniquely identified in the ES database, and hash collision is not easy to occur.
It should be emphasized that, to further ensure the privacy and security of the real-time data, the Json string message representing real-time information may also be stored in a node of a blockchain.
The blockchain referred to in the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
The subject application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a real-time data storage device, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device may be specifically applied to various electronic devices.
As shown in fig. 3, the real-time data storage device 300 according to the present embodiment includes: a receiving module 301, a message storage module 302, a creating module 303, a configuring module 304 and a data storage module 305. Wherein:
the receiving module 301 is configured to receive real-time data, and perform Json formatting on the real-time data to obtain a Json string message;
a message storage module 302, configured to input the Json string message to a Kafka-based message publishing system, and store the Json string message in a Topic pre-created in the Kafka-based message publishing system;
a creation module 303, configured to create a Kafka consumer and set a consumption Topic of the Kafka consumer, where the consumption Topic points to the pre-created Topic;
a configuration module 304, configured to configure a stream computing execution environment of a link data stream API, and configure the Kafka consumer as a data source in the stream computing execution environment;
and the data storage module 305 is used for calling the flink data stream API and storing the real-time data to a preset Hbase database and a preset ES database through the data source.
In this embodiment, by receiving real-time data, performing Json formatting on the real-time data to obtain a Json string message; inputting the Json string message into a message publishing system based on Kafka and storing the Json string message in a Topic pre-created in the message publishing system based on Kafka; creating a Kafka consumer and setting a consumption Topic of the Kafka consumer, the consumption Topic pointing to the pre-created Topic; configuring a stream computing execution environment of a flink data stream API, and configuring the Kafka consumer as a data source in the stream computing execution environment; and calling the flink data stream API, and storing the real-time data to a preset Hbase database and a preset ES database through the data source. The flexible expansion method for consuming the kafka message and storing the data in the HBASE and the ES based on the flink supports automatic expansion, the time consumption for data insertion is shorter, and the indexing is more convenient.
In some optional implementations of the present embodiment, the configuration module 305 includes:
the first filtering submodule is used for filtering the Json string message according to preset filtering conditions to obtain a filtered Json string message;
and the first storage submodule is used for calling the flink data stream API and storing the filtered Json string information into a preset Hbase database and a preset ES database.
In some optional implementations of the present embodiment, the configuration module 305 includes:
a first creating sub-module for creating an ES index;
and the second storage submodule is used for calling the flink data stream API and storing a field corresponding to the ES index in the Json string message into a preset ES database.
In some optional implementations of this embodiment, the second storage sub-module further includes:
the first conversion sub-module is used for converting the ID into an ES-ID of the Json string message in the ES database through a hash algorithm;
and the first inserting sub-module is used for inserting the field corresponding to the ES index of the Json string message into the record corresponding to the ES-ID in the ES database.
In some alternative implementations of the present embodiment, the real-time data storage device 300 further includes:
the first comparison sub-module is used for comparing the ID field in the Json string message with a preset ID validity rule;
and the third storage submodule is used for storing the Json string message to a preset Hbase database and a preset ES database through the data source when the ID field accords with the preset ID validity rule.
Further, the real-time data storage device 300 further includes:
a fourth storage sub-module for storing the Json string message into a blockchain
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It should be noted that only computer device 4 having components 41-43 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.
The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used to store an operating system and various application software installed on the computer device 4, such as computer readable instructions of a real-time data storage method. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as computer readable instructions for executing the real-time data storage method.
The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.
Receiving real-time data, and performing Json formatting on the real-time data to obtain a Json string message; inputting the Json string message into a message publishing system based on Kafka and storing the Json string message in a Topic pre-created in the message publishing system based on Kafka; creating a Kafka consumer and setting a consumption Topic of the Kafka consumer, the consumption Topic pointing to the pre-created Topic; configuring a stream computing execution environment of a flink data stream API, and configuring the Kafka consumer as a data source in the stream computing execution environment; and calling the flink data stream API, and storing the real-time data to a preset Hbase database and a preset ES database through the data source. The flexible expansion method for consuming the kafka message and storing the data in the HBASE and the ES based on the flink supports automatic expansion, the time consumption for data insertion is shorter, and the indexing is more convenient.
The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of a real-time data storage method as described above.
Receiving real-time data, and performing Json formatting on the real-time data to obtain a Json string message; inputting the Json string message into a message publishing system based on Kafka and storing the Json string message in a Topic pre-created in the message publishing system based on Kafka; creating a Kafka consumer and setting a consumption Topic of the Kafka consumer, the consumption Topic pointing to the pre-created Topic; configuring a stream computing execution environment of a flink data stream API, and configuring the Kafka consumer as a data source in the stream computing execution environment; and calling the flink data stream API, and storing the real-time data to a preset Hbase database and a preset ES database through the data source. The flexible expansion method for consuming the kafka message and storing the data in the HBASE and the ES based on the flink supports automatic expansion, the time consumption for data insertion is shorter, and the indexing is more convenient.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.
It is apparent that the embodiments described above are only some embodiments of the present application, but not all embodiments, the preferred embodiments of the present application are given in the drawings, but not limiting the patent scope of the present application. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a more thorough understanding of the present disclosure. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing, or equivalents may be substituted for elements thereof. All equivalent structures made by the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the protection scope of the application.

Claims (7)

1. A method of real-time data storage comprising the steps of:
receiving real-time data, and performing Json formatting on the real-time data to obtain a Json string message;
inputting the Json string message into a message publishing system based on Kafka and storing the Json string message in a Topic pre-created in the message publishing system based on Kafka;
creating a Kafka consumer and setting a consumption Topic of the Kafka consumer, the consumption Topic pointing to the pre-created Topic;
configuring a stream computing execution environment of a flink data stream API, and configuring the Kafka consumer as a data source in the stream computing execution environment;
filtering the Json string message according to preset filtering conditions to obtain a filtered Json string message; calling the flink data stream API, and storing the filtered Json string information into a preset Hbase database and a preset ES database; or, creating an ES index; calling the flink data stream API, and storing a field corresponding to the ES index in the Json string message to a preset ES database;
the step of calling the link data stream API and storing the field corresponding to the ES index in the Json string message to a preset ES database, where the Json string message includes an ID field, where the ID field is defined as a primary key, includes:
converting the ID of the Json string message into the ES-ID of the Json string message in the ES database through a hash algorithm;
and inserting a field corresponding to the ES index of the Json string message into a record corresponding to the ES-ID in the ES database.
2. The method for storing real-time data according to claim 1, wherein the Json string message includes an ID field, and before the step of calling the link data stream API to store the real-time data to a preset Hbase database and a preset ES database through the data source, the method further comprises:
comparing an ID field in the Json string message with a preset ID validity rule;
and when the ID field accords with the preset ID validity rule, storing the Json string message to a preset Hbase database and a preset ES database through the data source.
3. The method for storing real-time data according to claim 1, further comprising, after the step of receiving real-time data and performing Json formatting on the real-time data to obtain a Json string message:
the Json string message is stored into a blockchain.
4. A real-time data storage device, comprising:
the receiving module is used for receiving real-time data, and carrying out Json formatting on the real-time data to obtain a Json string message;
the message storage module is used for inputting the Json string message into a message issuing system based on Kafka and storing the Json string message in a Topic pre-created in the message issuing system based on Kafka;
the creating module is used for creating a Kafka consumer and setting a consumption Topic of the Kafka consumer, wherein the consumption Topic points to the pre-created Topic;
the configuration module is used for configuring a stream computing execution environment of the link data stream API and configuring the Kafka consumer as a data source in the stream computing execution environment;
the configuration module comprises: the first filtering submodule is used for filtering the Json string message according to preset filtering conditions to obtain a filtered Json string message; the first storage submodule is used for calling the flink data stream API and storing the filtered Json string information into a preset Hbase database and a preset ES database; or, the configuration module includes: a first creating sub-module for creating an ES index; the second storage sub-module is used for calling the flink data stream API and storing a field corresponding to the ES index in the Json string message to a preset ES database; the first conversion sub-module is used for converting the ID of the Json string message into the ES-ID of the Json string message in the ES database through a hash algorithm; and the first inserting sub-module is used for inserting the field corresponding to the ES index of the Json string message into the record corresponding to the ES-ID in the ES database.
5. The real-time data storage device of claim 4, wherein the configuration module comprises:
the first configuration submodule is used for configuring the parallelism of the data source to be consistent with the partition number according to the partition number of the consumption Topic;
and the second configuration sub-module is used for enabling and configuring check points of the flink data stream API.
6. A computer device comprising a memory having stored therein computer readable instructions which when executed by a processor implement the steps of the real time data storage method of any of claims 1 to 3.
7. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the real time data storage method of any of claims 1 to 3.
CN202110575138.4A 2021-05-26 2021-05-26 Real-time data storage method, device, computer equipment and storage medium Active CN113254445B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110575138.4A CN113254445B (en) 2021-05-26 2021-05-26 Real-time data storage method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110575138.4A CN113254445B (en) 2021-05-26 2021-05-26 Real-time data storage method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113254445A CN113254445A (en) 2021-08-13
CN113254445B true CN113254445B (en) 2024-01-05

Family

ID=77184427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110575138.4A Active CN113254445B (en) 2021-05-26 2021-05-26 Real-time data storage method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113254445B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360292B (en) * 2021-06-01 2024-03-15 北京百度网讯科技有限公司 Message processing method, device, electronic equipment, storage medium and program product
CN114610765B (en) * 2022-03-14 2024-05-03 平安国际智慧城市科技股份有限公司 Stream calculation method, device, equipment and storage medium
CN115460222A (en) * 2022-09-05 2022-12-09 蚂蚁区块链科技(上海)有限公司 Block chain data flow calculating device
CN115617495A (en) * 2022-12-06 2023-01-17 深圳安德空间技术有限公司 Ground penetrating radar data reasoning method and system based on distributed architecture

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681489A (en) * 2018-05-25 2018-10-19 西安交通大学 It is a kind of it is super calculate environment under mass data in real time acquisition and processing method
CN110502510A (en) * 2019-08-28 2019-11-26 南威软件股份有限公司 A kind of real-time analysis of WIFI terminal equipment track data and De-weight method and system
CN110928954A (en) * 2019-12-04 2020-03-27 深圳前海环融联易信息科技服务有限公司 HBase index synchronization method, HBase index synchronization device, computer equipment and storage medium
CN111966943A (en) * 2020-08-13 2020-11-20 上海哔哩哔哩科技有限公司 Streaming data distribution method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10783270B2 (en) * 2018-08-30 2020-09-22 Netskope, Inc. Methods and systems for securing and retrieving sensitive data using indexable databases

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681489A (en) * 2018-05-25 2018-10-19 西安交通大学 It is a kind of it is super calculate environment under mass data in real time acquisition and processing method
CN110502510A (en) * 2019-08-28 2019-11-26 南威软件股份有限公司 A kind of real-time analysis of WIFI terminal equipment track data and De-weight method and system
CN110928954A (en) * 2019-12-04 2020-03-27 深圳前海环融联易信息科技服务有限公司 HBase index synchronization method, HBase index synchronization device, computer equipment and storage medium
CN111966943A (en) * 2020-08-13 2020-11-20 上海哔哩哔哩科技有限公司 Streaming data distribution method and system

Also Published As

Publication number Publication date
CN113254445A (en) 2021-08-13

Similar Documents

Publication Publication Date Title
CN113254445B (en) Real-time data storage method, device, computer equipment and storage medium
CN111666490A (en) Information pushing method, device, equipment and storage medium based on kafka
CN112162965B (en) Log data processing method, device, computer equipment and storage medium
CN112199442B (en) Method, device, computer equipment and storage medium for distributed batch downloading files
CN112182004B (en) Method, device, computer equipment and storage medium for checking data in real time
WO2022095518A1 (en) Automatic interface test method and apparatus, and computer device and storage medium
CN115455058A (en) Cache data processing method and device, computer equipment and storage medium
CN115357761A (en) Link tracking method and device, electronic equipment and storage medium
CN113010542B (en) Service data processing method, device, computer equipment and storage medium
CN112860662A (en) Data blood relationship establishing method and device, computer equipment and storage medium
CN116974927A (en) Performance automatic test method, system, computer equipment and storage medium
CN116860856A (en) Financial data processing method and device, computer equipment and storage medium
CN114626352B (en) Report automatic generation method, device, computer equipment and storage medium
CN116450723A (en) Data extraction method, device, computer equipment and storage medium
CN115328764A (en) Test code optimization method based on automatic test and related equipment thereof
CN115484149B (en) Network switching method, network switching device, electronic equipment and storage medium
CN111327513B (en) Message data pushing method and device, computer equipment and storage medium
CN114238466A (en) Message pushing method and device, computer equipment and storage medium
CN117407191A (en) Data processing method, system, computer device and storage medium
CN117743291A (en) Data processing method, device, computer equipment and storage medium
CN116467145A (en) Page performance data acquisition method and device, computer equipment and storage medium
CN115168472A (en) Real-time report generation method and system based on Flink
CN117579457A (en) Business logic isolation method and device, computer equipment and storage medium
CN115526731A (en) Task batch processing method and device, computer equipment and storage medium
CN116206032A (en) Task verification method, device, computer equipment and medium thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20231206

Address after: Room A-1591, Building 3, No. 888, Jianhai Road, Chenjia Town, Chongming District, Shanghai 200085 (Shanghai Smart Island Data Industry Park)

Applicant after: Heliang Technology (Shanghai) Co.,Ltd.

Address before: 518000 Room 202, block B, aerospace micromotor building, No.7, Langshan No.2 Road, Xili street, Nanshan District, Shenzhen City, Guangdong Province

Applicant before: Shenzhen LIAN intellectual property service center

Effective date of registration: 20231206

Address after: 518000 Room 202, block B, aerospace micromotor building, No.7, Langshan No.2 Road, Xili street, Nanshan District, Shenzhen City, Guangdong Province

Applicant after: Shenzhen LIAN intellectual property service center

Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant before: PING AN PUHUI ENTERPRISE MANAGEMENT Co.,Ltd.

GR01 Patent grant
GR01 Patent grant