CN112182036A - Data sending and writing method and device, electronic equipment and readable storage medium - Google Patents

Data sending and writing method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN112182036A
CN112182036A CN202010968620.XA CN202010968620A CN112182036A CN 112182036 A CN112182036 A CN 112182036A CN 202010968620 A CN202010968620 A CN 202010968620A CN 112182036 A CN112182036 A CN 112182036A
Authority
CN
China
Prior art keywords
data
schema
source data
serialized
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010968620.XA
Other languages
Chinese (zh)
Inventor
林伟泽
孙藜
金山城
侯武庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Citic Bank Corp Ltd
Original Assignee
China Citic Bank Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Citic Bank Corp Ltd filed Critical China Citic Bank Corp Ltd
Priority to CN202010968620.XA priority Critical patent/CN112182036A/en
Publication of CN112182036A publication Critical patent/CN112182036A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data sending and writing method, a data sending and writing device, electronic equipment and a readable storage medium. The method comprises the following steps: obtaining serialized data from Kafka; determining a schema corresponding to the identification information carried by the serialized data; analyzing the serialized data based on the schema to obtain source data; and determining whether an ES model corresponding to the source data exists, and if so, writing the ES model into the source data after format conversion. Based on the scheme, the serialized data can be obtained from the Kafka in real time, the corresponding schema is determined based on the identification information, the serialized data is analyzed based on the schema to obtain the source data, the source data is subjected to format conversion and then written into the corresponding ES model according to the mapping rule, the data format consistency of the data from the acquisition to the processing process is ensured, and the real-time analysis and conversion of the service data are realized.

Description

Data sending and writing method and device, electronic equipment and readable storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for sending and writing data, an electronic device, and a readable storage medium.
Background
The business process in the financial system is complex, the data volume of the business data generated by each business node system is huge, and the business data needs to be acquired in time for processing and analysis so as to realize the support of operation.
At present, a mode of periodically acquiring service data in batches is mostly adopted for processing, the timeliness is slow, the operation and use requirements cannot be met, and data structures of data collected in each system are possibly different, so that the service data is inconvenient to process and analyze, and the service data is not favorable for use.
Disclosure of Invention
The present application aims to solve at least one of the above technical drawbacks. The technical scheme adopted by the application is as follows:
in a first aspect, an embodiment of the present application provides a data sending method, where the method includes:
acquiring binlog data from a database of a monitored system;
analyzing binlog data to obtain source data;
serializing the source data to obtain serialized data based on whether the schema matched with the source data exists or not, wherein the serialized data contains identification information of the schema;
the serialized data was published to Kafka.
Optionally, serializing the source data based on whether there is a schema matching the source data, including:
determining that a schema matched with the source data exists in the local cache;
if the source data exists, serializing the source data by using the schema;
and if not, registering the schema, and serializing the source data by using the schema.
Optionally, registering the schema includes:
determining whether a registered schema with the same name of the schema exists;
if not, registering the schema;
if yes, determining whether a data structure corresponding to the registered schema is consistent with a data structure of the source data, and if not, registering the schema; and if so, taking the registered schema as the schema.
In a second aspect, an embodiment of the present application provides a method for writing data, where the method includes:
obtaining serialized data from Kafka;
determining a schema corresponding to the identification information carried by the serialized data;
analyzing the serialized data based on the schema to obtain source data;
and determining whether an ES model corresponding to the source data exists, and if so, writing the ES model into the source data after format conversion.
Optionally, the writing method further includes:
determining whether a target data field exists in the source data, wherein the target data field does not exist in the ES model;
and if so, expanding the target data field of the ES model.
Optionally, the writing method further includes:
and if the ES model corresponding to the source data does not exist, determining the source data as abnormal data.
Optionally, the writing method further includes:
and determining whether an ES model corresponding to the abnormal data exists according to a preset retry strategy.
In a third aspect, an embodiment of the present application provides an apparatus for transmitting data, where the apparatus includes:
the data acquisition module is used for acquiring binlog data from a database of the monitored system;
the source data acquisition module is used for analyzing the binlog data to acquire source data;
the serialization module is used for serializing the source data to obtain serialized data based on whether the schema matched with the source data exists or not, and identification information of the schema exists in the serialized data;
and the data issuing module is used for issuing the serialized data to the Kafka.
Optionally, the serialization module is specifically configured to:
determining that a schema matched with the source data exists in the local cache;
if the source data exists, serializing the source data by using the schema;
and if not, registering the schema, and serializing the source data by using the schema.
Optionally, when registering the schema, the serialization module is specifically configured to:
determining whether a registered schema with the same name of the schema exists;
if not, registering the schema;
if yes, determining whether a data structure corresponding to the registered schema is consistent with a data structure of the source data, and if not, registering the schema; and if so, taking the registered schema as the schema.
In a fourth aspect, an embodiment of the present application provides an apparatus for writing data, where the apparatus includes:
a data acquisition module for acquiring serialized data from Kafka;
the schema determining module is used for determining a schema corresponding to the identification information carried by the serialized data;
the data analysis module is used for analyzing the serialized data based on the schema to obtain source data;
and the data writing module is used for determining whether an ES model corresponding to the source data exists, and if so, writing the ES model after performing format conversion on the source data.
Optionally, the apparatus further includes a field extension module, where the field extension module is configured to:
determining whether a target data field exists in the source data, wherein the target data field does not exist in the ES model;
and if so, expanding the target data field of the ES model.
Optionally, the apparatus further comprises:
and the abnormal data determining module is used for determining the source data as abnormal data when the ES model corresponding to the source data does not exist.
Optionally, the apparatus further comprises:
and the abnormal data retry module is used for determining whether an ES model corresponding to the abnormal data exists or not according to a preset retry strategy.
In a fifth aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory;
a memory for storing operating instructions;
a processor configured to perform the method as shown in any implementation of the first aspect or any implementation of the second aspect of the present application by calling an operation instruction.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the method shown in any of the embodiments of the first aspect or any of the embodiments of the second aspect of the present application.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
according to the scheme provided by the embodiment of the application, serialized data is obtained from Kafka, and a schema corresponding to identification information carried by the serialized data is determined; analyzing the serialized data based on the schema to obtain source data, determining whether an ES model corresponding to the source data exists, and if so, performing format conversion on the source data and writing the source data into the ES model. Based on the scheme, the serialized data can be obtained from the Kafka in real time, the corresponding schema is determined based on the identification information, the serialized data is analyzed based on the schema to obtain the source data, the source data is subjected to format conversion and then written into the corresponding ES model according to the mapping rule, the data format consistency of the data from the acquisition to the processing process is ensured, and the real-time analysis and conversion of the service data are realized.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of a data transmission method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a specific implementation of a data transmission method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a data writing method according to an embodiment of the present application;
FIG. 4 is a block diagram of the overall logic architecture for data collection, transmission, and writing provided by an embodiment of the present application;
fig. 5 is a schematic structural diagram of a data transmitting apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a data writing apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
In the prior art, the collected source data is generally a JavaScript Object Notation (JSON) character set, and the structure rule of the data is uncertain, so that the data is inconvenient to analyze and use, and the transmission process cannot monitor the change process of the data structure.
In the prior art, the method of periodically acquiring service data in batches is mostly adopted for processing, the time efficiency is slow, the operation and use requirements cannot be met,
the embodiments of the present application provide a method, an apparatus, an electronic device, and a readable storage medium for sending and writing data, which aim to solve at least one of the above technical problems in the prior art.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 1 shows a schematic flow diagram of a data transmission method provided in an embodiment of the present application, and as shown in fig. 1, the method mainly includes:
step S110: obtaining binlog (binary log) data from a database of a monitored system;
step S120: analyzing binlog data to obtain source data;
step S130: serializing the source data to obtain serialized data based on whether a schema (mode) matched with the source data exists or not, wherein identification information of the schema exists in the serialized data;
step S140: the serialized data was published to Kafka.
In the embodiment of the application, data change in a database of a monitored system can be monitored in real time based on the Canal, and source data can be acquired by analyzing binlog.
In the embodiment of the application, after the source data is analyzed, the avro serialization can be performed according to the schema matched with the source data to obtain the serialized data, and the serialized data is issued to Kafka.
In the embodiment of the application, the identification information is used for identifying the schema, so that when deserializing the serialized data, the schema is obtained through the identification information, and the serialized data is analyzed to obtain the source data.
In the embodiment of the application, the serialized data is issued to the Kafka, so that a data consumer can acquire the data in time, and the timeliness of data acquisition can be improved to meet the operation and use requirements.
In the method provided by the embodiment of the application, binlog data acquired from a database of a monitored system is analyzed to obtain source data, and the source data is serialized on the basis of schema matched with the source data, so that serialized data is obtained and sent to Kafka. Based on the scheme, the acquired data can be serialized and then issued to the Kafka, the schema used in the serialization is identified, the timeliness of the service data is favorably improved, a basis is provided for acquiring the serialized data from the Kafka, analyzing the serialized data by using the corresponding schema, and converting the source data into a corresponding data format, the service data is favorably processed and analyzed, and the service data is favorably used.
In an optional manner of the embodiment of the present application, serializing source data based on whether there is a schema matched with the source data includes:
determining that a schema matched with the source data exists in the local cache;
if the source data exists, serializing the source data by using the schema;
and if not, registering the schema, and serializing the source data by using the schema.
In the embodiment of the application, when source data are serialized on the basis of the schema, whether the schema matched with the source data exists in the local cache can be determined, and if the schema matched with the source data exists in the local cache, the schema matched with the source data in the cache can be used for processing.
If the schema matched with the source data does not exist, the schema matched with the schema and the source data can be registered, and the source data is serialized through the registered schema. In actual use, the registered schema can be stored in a local cache so as to reduce repeated registration of the schema.
In an optional mode of the embodiment of the present application, registering the schema includes:
determining whether a registered schema with the same name of the schema exists;
if not, registering the schema;
if yes, determining whether a data structure corresponding to the registered schema is consistent with a data structure of the source data, and if not, registering the schema; and if so, taking the registered schema as the schema. In the embodiment of the application, the current schema can be registered through the schema registration center. Specifically, it may be determined whether a registered schema with the same name as the current schema exists in the schema registry, and if not, the current schema may be registered; if so, determining whether the data structure corresponding to the schema is consistent with the data structure of the source data corresponding to the current schema, and if so, serializing by using the registered schema; and if not, registering the current schema.
Fig. 2 is a flowchart illustrating a specific implementation of the data sending method according to the embodiment of the present application, where a target mysql, that is, changed data collected from a database of a monitored system, is collected. And (4) acquiring data by the Canal, namely acquiring source data by acquiring and analyzing binlog. And obtaining information such as a library name, a table name, an operation type and the like, namely obtaining information required when matching the schema. And obtaining the schema used this time, namely obtaining the schema matched with the source data format. And filling the data into the schema, namely filling the source data into the schema. Performing avro coding, namely serializing the source data filled into the schema. And sending the data to the kafka, namely issuing the serialized data to the kafka.
One table corresponds to one schema, and the naming rule of the schema name is as follows: the library name _ table name, namely the data table structure of various source data and the schema are respectively named through the library name and the table name. Filling all columns in the data into newschchema, and inquiring whether the table oldschchema exists or not through the schema name, namely when the source data is matched with the schema, filling all fields of the source data into the schema, naming the schema through the source data, and searching whether the schema exists in the cache or not through the name of the schema corresponding to the source data. If the hashcodes of the oldschema and the newschema are the same, that is, the names of the schemas corresponding to the source data and the schemas exist in the cache, the hash values of the oldschemas and the source data can be compared to determine whether the hash values are the same, and if the hash values are the same, the schema in the cache is used for filling the source data.
If the schemas with the same names of the schemas corresponding to the source data do not exist in the cache, or the schemas with the same names of the schemas corresponding to the source data but different hash values exist in the cache, the schemas can be registered with the schema registry. And whether the schema registry has the version with the same structure as the newschema structure or not is judged, namely, the registry is inquired about the version information of the schema with the same structure as the schema corresponding to the source data. And acquiring the registered existschema through the version, namely acquiring the schema matched with the source data through the version information when the registry has the version information of the schema with the same structure as the schema corresponding to the source data. And when the schema registration center does not have the version information of the schema with the same structure as the schema corresponding to the source data, registering the schema corresponding to the source data. After the schema registry completes the registration of the schema, the source data may be populated into the schema.
Fig. 3 shows a schematic flowchart of a data writing method provided in an embodiment of the present application, and as shown in fig. 3, the method mainly includes:
step S210: obtaining serialized data from Kafka;
step S220: determining a schema corresponding to the identification information carried by the serialized data;
step S230: analyzing the serialized data based on the schema to obtain source data;
step S240: and determining whether an ES (elastic search) model corresponding to the source data exists, and if so, writing the ES model after format conversion is carried out on the source data.
In the embodiment of the application, serialized data can be obtained from Kafka, and corresponding schema is obtained through the identification information, so that deserialization of the serialized data is realized based on the schema, and source data is obtained.
In the embodiment of the application, the identification information is used for identifying the schema, and the schema matched with the source data can be obtained according to the corresponding relationship between the identification information and the schema, so as to realize the analysis of the source data.
In practical use, deserialization analysis of serialized data can be realized based on the kafka deserializationschema.
In the embodiment of the application, the serialized data can be acquired from the Kafka, and the serialized data is analyzed in time to obtain the source data, so that a data consumer can acquire the source data in time, and the timeliness of data acquisition can be improved to meet the operation and use requirements.
In the embodiment of the application, after the source data is acquired, if an ES model corresponding to the source data exists, the source data may be written into the ES model after format conversion.
When the source data are sucked into the ES model, the data time sequence in the window time can be ensured by a Flink stream data window overlapping watermark technology, and the source data are written into the ES model in batches.
Because the source data is written into the ES model after being subjected to format conversion, the data formats are unified, the data can be obtained from the ES model for processing and analysis, and the use of the data is facilitated.
The method provided by the embodiment of the application obtains the serialized data from the Kafka and determines the schema corresponding to the identification information carried by the serialized data; analyzing the serialized data based on the schema to obtain source data, determining whether an ES model corresponding to the source data exists, and if so, performing format conversion on the source data and writing the source data into the ES model. Based on the scheme, the serialized data can be obtained from the Kafka in real time, the corresponding schema is determined based on the identification information, the serialized data is analyzed based on the schema to obtain the source data, the source data is subjected to format conversion and then written into the corresponding ES model according to the mapping rule, the data format consistency of the data from the acquisition to the processing process is ensured, and the real-time analysis and conversion of the service data are realized.
In an optional manner of the embodiment of the present application, the method further includes:
determining whether a target data field exists in the source data, wherein the target data field does not exist in the ES model;
and if so, expanding the target data field of the ES model.
In the embodiment of the application, the source data may include a target data field, the ES model does not include the target data field, and the target data field may be extended for the ES model.
In an optional manner of the embodiment of the present application, the method further includes:
and if the ES model corresponding to the source data does not exist, determining the source data as abnormal data.
In the embodiment of the application, the source data and the ES model are pre-configured with a corresponding relationship, and if the ES model corresponding to the source data does not exist, the source data can be determined to be abnormal data. Abnormal data cannot be written directly to the ES model.
In an optional manner of the embodiment of the present application, the method further includes:
and determining whether an ES model corresponding to the abnormal data exists according to a preset retry strategy.
In this embodiment of the application, a retry strategy may be preset for the abnormal data, for example, after each preset duration, whether an ES model corresponding to the abnormal data exists is determined again, and if the number of retries exceeds a specified number, the abnormal data still cannot be written into the ES model, and the abnormal data may be defined as dirty data.
In the embodiment of the application, a data management system can be deployed to realize data acquisition and writing into the ES model, and the change of the data structure can be monitored and the current data structure of the data can be displayed.
Fig. 4 is a diagram illustrating an overall logical architecture of data collection, transmission, and writing provided by an embodiment of the present application.
A data sourcing system includes a plurality of business subsystems that generate source data during a business process. The data acquisition and transmission device module subscribes and monitors source data provided by a data source system, and corresponds to the data transmission method provided by the embodiment of the application. The source data are serialized and then are issued to a kafka acquisition bus, and the Flink data analysis and writing module acquires a serialized data stream by subscribing the kafka bus, analyzes an anti-sequence and writes data matched with an ES model through a metadata rule into an ES index, corresponding to the data writing method provided by the embodiment of the application. After the data is processed by the data writing method in the embodiment of the application, the batch processing module provides secondary extraction of the data of the ES library, and the oas-app-application data service module can acquire real-time data through the ES library to provide data services outwards.
Based on the same principle as the method shown in fig. 1, fig. 5 shows a schematic structural diagram of a data transmitting apparatus provided in an embodiment of the present application, and as shown in fig. 5, the data transmitting apparatus 30 may include:
the data acquisition module 310 is configured to obtain binlog data from a database of the monitored system;
a source data obtaining module 320, configured to analyze binlog data to obtain source data;
the serialization module 330 is configured to serialize the source data to obtain serialized data based on whether a schema matched with the source data exists, where identification information of the schema exists in the serialized data;
and a data publishing module 340, configured to publish the serialized data to Kafka.
In the apparatus provided in the embodiment of the present application, binlog data obtained from a database of a monitored system is analyzed to obtain source data, and the source data is serialized based on schema matched with the source data, so that serialized data is obtained and sent to Kafka. Based on the scheme, the acquired data can be serialized and then issued to the Kafka, the schema used in the serialization is identified, the timeliness of the service data is favorably improved, a basis is provided for acquiring the serialized data from the Kafka, analyzing the serialized data by using the corresponding schema, and converting the source data into a corresponding data format, the service data is favorably processed and analyzed, and the service data is favorably used.
Optionally, the serialization module is specifically configured to:
determining that a schema matched with the source data exists in the local cache;
if the source data exists, serializing the source data by using the schema;
and if not, registering the schema, and serializing the source data by using the schema.
Optionally, when registering the schema, the serialization module is specifically configured to:
determining whether version information corresponding to the schema exists;
if yes, registering the schema through version information;
and if the schema does not exist, creating version information corresponding to the schema, and registering the schema through the version information.
It is to be understood that the above modules of the data transmission apparatus in the present embodiment have functions of implementing the corresponding steps of the data transmission method in the embodiment shown in fig. 1. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above. The modules can be software and/or hardware, and each module can be implemented independently or by integrating a plurality of modules. For the functional description of each module of the above data sending apparatus, reference may be specifically made to the corresponding description of the data sending method in the embodiment shown in fig. 1, and details are not repeated here.
Based on the same principle as the method shown in fig. 3, fig. 6 shows a schematic structural diagram of a data writing device provided by an embodiment of the present application, and as shown in fig. 6, the data writing device 40 may include:
a data acquisition module 410 for acquiring serialized data from Kafka;
the schema determining module 420 is configured to determine a schema corresponding to the identification information carried by the serialized data;
the data analysis module 430 is configured to analyze the serialized data based on the schema to obtain source data;
the data writing module 440 is configured to determine whether an ES model corresponding to the source data exists, and if so, perform format conversion on the source data and write the source data into the ES model.
The device provided by the embodiment of the application acquires the serialized data from the Kafka and determines the schema corresponding to the identification information carried by the serialized data; analyzing the serialized data based on the schema to obtain source data, determining whether an ES model corresponding to the source data exists, and if so, performing format conversion on the source data and writing the source data into the ES model. Based on the scheme, the serialized data can be obtained from the Kafka in real time, the corresponding schema is determined based on the identification information, the serialized data is analyzed based on the schema to obtain the source data, the source data is subjected to format conversion and then written into the corresponding ES model according to the mapping rule, the data format consistency of the data from the acquisition to the processing process is ensured, and the real-time analysis and conversion of the service data are realized.
Optionally, the apparatus further includes a field extension module, where the field extension module is configured to:
determining whether a target data field exists in the source data, wherein the target data field does not exist in the ES model;
and if so, expanding the target data field of the ES model.
Optionally, the apparatus further comprises:
and the abnormal data determining module is used for determining the source data as abnormal data when the ES model corresponding to the source data does not exist.
Optionally, the apparatus further comprises:
and the abnormal data retry module is used for determining whether an ES model corresponding to the abnormal data exists or not according to a preset retry strategy.
It is to be understood that the above modules of the data writing apparatus in the present embodiment have functions of implementing the corresponding steps of the data writing method in the embodiment shown in fig. 3. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above. The modules can be software and/or hardware, and each module can be implemented independently or by integrating a plurality of modules. For the functional description of each module of the data writing device, reference may be specifically made to the corresponding description of the data writing method in the embodiment shown in fig. 3, and details are not repeated here.
The embodiment of the application provides an electronic device, which comprises a processor and a memory;
a memory for storing operating instructions;
and the processor is used for executing the method provided by any embodiment of the application by calling the operation instruction.
As an example, fig. 7 shows a schematic structural diagram of an electronic device to which an embodiment of the present application is applicable, and as shown in fig. 7, the electronic device 2000 includes: a processor 2001 and a memory 2003. Wherein the processor 2001 is coupled to a memory 2003, such as via a bus 2002. Optionally, the electronic device 2000 may also include a transceiver 2004. It should be noted that the transceiver 2004 is not limited to one in practical applications, and the structure of the electronic device 2000 is not limited to the embodiment of the present application.
The processor 2001 is applied to the embodiment of the present application to implement the method shown in the above method embodiment. The transceiver 2004 may include a receiver and a transmitter, and the transceiver 2004 is applied to the embodiments of the present application to implement the functions of the electronic device of the embodiments of the present application to communicate with other devices when executed.
The Processor 2001 may be a CPU (Central Processing Unit), general Processor, DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), FPGA (Field Programmable Gate Array) or other Programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 2001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs and microprocessors, and the like.
Bus 2002 may include a path that conveys information between the aforementioned components. The bus 2002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 2002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
The Memory 2003 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.
Optionally, the memory 2003 is used for storing application program code for performing the disclosed aspects, and is controlled in execution by the processor 2001. The processor 2001 is used to execute the application program code stored in the memory 2003 to implement the methods provided in any of the embodiments of the present application.
The electronic device provided by the embodiment of the application is applicable to any embodiment of the method, and is not described herein again.
Compared with the prior art, the embodiment of the application provides the electronic equipment, which is characterized in that serialized data are obtained from Kafka, and a schema corresponding to identification information carried by the serialized data is determined; analyzing the serialized data based on the schema to obtain source data, determining whether an ES model corresponding to the source data exists, and if so, performing format conversion on the source data and writing the source data into the ES model. Based on the scheme, the serialized data can be obtained from the Kafka in real time, the corresponding schema is determined based on the identification information, the serialized data is analyzed based on the schema to obtain the source data, the source data is subjected to format conversion and then written into the corresponding ES model according to the mapping rule, the data format consistency of the data from the acquisition to the processing process is ensured, and the real-time analysis and conversion of the service data are realized.
The present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the method shown in the above method embodiments.
The computer-readable storage medium provided in the embodiments of the present application is applicable to any of the embodiments of the foregoing method, and is not described herein again.
Compared with the prior art, the embodiment of the application provides a computer-readable storage medium, which is used for acquiring serialized data from Kafka and determining a schema corresponding to identification information carried by the serialized data; analyzing the serialized data based on the schema to obtain source data, determining whether an ES model corresponding to the source data exists, and if so, performing format conversion on the source data and writing the source data into the ES model. Based on the scheme, the serialized data can be obtained from the Kafka in real time, the corresponding schema is determined based on the identification information, the serialized data is analyzed based on the schema to obtain the source data, the source data is subjected to format conversion and then written into the corresponding ES model according to the mapping rule, the data format consistency of the data from the acquisition to the processing process is ensured, and the real-time analysis and conversion of the service data are realized.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (11)

1. A method for transmitting data, comprising:
acquiring binary log binlog data from a database of a monitored system;
analyzing the binlog data to obtain source data;
serializing the source data to obtain serialized data based on whether a mode schema matched with the source data exists or not, wherein the serialized data contains identification information of the schema;
the serialized data was published to Kafka.
2. The method of claim 1, wherein serializing the source data based on whether there is a schema that matches the source data comprises:
determining that the schema matched with the source data exists in the local cache;
if so, serializing the source data by using the schema;
and if not, registering the schema, and serializing the source data by using the schema.
3. The method of claim 2, wherein registering the schema comprises:
determining whether a registered schema with the same name of the schema exists;
if not, registering the schema;
if yes, determining whether a data structure corresponding to the registered schema is consistent with the data structure of the source data, and if not, registering the schema; and if so, taking the registered schema as the schema.
4. A method for writing data, comprising:
obtaining serialized data from Kafka;
determining a schema corresponding to the identification information carried by the serialized data;
analyzing the serialized data based on the schema to obtain source data;
and determining whether a data copy ES model corresponding to the source data exists, if so, performing format conversion on the source data and writing the source data into the ES model.
5. The method of claim 4, further comprising:
determining whether a target data field is present in the source data, the target data field not being present in the ES model;
and if so, expanding the target data field of the ES model.
6. The method according to claim 4 or 5, characterized in that the method further comprises:
and if the ES model corresponding to the source data does not exist, determining the source data as abnormal data.
7. The method of claim 6, further comprising:
and determining whether an ES model corresponding to the abnormal data exists or not according to a preset retry strategy.
8. An apparatus for transmitting data, comprising:
the data acquisition module is used for acquiring binlog data from a database of the monitored system;
the source data acquisition module is used for analyzing the binlog data to acquire source data;
the serialization module is used for serializing the source data to obtain serialized data based on whether the schema matched with the source data exists or not, wherein the serialized data contains the identification information of the schema;
and the data issuing module is used for issuing the serialized data to the Kafka.
9. An apparatus for writing data, comprising:
a data acquisition module for acquiring serialized data from Kafka;
the schema determining module is used for determining a schema corresponding to the identification information carried by the serialized data;
the data analysis module is used for analyzing the serialized data based on the schema to obtain source data;
and the data writing module is used for determining whether an ES model corresponding to the source data exists, and if so, writing the ES model after performing format conversion on the source data.
10. An electronic device comprising a processor and a memory;
the memory is used for storing operation instructions;
the processor is used for executing the method of any one of claims 1-7 by calling the operation instruction.
11. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the method of any one of claims 1-7.
CN202010968620.XA 2020-09-15 2020-09-15 Data sending and writing method and device, electronic equipment and readable storage medium Pending CN112182036A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010968620.XA CN112182036A (en) 2020-09-15 2020-09-15 Data sending and writing method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010968620.XA CN112182036A (en) 2020-09-15 2020-09-15 Data sending and writing method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN112182036A true CN112182036A (en) 2021-01-05

Family

ID=73921176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010968620.XA Pending CN112182036A (en) 2020-09-15 2020-09-15 Data sending and writing method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN112182036A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112685426A (en) * 2021-01-21 2021-04-20 浪潮云信息技术股份公司 NiFi-based Kafka consumption NewSQL CDC stream data conversion method
CN113190528A (en) * 2021-04-21 2021-07-30 中国海洋大学 Parallel distributed big data architecture construction method and system
CN113704320A (en) * 2021-08-09 2021-11-26 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium
CN113821532A (en) * 2021-09-29 2021-12-21 重庆富民银行股份有限公司 System and method for synchronizing data to heterogeneous data source based on mysql
CN113836579A (en) * 2021-09-26 2021-12-24 多点生活(成都)科技有限公司 Data processing method and device, electronic equipment and storage medium
US11861425B2 (en) 2021-05-19 2024-01-02 Red Hat, Inc. Runtime mapping of asynchronous application programming interface messaging topics and schemas

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109254982A (en) * 2018-08-31 2019-01-22 杭州安恒信息技术股份有限公司 A kind of stream data processing method, system, device and computer readable storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109254982A (en) * 2018-08-31 2019-01-22 杭州安恒信息技术股份有限公司 A kind of stream data processing method, system, device and computer readable storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112685426A (en) * 2021-01-21 2021-04-20 浪潮云信息技术股份公司 NiFi-based Kafka consumption NewSQL CDC stream data conversion method
CN113190528A (en) * 2021-04-21 2021-07-30 中国海洋大学 Parallel distributed big data architecture construction method and system
CN113190528B (en) * 2021-04-21 2022-12-06 中国海洋大学 Parallel distributed big data architecture construction method and system
US11861425B2 (en) 2021-05-19 2024-01-02 Red Hat, Inc. Runtime mapping of asynchronous application programming interface messaging topics and schemas
CN113704320A (en) * 2021-08-09 2021-11-26 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium
CN113704320B (en) * 2021-08-09 2024-01-02 北京达佳互联信息技术有限公司 Data processing method, device, electronic equipment and storage medium
CN113836579A (en) * 2021-09-26 2021-12-24 多点生活(成都)科技有限公司 Data processing method and device, electronic equipment and storage medium
CN113836579B (en) * 2021-09-26 2024-04-09 多点生活(成都)科技有限公司 Data processing method and device, electronic equipment and storage medium
CN113821532A (en) * 2021-09-29 2021-12-21 重庆富民银行股份有限公司 System and method for synchronizing data to heterogeneous data source based on mysql

Similar Documents

Publication Publication Date Title
CN112182036A (en) Data sending and writing method and device, electronic equipment and readable storage medium
CN108932313B (en) Data processing method and device, electronic equipment and storage medium
EP3767483A1 (en) Method, device, system, and server for image retrieval, and storage medium
CN108334609B (en) Method, device, equipment and storage medium for realizing JSON format data access in Oracle
CN110263222B (en) Data acquisition method, device, equipment and medium
CN106648569B (en) Target serialization realization method and device
CN111143446A (en) Data structure conversion processing method and device of data object and electronic equipment
CN111737564A (en) Information query method, device, equipment and medium
CN111950857A (en) Index system management method and device based on service indexes and electronic equipment
US20190197123A1 (en) Metadata storage method, device and server
CN111258905A (en) Defect positioning method and device, electronic equipment and computer readable storage medium
CN113407565B (en) Cross-database data query method, device and equipment
CN114116842A (en) Multi-dimensional medical data real-time acquisition method and device, electronic equipment and storage medium
CN112860802A (en) Database operation statement processing method and device and electronic equipment
CN112506490A (en) Interface generation method and device, electronic equipment and storage medium
CN111241137A (en) Data processing method and device, electronic equipment and storage medium
CN115114297A (en) Data lightweight storage and search method and device, electronic equipment and storage medium
CN114328981A (en) Knowledge graph establishing and data obtaining method and device based on mode mapping
CN110471708B (en) Method and device for acquiring configuration items based on reusable components
CN108629003B (en) Content loading method and device
CN110569243A (en) data query method, data query plug-in and data query server
CN110866005A (en) Internet of things data acquisition management method and system, storage medium and terminal
CN112181943A (en) Characteristic data acquisition method and device, electronic equipment and readable storage medium
CN111274051B (en) Data processing method and device, electronic equipment and computer readable storage medium
CN113486627B (en) Single number generation method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination