CN116303717A - Data acquisition format protocol conversion method and device - Google Patents

Data acquisition format protocol conversion method and device Download PDF

Info

Publication number
CN116303717A
CN116303717A CN202310246430.0A CN202310246430A CN116303717A CN 116303717 A CN116303717 A CN 116303717A CN 202310246430 A CN202310246430 A CN 202310246430A CN 116303717 A CN116303717 A CN 116303717A
Authority
CN
China
Prior art keywords
data
format
protocol
csv
serialization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310246430.0A
Other languages
Chinese (zh)
Inventor
张志华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
Tianyi Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Cloud Technology Co Ltd filed Critical Tianyi Cloud Technology Co Ltd
Priority to CN202310246430.0A priority Critical patent/CN116303717A/en
Publication of CN116303717A publication Critical patent/CN116303717A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Communication Control (AREA)

Abstract

The application relates to a data acquisition format protocol conversion method, a device, electronic equipment and a medium, wherein the method comprises the following steps: converting the protocol of the first acquisition format into a protocol of the-DATA-CSV format by constructing a DATA-CSV format including serialization and deserialization of Kafka and constructing a format of Flink; wherein constructing the serialization and the deserialization comprising Kafka comprises: constructing a serialization and deserialization class comprising a Kafka data stream by constructing a serialization module; the method comprises the steps of obtaining a created self-defined data format through creating a POJO serialization module, and converting the self-defined data format into a POJO class with a field name corresponding to the self-defined data format; converting the POJO class into a-DATA-CSV format through a conversion module; the first acquisition format protocol includes at least: the protocols of the ca_json, maxwell_json and debezium_json. The disk utilization efficiency of data acquisition and storage can be improved, and the development work efficiency is improved.

Description

Data acquisition format protocol conversion method and device
Technical Field
The present disclosure relates to the field of data protocols, and in particular, to a method, an apparatus, an electronic device, and a medium for converting a data acquisition format protocol.
Background
Regardless of the stage of the enterprise digital transformation, data acquisition synchronization is the most practical highest frequency requirement of the enterprise.
On the one hand, the demands of the fine operation of enterprises on real-time data are expanding continuously, and the real-time data can help the enterprises to collect data from sensors such as machine rotation speed, temperature, pressure, flow and the like in the industrial field, stock quotations, server logs, traditional databases and even Hadoop systems at the fastest speed. Valuable information is mined in a real-time or near real-time mode, and the method has great significance for enterprises to quickly make decisions.
On the other hand, with the intelligent upgrading of production equipment and related technologies and the changing demands of global markets at any time, the requirements of the industry on the acquisition and calculation related standards of real-time data are improved to the second level, the current batch processing data architecture is difficult to deal with, and a new generation of real-time data architecture system needs to be constructed to realize 'shift acceleration'.
In the data acquisition real-time synchronization scheme, a message middleware is used for peak clipping and valley filling, and a scene of multi-to-multi data synchronization exists, three formats of a Canal_json, a maxwell_json and a debezum_json exist in the current market of data acquisition formats, but the data acquisition formats have the defects that the data redundancy is relatively large, the unnecessary disk space is occupied, how to effectively simplify disk reading and writing, the disk IO is reduced, and the memory consumption is reduced, so that the reduction of the consumption of the whole resource pool of a server is a problem to be solved.
Disclosure of Invention
Based on the above problems, the application provides a data acquisition format protocol conversion method, a device, an electronic device and a medium.
In a first aspect, an embodiment of the present application provides a data acquisition format protocol conversion method, including:
converting the protocol of the first acquisition format into a protocol of the-DATA-CSV format by constructing a DATA-CSV format including serialization and deserialization of Kafka and constructing a format of Flink;
wherein constructing the serialization and the deserialization comprising Kafka comprises:
constructing a serialization and deserialization class comprising a Kafka data stream by constructing a serialization module; the method comprises the steps of obtaining a created self-defined data format through creating a POJO serialization module, and converting the self-defined data format into a POJO class with a field name corresponding to the self-defined data format; converting the POJO class into a-DATA-CSV format through a conversion module;
the first acquisition format protocol includes at least: the protocols of the ca_json, maxwell_json and debezium_json.
Further, in the DATA acquisition format protocol conversion method, constructing a format-DATA-CSV format of the Flink includes:
extracting data in a protocol of a first acquisition format from formats and connectors of a table-api of the link;
converting the DATA in the first acquisition format protocol through the Flink SQL to obtain DATA-DATA-CSV DATA;
SQL processing is carried out on the-DATA-CSV DATA;
wherein the-DATA-CSV format in the protocol of the-DATA-CSV format is a separate DATA format.
Further, in the DATA acquisition format protocol conversion method, the rules of the protocol of the-DATA-CSV format at least comprise the following seven types:
and (3) a step of: the head is not left blank and is in row units;
and II: the column names can be contained or not, and the column names are included;
thirdly,: one row of data does not span rows and has no empty rows;
fourth, the method comprises the following steps: the presence of a column with a non-visible character as a separator is also expressed;
fifth step: if ASCII codes exist in the column content, the column content is replaced by escape characters, and the field value is contained by a half-angle quotation mark;
sixth,: when the file is read and written, the ASCII code operation rules are reciprocal;
seventh,: the inner code format is not limited.
Further, in the above-mentioned DATA acquisition format protocol conversion method, the escape requirements in the protocol of the-DATA-CSV format at least include the following three kinds:
1. the method comprises the steps that an escape character is added in front of fields containing ASCII codes corresponding to types, ASCII codes corresponding to key and line feed symbols;
2. the type in the field corresponds to ASCII codes, and an escape character is added in front of the ASCII codes corresponding to key to realize the transcoding of the character quotation marks;
3. the corresponding ASCII code one-to-one correspondence is used for each specific field required for synchronization.
Further, in the above data acquisition format protocol conversion method, for specific fields required for synchronization, corresponding ASCII codes are used for one-to-one correspondence, respectively, including:
the data source corresponds to (ACII code/lf=0x0b);
the log acquisition time ts_ms corresponds to (ACII code/lf=0x0c);
the operation type op corresponds to (ACII code/lf=0x0d);
the metadata schema corresponds to (ACII code/lf=0x0f).
Further, in the above-mentioned DATA acquisition format protocol conversion method, the file in the protocol of-DATA-CSV format is a text file divided by a line-feed character;
the text file stores the form data in plain text form;
a text file is a sequence of characters;
the text file is composed of a plurality of records, and the records are separated by a certain line-feed symbol; each record consists of fields, and separators among the fields are characters or character strings; the plurality of records have identical field sequences;
the text file is recorded by using WORDPAD or a notepad;
each record in the text file is a line termination line feed (ACII code/lf=0x0a) or a carriage return line feed (ASCII code/crlf=0x0d0a);
wherein 0x0A represents the character '\n',0x0D0A represents the character string "\r\n" in c#.
Further, in the above method for converting a DATA acquisition format protocol, the field value in the protocol of the-DATA-CSV format contains multiple types, and each type is provided with a corresponding ASCII code;
-the bracketing of field values in the protocol in DATA-CSV format with type corresponding ASCII codes, the inclusion of an item in a row being empty, bracketing with type corresponding ASCII codes;
if the field in the protocol of the DATA-CSV format contains the ASCII code corresponding to the type, the ASCII code corresponding to the type is bracketed, and the ASCII code corresponding to the type is escaped;
the value of a field in the protocol in the DATA-CSV format, if containing the type, corresponds to an ASCII code, and the double write type corresponds to an ASCII code.
In a second aspect, an embodiment of the present application further provides a data acquisition format protocol conversion device, including: the modules are constructed so that the modules are connected,
the construction module is used for constructing a data_data_CSV format comprising serialization and deserialization of Kafka and constructing a format of Flink, and converting a protocol of the first acquisition format into a protocol of a-DATA-CSV format;
wherein the construction module is used for constructing serialization and deserialization comprising Kafka, and comprises:
a serialization module is constructed and used for constructing serialization and inverse serialization classes comprising Kafka data streams; the POJO serialization module is used for acquiring the created self-defined data format, and converting the self-defined data format into a POJO class with a field name corresponding to the self-defined data format; the conversion module is used for converting the POJO class into a-DATA-CSV format;
the first acquisition format protocol includes at least: the protocols of the ca_json, maxwell_json and debezium_json.
In a third aspect, an embodiment of the present invention further provides an electronic device, including: a processor and a memory;
the processor is configured to execute a data acquisition format protocol conversion method according to any one of the above claims by calling a program or instructions stored in the memory.
In a fourth aspect, embodiments of the present invention further provide a computer-readable storage medium storing a program or instructions that cause a computer to perform a data acquisition format protocol conversion method as set forth in any one of the above.
The embodiment of the application has the advantages that: the data acquisition format protocol based on the data synchronous acquisition can be converted through maxwell-json, channel-json and debenzium-json protocols, so that the data acquisition format protocol based on the data synchronous acquisition can be compatible with an acquisition synchronous frame protocol, and the transmission efficiency of the data synchronous can be improved by carrying out the data synchronous; the data acquisition format protocol based on data acquisition synchronization solves the problems of high data network bandwidth consumption, high disk read-write rate and high disk storage occupation ratio of a data synchronization acquisition system in the related technology; and the data synchronization-based kafka serialization and anti-serialization method can further compress the disk space through data serialization processing, so that the transmission efficiency of data synchronization is improved.
Drawings
In order to more clearly illustrate the technical solutions of embodiments or conventional techniques of the present application, the drawings required for the descriptions of the embodiments or conventional techniques will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.
Fig. 1 is a schematic diagram of a data acquisition format protocol conversion method according to an embodiment of the present application;
fig. 2 is a schematic diagram two of a data acquisition format protocol conversion method provided in an embodiment of the present application;
fig. 3 is a schematic diagram of a protocol conversion device for data acquisition format according to an embodiment of the present application;
fig. 4 is a schematic block diagram of an electronic device provided by an embodiment of the present disclosure.
Detailed Description
In order to make the above objects, features and advantages of the present application more comprehensible, embodiments accompanied with figures are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar modifications can be made by those skilled in the art without departing from the spirit of the application, and therefore the application is not limited to the specific embodiments disclosed below.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
The consistent hashing algorithm referred to in this application is first described below.
Fig. 1 is a schematic diagram of a data acquisition format protocol conversion method according to an embodiment of the present application.
In a first aspect, an embodiment of the present application provides a data acquisition format protocol conversion method, including:
converting the protocol of the first acquisition format into a protocol of the-DATA-CSV format by constructing a DATA-CSV format including serialization and deserialization of Kafka and constructing a format of Flink;
specifically, in the embodiment of the application, the-DATA-CSV format by constructing the format including the serialization and the deserialization of Kafka and the construction of the link is a format optimization based on the storage of the DATA acquisition synchronization message middleware, and is subsequently sourced to the Apache community.
Fig. 1 is a schematic diagram of a data acquisition format protocol conversion method according to an embodiment of the present application.
Wherein constructing includes serialization and deserialization of Kafka, and in combination with fig. 1, includes:
s101: constructing a serialization and deserialization class comprising a Kafka data stream by constructing a serialization module;
s102: the method comprises the steps of obtaining a created self-defined data format through creating a POJO serialization module, and converting the self-defined data format into a POJO class with a field name corresponding to the self-defined data format;
s103: converting the POJO class into a-DATA-CSV format through a conversion module;
the first acquisition format protocol includes at least: the protocols of the ca_json, maxwell_json and debezium_json.
Specifically, in the embodiment of the application, the serialization and the anti-serialization of Kafka are constructed, and the disk space can be compressed through the data serialization processing, so that the transmission efficiency of data synchronization is improved.
Fig. 2 is a schematic diagram two of a data acquisition format protocol conversion method according to an embodiment of the present application.
Further, in the above-mentioned DATA acquisition format protocol conversion method, a DATA-CSV format of the link is constructed, and in combination with fig. 2, the method includes three steps S201 to S203:
s201: extracting data in a protocol of a first acquisition format from formats and connectors of a table-api of the link;
s202: converting DATA in a first acquisition format protocol through the Flink SQL to obtain DATA-DATA-CSV DATA;
s203: SQL processing is carried out on the-DATA-CSV DATA;
wherein the-DATA-CSV format in the protocol of the-DATA-CSV format is a separate DATA format.
Specifically, the-DATA-CSV format in the embodiment of the present application is a split DATA format, which has field/column split (ASCII code/lf=0x05) characters and record/line split line-feed ((ASCII code/lf=0x0a).
Further, in the DATA acquisition format protocol conversion method, the rules of the protocol of the-DATA-CSV format at least comprise the following seven types:
and (3) a step of: the head is not left blank and is in row units;
and II: the column names can be contained or not, and the column names are included;
thirdly,: one row of data does not span rows and has no empty rows;
fourth, the method comprises the following steps: the presence of a column with a non-visible character as a separator is also expressed;
fifth step: if ASCII codes exist in the column content, the column content is replaced by escape characters, and the field value is contained by a half-angle quotation mark;
sixth,: when the file is read and written, the ASCII code operation rules are reciprocal;
seventh,: the inner code format is not limited.
Specifically, in the embodiment of the present application, the second column name (ACII code/lf=0x0f) in the rule is included; the fourth in the rule uses invisible characters (i.e., (ASCII code/lf=0x05)) as separators, and the column is empty to express its presence; if the fifth column content in the rule exists (ASCII code), the fifth column content is replaced by an escape character\plus ASCII code, namely (\ASCII code), namely, the field value is contained by a half-angle quotation mark (namely ""); the seventh inner code format in the rule is not limited and may be ASCII, unicode or others.
Further, in the above-mentioned DATA acquisition format protocol conversion method, the escape requirements in the protocol of the-DATA-CSV format at least include the following three kinds:
1. the method comprises the steps that an escape character is added in front of fields containing ASCII codes corresponding to types, ASCII codes corresponding to key and line feed symbols;
2. the type in the field corresponds to ASCII codes, and an escape character is added in front of the ASCII codes corresponding to key to realize the transcoding of the character quotation marks;
3. the corresponding ASCII code one-to-one correspondence is used for each specific field required for synchronization.
Specifically, in the embodiment of the present application, the first one of the escape requirements includes an ASCII code corresponding to a type, an ASCII code corresponding to a key, or a field of a line feed character, and an escape character must be added in front of the first one, where the user of the escape character can self-define the character, and use the user self-define character to perform escape of a special symbol; the type inside the second field in the escape claim corresponds to an ASCII code, an escape character is added in front of the ASCII code corresponding to the key, and the escape character user can also self-define the character to realize the transcoding of the character quotation mark, and for the specific fields required by synchronization, the method comprises the following steps: the third metadata schema, operation type op, data source, and log collection time ts_ms in the escape request are respectively in one-to-one correspondence with corresponding ASCII codes.
Further, in the above data acquisition format protocol conversion method, for specific fields required for synchronization, corresponding ASCII codes are used for one-to-one correspondence, respectively, including:
the data source corresponds to (ACII code/lf=0x0b);
the log acquisition time ts_ms corresponds to (ACII code/lf=0x0c);
the operation type op corresponds to (ACII code/lf=0x0d);
the metadata schema corresponds to (ACII code/lf=0x0f).
Further, in the above-mentioned DATA acquisition format protocol conversion method, the file in the protocol of-DATA-CSV format is a text file divided by a line-feed character;
the text file stores the form data in plain text form;
a text file is a sequence of characters;
the text file is composed of a plurality of records, and the records are separated by a certain line-feed symbol; each record consists of fields, and separators among the fields are characters or character strings; the plurality of records have identical field sequences;
the text file is recorded by using WORDPAD or a notepad;
each record in the text file is a line termination line feed (ACII code/lf=0x0a) or a carriage return line feed (ASCII code/crlf=0x0d0a);
wherein 0x0A [ linefeed ] represents the character '\n' in c#, 0x0D0A [ carriage return linefeed ] represents the character string "\r\n" in c#.
Specifically, in the embodiment of the present application, the file in-DATA-CSV format is essentially a text file divided by (ASCII code/lf=0x05) and ((ASCII code/lf=0x0a)) line-break value file format, -DATA-CSV (ASCII code/lf=0x05) divided by each line of the separator must be (ASCII code/lf=0x05), "Comma-broken value (data_csv, sometimes referred to as character-break value), which file stores table DATA in plain text form, the contents of the table DATA being numbers and texts; a text file is a sequence of characters that does not contain data that must be interpreted as binary digits; the separator between fields is a character or string, most commonly (ASCII code/lf=0x05).
Further, in the above method for converting a DATA acquisition format protocol, the field value in the protocol of the-DATA-CSV format contains multiple types, and each type is provided with a corresponding ASCII code;
-the bracketing of field values in the protocol in DATA-CSV format with type corresponding ASCII codes, the inclusion of an item in a row being empty, bracketing with type corresponding ASCII codes;
if the field in the protocol of the DATA-CSV format contains the ASCII code corresponding to the type, the ASCII code corresponding to the type is bracketed, and the ASCII code corresponding to the type is escaped;
the value of a field in the protocol in the DATA-CSV format, if containing the type, corresponds to an ASCII code, and the double write type corresponds to an ASCII code.
Specifically, in the embodiment of the present application, each type is provided with a corresponding ASCII code, for example, a String type corresponds to (ASCII code/lf=0x02), so that a user can set the ASCII code in a user-defined manner, and each type corresponds to an ASCII code, and a row contains an empty item and can be bracketed by the ASCII code corresponding to the type; when the field contains special characters, i.e., type-corresponding ASCII codes, the type-corresponding ASCII codes must be bracketed and the corresponding type-corresponding ASCII codes are escape, the type-corresponding ASCII codes may be double written when the value of the field contains special characters, i.e., type-corresponding ASCII codes, as if one type-corresponding ASCII code were used as an escape.
Fig. 3 is a schematic diagram of a protocol conversion device for data acquisition format according to an embodiment of the present application.
In a second aspect, an embodiment of the present application further provides a data acquisition format protocol conversion device, including: the block 300 is constructed so that,
the construction module 300 is configured to construct a data_data_csv format including serialization and deserialization of Kafka and a format of constructing a link to convert the protocol of the first acquisition format to a protocol of a-DATA-CSV format;
wherein the construction module 300 is configured to construct serialization and deserialization including Kafka, and includes:
a build serialization module 301 for building serialization and deserialization classes including Kafka data streams; the POJO serialization module 302 is configured to obtain a created custom data format, and convert the custom data format into a POJO class corresponding to a field name of the custom data format; the conversion module 303 is configured to convert the POJO class into a-DATA-CSV format;
the first acquisition format protocol includes at least: the protocols of the ca_json, maxwell_json and debezium_json.
In a third aspect, an embodiment of the present invention further provides an electronic device, including: a processor and a memory;
the processor is configured to execute a data acquisition format protocol conversion method according to any one of the above claims by calling a program or instructions stored in the memory.
In a fourth aspect, embodiments of the present invention further provide a computer-readable storage medium storing a program or instructions that cause a computer to perform a data acquisition format protocol conversion method as set forth in any one of the above.
Fig. 4 is a schematic block diagram of an electronic device provided by an embodiment of the present disclosure.
As shown in fig. 4, the electronic device includes: at least one processor 401, at least one memory 403, and at least one communication interface 403. The various components in the electronic device are coupled together by a bus system 404. A communication interface 403 for information transmission with an external device. It is appreciated that the bus system 404 serves to facilitate connected communications between these components. The bus system 404 includes a power bus, a control bus, and a status signal bus in addition to the data bus. The various buses are labeled as bus system 404 in fig. 4 for clarity of illustration.
It will be appreciated that the memory 402 in this embodiment can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.
In some implementations, the memory 402 stores the following elements, executable units or data structures, or a subset thereof, or a bracketing thereof: an operating system and application programs.
The operating system includes various system programs, such as a framework layer, a core library layer, a driving layer, and the like, and is used for realizing various basic services and processing hardware-based tasks. Applications, including various applications such as Media Player (Media Player), browser (Browser), etc., are used to implement various application services. The program for implementing any one of the data acquisition format protocol conversion methods provided in the embodiments of the present application may be included in the application program.
In the embodiment of the present application, the processor 401 is configured to execute the steps of each embodiment of the data acquisition format protocol conversion method provided in the embodiment of the present application by calling a program or an instruction stored in the memory 402, specifically, a program or an instruction stored in an application program.
Converting the protocol of the first acquisition format into a protocol of the-DATA-CSV format by constructing a DATA-CSV format including serialization and deserialization of Kafka and constructing a format of Flink;
wherein constructing the serialization and the deserialization comprising Kafka comprises:
constructing a serialization and deserialization class comprising a Kafka data stream by constructing a serialization module; the method comprises the steps of obtaining a created self-defined data format through creating a POJO serialization module, and converting the self-defined data format into a POJO class with a field name corresponding to the self-defined data format; converting the POJO class into a-DATA-CSV format through a conversion module;
the first acquisition format protocol includes at least: the protocols of the ca_json, maxwell_json and debezium_json.
Any one of the data acquisition format protocol conversion methods provided in the embodiments of the present application may be applied to the processor 401 or implemented by the processor 401. The processor 401 may be an integrated circuit chip with signal capability. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 401 or by instructions in the form of software. The processor 401 may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The steps of any one of the data acquisition format protocol conversion methods provided in the embodiments of the present application may be directly embodied in the execution of a hardware decoding processor, or may be executed by a combination of hardware and software units in the decoding processor. The software elements may be located in a random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 402, and the processor 401 reads the information in the memory 402, and in combination with its hardware, performs the steps of a data acquisition format protocol conversion method.
Those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the present application and form different embodiments.
Those skilled in the art will appreciate that the descriptions of the various embodiments are each focused on, and that portions of one embodiment that are not described in detail may be referred to as related descriptions of other embodiments.
The present invention is not limited to the above embodiments, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the present invention, and these modifications and substitutions are intended to be included in the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (10)

1. A data acquisition format protocol conversion method, comprising:
converting the protocol of the first acquisition format into a protocol of the-DATA-CSV format by constructing a DATA-CSV format including serialization and deserialization of Kafka and constructing a format of Flink;
wherein the construction comprises the serialization and the deserialization of Kafka, comprising:
constructing a serialization and deserialization class comprising a Kafka data stream by constructing a serialization module; the method comprises the steps of obtaining a created self-defined data format through creating a POJO serialization module, and converting the self-defined data format into a POJO class with a field name corresponding to the self-defined data format; converting the POJO class into a-DATA-CSV format through a conversion module;
the first acquisition format protocol at least comprises: the protocols of the ca_json, maxwell_json and debezium_json.
2. The method for converting a DATA acquisition format protocol according to claim 1, wherein said constructing a DATA-CSV format of a link comprises:
extracting data in a protocol of a first acquisition format from formats and connectors of a table-api of the link;
converting the DATA in the first acquisition format protocol through the Flink SQL to obtain DATA-DATA-CSV DATA;
SQL processing is carried out on the-DATA-CSV DATA;
wherein the-DATA-CSV format in the-DATA-CSV format protocol is a separate DATA format.
3. The method according to claim 1, wherein the rules of the protocol of the-DATA-CSV format at least include the following seven types:
and (3) a step of: the head is not left blank and is in row units;
and II: the column names can be contained or not, and the column names are included;
thirdly,: one row of data does not span rows and has no empty rows;
fourth, the method comprises the following steps: the presence of a column with a non-visible character as a separator is also expressed;
fifth step: if ASCII codes exist in the column content, the column content is replaced by escape characters, and the field value is contained by a half-angle quotation mark;
sixth,: when the file is read and written, the ASCII code operation rules are reciprocal;
seventh,: the inner code format is not limited.
4. The method according to claim 1, wherein the escape requirements in the protocol of the-DATA-CSV format include at least three kinds of:
1. the method comprises the steps that a field containing ASCII codes corresponding to types, ASCII codes corresponding to key and line feed symbols is added with an escape character in front of the fields of the ASCII codes corresponding to the types, the ASCII codes corresponding to the key and the line feed symbols;
2. the type in the field corresponds to ASCII codes, and an escape character is added in front of the ASCII codes corresponding to key to realize the transcoding of the character quotation marks;
3. the corresponding ASCII code one-to-one correspondence is used for each specific field required for synchronization.
5. The method for converting a data acquisition format protocol according to claim 4, wherein the specific fields required for synchronization respectively use corresponding ASCII codes in a one-to-one correspondence, comprising:
the data source corresponds to (ACII code/lf=0x0b);
the log acquisition time ts_ms corresponds to (ACII code/lf=0x0c);
the operation type op corresponds to (ACII code/lf=0x0d);
the metadata schema corresponds to (ACII code/lf=0x0f).
6. A DATA acquisition format protocol conversion method according to claim 1 wherein the file in the protocol in the-DATA-CSV format is a text file split with a line-feed;
the text file stores form data in a plain text form;
the text file is a sequence of characters;
the text file consists of a plurality of records, and the records are separated by a certain line-feed symbol; each record consists of fields, and separators among the fields are characters or character strings; the plurality of records have identical field sequences;
the text file uses WORDPAD or a notepad opening record;
each record in the text file is a line termination line feed (ACII code/lf=0x0a) or a carriage return line feed (ASCII code/crlf=0x0d0a);
wherein 0x0A represents the character '\n',0x0D0A represents the character string "\r\n" in c#.
7. The method according to claim 1, wherein the field value in the protocol of the-DATA-CSV format contains a plurality of types, each type being provided with a corresponding ASCII code;
the field value in the protocol of the-DATA-CSV format is bracketed with the ASCII code corresponding to the type, one item contained in the row is empty, and the field value is bracketed with the ASCII code corresponding to the type;
if the field in the protocol of the-DATA-CSV format contains ASCII codes corresponding to the types, the field is bracketed by the ASCII codes corresponding to the types, and the ASCII codes corresponding to the types are escaped;
and if the value of the field in the protocol in the-DATA-CSV format contains the ASCII code, the double writing type corresponds to the ASCII code.
8. A data acquisition format protocol conversion device, comprising: the modules are constructed so that the modules are connected,
the construction module is used for constructing a data_data_CSV format comprising serialization and deserialization of Kafka and constructing a format of Flink, and converting a protocol of the first acquisition format into a protocol of a-DATA-CSV format;
wherein the construction module is used for constructing serialization and deserialization comprising Kafka, and comprises:
a serialization module is constructed and used for constructing serialization and inverse serialization classes comprising Kafka data streams; the POJO serialization module is used for acquiring the created self-defined data format, and converting the self-defined data format into a POJO class with a field name corresponding to the self-defined data format; the conversion module is used for converting the POJO class into a-DATA-CSV format;
the first acquisition format protocol at least comprises: the protocols of the ca_json, maxwell_json and debezium_json.
9. An electronic device, comprising: a processor and a memory;
the processor is configured to execute a data acquisition format protocol conversion method according to any one of claims 1 to 7 by calling a program or instructions stored in the memory.
10. A computer-readable storage medium storing a program or instructions that cause a computer to perform a data acquisition format protocol conversion method according to any one of claims 1 to 7.
CN202310246430.0A 2023-03-10 2023-03-10 Data acquisition format protocol conversion method and device Pending CN116303717A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310246430.0A CN116303717A (en) 2023-03-10 2023-03-10 Data acquisition format protocol conversion method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310246430.0A CN116303717A (en) 2023-03-10 2023-03-10 Data acquisition format protocol conversion method and device

Publications (1)

Publication Number Publication Date
CN116303717A true CN116303717A (en) 2023-06-23

Family

ID=86816310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310246430.0A Pending CN116303717A (en) 2023-03-10 2023-03-10 Data acquisition format protocol conversion method and device

Country Status (1)

Country Link
CN (1) CN116303717A (en)

Similar Documents

Publication Publication Date Title
CN107592116B (en) Data compression method, device and storage medium
CN105450232A (en) Encoding method, decoding method, encoding device and decoding device
CN111090417B (en) Binary file analysis method, binary file analysis device, binary file analysis equipment and binary file analysis medium
US20130238865A1 (en) Decompression apparatus and decompression method
CN101553779B (en) Migration apparatus which convert application program of mainframe system into application program of open system and method for thereof
CN107919943A (en) Coding, coding/decoding method and the device of binary data
CN111131403A (en) Message coding and decoding method and device for Internet of things equipment
CN106649217A (en) Data matching method and device
CN111049729A (en) Persistent message transmission method and device
AU2006200055A1 (en) System and method for storing a document in a serial binary format
JP5551660B2 (en) Computer-implemented method for encoding text into matrix code symbols, computer-implemented method for decoding matrix code symbols, encoder for encoding text into matrix code symbols, and decoder for decoding matrix code symbols
JP2012124679A (en) Apparatus and method for decoding encoded data
CN117749899A (en) Protocol conversion framework, device communication method, device and computer storage medium
US10104207B1 (en) Automatic protocol discovery
CN100361128C (en) Multi-keyword matching method for text or network content analysis
CN102508690B (en) Storing method and decoding method for command line of embedded equipment
CN116303717A (en) Data acquisition format protocol conversion method and device
JP4821287B2 (en) Structured document encoding method, encoding apparatus, encoding program, decoding apparatus, and encoded structured document data structure
CN105793842B (en) Conversion method and device between serialized message
CN101553800B (en) Migration apparatus which convert SAM/VSAM files of mainframe system into SAM/VSAM files of open system and method for thereof
CN114722781A (en) Method and device for converting streaming document into OFD document
CN108763413B (en) Data searching and positioning method based on data storage format
Garg et al. Study on JSON, its Uses and Applications in Engineering Organizations
CN109992293B (en) Method and device for assembling Android system component version information
CN107818121B (en) HTML file compression method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination