CN113901090A

CN113901090A - Real-time situation data leading and processing method

Info

Publication number: CN113901090A
Application number: CN202111266120.2A
Authority: CN
Inventors: 高峰; 余曾文; 徐星旺
Original assignee: Beijing Institute of Computer Technology and Applications
Current assignee: Beijing Institute of Computer Technology and Applications
Priority date: 2021-10-28
Filing date: 2021-10-28
Publication date: 2022-01-07

Abstract

The invention relates to a method for real-time situation data leading and processing, and belongs to the field of radar data processing. The method of the invention comprises the following steps: leading and connecting real-time data; preprocessing data, including non-null judgment and Chinese-English conversion; the data engine side number is unified, missing attribute supplement is carried out, dynamic stream data needs to be matched with the static dictionary data in an associated mode, and the matched static dictionary information is filled back into the real-time potential flow; data storage and pushing, namely performing persistent storage on one part of matched real-time data, and pushing the other part of the matched real-time data into a message queue after simple processing for calling by an application system; and dynamic maintenance and refreshing of the static dictionary. The real-time data access and processing method provided by the invention is convenient to access, and can flexibly expand the data processing function by dynamically increasing the stream task sql, so that a quick and accurate processing result can be obtained. The method has important application value in scenes that data cleaning, real-time dictionary matching and stream data storage and pushing of real-time data are needed.

Description

Real-time situation data leading and processing method

Technical Field

The invention belongs to the field of radar data processing, and particularly relates to a real-time situation data leading and processing method.

Background

With the advent of the big data era, data formats generated in daily life are various, and efficient storage and rapid and accurate processing of data become more important. Wherein the processing of real-time data is an important component in big data processing. In the face of massive streaming data, stability and reliability become the problems that a streaming data processing system needs to be straight.

The real-time situation data is acquired by different radar signal sources, and information is supplemented by program automatic judgment and human intervention, so that the difference of names of targets, airport buoys and the like of the same target is possible. In the prior art, real-time streams are generally led in a programming mode, a simple dictionary is used for one-to-one matching, and many-to-one dictionary table matching cannot be carried out.

Disclosure of Invention

Technical problem to be solved

The invention aims to solve the technical problem of how to provide a method for leading and processing real-time situation data, and aims to solve the problems that the prior art generally uses a programming mode to lead real-time streams, uses a simple dictionary to carry out one-to-one matching, and cannot carry out many-to-one dictionary table matching.

(II) technical scheme

In order to solve the above technical problem, the present invention provides a method for real-time situation data accessing and processing, which comprises the following steps:

s1, a real-time flow connection step, namely establishing an input flow, connecting a data source and configuring connection parameters of a message queue;

s2, a data preprocessing step, namely creating a derivative stream and processing the input stream created in the S1, wherein the input stream comprises data filtering and Chinese-English conversion;

s3, creating a dynamic flow task, and unifying key attributes of the derived flows according to the mapping dictionary table: comparing and judging the target name in the streaming data with the target name or the keyword in the mapping dictionary table, and if the target name in the streaming data is successfully matched with the target name or the keyword in the mapping dictionary table, replacing the engine side number of the streaming data with the engine side number in the dictionary table, and unifying the engine side numbers;

and S4, data persistence and pushing, wherein the streaming data created in S3 are stored in the hbase table, and the processed streaming data are issued and used through a message queue.

Further, when selecting the message queue in step S1, the supported message queue includes kafka and rabbitmq.

Further, the step S1 specifically includes: configuring a zookeeper and kafka node by using hive sql, wherein kafka is a received message queue, zookeeper is a distributed scheduling platform on which the message queue depends, and some kafka metadata information is stored; and configuring the serialized coding information and the kafka topic of the received character stream, and cutting the received character stream into paragraph characters according to line breaks.

Further, the step S2 specifically includes: creating a derivative stream by using hive sql, analyzing Chinese fields in an input stream in S1 by using a json analysis function json _ tuple () supported by hive, converting the as keywords into English codes, and completing Chinese and English code conversion; the source of the stream data is the input stream created at S1, the filter condition of the stream data is that the json string is not null, and the input stream destination name is not null.

Further, the step S3 of creating the dynamic flow task specifically includes: and creating a dynamic flow task, wherein the from source is the derivative flow name in the last step.

Further, the step S3 compares and determines the target name in the stream data with the target name or the keyword in the mapping dictionary table, and if the target name and the keyword are successfully matched, replaces the engine side number of the stream data with the engine side number in the dictionary table, and unifies the engine side numbers specifically including: associating the dynamic stream with a static mapping dictionary table in a left join association mode, and judging whether dictionary values are matched or not by using a mode that whether a target name in stream data is equal to a target name in the mapping dictionary table or not; if the matching is not successful, whether the target name in the stream data is contained in the keywords of the mapping dictionary table or not is used, and the keywords maintain all alias fields of the same target; and if the matching is successful, replacing the engine side number of the streaming data by using the engine side number in the dictionary table, and unifying the engine side numbers.

Further, the step S3 further includes: if the matching is not successful, the dictionary table data does not have a corresponding dictionary, or the target does not have an alias which can be matched, the sql is labeled, and the dictionary is manually selected for matching according to the historical data in the later period.

Further, the step S4 specifically includes: firstly, creating a stream task by using hive sql, following from with a stream name created by S3, mapping all fields in the stream to an hbase table established in advance, and performing persistent storage; and continuing to create a stream task, wherein from is followed by the stream name created by S3, data is introduced from the stream in the previous step, original English fields in the stream are restored into Chinese fields again, the Chinese fields are spliced into json character strings, and the json character strings are issued by using the stream task.

Further, the step S4 is followed by the step S5: and (3) in the dictionary online maintenance step, if target data are added into the dictionary, selecting a related dictionary table in the basic dictionary library, and mapping the same attribute fields in different dictionary tables into a mapping dictionary table:

further, the step S5 specifically includes: clicking a newly added mapping dictionary target in a situation introduction system, selecting a plurality of isomorphic tables from a basic dictionary table by the system for tree target display, wherein the tables have tree structures and similar field contents, uniformly mapping fields of the similar isomorphic tables into the mapping dictionary table, and expanding keywords in the dictionary table at the same time; for the keywords in the extended table, when a dictionary target is added newly, the keyword information in the original dictionary table is copied to the new table, manual modification of the keywords of the dictionary table is supported, and the target alias needing to be added is added into the mapping dictionary keyword.

(III) advantageous effects

The invention provides a method for drawing and processing real-time situation data, which uses the hive sql which is relatively stable at the present stage as a real-time drawing processing flow, has high flexibility and high adaptability, uses a plurality of dictionary tables to map to a unified dictionary table, flexibly maintains the keywords of the dictionary table, ensures that the real-time flow can be matched in time after the dictionary table is modified, and does not need to perform complex service restarting actions. The real-time data access and processing method provided by the invention is convenient to access, and the data processing function can be flexibly expanded by dynamically increasing the flow task sql. And a fast and accurate processing result can be obtained. The method has important application value in scenes that data cleaning, real-time dictionary matching and stream data storage and pushing of real-time data are needed.

Drawings

FIG. 1 is a flow chart of a real-time tapping method according to the present invention.

Detailed Description

In order to make the objects, contents and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.

The invention discloses a method for leading and processing real-time data, which comprises the following steps: (1) and leading in real-time data. (2) And (4) preprocessing data. Including non-null determination and chinese-english conversion. (3) The data engine side numbers are uniform, and missing attributes are supplemented. The dynamic stream data needs to be matched with the association of the static dictionary data, and the matched static dictionary information is filled back into the real-time potential stream. (4) And storing and pushing data. And performing persistent storage on one part of the matched real-time data, and pushing the other part of the real-time data into a message queue after simple processing for calling an application system. (5) And dynamic maintenance and refreshing of the static dictionary. And uniformly collecting the data in the standard dictionary tables into a mapping dictionary, and then automatically refreshing the real-time stream to ensure that the association matching action of the second step can be matched with the data in the latest dictionary table. The real-time data access and processing method provided by the invention is convenient to access, and the data processing function can be flexibly expanded by dynamically increasing the flow task sql. And a fast and accurate processing result can be obtained. The method has important application value in scenes that data cleaning, real-time dictionary matching and stream data storage and pushing of real-time data are needed.

The real-time data processing of the invention is divided into real-time data leading, data management and data use, and the real-time leading provides general non-delay data access. The data governance provides a flexible governance mode, and the customized flow processing rules are rapidly configured. Data usage provides PB-class mass data storage services as well as low latency data push services.

The purpose of the invention is: the method for real-time leading-in situation data is provided, and the requirements of fast and accurate real-time data and high-quality data processing are met.

In order to achieve the above object, the present invention provides a method for accessing and processing real-time data, which comprises:

and (5) real-time flow leading-in step.

And (5) data preprocessing.

And the data machine side number is unified and supplemented with attributes.

And (4) data persistence and pushing.

And online maintenance of the dictionary.

FIG. 1 is a flow chart of a method of real-time tap-down of the present invention. As shown in fig. 1, the method includes:

and S1, real-time stream leading-in step. Mainstream message queues are supported, such as kafka, rabbitmq, and the like. Creating input stream, connecting data source, configuring connection parameter of message queue.

In particular implementation, using hive sql, message queue information is configured, such as zookeeper and kafka nodes, where kafka is a received message queue and zookeeper is a distributed scheduling platform on which the message queue depends, and some kafka metadata information is stored. And configuring the serialized coding information and the kafka topic of the received character stream, and cutting the received character stream into paragraph characters according to line breaks.

And S2, preprocessing data. Creating a derivative stream, and processing the input stream created from S1, including data filtering and Chinese-English conversion.

During specific implementation, the hive sql is used for creating a derivative stream, a json parsing function json _ tuple () supported by the hive is used for parsing Chinese fields in the input stream in the S1, as keywords are used for converting the Chinese fields into English codes, and Chinese and English code conversion is completed.

Further, the source of the stream data is the input stream created at S1, the filter condition of the stream data is that the json string is not null, and the input stream target name is not null.

S3, creating a dynamic flow task, and unifying key attributes of the derived flows according to the mapping dictionary table: and comparing and judging the target name in the stream data with the target name or the keyword in the mapping dictionary table, and if the target name in the stream data is successfully matched with the target name or the keyword in the mapping dictionary table, replacing the engine side number of the stream data with the engine side number in the dictionary table, and unifying the engine side numbers.

During specific implementation, in a primary key consistency judgment stage, a dynamic flow task is created, and a from source is a derived flow name in the last step.

The dynamic stream is associated with the static mapping dictionary table in a left join association mode, and whether dictionary values are matched or not is judged in a mode that whether the target name in the stream data is equal to the target name in the mapping dictionary table or not;

if the matching is not successful, whether the target name in the stream data is contained in the key word of the mapping dictionary table is reused. Where the mapping dictionary table is maintained in S5, the key maintains all alias fields for the same target.

And if the matching is successful, replacing the engine side number of the streaming data by using the engine side number in the dictionary table, and unifying the engine side numbers.

Otherwise, if the matching is not successful, the dictionary table data does not have a corresponding dictionary, or the target does not have an alias which can be matched, the sql is labeled, and the dictionary is manually selected for matching according to the historical data in the later period.

And S4, data persistence and pushing. And storing the streaming data created in the step S3 into the hbase table, and issuing the processed streaming data to be used through a message queue.

In specific implementation, a stream task is created by using hive sql, from is followed by a stream name created by S3, and all fields in the stream are mapped into an hbase table established in advance for persistent storage.

And continuing to create a stream task, wherein from is followed by the stream name created by S3, data is introduced from the stream in the previous step, original English fields in the stream are restored into Chinese fields again, the Chinese fields are spliced into json character strings, and the json character strings are issued by using the stream task.

And S5, online maintenance of the dictionary. If target data is added into the dictionary, related dictionary tables in the basic dictionary library are selected, and the same attribute fields in different dictionary tables are mapped into the mapping dictionary table.

In specific implementation, a newly added mapping dictionary target is clicked in the situation introduction system, and the system selects a plurality of isomorphic tables (such as a universal equipment table) from the basic dictionary table to display the tree-shaped target. The tables have tree structures, the field contents are similar, the fields of the similar heterogeneous tables are mapped into the mapping dictionary table in a unified mode, and meanwhile, the keywords in the dictionary table are expanded.

Furthermore, for the keywords in the extension table, when a dictionary target is added newly, the keyword information in the original dictionary table is copied to the new table, and manual modification of the keywords in the dictionary table is supported. And adding the target alias needing to be added into the mapping dictionary key.

The invention provides a method for leading-in real-time situation data and processing the data online, which comprises the following steps:

(1) a mainstream message queue is supported.

(2) Streaming data is flexibly processed using sql.

(3) And matching the dynamic flow data with the static dictionary in real time.

(4) And dynamically changing the flow matching state in a manner of maintaining the dictionary in real time.

Further, the supported mainstream message queues comprise kafka, rabbitmq, pocket mq and the like.

Furthermore, in the flexible processing of streaming data by using sql, the data in the streaming data can be flexibly processed without affecting the performance, so that the data can be conveniently checked and modified.

Further, in the real-time matching of the available dynamic stream data and the static dictionary, the values of the fields in the dynamic stream and the static dictionary table are flexibly matched by using a hive sql mode.

Further, in the manner of using the real-time maintained dictionary, in the state of dynamically changing stream matching, the static dictionary table can be modified by the outside, the real-time stream is updated, the stream is automatically refreshed, and the static dictionary matched with the stream takes effect.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A method for leading and processing real-time situation data is characterized by comprising the following steps:

2. The method for real-time situation data accessing and processing as claimed in claim 1, wherein the supported message queue comprises kafka and rabbitmq during message queue selection in step S1.

3. The method for real-time situation data introduction and processing as claimed in claim 1, wherein the step S1 specifically includes: configuring a zookeeper and kafka node by using hive sql, wherein kafka is a received message queue, zookeeper is a distributed scheduling platform on which the message queue depends, and some kafka metadata information is stored; and configuring the serialized coding information and the kafka topic of the received character stream, and cutting the received character stream into paragraph characters according to line breaks.

4. The method for real-time situation data introduction and processing according to any one of claims 1 to 3, wherein the step S2 specifically includes: creating a derivative stream by using hive sql, analyzing Chinese fields in an input stream in S1 by using a json analysis function json _ tuple () supported by hive, converting the as keywords into English codes, and completing Chinese and English code conversion; the source of the stream data is the input stream created at S1, the filter condition of the stream data is that the json string is not null, and the input stream destination name is not null.

5. The method for importing and processing real-time situation data according to claim 1, wherein the step S3 of creating the dynamic flow task specifically includes: and creating a dynamic flow task, wherein the from source is the derivative flow name in the last step.

6. The method for importing and processing real-time situation data according to claim 5, wherein the step S3 compares and determines the target name in the stream data with the target name or the keyword in the mapping dictionary table, and if the target name and the keyword are successfully matched, replaces the board number in the stream data with the board number in the dictionary table, and unifies the board numbers specifically includes: associating the dynamic stream with a static mapping dictionary table in a left join association mode, and judging whether dictionary values are matched or not by using a mode that whether a target name in stream data is equal to a target name in the mapping dictionary table or not; if the matching is not successful, whether the target name in the stream data is contained in the keywords of the mapping dictionary table or not is used, and the keywords maintain all alias fields of the same target; and if the matching is successful, replacing the engine side number of the streaming data by using the engine side number in the dictionary table, and unifying the engine side numbers.

7. The method for real-time situation data accessing and processing as claimed in claim 6, wherein the step S3 further comprises: if the matching is not successful, the dictionary table data does not have a corresponding dictionary, or the target does not have an alias which can be matched, the sql is labeled, and the dictionary is manually selected for matching according to the historical data in the later period.

8. The method for real-time situation data introduction and processing according to any one of claims 5 to 7, wherein the step S4 specifically includes: firstly, creating a stream task by using hive sql, following from with a stream name created by S3, mapping all fields in the stream to an hbase table established in advance, and performing persistent storage; and continuing to create a stream task, wherein from is followed by the stream name created by S3, data is introduced from the stream in the previous step, original English fields in the stream are restored into Chinese fields again, the Chinese fields are spliced into json character strings, and the json character strings are issued by using the stream task.

9. The method for real-time situation data accessing and processing as claimed in claim 8, wherein the step S4 is followed by the step S5: and in the dictionary online maintenance step, if target data is added into the dictionary, a related dictionary table in the basic dictionary library is selected, and the same attribute fields in different dictionary tables are mapped into a mapping dictionary table.

10. The method for importing and processing real-time situation data according to claim 9, wherein the step S5 specifically includes: clicking a newly added mapping dictionary target in a situation introduction system, selecting a plurality of isomorphic tables from a basic dictionary table by the system for tree target display, wherein the tables have tree structures and similar field contents, uniformly mapping fields of the similar isomorphic tables into the mapping dictionary table, and expanding keywords in the dictionary table at the same time; for the keywords in the extended table, when a dictionary target is added newly, the keyword information in the original dictionary table is copied to the new table, manual modification of the keywords of the dictionary table is supported, and the target alias needing to be added is added into the mapping dictionary keyword.