CN118051495B - Flight data storage and reading method and system - Google Patents

Flight data storage and reading method and system Download PDF

Info

Publication number
CN118051495B
CN118051495B CN202410451486.4A CN202410451486A CN118051495B CN 118051495 B CN118051495 B CN 118051495B CN 202410451486 A CN202410451486 A CN 202410451486A CN 118051495 B CN118051495 B CN 118051495B
Authority
CN
China
Prior art keywords
data
flight
parameter
storage
flight parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410451486.4A
Other languages
Chinese (zh)
Other versions
CN118051495A (en
Inventor
王祺
李嘉艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ruida Flight Technology Co ltd
Original Assignee
Shenzhen Ruida Flight Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Ruida Flight Technology Co ltd filed Critical Shenzhen Ruida Flight Technology Co ltd
Priority to CN202410451486.4A priority Critical patent/CN118051495B/en
Publication of CN118051495A publication Critical patent/CN118051495A/en
Application granted granted Critical
Publication of CN118051495B publication Critical patent/CN118051495B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for storing and reading flight data, wherein the method comprises the following steps: the data storage server acquires flight parameter data based on a distributed file system; classifying the flight parameter data into a parameter classification table and a data type table; performing storage format conversion on the flight parameter data, and performing determinant file storage on the flight parameter data after the storage format conversion; extracting a storage path of flight parameter data, establishing a mapping relation between the storage path and a flight ID, and then sending the mapping relation to a data access server; the data access server adopts a lightweight database to construct a storage information table, and uses the storage information table to store and update the mapping relation; and when the data access server receives a data reading instruction of the user, reading the target flight parameter data from the data storage server, and returning the target flight parameter data to the user. The invention has the advantages of light weight, high performance, wide applicability, low deployment requirement and the like.

Description

Flight data storage and reading method and system
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and a system for storing and reading flight data.
Background
In the field of flight, it is becoming increasingly popular to drill values through multiple dimensions on parameters, which requires airlines to store large amounts of historical flight data for historical computation, extraction, and for optimizing or improving safety, energy efficiency, and the like. However, the conditions of high redundancy, performance requirement and the like exist in the process of storing a large number of historical flights, and the traditional database is difficult to meet. And by using big data and other technologies, such as HBase based on Hadoop, great hardware cost and cluster operation and maintenance cost are required.
Since the flight parameters belong to time series data, most of the time is studied by taking flights as basic granularity, and the flight data amount of the flights is proportional to the flight time of the flights. The flight parameters of general civil aviation flight are divided into original recorded parameters and derived calculated parameters, the total number of parameters is basically about 1500-4000, and a flight of 2.5 hours is taken as an example, and assuming that all parameters are recorded once in 1 second, the data volume of one flight is also between 1350 and 3600 ten thousand. The actual situation is that many core parameters need to be recorded at a higher frequency, such as 4 times per second, 8 times per second, 16 times per second, or even 32 times per second, which results in that the data volume of these parameters will double, so that the storage is faced with a great difficulty. Secondly, the number of parameters of the same aircraft is not constant, but is dynamically adjusted according to the service requirement, such as newly added parameters, so that the number of parameters is slowly changed. This also requires adaptations for storage for this case.
In order to address the above-mentioned drawbacks, two basic approaches are now adopted in the industry. One is to use a KeyValue/NoSQL database. For example, the HBase of Hadoop directly converts the flight parameters into binary system for storage, and can also be used together with compression. The KeyValue database supports dynamic schema and is suitable for the situation that the storage parameter name is changed. The method has the defect that HBase needs to rely on a heavy Hadoop cluster to operate, which leads to the great improvement of the use threshold of airlines in hardware and cluster operation and maintenance, and further leads to the limitation of the application range. And secondly, a relational database is used for directly converting the parameters of the flight into binary system for storage, and the binary system can also be used together with compression, and the conditions of dynamically increasing and reducing the parameters are adapted by using a column-to-row mode of uuid+parameter names. The disadvantage of this approach is that the relational database is not good at storing such byte file data, and the performance is significantly degraded when flights are stored in large quantities. The common deployment schemes of the relational databases are master-slave, standby and the like, the throughput can not be fully improved by utilizing multiple servers, and if the distributed slice databases are used, the hardware threshold and the operation and maintenance threshold are seriously raised by using the hardware threshold of the distributed slice databases in the industry.
Disclosure of Invention
The embodiment of the invention provides a method and a system for storing and reading flight data, and aims to provide a storage and reading scheme which is light in weight, high in performance, wide in applicability and low in deployment requirement.
In a first aspect, an embodiment of the present invention provides a method for storing and reading flight data, including:
The data storage server acquires flight data based on a distributed file system, and performs decoding processing and derivative parameter processing on the flight data to obtain corresponding flight parameter data; wherein the flight parameter data includes a flight ID;
Carrying out parameter classification on the flight parameter data, and dividing the flight parameter data into a preset parameter classification table and a data type table according to the parameter classification result;
Based on the flight parameter data in the parameter classification table and the data type table, carrying out storage format conversion on the flight parameter data, and storing determinant files on the flight parameter data after the storage format conversion;
extracting a storage path of the flight parameter data, establishing a mapping relation between the storage path and a flight ID, and then sending the mapping relation to a plurality of data access servers;
Each data access server adopts a lightweight database in advance to construct a storage information table, and uses the storage information table to store and update the received mapping relation;
When any data access server receives a data reading instruction of a user, corresponding target flight parameter data is read from the data storage server according to the data reading instruction, and the target flight parameter data is returned to the user.
In a second aspect, an embodiment of the present invention provides a system for storing and reading flight data, including a data storage server and a plurality of data reading servers;
the data storage server is used for acquiring flight data based on a distributed file system, and performing decoding processing and derivative parameter processing on the flight data to obtain corresponding flight parameter data; wherein the flight parameter data includes a flight ID;
Carrying out parameter classification on the flight parameter data, and dividing the flight parameter data into a preset parameter classification table and a data type table according to the parameter classification result;
Based on the flight parameter data in the parameter classification table and the data type table, carrying out storage format conversion on the flight parameter data, and storing determinant files on the flight parameter data after the storage format conversion;
extracting a storage path of the flight parameter data, establishing a mapping relation between the storage path and a flight ID, and then sending the mapping relation to a plurality of data access servers;
Each data reading server is used for constructing a storage information table by adopting a lightweight database in advance and carrying out storage updating on the received mapping relation by utilizing the storage information table;
When any data access server receives a data reading instruction of a user, corresponding target flight parameter data is read from the data storage server according to the data reading instruction, and the target flight parameter data is returned to the user.
The embodiment of the invention provides a method and a system for storing and reading flight data. The data storage server stores parameters in a mode of a column file and a distributed file system, and uses a kernel to mount the distributed file system, so that only IO of the selected column data can be generated when the data of the parameter column is read, and IO throughput of a disk can be greatly improved, and storage performance is improved. The data access server realizes lightweight storage by adopting a lightweight database, and can improve access and reading efficiency, namely, only a flight file path is required to be obtained from a memory when data is read each time, then a parameter column of a flight to be read is directly read, only IO for reading the data is generated, and the extreme performance of the distributed file system is fully exerted. In addition, the storage and reading method provided by the embodiment of the invention has low use threshold, is convenient to maintain, and does not have the problems of high deployment difficulty and the like in the prior art.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for storing and reading flight data according to an embodiment of the present invention;
FIG. 2 is a network architecture diagram of a system for storing and reading flight data according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating storage conversion in a method for storing and reading flight data according to an embodiment of the present invention;
Fig. 4 is a schematic diagram of writing out a data storage server in a method for storing and reading flight data according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of query update in a method for storing and reading flight data according to an embodiment of the present invention;
fig. 6 is a schematic diagram illustrating a data access server in a method for storing and reading flight data according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Referring to fig. 1, fig. 1 is a flow chart of a method for storing and reading flight data according to an embodiment of the present invention, which specifically includes: steps S101-S106.
S101, a data storage server acquires flight data based on a distributed file system, and performs decoding processing and derivative parameter processing on the flight data to obtain corresponding flight parameter data; wherein the flight parameter data includes a flight ID;
S102, carrying out parameter classification on the flight parameter data, and dividing the flight parameter data into a preset parameter classification table and a data type table according to the result of the parameter classification;
S103, based on the flight parameter data in the parameter classification table and the data type table, carrying out storage format conversion on the flight parameter data, and storing determinant files on the flight parameter data after the storage format conversion;
s104, extracting a storage path of the flight parameter data, establishing a mapping relation between the storage path and the flight ID, and then sending the mapping relation to a plurality of data access servers;
S105, each data access server adopts a lightweight database in advance to construct a storage information table, and uses the storage information table to store and update the received mapping relation;
And S106, when any data access server receives a data reading instruction of a user, reading corresponding target flight parameter data from the data storage server according to the data reading instruction, and returning the target flight parameter data to the user.
In this embodiment, the storage of the flight parameter data is implemented by the data storage server, and the access and the call of the flight parameter data are implemented by the data access server. Specifically, the data storage server acquires flight data on the basis of a distributed file system, decodes and processes derived parameters to obtain flight parameter data, classifies the flight parameter data, stores the flight parameter data in a corresponding parameter classification table and a data type table, and then realizes columnar storage by converting a storage format of the flight parameter data. After the data storage server finishes storing, the mapping relation between the storage path and the flight ID in the flight parameter data is sent to the data access server, and the data access server stores and updates the mapping relation through the storage information table. When the user needs to read the flight parameter data, the corresponding target flight parameter data can be called up to the data storage server through the data access server.
The data storage server in the embodiment stores parameters by using a mode of a column file and a distributed file system, uses a kernel to mount the distributed file system, and can only generate IO of the selected column data when reading data of the parameter column, so that IO throughput of a disk can be greatly improved, and storage performance is improved. The data access server realizes lightweight storage by adopting a lightweight database, and can improve access and reading efficiency, namely, only a flight file path is required to be obtained from a memory when data is read each time, then a parameter column of a flight to be read is directly read, only IO for reading the data is generated, and the extreme performance of the distributed file system is fully exerted. In addition, the storage and reading method provided by the embodiment has low use threshold, is convenient to maintain, and does not have the problems of high deployment difficulty and the like in the prior art.
It will be appreciated that the flight data is derived from a flight recorder, which is decoded and derived from parameters by decoding software, such as AGS, airFase, etc., to obtain a full-parameter flight data file, i.e., the flight parameter file. In this embodiment, taking the docking AGS to output HDF5 as an example, all flight parameters, definitions of each parameter, and flight basic information can be obtained by AGS decoding software.
In an actual application scene, ceph can be selected as a storage file system, ceph is a file storage system which has high performance, is easy to expand and supports kernel level mounting, and ceph can be used in a single machine or a distributed cluster, so that storage requirements of different clients and different scenes are met. Further, after any server node is installed ceph, the kernel is used for mounting. By the method, only IO of read data can be generated when the Parquet column file is accessed, so that IO throughput can be effectively improved. Specifically, the method can be based on a/data catalog mounted to the linux system, and the follow-up operation is stored based on the catalog.
In an embodiment, the classifying the flight parameter data according to the parameter classification result is divided into a preset parameter classification table and a data type table, and the method includes:
Storing the flight parameter data into a parameter classification table according to ptype types, and storing the flight parameter data into a data type table according to dtype types;
In the parameter classification table, flight parameter data are divided and stored according to the status code type, the numerical value type and the character type;
in the data type table, flight parameter data are divided and stored according to preset data types; the preset data type is obtained according to the flight data characteristics.
In this embodiment, the parameter classification table is used for marking ptype, and according to the definition of the parameters, the parameters are classified into three types as described in table 1:
TABLE 1
The data type table is used for marking dtype, and since the pandas library and the numpy library are used for data operation in this embodiment, 18 data types as shown in table 2 are divided by referring to the numpy library in combination with the characteristics of the flight data itself:
TABLE 2
In an embodiment, the storing format conversion is performed on the flight parameter data based on the flight parameter data in the parameter classification table and the data type table, and the determinant file storage is performed on the flight parameter data after the storing format conversion, and further includes:
Reading the flight parameter data as a columnar initial data structure DATAFRAME by a pandas library;
acquiring a parameter sequence number of the flight parameter data in the parameter classification table and a type sequence number in the data type table by using an API (application program interface) of a pandas library;
converting each column of data in the initial data structure into a byte string according to the parameter sequence number and the type sequence number;
and constructing a target data structure, and storing the byte string as a parquet file in a column type by using the target data structure.
In this embodiment, taking the HDF5 output format of AGS as an example, the output format is a general standard format, and using the pandas library of python after program analysis, all DATAFRAME parameters of pandas, that is, the initial data structure, hereinafter abbreviated as df, are read. The number of general parameters is about 1000-4000, and 2 parameters are taken as an example here, and the structure is shown in DATAFRAME. Assuming that the parameters processed have a true airspeed (mnemonic is TAS) and a FLIGHT PHASE (mnemonic is flightphase), a two-dimensional table as shown in table 3 can be obtained:
TABLE 3 Table 3
Then, in this embodiment, a null DATAFRAME is created, that is, the target data structure, simply called new_df, is used for data. Specifically, firstly, according to the parameter name, the parameter type is obtained, and according to the parameter classification table, the serial number ptype is obtained, for example, the TAS vacuum speed is 2, and the numerical parameter is obtained. The API interface of pandas is called again, for example, df.TAS.dtype can take the data type, and according to the data type, the sequence number converted from the data type table to dtype can be obtained, for example, the TAS vacuum speed is 5, and the unsigned short integer is obtained. The column data is then converted into bytes, taking TAS vacuum speed as an example: df.tas.to_ numpy (). Tobytes (), which can obtain byte data of b 'nx\x82\x8c\x96\xa0\ xaa \xb4' after byte conversion. Then, splicing is carried out by combining ptype serial numbers and dtype serial numbers to obtain a continuous byte string, and finally, an API interface of pandas is used: new_df stores the byte string into new_df.
In a specific embodiment, the converting each column of data in the initial data structure into a byte string according to the parameter sequence number and the type sequence number includes:
acquiring a current version number of the flight parameter data, and converting the current version number into unsigned int types to obtain a first byte;
Converting the parameter sequence number into unsigned int types to obtain a second byte;
Converting the type serial number into unsigned int types to obtain a third byte;
converting the flight parameter data into a byte format based on the dtype type, and sequentially taking the flight parameter data as a fourth byte to a last byte;
and sequentially summarizing all the obtained bytes into the byte string.
In this embodiment, the current version number is converted into unsigned int types to obtain a first byte, then the parameter sequence number and the type sequence number are converted into unsigned int types to obtain a second byte and a third byte, and the flight parameter data is restored into a byte mode through dtype and used as a fourth byte and all subsequent bytes, which can be implemented by using an np.from buffer function of numpy. It should be noted that, when the flight parameter data is organized into byte strings, the embodiment can support multi-version definition and provide method applicability.
In an embodiment, the flight parameter data further includes flight information and parameter definitions;
the method comprises the steps of carrying out storage format conversion on the flight parameter data based on the flight parameter data in the parameter classification table and the data type table, carrying out determinant file storage on the flight parameter data after the storage format conversion, and further comprising the following steps:
Converting the flight information and the parameter definition column into a JSON String format, and storing the JSON String format into a target data structure;
And splicing the initial storage path by combining the flight information, and splicing the initial storage path with the root path of the target data structure to obtain the storage path of the flight parameter data.
In the embodiment, a flight information column and a parameter definition column are added to flight parameter data stored in a target data structure, specifically, flight information is converted into a JSON String format and is stored in new_df, and a @ package is used on a column name, so that ambiguity with the parameter column is avoided: new_df [ @ INFO @ ] = [ json. Dump (flight_info) ]; and converting the parameter definition into a JSON String format, storing the parameter definition into new_df, and using @ @ package on a column name to avoid ambiguity with a parameter column: new_df [ @ PARAMDEF @ ] = [ json. Dumps (params_def) ].
Further, an initial storage path is obtained by splicing according to the flight information, for example, < aircraft model >/< flight year/month >/< flight day >/< flight ID >. Parquet, where the initial storage path is: a320-232/202401/01/00688ff1aeff787f335f96fb02c37aa6.Parquet. Then, the initial storage path and the root path of the file system are spliced together to obtain a final storage path of the flight parameter data, such as: data/A320-232/202401/01/00688ff1aeff787f335f96fb02c37aa6.Parquet.
In an embodiment, the storing format conversion is performed on the flight parameter data based on the flight parameter data in the parameter classification table and the data type table, and the determinant file storage is performed on the flight parameter data after the storing format conversion, and further includes:
The data storage server judges whether the lock resource file of parquet files is successfully newly built;
if the locking resource file is judged to fail to be newly built, judging whether the locking resource file exists or not;
If the locked resource file is judged not to exist, returning to write out the flight failure;
if the existence of the locked resource file is judged, acquiring the updating time and the current time of the locked resource file, and then comparing whether the time difference between the current time and the updating time is within the preset operation time or not;
When the time difference between the current time and the update time is within the preset operation time, returning to write out the flight failure;
when the time difference between the current time and the update time is not within the preset operation time, locking the resource file, and continuously judging whether the locking resource file of the target parquet file is successfully established or not until judging that the establishment is successful.
Further, if it is determined that the locked resource file of the target parquet file is successfully created, a first current timestamp is obtained, the first current timestamp is set as a first time variable, and then whether the target parquet file already exists is determined;
if the target parquet file is judged to exist, reading the target parquet file into an initial data structure, combining the target data structure with the initial data structure to obtain a data structure to be written out, and writing the data structure to be written out into a preparation file;
if the target parquet file is judged not to exist, writing the target data structure into a preparation file;
after the writing of the preparation file is finished, a second current time stamp is obtained, and the second current time stamp is set as a second time variable;
Comparing whether the time variable difference between the second time variable and the first time variable is within a preset operation time;
When the time variable difference value is not within the preset operation time, the writing-out action is abandoned, and the corresponding locking resource file and the corresponding preparation file are deleted;
When the time variable difference value is within the preset operation time, the file name of the backup file is modified and written out, and then the corresponding locking resource file is deleted.
In this embodiment, when the data storage server writes out data, the writing-out efficiency can be improved through the API interface of new_df.to_ parquet. However, since there may be multiple processing packages and delta parameter packages parquet for one flight, this embodiment designs a merge-on-write strategy to avoid losing new data. Specifically, in connection with FIG. 4, it is first determined whether the < flight_id > parameter.lock file (i.e., the locked resource file of the target parquet file) was newly created successfully. If the new creation is unsuccessful, judging whether a < flight_id > par_lock file exists, and if the flight exists, returning to write out the flight failure; if so, acquiring the update TIME ts1 and the current TIME ts2 of the < flight_id > page file, then determining whether the ts2-ts1 is smaller than the preset operation TIME TIME_OUT, if so, writing OUT the flight fails, if not, considering that the flight belongs to the overtime preemption action, deleting the < flight_id > page file, and then continuously judging whether the < flight_id > page file is newly built successfully.
If the new creation is successful, the current timestamp is acquired and set as a variable ts1, and then it is determined whether the < flight_id > parquet file already exists. If so, reading the < flight_id > parquet file as old data structure, merging new data structure new_df onto old_df to update and obtain updated data structure, and writing the data structure into the < flight_id > parameter.ready <4 random character > file, wherein the random character is used for avoiding repetition, and obtaining the time stamp at the moment after the writing is completed and setting the time stamp as a variable ts2. If the determination does not exist, the write operation may be directly performed. Next, judging whether ts2-ts1 presets operation time TIMEOUT, if not, considering that the TIMEOUT gives up action, so that the writing out is given up, and then deleting the < flight_id > parquet.lock file and the < flight_id > parquet.ready <4 random character > files; if so, the < flight_id > parquet.ready <4 random characters > is modified to < flight_id > parquet and the < flight_id > parquet.lock file is deleted.
In an embodiment, each of the data access servers constructs a storage information table in advance by adopting a lightweight database, and uses the storage information table to update the received mapping relationship in a storage manner, including:
The data access server takes the flight ID as a key value, takes the storage path as a value, and stores the mapping relation into a storage information table through a HashMap;
When the repeated flight IDs exist in the storage information table, the previously stored mapping relation is subjected to coverage updating according to the storage time;
And polling the maximum primary key value in the lightweight database at preset time intervals, and updating the stored information table according to the polling result.
In this embodiment, the data access server only needs to store the mapping relationship between the flight id and the storage path through the database, so that a lightweight relational database or KV database can be arbitrarily selected, and then deployed according to the client scenario stand-alone or cluster. The following is exemplified using MySQL:
CREATE TABLE flight_info (
‘pk’BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
‘fight_id’ CHAR(32),
‘path’ VARCHAR(255),
other fields such as flight information
)。
Wherein pk is a primary key, which is added with 1 automatically every time a record is newly added; flight_id is the unique id of a flight, and since data may be repeatedly calculated, there may be multiple instances of the same flight_id; path stores a storage path to a file storage system. Of course, the flight has other information, and the information can be stored together, so that the data information can be conveniently screened in certain scenes. It should be noted that, in this embodiment, the deployments of the plurality of data access servers are independent from each other, and the data access servers share the mapping relationship between the flight id and the path through the flight information table, and each data access server includes an initialization start stage, an operation stage and a mapping relationship receiving stage.
Specifically, in the initialization startup phase, since pk is an auto-increment primary key, when reading all rows of flight_info, the returned pk is ordered from small to large by default. Using flight_id as key and path as value, a HashMap is used to save the page into memory, called fp_mapping. Meanwhile, the flight_id in the stored information table may be repeated, so that the overlay principle is followed, that is, the old mapping relationship is overlaid with the new mapping relationship. Further, for the stored maximum pk, it is referred to as fp_max_pk.
In the run phase, the query is performed every N seconds (e.g., 3 seconds) interval, i.e., by querying the database for the largest pk value to determine if an update is needed. Here, the query is a very small overhead action, such as SELECT max (pk) FROM flight_info of MySQL. As shown in fig. 5, according to the result of the query, there are the following cases:
The database pk is less than or equal to fp_max_pk of the memory, and skipping;
database pk > fp_max_pk of memory, the data of the difference part is queried and updated to fp_mapping, and fp_max_pk is updated.
For example, database pk returns 105, and fp_max_pk of memory is 100, then only data of pk >100 need be queried, specifically: SELECT FLIGHT _id, pat, other field FROM flight_info WHERE pk >100.
In addition, the plurality of data access servers are independent from each other in processing the request, and connection information of the plurality of data access servers may be configured to:
qardata01.reda-flight.com:4501,qardata02.reda-flight.com:4501,qardata03.reda-flight.com:4501。
when the data storage server informs the mapping relation between the flight id and the path to the data access server, connection information of the data access server is cut into an array by commas, so that the array of connection points of the data access server can be obtained, then the sequence of the array of connection points is disordered, one connection point is popped up each time, and the mapping relation between the flight id and the path and some other flight information are sent to the connection points by using the http POST method. If the transmission is successful, the method exits and is regarded as the notification of success. Otherwise, the sending process is continuously circulated until the connected node array is empty, and the program throws out the exception and is regarded as notification failure. When the data access server receives the mapping relation between the flight id and the path from the data storage server, the record is added to the database DB in an inserting manner, as shown in the following table 4:
TABLE 4 Table 4
Then fp_mapping and fp_max_pk are updated, and after the update is completed, a success flag is returned, and if any link fails, an exception is returned.
In an embodiment, when any of the data access servers receives a data reading instruction of a user, reading corresponding target flight parameter data from the data storage server according to the data reading instruction, and returning the target flight parameter data to the user, including:
the data access server provides an access interface in an http mode or gRPC mode based on scene performance requirements;
and acquiring the target data structure by using the access interface, and performing byte string inversion operation on the target data structure to obtain target flight parameter data corresponding to the target data structure.
In this embodiment, firstly, in order to achieve a better reading effect, a manner of combining rust language and asynchronization is used as a base of a reading service, and rust language is a new generation high performance processing language, and because there is no overhead of a memory in a running process, unnecessary overhead of a high concurrency scene can be reduced. Secondly, in terms of interfaces, the present embodiment provides a differentiated usage mode, and combines usability and performance scenarios, for example, for some non-high performance scenarios, the present embodiment provides an http mode, such as a receiving interface of a mapping relationship between a flight id and a storage path, and for some high performance reading scenarios, the present embodiment provides a gRPC mode, such as a flight parquet file reading, a specific parameter reading of a specified flight, and so on. Specifically, gRPC is defined as follows:
/(define a GRPC SERVICE)
service ParquetAccessService {
A rpc interface is defined for reading the columns of Parquet
rpc ReadColumns(ReadColumnsRequest) returns (ReadColumnsResponse);
}
A rpc interface is defined for reading the request structure of the column of Parquet
message ReadColumnsRequest {
Type of string of/(vogue)
string airline = 1;
Unique id of flight/flight, type of character string
string flight_id = 2;
Array of strings for which columns are to be read
repeated string columns = 3;
}
Response structure defining rpc interface for reading columns of Parquet
message ReadColumnsResponse {
The value is the data in byte format
map<string, bytes>columns_map = 1;
The column that cannot be found in parquet is a string array
repeated string missing_columns = 2;
}。
In a practical application scenario, the data of the obtained parameters may be read through gRPC interface, taking python call interface rpc ReadColumns (ReadColumnsRequest) returns (ReadColumnsResponse) as an example, where the type of the fact [ str, bytes ] corresponding to python is returned. In connection with fig. 6, the restore parameters are processed by the reverse operation, where only version 1 storage specification is processed. The following specific procedures are as follows:
first, a pair of keys and value extracted from the map one by one, wherein the keys are parameter names and the value is Bytes composed of version+ ptype +dtype+parameter data;
Secondly, converting the first byte data of the value into unsigned int to judge whether the first byte data is the version 1; if yes, continuing the subsequent steps;
then, converting the second byte data of the value into unsigned int, and knowing the parameter type and ptype of the current parameter according to the parameter classification table;
converting the third byte data of the value into unsigned int, knowing the data type and the corresponding serial number of the current parameter according to the data type table, finding the character code in numpy, and storing the character code as dtype, for example, 5- > H;
thirdly, reading the fourth byte of the value to EOF (end of File) and saving the value as aw_param_data;
Finally, the data is restored using the API of numpy: from buffer (raw_param_data, dtype)
Thus, corresponding target flight parameter data can be obtained through reading.
As shown in fig. 2, the embodiment of the present invention further provides a flight data storage and reading system, which includes a data storage server 201 and a plurality of data reading servers 202;
The data storage server 201 is configured to obtain flight data based on a distributed file system, and perform decoding processing and derivative parameter processing on the flight data to obtain corresponding flight parameter data; wherein the flight parameter data includes a flight ID;
Carrying out parameter classification on the flight parameter data, and dividing the flight parameter data into a preset parameter classification table and a data type table according to the parameter classification result;
Based on the flight parameter data in the parameter classification table and the data type table, carrying out storage format conversion on the flight parameter data, and storing determinant files on the flight parameter data after the storage format conversion;
Extracting a storage path of the flight parameter data, establishing a mapping relation between the storage path and a flight ID, and then transmitting the mapping relation to a plurality of data access servers 202;
Each data reading server 202 is configured to construct a storage information table in advance by adopting a lightweight database, and use the storage information table to update the received mapping relationship in a storage manner;
when any of the data access servers 202 receives a data reading instruction of a user, corresponding target flight parameter data is read from the data storage server 201 according to the data reading instruction, and the target flight parameter data is returned to the user.
Since the embodiments of the system portion and the embodiments of the method portion correspond to each other, the embodiments of the system portion refer to the description of the embodiments of the method portion, which is not repeated herein.
In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the application can be made without departing from the principles of the application and these modifications and adaptations are intended to be within the scope of the application as defined in the following claims.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims (7)

1. A method for storing and reading flight data, comprising:
The data storage server acquires flight data based on a distributed file system, and performs decoding processing and derivative parameter processing on the flight data to obtain corresponding flight parameter data; wherein the flight parameter data includes a flight ID;
Carrying out parameter classification on the flight parameter data, and dividing the flight parameter data into a preset parameter classification table and a data type table according to the parameter classification result;
Based on the flight parameter data in the parameter classification table and the data type table, carrying out storage format conversion on the flight parameter data, and storing determinant files on the flight parameter data after the storage format conversion;
extracting a storage path of the flight parameter data, establishing a mapping relation between the storage path and a flight ID, and then sending the mapping relation to a plurality of data access servers;
Each data access server adopts a lightweight database in advance to construct a storage information table, and uses the storage information table to store and update the received mapping relation;
when any data access server receives a data reading instruction of a user, reading corresponding target flight parameter data from the data storage server according to the data reading instruction, and returning the target flight parameter data to the user;
the step of classifying the flight parameter data into parameters according to the result of parameter classification, and dividing the parameters into a preset parameter classification table and a data type table, wherein the step of classifying comprises the following steps:
Storing the flight parameter data into a parameter classification table according to ptype types, and storing the flight parameter data into a data type table according to dtype types;
In the parameter classification table, flight parameter data are divided and stored according to the status code type, the numerical value type and the character type;
in the data type table, flight parameter data are divided and stored according to preset data types; the preset data types are obtained according to flight data characteristics;
the storing format conversion is performed on the flight parameter data based on the flight parameter data in the parameter classification table and the data type table, and the determinant file storage is performed on the flight parameter data after the storage format conversion, which comprises the following steps:
Reading the flight parameter data as a columnar initial data structure DATAFRAME by a pandas library;
acquiring a parameter sequence number of the flight parameter data in the parameter classification table and a type sequence number in the data type table by using an API (application program interface) of a pandas library;
converting each column of data in the initial data structure into a byte string according to the parameter sequence number and the type sequence number;
constructing a target data structure, and storing the byte string as a parquet file in a column type by utilizing the target data structure;
the converting each column of data in the initial data structure into a byte string according to the parameter sequence number and the type sequence number comprises the following steps:
acquiring a current version number of the flight parameter data, and converting the current version number into unsigned int types to obtain a first byte;
Converting the parameter sequence number into unsigned int types to obtain a second byte;
Converting the type serial number into unsigned int types to obtain a third byte;
converting the flight parameter data into a byte format based on the dtype type, and sequentially taking the flight parameter data as a fourth byte to a last byte;
and sequentially summarizing all the obtained bytes into the byte string.
2. The method of claim 1, wherein the flight parameter data further comprises flight information and parameter definitions;
the method comprises the steps of carrying out storage format conversion on the flight parameter data based on the flight parameter data in the parameter classification table and the data type table, carrying out determinant file storage on the flight parameter data after the storage format conversion, and further comprising the following steps:
Converting the flight information and the parameter definition column into a JSON String format, and storing the JSON String format into a target data structure;
And splicing the initial storage path by combining the flight information, and splicing the initial storage path with the root path of the target data structure to obtain the storage path of the flight parameter data.
3. The method for storing and reading flight data according to claim 1, wherein the storing and format converting the flight parameter data based on the flight parameter data in the parameter classification table and the data type table, and storing the flight parameter data after the storing format conversion in a determinant file, further comprises:
The data storage server judges whether the lock resource file of parquet files is successfully newly built;
if the locking resource file is judged to fail to be newly built, judging whether the locking resource file exists or not;
If the locked resource file is judged not to exist, returning to write out the flight failure;
if the existence of the locked resource file is judged, acquiring the updating time and the current time of the locked resource file, and then comparing whether the time difference between the current time and the updating time is within the preset operation time or not;
When the time difference between the current time and the update time is within the preset operation time, returning to write out the flight failure;
When the time difference between the current time and the update time is not within the preset operation time, locking the resource file, and continuously judging whether the locking of the resource file of parquet file is successful or not until the success of the new construction is judged.
4. The method for storing and reading flight data according to claim 3, wherein after the step of determining parquet whether the locked resource file of the file is newly created successfully, the data storage server further comprises:
If the locking resource file of the target parquet file is successfully established, acquiring a first current time stamp, setting the first current time stamp as a first time variable, and then judging whether the target parquet file exists or not;
if the target parquet file is judged to exist, reading the target parquet file into an initial data structure, combining the target data structure with the initial data structure to obtain a data structure to be written out, and writing the data structure to be written out into a preparation file;
if the target parquet file is judged not to exist, writing the target data structure into a preparation file;
after the writing of the preparation file is finished, a second current time stamp is obtained, and the second current time stamp is set as a second time variable;
Comparing whether the time variable difference between the second time variable and the first time variable is within a preset operation time;
When the time variable difference value is not within the preset operation time, the writing-out action is abandoned, and the corresponding locking resource file and the corresponding preparation file are deleted;
When the time variable difference value is within the preset operation time, the file name of the backup file is modified and written out, and then the corresponding locking resource file is deleted.
5. The method for storing and reading flight data according to claim 1, wherein each of the data access servers constructs a stored information table in advance using a lightweight database, and performs storage update on the received mapping relationship using the stored information table, comprising:
The data access server takes the flight ID as a key value, takes the storage path as a value, and stores the mapping relation into a storage information table through a HashMap;
When the repeated flight IDs exist in the storage information table, the previously stored mapping relation is subjected to coverage updating according to the storage time;
And polling the maximum primary key value in the lightweight database at preset time intervals, and updating the stored information table according to the polling result.
6. The method for storing and reading flight data according to claim 1, wherein when any of the data access servers receives a data reading instruction from a user, the method reads corresponding target flight parameter data from the data storage server according to the data reading instruction, and returns the target flight parameter data to the user, comprising:
the data access server provides an access interface in an http mode or gRPC mode based on scene performance requirements;
and acquiring the target data structure by using the access interface, and performing byte string inversion operation on the target data structure to obtain target flight parameter data corresponding to the target data structure.
7. A storage and reading system of flight data, which is characterized by comprising a data storage server and a plurality of data reading servers;
the data storage server is used for acquiring flight data based on a distributed file system, and performing decoding processing and derivative parameter processing on the flight data to obtain corresponding flight parameter data; wherein the flight parameter data includes a flight ID;
Carrying out parameter classification on the flight parameter data, and dividing the flight parameter data into a preset parameter classification table and a data type table according to the parameter classification result;
Based on the flight parameter data in the parameter classification table and the data type table, carrying out storage format conversion on the flight parameter data, and storing determinant files on the flight parameter data after the storage format conversion;
extracting a storage path of the flight parameter data, establishing a mapping relation between the storage path and a flight ID, and then sending the mapping relation to a plurality of data access servers;
Each data reading server is used for constructing a storage information table by adopting a lightweight database in advance and carrying out storage updating on the received mapping relation by utilizing the storage information table;
when any data access server receives a data reading instruction of a user, reading corresponding target flight parameter data from the data storage server according to the data reading instruction, and returning the target flight parameter data to the user;
the step of classifying the flight parameter data into parameters according to the result of parameter classification, and dividing the parameters into a preset parameter classification table and a data type table, wherein the step of classifying comprises the following steps:
Storing the flight parameter data into a parameter classification table according to ptype types, and storing the flight parameter data into a data type table according to dtype types;
In the parameter classification table, flight parameter data are divided and stored according to the status code type, the numerical value type and the character type;
in the data type table, flight parameter data are divided and stored according to preset data types; the preset data types are obtained according to flight data characteristics;
the storing format conversion is performed on the flight parameter data based on the flight parameter data in the parameter classification table and the data type table, and the determinant file storage is performed on the flight parameter data after the storage format conversion, which comprises the following steps:
Reading the flight parameter data as a columnar initial data structure DATAFRAME by a pandas library;
acquiring a parameter sequence number of the flight parameter data in the parameter classification table and a type sequence number in the data type table by using an API (application program interface) of a pandas library;
converting each column of data in the initial data structure into a byte string according to the parameter sequence number and the type sequence number;
constructing a target data structure, and storing the byte string as a parquet file in a column type by utilizing the target data structure;
the converting each column of data in the initial data structure into a byte string according to the parameter sequence number and the type sequence number comprises the following steps:
acquiring a current version number of the flight parameter data, and converting the current version number into unsigned int types to obtain a first byte;
Converting the parameter sequence number into unsigned int types to obtain a second byte;
Converting the type serial number into unsigned int types to obtain a third byte;
converting the flight parameter data into a byte format based on the dtype type, and sequentially taking the flight parameter data as a fourth byte to a last byte;
and sequentially summarizing all the obtained bytes into the byte string.
CN202410451486.4A 2024-04-16 2024-04-16 Flight data storage and reading method and system Active CN118051495B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410451486.4A CN118051495B (en) 2024-04-16 2024-04-16 Flight data storage and reading method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410451486.4A CN118051495B (en) 2024-04-16 2024-04-16 Flight data storage and reading method and system

Publications (2)

Publication Number Publication Date
CN118051495A CN118051495A (en) 2024-05-17
CN118051495B true CN118051495B (en) 2024-07-09

Family

ID=91050318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410451486.4A Active CN118051495B (en) 2024-04-16 2024-04-16 Flight data storage and reading method and system

Country Status (1)

Country Link
CN (1) CN118051495B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11768834B1 (en) * 2023-05-03 2023-09-26 Newday Database Technology, Inc. Storing and querying general data type documents in SQL relational databases

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FI20002720A (en) * 2000-12-12 2002-06-13 Nokia Corp Procedure for performing conversions
JP7157716B2 (en) * 2019-08-09 2022-10-20 株式会社日立製作所 Database server device, server system and request processing method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11768834B1 (en) * 2023-05-03 2023-09-26 Newday Database Technology, Inc. Storing and querying general data type documents in SQL relational databases

Also Published As

Publication number Publication date
CN118051495A (en) 2024-05-17

Similar Documents

Publication Publication Date Title
CN101176090B (en) Apparatus and method for identifying asynchronous data in redundant data stores and for re-synchronizing same
US9047392B2 (en) System and method for conversion of JMS message data into database transactions for application to multiple heterogeneous databases
US7464247B2 (en) System and method for updating data in a distributed column chunk data store
US7650394B2 (en) Synchronizing email recipient lists using block partition information
US20200218713A1 (en) Data structure and format for efficient storage or transmission of objects
US20060235878A1 (en) Client side indexing of offline address book files
CN106844102B (en) Data recovery method and device
US9971779B2 (en) Automated data intake system
US20120278429A1 (en) Cluster system, synchronization controlling method, server, and synchronization controlling program
CN108595511B (en) Diversified meteorological hydrological data classification storage processing method and system
CN112100148B (en) Increment processing method for packed log
CN111324665A (en) Log playback method and device
CN118051495B (en) Flight data storage and reading method and system
CN113076298A (en) Distributed small file storage system
US7599903B2 (en) Systems and methods for extracting data sets from an online relational database into a data warehouse
CN110874290B (en) Transaction analysis hybrid processing method of distributed memory database and database
CN115033578A (en) Method for updating service data, related device and storage medium
CN111651417A (en) Log processing method and device
CN113297201A (en) Index data synchronization method, system and device
CN111966650B (en) Operation and maintenance big data sharing data table processing method and device and storage medium
Doblander et al. Shared dictionary compression in publish/subscribe systems
CN111753518B (en) Autonomous file consistency checking method
US20130054571A1 (en) Virtual directory server changelog
US11334455B2 (en) Systems and methods for repairing a data store of a mirror node
CN118092982B (en) Multi-cluster operation and maintenance method, equipment and medium for cloud native application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant