CN114610976A

CN114610976A - Data query method, data storage method, data query device, data storage device, computing equipment and media

Info

Publication number: CN114610976A
Application number: CN202011419883.1A
Authority: CN
Inventors: 汪勇; 齐向东; 吴云坤
Original assignee: Qianxin Technology Group Co Ltd; Secworld Information Technology Beijing Co Ltd
Current assignee: Qianxin Technology Group Co Ltd; Secworld Information Technology Beijing Co Ltd
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2022-06-10

Abstract

The present disclosure provides a data query method, including: receiving a query request, wherein the query request at least comprises attribute information of target message data; determining target index data from the plurality of index data based on the query request, wherein the target index data comprises a target file path associated with attribute information of the target message data, and each index data in the plurality of index data comprises attribute information of historical message data and a file path of a file in which the historical message data is located; determining a target file from at least one file based on the target index data, wherein the file path of the target file is a target file path, and at least one file is used for storing historical message data; and acquiring target message data from the target file. The present disclosure also provides a data storage method, apparatus, computing device, computer-readable storage medium, and computer program product.

Description

Data query method, data storage method, data query device, data storage device, computing equipment and media

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data query method, a data storage method, a data query apparatus, a data storage apparatus, a computing device, and a computer-readable storage medium.

Background

In the face of a large amount of message data generated in a network, the related art generally stores the message data by using big data technologies such as Kafka or Spark, and provides support for subsequent queries. However, the related art message data storage method usually processes the message data and stores the processed message data into the database, so that the original data of the message cannot be obtained during subsequent query, and the evidence obtaining requirement of the message data is difficult to meet. In addition, in the related art, when the stored message data is queried, the computing resource overhead is high, and the querying speed is low.

Disclosure of Invention

In view of the above, the present disclosure provides an optimized data query method, data storage method, data query apparatus, data storage apparatus, computing device, and computer-readable storage medium.

One aspect of the present disclosure provides a data query method, including: the method comprises the steps of receiving a query request, wherein the query request at least comprises attribute information of target message data, determining the target index data from a plurality of index data based on the query request, wherein the target index data comprise target file paths associated with the attribute information of the target message data, each index data in the index data comprises the attribute information of historical message data and file paths of files where the historical message data are located, determining a target file from at least one file based on the target index data, wherein the file path of the target file is the target file path, the at least one file is used for storing the historical message data, and the target message data is obtained from the target file.

According to the embodiment of the present disclosure, the index data are stored in a plurality of first databases, and the database identifier and the index data identifier of each first database are stored in a second database in an associated manner, where the index data identifier represents the index data stored in the first database. Wherein the determining target index data from a plurality of index data based on the query request comprises: the method comprises the steps of determining an index data identifier indicated by a query request from the second database based on the query request, determining at least one database identifier associated with the indicated index data identifier from the second database based on the index data identifier indicated by the query request, determining at least one first database corresponding to the at least one database identifier based on the at least one database identifier, and determining index data with attribute information matched with attribute information of target message data from the index data stored in the at least one first database as the target index data.

According to the embodiment of the present disclosure, the plurality of index data are stored in the plurality of first databases according to the timestamp of the historical packet data, the index data identifier includes a timestamp range of the index data stored in the first database, the query request further includes a target timestamp range, and the timestamp of the target packet data is within the target timestamp range. Wherein determining, based on the query request, the index data identification indicated by the query request from the second database comprises: and determining the index data identification indicated by the query request from the second database based on the target time range in the query request, wherein the timestamp range of the index data identification indicated by the query request comprises the target time range.

According to the embodiment of the present disclosure, the target file includes a plurality of history packet data. Wherein the obtaining the target packet data from the target file includes: and determining at least one historical message data from the plurality of historical message data as the target message data based on the attribute information of the target message data, wherein the attribute information of the at least one historical message data is matched with the attribute information of the target message data.

According to the embodiment of the disclosure, the at least one file is a file in a distributed file system; the at least one file corresponds to at least one preset time range one by one, and for each file in the at least one file, the message generation time of each historical message data stored in the file is within the preset time range corresponding to the file.

According to the embodiment of the disclosure, for each file in the at least one file, the plurality of historical message data stored by the file are compressed into a plurality of subfiles; and for each subfile, sequentially compressing a plurality of historical message data in the subfile. Wherein, the plurality of historical message data are compressed in sequence, including: and compressing at least one piece of received historical message data to obtain a primary compression subfile, and compressing at least one piece of newly received historical message data into the primary compression subfile. Wherein the file path further includes a file name of the subfile.

According to an embodiment of the present disclosure, the attribute information includes at least one of: source IP address, destination IP address, source port, destination port, data transfer protocol.

According to an embodiment of the present disclosure, the first database includes a bitmap database, and the second database includes a bitmap database.

Another aspect of the present disclosure provides a data storage method, including: the method comprises the steps of obtaining historical message data to be stored, analyzing each historical message data in the historical message data to be stored to obtain attribute information of each historical message data, storing the historical message data to be stored to at least one file in a distributed file system, recording a file path of the file where each historical message data is located, determining the attribute information of the historical message data and the file path of the file where the historical message data is located as index information aiming at each historical message data, and storing the index information to a bitmap database in a correlation mode.

According to the embodiment of the disclosure, the at least one file corresponds to at least one preset time range one by one; the storing the historical packet data to be stored to at least one file in a distributed file system comprises, for each of the historical packet data: determining the message generation time of the historical message data, and storing the historical message data to one of the at least one file based on the message generation time and the at least one preset time range, wherein the message generation time is within the preset time range corresponding to the stored file.

According to the embodiment of the disclosure, for each file in the at least one file, the plurality of historical message data stored by the file are compressed into a plurality of subfiles; for each subfile: and compressing at least one piece of received historical message data to obtain a primary compression subfile, and compressing at least one piece of newly received historical message data into the primary compression subfile.

Another aspect of the present disclosure provides a data query apparatus including: the device comprises a receiving module, a first determining module, a second determining module and a first obtaining module. The receiving module is used for receiving a query request, wherein the query request at least comprises attribute information of target message data. The first determining module is configured to determine target index data from a plurality of index data based on the query request, where the target index data includes a target file path associated with attribute information of the target packet data, and each index data in the plurality of index data includes attribute information of history packet data and a file path of a file in which the history packet data is located. The second determining module is configured to determine a target file from at least one file based on the target index data, where a file path of the target file is the target file path, and the at least one file is used to store the historical packet data. The first obtaining module is used for obtaining the target message data from the target file.

Another aspect of the present disclosure provides a data storage device including: the device comprises a second acquisition module, an analysis module, a first storage module, a third determination module and a second storage module. The second obtaining module is used for obtaining historical message data to be stored. The analysis module is used for analyzing each historical message data in the historical message data to be stored to obtain the attribute information of each historical message data. The first storage module is used for storing the historical message data to be stored to at least one file in a distributed file system and recording a file path of a file where each historical message data is located. The third determining module is used for determining attribute information of the historical message data and a file path of a file where the historical message data is located as index information for each historical message data. The second storage module is used for storing the index information to the bitmap database in a correlation mode.

Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.

Another aspect of the disclosure provides a computer program product comprising computer executable instructions for implementing the method as described above when executed.

According to the embodiment of the disclosure, the problems that the message data storage mode of the related technology is difficult to meet the evidence obtaining requirement of the message data, the cost of the query computing resources is high, and the query speed is slow can be at least partially solved by using the data query method and the data storage method, so that the unprocessed original message data can be obtained during the query evidence obtaining, the query speed is improved, and the technical effect of resource consumption of data query is reduced.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates a system architecture of a data query method and a data storage method according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a schematic diagram of a data query method and a data storage method according to an embodiment of the disclosure;

FIG. 3 schematically illustrates a flow diagram of a data query method according to an embodiment of the disclosure;

FIG. 4 schematically illustrates a schematic diagram of an index data store according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow chart for determining target index data according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a flow chart of a data storage method according to an embodiment of the present disclosure;

FIG. 7 schematically shows a block diagram of a data querying device according to an embodiment of the present disclosure;

FIG. 8 schematically illustrates a block diagram of a data storage device according to an embodiment of the present disclosure; and

FIG. 9 schematically illustrates a block diagram of a computer system suitable for data querying and data storage according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

An embodiment of the present disclosure provides a data query method, including: the method comprises the steps of receiving a query request, wherein the query request at least comprises attribute information of target message data, determining the target index data from a plurality of index data based on the query request, wherein the target index data comprises a target file path associated with the attribute information of the target message data, each index data in the plurality of index data comprises the attribute information of historical message data and a file path of a file in which the historical message data is located, determining a target file from at least one file based on the target index data, wherein the file path of the target file is the target file path, at least one file is used for storing the historical message data, and obtaining the target message data from the target file.

The embodiment of the present disclosure further provides a data storage method, configured to store historical packet data, where the method includes: the method comprises the steps of obtaining a plurality of historical message data to be stored, analyzing each historical message data in the plurality of historical message data to be stored to obtain attribute information of each historical message data, storing the plurality of historical message data to be stored to at least one file in a distributed file system, and storing the attribute information of the historical message data and a file path of a file where the historical message data is located in a bitmap database in a correlated mode aiming at each historical message data to obtain index data aiming at each historical message data.

Fig. 1 schematically shows a system architecture of a data query method and a data storage method according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, a system architecture 100 according to this embodiment may include forwarding

devices

101, 102, 103, a network 104, and a server 105. Network 104 is used to provide a medium for communication links between forwarding

devices

101, 102, 103 and server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The forwarding

devices

101, 102, 103 interact with a server 105 over a network 104 to receive or send messages or the like. Forwarding

devices

101, 102, 103 may include, but are not limited to, routers, switches, gateways, and the like.

Server 105 may be a server that provides various services, such as providing storage functions (for example only) for message data from forwarding

devices

101, 102, 103. The server 105 may analyze and process the received query request, and obtain target packet data for the query request.

It should be noted that the data query method and the data storage method provided by the embodiments of the present disclosure may be generally executed by the server 105. Accordingly, the data query device and the data storage device provided by the embodiments of the present disclosure may be generally disposed in the server 105. The data query method and the data storage method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster different from the server 105 and capable of communicating with the forwarding

devices

101, 102, 103 and/or the server 105. Accordingly, the data query device and the data storage device provided by the embodiments of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the forwarding

devices

101, 102, 103 and/or the server 105.

It should be understood that the number of forwarding devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of forwarding devices, networks, and servers, as desired for implementation.

Fig. 2 schematically illustrates a schematic diagram of a data query method and a data storage method according to an embodiment of the present disclosure.

As shown in fig. 2, a plurality of historical message data 210 to be stored are obtained from, for example, a router, a switch, or the like. For example, the plurality of historical message data 210 includes historical message data 211, historical message data 212, historical message data 213, and historical message data 214.

And analyzing the data of each historical message data to obtain the attribute information of each historical message data. For example, the attribute information of the history packet data 211 is "attribute a", the attribute information of the history packet data 212 is "attribute B", the attribute information of the history packet data 213 is "attribute C", and the attribute information of the history packet data 214 is "attribute D". The attribute information may include a quaternion of the packet, where the quaternion includes a source IP address, a destination IP address, a source port, and a destination port.

Next, the plurality of historical message data 210 is stored into a plurality of

files

221, 222, and the plurality of

files

221, 222 may be stored in a distributed file system. For example, the historical message data 211 and 212 are compressed and stored in the file 221, and the historical message data 213 and 214 are compressed and stored in the file 222. The file path of the file 221 is, for example, "path a", and the file path of the file 222 is, for example, "path b", and the corresponding file can be found through the file path.

The attribute information of each historical message data and the file path of the file stored in the historical message data are stored in an associated manner, so as to obtain an index file 230, and the index file 230 is stored in a bitmap database, for example. The index file 230 includes a plurality of index data corresponding to a plurality of historical packet data one to one. The plurality of index data include, for example, data associated with "attribute a" and "path a", data associated with "attribute B" and "path a", data associated with "attribute C" and "path B", and data associated with "attribute D" and "path B".

After storing the historical message data to the plurality of

files

221, 222 and generating the index file 230, the target message data 260 may be obtained from the plurality of historical message data based on the received query request 240. Specifically, the query request 240 includes, for example, attribute information of the required target packet data, for example, the attribute information included in the query request 240 is "attribute a".

Then, based on "attribute a" in the query request 240, target index data 250 is determined from the plurality of index data in the index file 230, and the target index data 250 is, for example, associated data of "attribute a" and "path a". Then, based on "path a" in the target index data 250, the file 221 whose file path is "path a" is determined from the plurality of

files

221, 222 as the target file, the file 221 is decompressed next, and then the history packet data 211 whose attribute data is "attribute a" is acquired from the decompressed file 221 based on "attribute a" in the query request 240 as the target packet data 260.

The data query method and the data storage method of the embodiment of the present disclosure are described below with reference to the schematic diagram of fig. 2.

Fig. 3 schematically shows a flow chart of a data query method according to an embodiment of the present disclosure.

As shown in fig. 3, the method may include, for example, the following operations S310 to S340.

In operation S310, an inquiry request is received, where the inquiry request at least includes attribute information of target packet data. Wherein the query request is used, for example, to query the target message data from a plurality of stored historical message data.

In operation S320, target index data is determined from the plurality of index data based on the query request.

In the embodiment of the present disclosure, the plurality of index data correspond to the plurality of historical packet data one to one, for example, that is, each index data includes attribute information of the corresponding historical packet data and a file path of a file in which the historical packet data is located. Among them, a plurality of index data are stored in the bitmap database, for example, and the bitmap database has an advantage in query efficiency.

And determining the index data with the attribute information from the plurality of index data as target index data based on the attribute information in the query request, wherein the target index data comprises a target file path associated with the attribute information.

In operation S330, a target file is determined from at least one file based on the target index data. The file path of the target file is a target file path, and at least one file is used for storing historical message data.

In an embodiment of the present disclosure, at least one file is stored, for example, in a distributed file system, each file including a plurality of history packet data that are stored compressed, each file having a file path. And determining a file with a file path consistent with the target file path from at least one file as a target file based on the target file path in the target index data.

In operation S340, target message data is acquired from the target file.

In an embodiment of the present disclosure, the target file includes a plurality of historical message data. At least one historical message data can be determined from the plurality of historical message data stored in the target file as the target message data based on the attribute information of the target message data in the query request, and the attribute information of the at least one historical message data is matched with the attribute information of the target message data.

For example, after the target file is determined, since the target file stores a plurality of historical packet data, the historical packet data whose attribute information is consistent with the attribute information in the query request can be acquired from the target file as the required target packet data based on the attribute information in the query request.

In the embodiment of the disclosure, the historical message data is stored in the distributed file system, so that the originality of the historical message data is ensured, and the unprocessed original message data can be acquired from the distributed file system when the message is inquired and proved subsequently. In addition, after the historical message data is stored in the distributed file system, the file path and the attribute information of the message are stored in the bitmap database as index data in a correlation mode, so that the index data can be conveniently searched from the bitmap database based on the attribute information to obtain a target file path stored by the target message data, unprocessed target message data can be obtained from the file based on the target file path, the query speed is increased, and the resource consumption of data query is reduced.

FIG. 4 schematically illustrates a schematic diagram of index data storage according to an embodiment of the disclosure.

As shown in fig. 4, the plurality of historical message data 410 includes historical message data 411 and 418 as an example. Historical message data 411, 412 are stored in file 421, historical message data 413, 414 are stored in file 422, historical message data 415, 416 are stored in file 423, and historical message data 417, 418 are stored in file 424. The file path of file 421 is "path a", the file path of file 422 is "path b", and the file path of file 423 is "path c", and the file path of file 424 is "path d".

A plurality of index data corresponding one-to-one to the plurality of history packet data is stored in, for example, a plurality of first databases. For example, a plurality of index data corresponding to the historical message data 411-414 one by one are stored in the first database 431, a plurality of index data corresponding to the historical message data 415-418 one by one are stored in the first database 432, the database identifier of the first database 431 is, for example, "first database P", and the database identifier of the first database 432 is, for example, "first database Q". The first database 431 includes a bitmap database and the second database 432 includes a bitmap database.

The attribute information of each history packet data includes, for example, any one or more of a source IP address, a destination IP address, a source port, a destination port, and a data transmission protocol. The attribute information of each historical message data and the file path association of the file stored in the historical message data are stored in a first database.

In addition, each historical message data has a timestamp, which characterizes, for example, the time that the message was generated. For convenience of understanding, taking the timestamp of the historical message data 411 as "20200101" as an example, it indicates that the historical message data 411 was generated in 2020 on day 01/month 01. However, the timestamp of each historical message data may also be accurately represented to a certain time, for example, may be represented as "20200101163020," which indicates that the historical message data is generated 30 minutes and 20 seconds at 16 of 01/2020.

The index data are stored in the first databases according to the time stamps of the historical message data. Each index data includes, for example, a time stamp, attribute information, and a file path of the history packet data. For example, each first database may store 4 index data, and then the index data corresponding to the plurality of historical packet data 411-. After the first database 431 is full of 4 index data, the index data corresponding to the remaining historical message data 415 and 418 are sequentially stored in the second database 432 according to the time stamp.

Next, the database identification and index data identification association of each first database is stored in second database 440. Wherein the index data identification characterizes the index data stored by the first database, e.g., the index data identification includes a timestamp range of the index data stored by the first database. The timestamp range is characterized by, for example, a minimum timestamp and a maximum timestamp.

Taking the first database 431 as an example, the minimum timestamp of the index data in the first database 431 is "20200101", and the maximum timestamp is "20200104". The minimum timestamp "20200101", the maximum timestamp "20200104", and the database identification "first database P" of the first database 431 are stored in association to the second database 440. The process for the first database 432 is the same or similar and will not be described herein.

The process of determining target index data according to the embodiment of the present disclosure is described below with reference to the schematic diagram of fig. 4 and the flowchart of fig. 5.

FIG. 5 schematically illustrates a flow chart for determining target index data according to an embodiment of the present disclosure.

As shown in fig. 5, the determination of the target index data from the plurality of index data based on the query request in the above operation S320 includes the following operations S521 to S524.

In operation S521, based on the query request, the index data identifier indicated by the query request is determined from the second database.

In the embodiment of the present disclosure, the query request further includes a target time range, and the timestamp of the target message data is within the target time range. For example, when the target message data needs to be queried, a target time range for generating the target message data may be specified, for example, the target time range is from 2020, 01, 02 days to 2020, 01, 03 days, and the time for generating the target message data is within the target time range.

Then, based on the target time range in the query request, the index data identification indicated by the query request, such as minimum timestamp "20200101" and maximum timestamp "20200104", is determined from the second database. It is understood that the timestamp range (01/2020 to 04/01/2020) identified by the index data indicated by the query request includes the target timestamp range (01/02/2020 to 03/01/2020).

In operation S522, at least one database identifier associated with the indicated index data identifier is determined from the second database based on the index data identifier indicated by the query request.

For example, the database associated with the index data identifications "20200101" and "20200104" indicated by the query request is identified as "first database P".

In operation S523, at least one first database corresponding to the at least one database identity is determined based on the at least one database identity. For example, a first database 431 corresponding to the database identification "first database P" is determined.

In operation S524, index data, of which the attribute information matches the attribute information of the target packet data, is determined from the index data stored in the at least one first database as target index data.

In an example, when the attribute information included in the query request is "destination port 31", index data including "destination port 31" is determined from the first database 431 as target index data including, for example, file path "path a". Next, target packet data is obtained from the file 421 corresponding to the "path a", for example, one or more history packet data with a destination port of "31" are obtained from the file 421 as the target packet data.

In another example, when the attribute information included in the query request is "Telnet protocol", a plurality of index data including "Telnet protocol" is determined from the first database 431 as target index data including, for example, file paths "path a" and "path b". Next, target packet data is obtained from the file 421 corresponding to "path a" and the file 422 corresponding to "path b", for example, one or more history packet data with a data transfer protocol "Telnet protocol" are obtained from the file 421, one or more history packet data with a data transfer protocol "Telnet protocol" are obtained from the file 422, and the obtained history packet data are used as the target packet data.

In another example, when the attribute information included in the query request is "

destination port

31 or 32" and "Telnet protocol", the first piece of index data including "

destination port

31 or 32" and "Te 1net protocol" is determined from the first database 431 as target index data including, for example, file path "path a" therein. Next, target packet data is obtained from the file 421 corresponding to the "path a", for example, one or more history packet data with a destination port of "31 or 32" and a data transmission protocol of "Telnet protocol" are obtained from the file 421 as the target packet data.

In another example, when the target time range in the query request is between 04 days 01/2020 and 05 days 01/2020, the index data identifications indicated by the query request are, for example, "20200101" and "20200104" and "20200105" and "20200108". The first database corresponding to the index data identification includes, for example, a first database 431 and a first database 432. Next, target index data including the attribute information is determined from the first database 431 and the first database 432 based on the attribute information in the query request, and target message data is obtained from a corresponding file based on a file path included in the target index data.

It can be understood that, in the embodiment of the present disclosure, by storing the index data of the historical packet data in the plurality of first databases and establishing the second database for indexing the plurality of first databases, when querying the target packet data, it is convenient to determine the corresponding first database from the second database first, and then determine the file path stored in the target packet data from the determined first database, without traversing all the first databases, thereby improving the querying efficiency and reducing the computation resources consumed by querying.

In the embodiment of the present disclosure, at least one file is a file in a distributed file system, the at least one file corresponds to at least one preset time range one to one, and for each file in the at least one file, the message generation time of each historical message data stored in the file is within the preset time range corresponding to the file.

For example, the at least one file includes file 1, file 2, file 3, and the like, and the message generation time of the history message data stored in each file is, for example, within 1 hour. For example, the preset time range corresponding to the file 1 is from 00:00: 00:00 on 01 month and 01 day 0:59:59 on 01 month and 2020. The predetermined time range corresponding to the file 2 is, for example, 2020, 01/01: 00:00 to 2020, 01/01: 59: 59. The preset time range corresponding to the file 3 is, for example, 02:00:00 on 01/2020/01 to 2:59:59 on 01/2020. Taking the historical message data stored in the file 1 as an example, the message generation time of each historical message data stored in the file 1 is, for example, within a time range from 00:00: 00:00 on 01/2020 to 0:59:59 on 01/2020.

For each of at least one file, the plurality of historical message data stored by the file is compressed into a plurality of subfiles. Taking the file 1 as an example, the plurality of historical message data stored in the file 1 are compressed into, for example, a subfile 11, a subfile 12, a subfile 13, and the like. For example, taking 3000 historical message data sequentially generated within a time range from 01/00/2020 to 01/0/59/2020, each subfile may store 1000 pieces of historical message data, for example. For example, in the process of sequentially generating 3000 pieces of historical message data, the generated historical message data are sequentially stored in the subfile 11, after the subfile 11 is full of 1000 pieces of historical message data, the generated historical message data are sequentially stored in the subfile 12 until the subfile 12 is full of 1000 pieces of historical message data, and the subsequently generated messages are sequentially stored in the subfile 13.

And for each subfile, sequentially compressing a plurality of historical message data in the subfile. Taking the subfile 11 as an example, 1000 pieces of history message data stored in the subfile 11 are sequentially compressed and stored, for example.

In one embodiment, the received historical message data may be compressed for the subfile 11 at preset intervals, where the preset interval may be 1 minute. For example, 200 historical message data received within 1 minute are compressed to obtain a primary compressed subfile, and then 300 historical message data newly received in the subsequent 1 minute are compressed into the primary compressed subfile. The newly received 300 historical message data are compressed into the primary compression sub-file, and can be compressed by a streaming compression technology, wherein the streaming compression technology has the function of continuously compressing the new file in the compressed file. Therefore, for the newly received historical message data in the following every 1 minute, the newly received historical message data can be compressed into the previous compressed subfile continuously until the subfile 11 is full of 1000 historical message data, and finally the 1000 historical message data stored in the subfile 11 are compressed into one file.

In another embodiment, the compression process may be performed once for each sub-file 11 with a preset number of history file data, for example, 200. For example, 200 received historical message data are compressed to obtain a preliminary compressed subfile, and then the 200 subsequent newly received historical message data are compressed into the preliminary compressed subfile. The newly received 200 historical message data can be compressed into the preliminary compression subfile by a streaming compression technology. For every 200 subsequent newly received historical message data, the newly received historical message data can be compressed into the previous compressed subfile continuously until the subfile 11 is full of 1000 historical message data, and finally the 1000 historical message data stored in the subfile 11 are compressed into one file.

In the embodiment of the disclosure, the subfile is obtained by compressing the historical message data for multiple times through the streaming compression technology, so that the consumption peak value of the storage space can be reduced. For example, after waiting for receiving 1000 pieces of historical message data, once again compressing the 1000 pieces of historical message data to the subfile 11, the received message data will occupy a larger storage space because the received message data is not compressed in the process of waiting for receiving the 1000 pieces of historical message data. And the historical message data is compressed for many times by the streaming compression technology, so that the storage space occupied by the historical message data can be reduced.

In the embodiment of the present disclosure, since each file has a plurality of subfiles, each subfile also has a file name, for example. The file path for each file can also include the file name of the subfile, so that the corresponding subfile in the target file can be conveniently acquired when the target message data is acquired, and the data acquisition speed is improved.

FIG. 6 schematically shows a flow chart of a data storage method according to an embodiment of the present disclosure.

As shown in fig. 6, the method may include, for example, the following operations S610 to S650.

In operation S610, history packet data to be stored is acquired.

In operation S620, each historical packet data in the historical packet data to be stored is parsed, so as to obtain attribute information of each historical packet data.

The attribute information includes any one or more of a source IP address, a destination IP address, a source port, a destination port, and a data transmission protocol.

In operation S630, the historical packet data to be stored is stored in at least one file in the distributed file system, and a file path of a file in which each historical packet data is located is recorded.

In operation S640, for each history packet data, attribute information of the history packet data and a file path of a file in which the history packet data is located are determined as index information.

In operation S650, the index information association is stored to the bitmap database.

In an embodiment of the present disclosure, at least one file corresponds to at least one preset time range one to one. Storing historical message data to be stored to at least one file in a distributed file system comprises: and determining the message generation time of the historical message data aiming at each historical message data, storing the historical message data to one of at least one file based on the message generation time and at least one preset time range, wherein the message generation time is in the preset time range corresponding to the stored file.

In an embodiment of the present disclosure, for each of at least one file, the plurality of historical message data stored by the file is compressed into a plurality of subfiles. For each subfile: and compressing the received at least one historical message data to obtain a primary compressed subfile, and compressing the newly received at least one historical message data into the primary compressed subfile.

Fig. 7 schematically shows a block diagram of a data querying device according to an embodiment of the present disclosure.

As shown in fig. 7, the data querying device 700 may include: a receiving module 710, a first determining module 720, a second determining module 730, and a first obtaining module 740.

The receiving module 710 may be configured to receive a query request, where the query request includes at least attribute information of the target message data. According to the embodiment of the present disclosure, the receiving module 710 may, for example, perform the operation S310 described above with reference to fig. 3, which is not described herein again.

The first determining module 720 may be configured to determine target index data from a plurality of index data based on the query request, where the target index data includes a target file path associated with attribute information of the target packet data, and each index data in the plurality of index data includes attribute information of the history packet data and a file path of a file in which the history packet data is located. The first determining module 720 according to the embodiment of the disclosure may perform, for example, the operation S320 described above with reference to fig. 3, which is not described herein again.

The second determining module 730 may be configured to determine a target file from at least one file based on the target index data, where a file path of the target file is a target file path, and at least one file is used to store historical message data. According to an embodiment of the present disclosure, the second determining module 730 may perform, for example, the operation S330 described above with reference to fig. 3, which is not described herein again.

The first obtaining module 740 may be configured to obtain target packet data from a target file. According to the embodiment of the present disclosure, the first obtaining module 740 may, for example, perform the operation S340 described above with reference to fig. 3, which is not described herein again.

FIG. 8 schematically shows a block diagram of a data storage device according to an embodiment of the disclosure.

As shown in fig. 8, the data storage device 800 may include: a second obtaining module 810, a parsing module 820, a first storing module 830, a third determining module 840, and a second storing module 850.

The second obtaining module 810 may be configured to obtain historical message data to be stored. According to an embodiment of the present disclosure, the second obtaining module 810 may perform, for example, the operation S610 described above with reference to fig. 6, which is not described herein again.

The parsing module 820 may be configured to parse each historical packet data in the historical packet data to be stored, to obtain attribute information of each historical packet data. According to the embodiment of the present disclosure, the parsing module 820 may perform, for example, the operation S620 described above with reference to fig. 6, which is not described herein again.

The first storage module 830 may be configured to store a plurality of historical packet data to be stored in at least one file in the distributed file system, and record a file path of a file in which each historical packet data is located. According to the embodiment of the present disclosure, the first storage module 830 may perform, for example, the operation S630 described above with reference to fig. 6, which is not described herein again.

The third determining module 840 may be configured to determine, for each historical packet data, attribute information of the historical packet data and a file path of a file in which the historical packet data is located as index information. According to an embodiment of the present disclosure, the third determining module 840 may perform, for example, operation S640 described above with reference to fig. 6, which is not described herein again.

The second storage module 850 may be used to store the index information association to the bitmap database. According to the embodiment of the present disclosure, the second storage module 850 may perform, for example, operation S650 described above with reference to fig. 6, which is not described herein again.

Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.

FIG. 9 schematically illustrates a block diagram of a computer system suitable for data querying and data storage according to an embodiment of the disclosure. The computer system illustrated in FIG. 9 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the disclosure.

As shown in fig. 9, a computer system 900 according to an embodiment of the present disclosure includes a processor 901 which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. Processor 901 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM 903, various programs and data necessary for the operation of the system 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. The processor 901 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the programs may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

System 900 may also include an input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the present disclosure. The system 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a computer-non-volatile computer-readable storage medium, which may include, for example and without limitation: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 902 and/or RAM 903 described above and/or one or more memories other than the ROM 902 and RAM 903.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A method of data query, comprising:

receiving a query request, wherein the query request at least comprises attribute information of target message data;

determining target index data from a plurality of index data based on the query request, wherein the target index data comprises a target file path associated with attribute information of the target message data, and each index data in the plurality of index data comprises attribute information of historical message data and a file path of a file in which the historical message data is located;

determining a target file from at least one file based on the target index data, wherein the file path of the target file is the target file path, and the at least one file is used for storing the historical message data; and

and acquiring the target message data from the target file.

2. The method of claim 1, wherein the plurality of index data are stored in a plurality of first databases, each of the first databases having a database identifier and an index data identifier associated therewith stored in a second database, the index data identifiers characterizing the index data stored in the first databases;

wherein the determining target index data from a plurality of index data based on the query request comprises:

determining, from the second database, an index data identification indicated by the query request based on the query request;

determining at least one database identifier associated with the indicated index data identifier from the second database based on the index data identifier indicated by the query request;

determining at least one first database corresponding to the at least one database identification based on the at least one database identification; and

and determining index data with attribute information matched with the attribute information of the target message data from the index data stored in the at least one first database as the target index data.

3. The method of claim 2, wherein the plurality of index data are stored in the plurality of first databases according to a timestamp of the historical message data; the index data identifying a range of timestamps that includes index data stored by the first database; the query request also comprises a target time range, and the timestamp of the target message data is within the target time range;

wherein determining, based on the query request, the index data identification indicated by the query request from the second database comprises:

and determining the index data identification indicated by the query request from the second database based on the target time range in the query request, wherein the timestamp range of the index data identification indicated by the query request comprises the target time range.

4. The method of claim 1, wherein the target file includes a plurality of historical message data;

wherein the obtaining the target packet data from the target file includes:

and determining at least one historical message data from the plurality of historical message data as the target message data based on the attribute information of the target message data, wherein the attribute information of the at least one historical message data is matched with the attribute information of the target message data.

5. The method of claim 1, wherein the at least one file is a file in a distributed file system; the at least one file corresponds to at least one preset time range one by one, and for each file in the at least one file, the message generation time of each historical message data stored in the file is within the preset time range corresponding to the file.

6. The method of claim 1, wherein, for each of the at least one file, the plurality of historical message data stored by the file is compressed into a plurality of subfiles; for each subfile, sequentially compressing a plurality of historical message data in the subfile;

wherein, the plurality of historical message data are compressed in sequence, including: compressing at least one received historical message data to obtain a primary compression subfile, and compressing at least one newly received historical message data into the primary compression subfile;

wherein the file path further includes a file name of the subfile.

7. The method of any of claims 1-6, wherein the attribute information comprises at least one of:

source IP address, destination IP address, source port, destination port, data transfer protocol.

8. A method according to claim 2 or 3, wherein the first database comprises a bitmap database and the second database comprises a bitmap database.

9. A method of data storage, comprising:

acquiring historical message data to be stored;

analyzing each historical message data in the historical message data to be stored to obtain attribute information of each historical message data;

storing the historical message data to be stored into at least one file in a distributed file system, and recording the file path of the file in which each historical message data is located;

determining attribute information of the historical message data and a file path of a file where the historical message data is located as index information for each historical message data; and

and storing the index information into a bitmap database in a correlated manner.

10. The method of claim 9, wherein the at least one file corresponds one-to-one to at least one preset time range; the storing the historical packet data to be stored to at least one file in a distributed file system comprises, for each of the historical packet data:

determining the message generation time of the historical message data; and

and storing the historical message data to one of the at least one file based on the message generation time and the at least one preset time range, wherein the message generation time is within the preset time range corresponding to the stored file.

11. The method of claim 9, wherein, for each of the at least one file, the plurality of historical message data stored by the file is compressed into a plurality of subfiles; for each subfile:

compressing at least one received historical message data to obtain a primary compression subfile; and

and compressing at least one piece of newly received historical message data into the primary compression subfile.

12. A data query apparatus, comprising:

a receiving module, configured to receive a query request, where the query request at least includes attribute information of target packet data;

a first determining module, configured to determine target index data from multiple index data based on the query request, where the target index data includes a target file path associated with attribute information of the target packet data, and each index data in the multiple index data includes attribute information of historical packet data and a file path of a file in which the historical packet data is located;

a second determining module, configured to determine a target file from at least one file based on the target index data, where a file path of the target file is the target file path, and the at least one file is used to store the historical packet data; and

and the first acquisition module is used for acquiring the target message data from the target file.

13. A data storage device comprising:

the second acquisition module is used for acquiring historical message data to be stored;

the analysis module is used for analyzing each historical message data in the historical message data to be stored to obtain attribute information of each historical message data;

the first storage module is used for storing the historical message data to be stored to at least one file in a distributed file system and recording the file path of the file where each historical message data is located;

the third determining module is used for determining attribute information of the historical message data and a file path of a file where the historical message data is located as index information aiming at each historical message data; and

and the second storage module is used for storing the index information to the bitmap database in a correlation manner.

14. A computing device, comprising:

one or more processors;

a storage device for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-11.

15. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 11.

16. A computer program product comprising computer executable instructions for implementing a method according to any one of claims 1 to 11 when executed.