CN114297204A - Data storage and retrieval method and device for heterogeneous data source - Google Patents

Data storage and retrieval method and device for heterogeneous data source Download PDF

Info

Publication number
CN114297204A
CN114297204A CN202111677171.4A CN202111677171A CN114297204A CN 114297204 A CN114297204 A CN 114297204A CN 202111677171 A CN202111677171 A CN 202111677171A CN 114297204 A CN114297204 A CN 114297204A
Authority
CN
China
Prior art keywords
data
storage
target
heterogeneous
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111677171.4A
Other languages
Chinese (zh)
Inventor
高羽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qianxin Technology Group Co Ltd
Secworld Information Technology Beijing Co Ltd
Original Assignee
Qianxin Technology Group Co Ltd
Secworld Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qianxin Technology Group Co Ltd, Secworld Information Technology Beijing Co Ltd filed Critical Qianxin Technology Group Co Ltd
Priority to CN202111677171.4A priority Critical patent/CN114297204A/en
Publication of CN114297204A publication Critical patent/CN114297204A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data storage and data retrieval method and device for a heterogeneous data source, relates to the technical field of heterogeneous data source processing, and can optimize the acquisition, storage and retrieval operation of the heterogeneous data source, reduce the processing cost, improve the processing efficiency and facilitate subsequent maintenance and management. The main technical scheme of the invention is as follows: receiving a data acquisition instruction, wherein the data acquisition instruction comprises an identifier of a heterogeneous data source; based on the identification of the heterogeneous data source, finding a corresponding heterogeneous data source, wherein an annotation is embedded in the heterogeneous data source and is used for configuring a data acquisition mode of the heterogeneous data source; according to the annotation embedded in the heterogeneous data source, data acquisition operation is carried out to obtain target data; and storing the target data into a column type storage database by using a preset configuration file, and further providing retrieval operation on heterogeneous data sources by using the column type storage database.

Description

Data storage and retrieval method and device for heterogeneous data source
Technical Field
The invention relates to the technical field of heterogeneous data source processing, in particular to a data storage and data retrieval method and device for a heterogeneous data source.
Background
Data retrieval operations may span multiple data sources, i.e., where data is sourced, where different data sources contain large amounts of data (e.g., log data) generated by business activities of various industries, where the large amounts of data from different sources may differ in data structure, storage format, etc., constituting heterogeneous data sources, which contain huge amounts of data, which may be on the order of billions.
At present, a customized script is generally required to be written for each data source to implement functions of data parsing and collecting required data, and the collected data also needs to be stored and externally provided with a retrieval capability. However, this has high requirements on program performance and reliability, which is equivalent to the need for customized development, resulting in high development cost and unfavorable subsequent maintenance management.
Disclosure of Invention
In view of this, the present invention provides a data storage and data retrieval method and apparatus for a heterogeneous data source, and mainly aims to implement acquisition, storage and retrieval operations for the heterogeneous data source by using a general method, optimize a related data processing method for the heterogeneous data source, reduce data processing cost, improve data processing efficiency, and facilitate subsequent maintenance and management.
A first aspect of the present application provides a data storage method for heterogeneous data sources, where the method includes:
receiving a data acquisition instruction, wherein the data acquisition instruction comprises an identifier of a heterogeneous data source;
based on the identification of the heterogeneous data source, finding a corresponding heterogeneous data source, wherein an annotation is embedded in the heterogeneous data source and is used for configuring a data acquisition mode of the heterogeneous data source;
according to the annotation embedded in the heterogeneous data source, data acquisition operation is carried out to obtain target data;
and storing the target data into a column type storage database by using a preset configuration file.
In some modified embodiments of the first aspect of the present application, the obtaining target data by performing a data acquisition operation according to an annotation embedded in the heterogeneous data source includes:
acquiring an interface and a data conversion format of the heterogeneous data source from the annotation;
collecting data from the heterogeneous data sources according to the interface;
and according to the data conversion format, carrying out unified format conversion processing on the acquired data to obtain target data.
In some variations of the first aspect of the present application, the storing the target data in a columnar storage database using a preset profile includes:
acquiring a plurality of preset attribute fields, a plurality of storage volume thresholds and a plurality of storage addresses from the preset configuration file, wherein each storage volume threshold is associated with a different storage address;
acquiring a storage quantity threshold value corresponding to the data quantity larger than the target data from the plurality of storage quantity threshold values as a target storage quantity threshold value;
determining a corresponding associated target storage address from the plurality of storage addresses according to the target storage amount threshold;
according to the preset attribute fields, a column type storage database is established on a storage space corresponding to the target storage address;
storing the target data in the columnar storage database.
In some variations of the first aspect of the present application, after the creating the columnar storage database on the storage space corresponding to the target storage address according to the plurality of preset attribute fields, the method further includes:
acquiring a target attribute field from a plurality of preset attribute fields;
setting a data storage format corresponding to the target attribute field to store data information in the data storage format in the target attribute field; and/or the presence of a gas in the gas,
if the number of the target attribute fields is multiple, analyzing the multiple target attribute fields to obtain the association relation existing among the multiple target attribute fields, so that when the target data is stored in the columnar storage database, data information is preferentially stored in the target attribute fields with the association relation.
In some variations of the first aspect of the application, the storing the target data in the columnar storage database comprises:
in the process of storing the target data by using the plurality of preset attribute fields, storing corresponding target data to the target attribute fields according to the data storage format;
and if the number of the target attribute fields is multiple, storing corresponding target data to the multiple target attribute fields according to the incidence relation among the multiple target attribute fields.
A second aspect of the present application provides a data retrieval method for heterogeneous data sources, which is applied to a columnar storage database obtained by the data storage method for heterogeneous data sources as described above, and the method includes:
receiving a data retrieval instruction, wherein the data retrieval instruction carries retrieval information;
and searching a retrieval result matched with the retrieval information by traversing each attribute column in the column type storage database.
In some variations of the second aspect of the present application, the searching for the search result matching the search information by traversing each attribute column in the column-wise storage database includes:
analyzing the retrieval condition and the retrieval key word from the retrieval information;
according to the search keywords, traversing the attribute information under each attribute field in the column storage database one by one, and searching matched target attribute information;
determining a target attribute field corresponding to attribution according to the target attribute information;
and searching a retrieval result matched with the retrieval condition under the target attribute field.
A third aspect of the present application provides a data storage device for heterogeneous data sources, the device comprising:
the receiving unit is used for receiving a data acquisition instruction, and the data acquisition instruction comprises an identifier of a heterogeneous data source;
the searching unit is used for searching a corresponding heterogeneous data source based on the identifier of the heterogeneous data source, wherein an annotation is embedded in the heterogeneous data source and is used for configuring a data acquisition mode of the heterogeneous data source;
the acquisition unit is used for executing data acquisition operation according to the embedded annotation in the heterogeneous data source to obtain target data;
and the storage unit is used for storing the target data into a column type storage database by using a preset configuration file.
In some variations of the third aspect of the present application, the acquisition unit comprises:
the acquisition module is used for acquiring the interface and the data conversion format of the heterogeneous data source from the annotation;
the acquisition module is used for acquiring data from the heterogeneous data source according to the interface;
and the processing module is used for carrying out unified format conversion processing on the acquired data according to the data conversion format to obtain target data.
In some modified embodiments of the third aspect of the present application, the storage unit includes:
an obtaining module, configured to obtain a plurality of preset attribute fields, a plurality of storage volume thresholds, and a plurality of storage addresses from the preset configuration file, where each storage volume threshold is associated with a different storage address;
the obtaining module is further configured to obtain, from the plurality of storage volume thresholds, a storage volume threshold corresponding to a data volume larger than the target data as a target storage volume threshold;
the determining module is used for determining a corresponding associated target storage address from the plurality of storage addresses according to the target storage amount threshold;
the creation module is used for creating a column type storage database on a storage space corresponding to the target storage address according to the preset attribute fields;
and the storage module is used for storing the target data into the column storage database.
In some modified embodiments of the third aspect of the present application, the storage unit further includes:
the acquisition module is further configured to acquire a target attribute field from the plurality of preset attribute fields;
the setting module is used for setting a data storage format corresponding to the target attribute field so as to store data information in the target attribute field in the data storage format;
and the establishing module is used for analyzing the target attribute fields to obtain the association relation existing among the target attribute fields when the target attribute fields are multiple, and is used for preferentially storing data information into the target attribute fields with the association relation when the target data is stored in the column storage database.
In some modified embodiments of the third aspect of the present application, the storage module is further specifically configured to:
in the process of storing the target data by using the plurality of preset attribute fields, storing corresponding target data to the target attribute fields according to the data storage format;
and when the target attribute fields are multiple, storing corresponding target data to the multiple target attribute fields according to the association relation existing among the multiple target attribute fields.
A fourth aspect of the present application provides a data retrieval apparatus for heterogeneous data sources, the apparatus including:
the receiving unit is used for receiving a data retrieval instruction, and the data retrieval instruction carries retrieval information;
and the searching unit is used for searching a searching result matched with the searching information by traversing each attribute column in the column type storage database.
In some modified embodiments of the fourth aspect of the present application, the search unit includes:
the analysis module is used for analyzing the retrieval conditions and the retrieval keywords from the retrieval information;
the searching module is used for traversing the attribute information under each attribute field in the column storage database one by one according to the search keyword and searching the matched target attribute information;
the determining module is used for determining a target attribute field corresponding to attribution according to the target attribute information;
the searching module is further used for searching the searching result matched with the searching condition under the target attribute field.
A fifth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for retrieving a heterogeneous data source as described above, and implements the method for retrieving data of a heterogeneous data source as described above.
A sixth aspect of the present application provides an electronic device, comprising: the system comprises a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor realizes the heterogeneous data source retrieval method and the heterogeneous data source data retrieval method when executing the computer program.
By the technical scheme, the technical scheme provided by the invention at least has the following advantages:
the invention provides a data storage and data retrieval method and a device, which are characterized in that annotations are added into a heterogeneous data source to be acquired in advance, then when a data acquisition instruction is received, data acquisition operation is performed according to the annotations embedded in the heterogeneous data source to obtain target data, and then the target data is stored in a column type storage database by utilizing a preset configuration file, so that the acquisition and storage operation of each heterogeneous data source is realized in a universal mode. And when a retrieval instruction is received, the corresponding retrieval result can be fed back by using the column type storage database. Compared with the prior art, the method and the device solve the problems that the acquisition, storage and retrieval functions are provided for each heterogeneous data source in a customized development mode, so that the cost is high, and the subsequent maintenance management is not facilitated.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a flowchart of a data storage method for heterogeneous data sources according to an embodiment of the present invention;
fig. 2 is a flowchart of another data storage method for heterogeneous data sources according to an embodiment of the present invention;
fig. 3 is a flowchart of a data retrieval method for heterogeneous data sources according to an embodiment of the present invention;
FIG. 4 is a flowchart of another data retrieval method for heterogeneous data sources according to an embodiment of the present invention
FIG. 5 is a block diagram illustrating a data storage device of a heterogeneous data source according to an embodiment of the present invention;
FIG. 6 is a block diagram of a data storage device of another heterogeneous data source according to an embodiment of the present invention;
FIG. 7 is a block diagram of a data retrieving apparatus for heterogeneous data sources according to an embodiment of the present invention;
fig. 8 is a block diagram of a data retrieval apparatus of another heterogeneous data source according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
The embodiment of the invention provides a data storage method of a heterogeneous data source, as shown in fig. 1, the method realizes acquisition and storage functions in a general way aiming at different heterogeneous data sources, and the embodiment of the invention provides the following specific steps:
101. and receiving a data acquisition instruction, wherein the data acquisition instruction comprises the identifier of the heterogeneous data source.
The data sources are data sources, and different data sources include a large amount of data (e.g., log data) generated by business activities of various industries, and the large amount of data different in the data sources are different in data structure, storage manner, and the like, thereby forming heterogeneous data sources.
In the embodiment of the present invention, the received data acquisition instruction may include an identifier of one or more data sources, where the identifier is used to indicate a data acquisition object.
102. And finding the corresponding heterogeneous data source based on the identification of the heterogeneous data source.
103. And according to the embedded annotation in the heterogeneous data source, executing data acquisition operation to obtain target data.
The annotations are embedded in the heterogeneous data source and used for configuring a data acquisition mode of the heterogeneous data source, specifically, the annotations can be understood as special marks in the code, the marks can be read during compiling, class loading and running, corresponding processing is executed, and through the annotations, developers can embed supplementary information in the source code under the condition that original codes and logic are not changed.
In the embodiment of the invention, the annotations are common for different heterogeneous data sources, rather than the scripts developed and written by customization for realizing certain specified operations for each data source, the annotations are written into the data sources in advance after the acquisition objects (namely the data sources) are determined.
Specifically, at the code level, a Software Development Kit (SDK) may be used to write annotations into the start of the data source, where the annotations correspond to special marks in the code, and these marks may be read during compiling, class loading, and runtime, and perform corresponding processing to facilitate information supplementation or deployment by other tools. For the embodiment of the invention, the target data is obtained by executing the annotation embedded in the data source to perform data acquisition operation.
104. And storing the target data into a column type storage database by using a preset configuration file.
The preset configuration file at least comprises preset attribute fields and a storage address, wherein the preset attribute fields are used for indicating which attribute fields are used for storing the target data; the storage address is used to refer to where the target data is stored, such as a memory or server. And in addition if the storage location requires authorized login, the corresponding account number and password may also be stored in the preset configuration file.
The column type storage database is used for storing acquired target data according to a column storage index, column type attributes are stored by taking attributes (columns) in the relational database as units, data information of the same attribute is stored together in a data table, and attribute information of different attributes in one record is respectively stored in different storage units.
In the embodiment of the present invention, the preset configuration file corresponds to a standard according to which the target data is stored. And combining the storage standards, and adopting a column type storage database to implement the function of storing the target data as follows: for a plurality of heterogeneous data sources, the data volume of the data sources can be billions, the number of types of attribute information obtained by collecting the heterogeneous data sources is large, and then the number of attribute fields required for storing the attribute information by using the created database is also large. If the data information is stored in the line storage mode, a large amount of data information in the attribute fields is searched in a traversal mode during the searching operation, and therefore the searching processing cost is wasted, and the searching efficiency is low. However, if the data information is stored in a column-wise storage manner, the next attribute field can be found only by traversing the attribute information in one attribute field based on the column storage index during the retrieval operation, so that the matched retrieval result can be found without traversing all the attribute fields.
The embodiment of the invention provides a data storage method, and the data storage method is characterized in that annotations are added into heterogeneous data sources to be acquired in advance, then when a data acquisition instruction is received, data acquisition operation is performed according to the annotations embedded in the heterogeneous data sources to obtain target data, and then the target data is stored into a column type storage database by using a preset configuration file, so that the acquisition and storage operation of each heterogeneous data source is realized in a universal mode. Compared with the prior art, the method and the device solve the problems that the acquisition and storage functions are provided for each heterogeneous data source in a customized manner, so that the cost is high, and the subsequent maintenance management is not facilitated.
In order to describe the above embodiment in more detail, another data storage method for heterogeneous data sources is further provided in the embodiment of the present invention, as shown in fig. 2, which is a detailed explanation of the above embodiment, and for this, the following specific steps are provided in the embodiment of the present invention:
201. and receiving a data acquisition instruction, wherein the data acquisition instruction comprises the identifier of the heterogeneous data source.
In the embodiment of the present invention, the step is explained with reference to step 101, and is not described herein again.
202. And finding the corresponding heterogeneous data source based on the identification of the heterogeneous data source.
203. And according to the embedded annotation in the heterogeneous data source, executing data acquisition operation to obtain target data.
The heterogeneous data source is embedded with an annotation, the annotation is used for configuring a data acquisition mode of the heterogeneous data source, and the annotation comprises an interface of the heterogeneous data source and a data conversion format.
In the embodiment of the present invention, a specific implementation method for implementing data acquisition by using an embedded annotation in a heterogeneous data source includes the following steps:
first, the interface and data conversion format of the heterogeneous data source are obtained from the annotation.
Secondly, data are collected from the heterogeneous data source according to the interface, and the collected data are subjected to unified format conversion processing according to the data conversion format to obtain target data.
It should be noted that, for a plurality of heterogeneous data sources, there are differences in data structures, storage formats, and the like, and therefore, it is necessary to implement standardized format processing on data information collected from the plurality of heterogeneous data sources by using the data conversion format mentioned in the embodiment of the present invention, so as to obtain standard and uniform data information, which is used as target data for subsequent storage operations.
204. A plurality of preset attribute fields, a plurality of storage volume thresholds and a plurality of storage addresses are obtained from a preset configuration file, wherein each storage volume threshold is associated with a different storage address.
The preset attribute field is a standardized and universal attribute field integrated from databases in a plurality of technical fields. Specifically, for the target data collected from the heterogeneous data sources in the embodiment of the present invention, because the target data come from different heterogeneous data sources and the number of attribute types included in a plurality of heterogeneous data sources is also large, in order to implement effective storage of more diverse data information, the embodiment of the present invention may obtain the attribute fields from the heterogeneous data sources to construct a standardized and general attribute field.
For example, the common attribute field is obtained by performing intersection processing on the attribute fields contained in a plurality of heterogeneous data sources; second, the attribute field is derived from the common attribute field. Specifically, for the derived attribute field, a semantic analysis method may be adopted, for example: and merging the attribute fields with similar semantics or expanding the attribute fields of more lower concept refinement branches based on the attribute fields with high frequency use. Further, the attribute types contained in the multiple heterogeneous data sources are various and large in quantity, the data value of the data of the attribute types appearing at a low frequency is not high, the data can be cleaned from the collected target data and is not stored in the column type storage database, and particularly, the attribute fields with the low frequency can not be set in the database, so that the situation that storage resources are occupied due to too much useless redundant data stored in the database is avoided.
The storage volume threshold is a data capacity threshold of the storage space, the storage address is an address of the storage space, each storage volume threshold is associated with a different storage address, so that the corresponding storage space is selected according to the size of the data volume of the collected target data, it should be noted that the collected target data may include dynamic data and static data, the dynamic data is frequently changed and directly reflects data of a transaction process, such as website access amount, online number of people, daily sales amount and the like, and the dynamic data is short in update period and large in data volume, so that the optimization processing mode is to separately process the dynamic data and the static data.
In the embodiment of the invention, the establishment of a corresponding database for the target data can be automatically completed by utilizing the preset attribute field, the storage quantity threshold value and the storage address in the preset configuration file to complete the storage.
205. And acquiring a storage quantity threshold corresponding to the data quantity larger than the target data from the plurality of storage quantity thresholds as a target storage quantity threshold.
206. And determining a corresponding associated target storage address from the plurality of storage addresses according to the target storage amount threshold.
In the embodiment of the present invention, as explained in step 205 and 206, a plurality of storage volume thresholds are stored in the preset configuration file, and each storage volume threshold is associated with a storage address. In order to avoid the situation that the storage space corresponding to the storage address is not enough to accommodate the storage target data, the data volume of the target data can be preferentially compared with different storage volume thresholds, and the storage address corresponding to the storage volume threshold larger than the data volume of the target data is searched. Wherein, each memory address may include: a start address and an end address stored in pairs.
In another embodiment, in order to improve the utilization rate of the storage space, after the plurality of storage addresses corresponding to the storage amount threshold larger than the data amount of the target data are found, the storage address corresponding to the minimum storage amount threshold may be selected from the plurality of storage addresses, so that it is ensured that the corresponding address space is occupied as much as possible, and resource waste is prevented.
207. And according to a plurality of preset attribute fields, creating a column type storage database on a storage space corresponding to the target storage address.
In the embodiment of the present invention, a columnar storage database is created mainly according to preset attribute fields in a preset configuration file, and then the columnar storage database contains the preset attribute fields, and further, optimization processing can be performed on the preset attribute fields, which specifically includes the following steps:
obtaining a target attribute field from a plurality of preset attribute fields, namely the preset attribute field needing optimization processing, which is called the target attribute field for short, wherein the optimization processing mode of the target attribute field is as follows:
for example, the data storage format corresponding to the target attribute field is set by preprocessing the target attribute field.
For example, when fields having special meanings in some specific fields or names of province, city, district and county are stored according to actual business requirements, the literal names and the codes mapped by the literal names need to be stored at the same time.
Alternatively, as a parallel scheme, if there are a plurality of target attribute fields, the association relationship existing between the plurality of target attribute fields is obtained by analyzing the plurality of target attribute fields. Based on the association relationship, data information may be preferentially stored in the target attribute field having the association relationship, for example, after the data information is stored in one target attribute field, if the target attribute field has the association relationship with another target attribute field, based on the association relationship, the data information is preferentially stored in the another target attribute field, and then the data storage operation on the database is completed according to the other attribute field.
It should be noted that, data storage is performed based on the association relationship, so that not only the data storage efficiency can be effectively improved during storage, but also other target attribute fields having association relationship with the target attribute field can be brought out based on one target attribute field during subsequent retrieval, thereby effectively improving the query efficiency of data.
For example, as mentioned above, the province, the city, the district, and the county correspond to different attribute fields respectively, and the association relationship between the attribute fields may be established in advance according to a certain sequence, so that data information is stored in the province, the city, the district, and the county corresponding to different attribute fields respectively according to the sequence based on the association relationship.
208. Storing the target data in a columnar storage database.
In the embodiment of the present invention, the data analysis processing is performed on the acquired target data to determine the attribute fields that need to be stored, so as to complete the storage operation, but the target attribute fields preset in step 207 need to be described as follows:
and in the process of storing the target data by utilizing the plurality of preset attribute fields, storing corresponding data to the target attribute fields according to a data storage format. And if the number of the target attribute fields is multiple, storing corresponding data to the multiple target attribute fields according to the incidence relation existing in the multiple target attribute fields.
Further, on the basis of the column-wise storage database obtained by the data storage method for the heterogeneous data source provided by the embodiment of the present invention, an embodiment of the present invention further provides a data retrieval method for the heterogeneous data source, as shown in fig. 3, and the following specific steps are provided for the embodiment of the present invention:
301. and receiving a data retrieval instruction, wherein the data retrieval instruction carries retrieval information.
302. And searching a retrieval result matched with the retrieval information by traversing each attribute column in the column type storage database.
In the embodiment of the invention, the columnar storage database stores data information from a plurality of heterogeneous data sources, which is equivalent to that the data information in the plurality of heterogeneous data sources is subjected to integrated preprocessing in a data acquisition and storage manner. Then, for receiving the retrieval instruction, it may indirectly act as a retrieval operation facing these heterogeneous data sources, so that the retrieval result found by retrieving the columnar storage database is equivalent to the retrieval result obtained by initiating the retrieval instruction to the original heterogeneous data source.
In the embodiment of the invention, the retrieval result is obtained based on the retrieval of the column type storage database, and the original situation that the retrieval instruction is respectively initiated to each heterogeneous data source to obtain the related data information is replaced, so that the retrieval efficiency and the retrieval universality are greatly improved.
Further, for detailed explanation of the retrieval operation, another data retrieval method for heterogeneous data sources is further provided in the embodiment of the present invention, as shown in fig. 4, the following specific steps are provided for the embodiment of the present invention:
401. and receiving a data retrieval instruction, wherein the data retrieval instruction carries retrieval information.
402. And analyzing the retrieval conditions and the retrieval keywords from the retrieval information.
403. And according to the search keywords, traversing the attribute information under each attribute field in the column storage database one by one, and searching for matched target attribute information.
According to the embodiment of the invention, a large amount of types of attribute information obtained by collecting the heterogeneous data sources can be stored by utilizing the column type storage database, namely the number of the attribute fields in the column type storage database is large, so that based on the column type storage characteristics of the column type storage database, the whole column of data reading can be realized during data reading, namely one attribute field is searched, the data information of the whole attribute column can be read, and thus, the search can avoid traversing a large number of attribute fields, but after the attribute field where the search keyword is located is determined, other attribute fields do not need to be traversed, and the search efficiency is greatly improved.
404. And determining a corresponding attributive target attribute field according to the target attribute information.
405. And searching a retrieval result matched with the retrieval condition under the target attribute field.
In the embodiment of the invention, after the matched target attribute information is determined according to the search keyword, the search operation is further completed under the target attribute field according to the search condition, and the search result is efficiently obtained.
Further, as an implementation of the method shown in fig. 1 and fig. 2, an embodiment of the present invention provides a data storage device for heterogeneous data sources. The embodiment of the apparatus corresponds to the embodiment of the method, and for convenience of reading, details in the embodiment of the apparatus are not repeated one by one, but it should be clear that the apparatus in the embodiment can correspondingly implement all the contents in the embodiment of the method. The device is applied to realize acquisition and storage operations in a general way for different heterogeneous data sources, and specifically as shown in fig. 5, the device includes:
a receiving unit 31, configured to receive a data acquisition instruction, where the data acquisition instruction includes an identifier of a heterogeneous data source;
the searching unit 32 is configured to search a corresponding heterogeneous data source based on the identifier of the heterogeneous data source, where an annotation is embedded in the heterogeneous data source, and the annotation is used to configure a data acquisition mode of the heterogeneous data source;
the acquisition unit 33 is configured to perform data acquisition operations according to the annotations embedded in the heterogeneous data source to obtain target data;
and the storage unit 34 is used for storing the target data into a column type storage database by using a preset configuration file.
Further, as shown in fig. 6, the acquisition unit 33 includes:
the obtaining module 331 is configured to obtain an interface and a data conversion format of the heterogeneous data source from the annotation;
an acquisition module 332, configured to acquire data from the heterogeneous data source according to the interface;
and the processing module 333 is configured to perform unified format conversion processing on the acquired data according to the data conversion format to obtain target data.
Further, as shown in fig. 6, the storage unit 34 includes:
an obtaining module 341, configured to obtain, from the preset configuration file, a plurality of preset attribute fields, a plurality of storage volume thresholds, and a plurality of storage addresses, where each storage volume threshold is associated with a different storage address;
the obtaining module 341 is further configured to obtain, from the multiple storage volume thresholds, a storage volume threshold corresponding to a data volume greater than the target data as a target storage volume threshold;
a determining module 342, configured to determine, according to the target storage amount threshold, a corresponding associated target storage address from the plurality of storage addresses;
a creating module 343, configured to create a column-wise storage database on the storage space corresponding to the target storage address according to the multiple preset attribute fields;
a storage module 344, configured to store the target data in the columnar storage database.
Further, as shown in fig. 6, the storage unit 34 further includes:
the obtaining module 341 is further configured to obtain a target attribute field from the plurality of preset attribute fields;
a setting module 345, configured to set a data storage format corresponding to the target attribute field, so as to store data information in the data storage format in the target attribute field;
an establishing module 346, configured to, when there are multiple target attribute fields, obtain an association relationship existing between the multiple target attribute fields by analyzing the multiple target attribute fields, so as to preferentially store data information into a target attribute field having the association relationship when the target data is stored in the columnar storage database.
Further, as shown in fig. 6, the storage module 344 is further specifically configured to:
in the process of storing the target data by using the plurality of preset attribute fields, storing corresponding target data to the target attribute fields according to the data storage format;
and when the target attribute fields are multiple, storing corresponding target data to the multiple target attribute fields according to the association relation existing among the multiple target attribute fields.
Further, as an implementation of the methods shown in fig. 3 and fig. 4, an embodiment of the present invention provides a data retrieval device for heterogeneous data sources. The embodiment of the apparatus corresponds to the embodiment of the method, and for convenience of reading, details in the embodiment of the apparatus are not repeated one by one, but it should be clear that the apparatus in the embodiment can correspondingly implement all the contents in the embodiment of the method. The apparatus is applied to complete a retrieval operation for a database containing heterogeneous data sources, and specifically as shown in fig. 7, the apparatus includes:
a receiving unit 41, configured to receive a data retrieval instruction, where the data retrieval instruction carries retrieval information;
and the searching unit 42 is used for searching a searching result matched with the searching information by traversing each attribute column in the column type storage database.
Further, as shown in fig. 8, the search unit 42 includes:
the parsing module 421 is configured to parse the search information to obtain a search condition and a search keyword;
the searching module 422 is configured to traverse the attribute information under each attribute field in the column storage database one by one according to the search keyword, and search for matched target attribute information;
a determining module 423, configured to determine, according to the target attribute information, a target attribute field corresponding to the attribution;
the searching module 422 is further configured to search for a search result matching the search condition in the target attribute field.
In summary, embodiments of the present invention provide a data storage method and a data retrieval method and apparatus, where annotations are added to a heterogeneous data source to be acquired in advance, then when a data acquisition instruction is received, a data acquisition operation is performed according to the annotations embedded in the heterogeneous data source to obtain target data, and then the target data is stored in a column type storage database by using a preset configuration file, thereby implementing acquisition and storage operations on each heterogeneous data source in a general manner. And when a retrieval instruction is received, the corresponding retrieval result can be fed back by using the column type storage database. Compared with the prior art, the method and the device solve the problems that the acquisition, storage and retrieval functions are provided for each heterogeneous data source in a customized development mode, so that the cost is high, and the subsequent maintenance management is not facilitated.
The data storage device of the heterogeneous data source comprises a processor and a memory, wherein the receiving unit, the searching unit, the collecting unit, the storage unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The data retrieval device of the heterogeneous data source comprises a processor and a memory, wherein the receiving unit, the searching unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, the acquisition, storage and retrieval operation of the heterogeneous data source is realized in a general mode by adjusting kernel parameters, the related data processing method of the heterogeneous data source is optimized, the data processing cost is reduced, the data processing efficiency is improved, and the subsequent maintenance and management are facilitated.
An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the data storage method of the heterogeneous data source or the data retrieval method of the heterogeneous data source as described above.
An embodiment of the present invention provides an electronic device, including: the data storage method comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the data storage method of the heterogeneous data source or the data retrieval method of the heterogeneous data source.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip. The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent insertion, improvement, etc. made within the spirit and principle of the present application should be included in the scope of claims of the present application.

Claims (11)

1. A method for storing data from heterogeneous data sources, the method comprising:
receiving a data acquisition instruction, wherein the data acquisition instruction comprises an identifier of a heterogeneous data source;
based on the identification of the heterogeneous data source, finding a corresponding heterogeneous data source, wherein an annotation is embedded in the heterogeneous data source and is used for configuring a data acquisition mode of the heterogeneous data source;
according to the annotation embedded in the heterogeneous data source, data acquisition operation is carried out to obtain target data;
and storing the target data into a column type storage database by using a preset configuration file.
2. The method of claim 1, wherein the annotation includes an interface and a data conversion format of the heterogeneous data source, and the performing a data acquisition operation to obtain target data according to the annotation embedded in the heterogeneous data source comprises:
acquiring an interface and a data conversion format of the heterogeneous data source from the annotation;
collecting data from the heterogeneous data sources according to the interface;
and according to the data conversion format, carrying out unified format conversion processing on the acquired data to obtain target data.
3. The method of claim 1, wherein storing the target data in a columnar storage database using a preset profile comprises:
acquiring a plurality of preset attribute fields, a plurality of storage volume thresholds and a plurality of storage addresses from the preset configuration file, wherein each storage volume threshold is associated with a different storage address;
acquiring a storage quantity threshold value corresponding to the data quantity larger than the target data from the plurality of storage quantity threshold values as a target storage quantity threshold value;
determining a corresponding associated target storage address from the plurality of storage addresses according to the target storage amount threshold;
according to the preset attribute fields, a column type storage database is established on a storage space corresponding to the target storage address;
storing the target data in the columnar storage database.
4. The method of claim 3, further comprising:
acquiring a target attribute field from a plurality of preset attribute fields;
setting a data storage format corresponding to the target attribute field to store data information in the data storage format in the target attribute field; and/or the presence of a gas in the gas,
if the number of the target attribute fields is multiple, analyzing the multiple target attribute fields to obtain the association relation existing among the multiple target attribute fields, so that when the target data is stored in the columnar storage database, data information is preferentially stored in the target attribute fields with the association relation.
5. The method of claim 4, wherein said storing said target data in said columnar storage database comprises:
in the process of storing the target data by using the plurality of preset attribute fields, storing corresponding target data to the target attribute fields according to the data storage format;
and if the number of the target attribute fields is multiple, storing corresponding target data to the multiple target attribute fields according to the incidence relation among the multiple target attribute fields.
6. A data retrieval method for heterogeneous data sources, which is applied to a column-type storage database obtained by the data storage method for heterogeneous data sources according to any one of claims 1 to 5, and comprises the following steps:
receiving a data retrieval instruction, wherein the data retrieval instruction carries retrieval information;
and searching a retrieval result matched with the retrieval information by traversing each attribute column in the column type storage database.
7. The method according to claim 6, wherein the searching for the search result matching the search information by traversing each attribute column in the column-wise storage database comprises:
analyzing the retrieval condition and the retrieval key word from the retrieval information;
according to the search keywords, traversing the attribute information under each attribute field in the column storage database one by one, and searching matched target attribute information;
determining a target attribute field corresponding to attribution according to the target attribute information;
and searching a retrieval result matched with the retrieval condition under the target attribute field.
8. A data storage device for heterogeneous data sources, the device comprising:
the receiving unit is used for receiving a data acquisition instruction, and the data acquisition instruction comprises an identifier of a heterogeneous data source;
the searching unit is used for searching a corresponding heterogeneous data source based on the identifier of the heterogeneous data source, wherein an annotation is embedded in the heterogeneous data source and is used for configuring a data acquisition mode of the heterogeneous data source;
the acquisition unit is used for executing data acquisition operation according to the embedded annotation in the heterogeneous data source to obtain target data;
and the storage unit is used for storing the target data into a column type storage database by using a preset configuration file.
9. An apparatus for data retrieval from disparate data sources, the apparatus comprising:
the receiving unit is used for receiving a data retrieval instruction, and the data retrieval instruction carries retrieval information;
and the searching unit is used for searching a searching result matched with the searching information by traversing each attribute column in the column type storage database.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, implements the data storage method of a heterogeneous data source according to any one of claims 1 to 5;
or which computer program, when being executed by a processor, carries out a data retrieval method of a heterogeneous data source according to claim 6 or 7.
11. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a data storage method of a heterogeneous data source according to any one of claims 1-5 when executing the computer program;
or which computer program, when being executed by a processor, carries out a data retrieval method of a heterogeneous data source according to claim 6 or 7.
CN202111677171.4A 2021-12-31 2021-12-31 Data storage and retrieval method and device for heterogeneous data source Pending CN114297204A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111677171.4A CN114297204A (en) 2021-12-31 2021-12-31 Data storage and retrieval method and device for heterogeneous data source

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111677171.4A CN114297204A (en) 2021-12-31 2021-12-31 Data storage and retrieval method and device for heterogeneous data source

Publications (1)

Publication Number Publication Date
CN114297204A true CN114297204A (en) 2022-04-08

Family

ID=80976335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111677171.4A Pending CN114297204A (en) 2021-12-31 2021-12-31 Data storage and retrieval method and device for heterogeneous data source

Country Status (1)

Country Link
CN (1) CN114297204A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098581A (en) * 2022-08-26 2022-09-23 金联创网络科技有限公司 Method, device and equipment for storing numerical heterogeneous data and storage medium
CN115185939A (en) * 2022-09-07 2022-10-14 中航信移动科技有限公司 Data processing method of multi-source data

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098581A (en) * 2022-08-26 2022-09-23 金联创网络科技有限公司 Method, device and equipment for storing numerical heterogeneous data and storage medium
CN115185939A (en) * 2022-09-07 2022-10-14 中航信移动科技有限公司 Data processing method of multi-source data
CN115185939B (en) * 2022-09-07 2022-11-18 中航信移动科技有限公司 Data processing method of multi-source data

Similar Documents

Publication Publication Date Title
CN107247808B (en) Distributed NewSQL database system and picture data query method
CN110019218B (en) Data storage and query method and equipment
CN106611044B (en) SQL optimization method and equipment
CN110673839B (en) Distributed tool configuration construction generation method and system
US20140046928A1 (en) Query plans with parameter markers in place of object identifiers
CN114297204A (en) Data storage and retrieval method and device for heterogeneous data source
CN110555035A (en) Method and device for optimizing query statement
EP3889797A1 (en) Database index and database query processing method, apparatus, and device
CN112860730A (en) SQL statement processing method and device, electronic equipment and readable storage medium
CN110955714A (en) Method and device for converting unstructured text into structured text
CN111680030A (en) Data fusion method and device, and data processing method and device based on meta information
CN111625728B (en) Method, device, equipment and medium for generating retrieval catalog from webpage document
CN116303628B (en) Alarm data query method, system and equipment based on elastic search
CN113297245A (en) Method and device for acquiring execution information
CN116610694A (en) Rule verification method and system based on relation between columns and access sentences
CN110019357B (en) Database query script generation method and device
CN115658680A (en) Data storage method, data query method and related device
CN105426676A (en) Drilling data processing method and system
CN114218347A (en) Method for quickly searching index of multiple file contents
US20170031909A1 (en) Locality-sensitive hashing for algebraic expressions
CN114428776A (en) Index partition management method and system for time sequence data
CN112749189A (en) Data query method and device
CN116431756B (en) Method, equipment and medium for highlighting search text based on Vue
US11995059B2 (en) Database index and database query processing method, apparatus, and device
CN115952203B (en) Data query method, device, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination