CN110555021B - Data storage method, query method and related device - Google Patents

Data storage method, query method and related device Download PDF

Info

Publication number
CN110555021B
CN110555021B CN201810252830.1A CN201810252830A CN110555021B CN 110555021 B CN110555021 B CN 110555021B CN 201810252830 A CN201810252830 A CN 201810252830A CN 110555021 B CN110555021 B CN 110555021B
Authority
CN
China
Prior art keywords
query
data
information
hbase database
personnel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810252830.1A
Other languages
Chinese (zh)
Other versions
CN110555021A (en
Inventor
蔡云鹏
张露
李奇
王莹莹
李烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201810252830.1A priority Critical patent/CN110555021B/en
Publication of CN110555021A publication Critical patent/CN110555021A/en
Application granted granted Critical
Publication of CN110555021B publication Critical patent/CN110555021B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data storage method, a query method and a related device, which are applied to a server, wherein the server comprises a pre-constructed snowflake model, and the data storage method comprises the following steps: and acquiring the pre-processed data information. And importing the data information into a plurality of tables of the snowflake model corresponding to the HBase database. And compressing the data information imported into the HBase database to save the storage space. The characteristic of transverse expansion of a data storage system of the HBase database facing array is utilized, and the data storage is carried out by matching with a snowflake model. On the one hand, the method can well adapt to the requirement of the rapid increase of data on the storage capacity. On the other hand, under the condition of complex relation of stored data service, the whole database scanning is avoided, and the searching efficiency of the stored data is improved. And the data information imported into the HBase database is compressed, so that the storage space is further saved, and mass data storage is better dealt with.

Description

Data storage method, query method and related device
Technical Field
The present application relates to the field of database technologies, and in particular, to a data storage method, a query method, and related devices.
Background
In recent years, the development of the Internet and information industry, big data is attracting more and more attention. With the rapid increase in the size and type of data, it is marked that the large data age has been entered gradually. Large data periods present challenges for the storage and querying of data.
The conventional Oracle and other relational databases cannot meet the requirements of concurrent access and data analysis and mining of massive data with complex business relations due to the problems of limited storage management capability, insufficient concurrency, insufficient expansibility, high cost and the like.
Disclosure of Invention
The embodiment of the application provides a data storage method, a query method and a related device, which are used for coping with the storage pressure of massive data and improving the query speed of data with complex business relations.
In order to achieve the above object, the technical scheme adopted by the embodiment of the application is as follows:
in a first aspect, an embodiment of the present application provides a data storage method, applied to a server, where the server includes a pre-constructed snowflake model, the data storage method includes: acquiring pre-processed data information; importing the data information into a plurality of tables of the snowflake model corresponding to an HBase database; and compressing the data information imported into the HBase database to realize data storage.
In a second aspect, an embodiment of the present application provides a data storage device, applied to a server, where the server includes a pre-constructed snowflake model, the data storage device includes: the acquisition module is used for acquiring the pre-processed data information; the storage module is used for importing the data information into a plurality of tables of the snowflake model corresponding to the HBase database; and the compression module is used for compressing the data information imported into the HBase database so as to realize data storage.
In a third aspect, the present application provides a data query method, applied to a server, where the server includes an HBase database, and data information and a corresponding index table written in the HBase database by using the foregoing data storage method, and the server is connected with a client in a communication manner, where the data query method includes: receiving a query request sent by a client; searching data information corresponding to the query request from the HBase database according to the query request and the index table; and feeding the searched data information back to the client.
In a fourth aspect, an embodiment of the present application provides a data query device, which is applied to a server, where the server includes an HBase database, and data information written in the HBase database by using the foregoing data storage method and a corresponding index table, and the server is connected with a client in a communication manner, where the data query device includes: the receiving module is used for receiving a query request sent by the client; the searching module is used for searching data information corresponding to the query request from the HBase database according to the query request and the index table; and the feedback module is used for feeding the searched data information back to the client.
Compared with the prior art, the data storage method provided by the embodiment of the application has the advantages that the preprocessed data information is imported into the HBase database by utilizing the snowflake model, namely, the characteristic of transverse expansion of a data storage system of which the HBase database faces a column is utilized, and the data storage is carried out by matching with the snowflake model. On the one hand, the method can well adapt to the requirement of the rapid increase of data on the storage capacity. On the other hand, under the condition of complex relation of stored data service, scanning of all data is avoided, and searching efficiency of stored data is improved. And the data information imported into the HBase database is compressed, so that the storage space is further saved, and mass data storage is better dealt with.
In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a block diagram of a server according to a preferred embodiment of the present application.
Fig. 2 shows a flowchart of a data storage method according to an embodiment of the present application.
Fig. 3 shows a schematic diagram of a snowflake model according to an embodiment of the present application.
Fig. 4 shows a flowchart of a data query method provided by an embodiment of the present application.
Fig. 5 is a schematic diagram of an index table, a lookup table, and a special lookup table according to an embodiment of the present application.
Fig. 6 is a schematic diagram of a data storage device according to an embodiment of the present application.
Fig. 7 is a schematic diagram of a data query device according to an embodiment of the present application.
Icon: 100-server; 111-memory; 112-a processor; 113-a communication unit; 200-a data storage device; 201-an acquisition module; 202-a memory module; 203-a compression module; 300-a data query device; 301-a receiving module; 302-a search module; 303-a feedback module.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
With the explosive growth of data and the increasing demands for data analysis and mining, conventional relational databases are challenged in terms of storage capacity, concurrency, cost, unstructured data storage, and the like. In order to meet the requirement of big data, various NoSQL high concurrency real-time databases are presented to solve the defects of the traditional relational databases, and the method has the advantages of high flexibility, high performance, simple model and the like. The HBase database is an open-source, distributed and column-oriented data storage system, has the characteristic of transverse expansion, can well adapt to the requirement of rapid data growth on storage capacity, has the characteristic of distributed parallel operation, has high I/O speed, and can meet the requirement of high-efficiency query and data analysis and mining on big data. The support of unstructured data and sparse matrix storage makes it well suited for storing and managing a wide variety of unstructured data.
Although in the related art, HBase-based databases typically use high/wide tables to store mass data. And then carrying out coding compression and design optimization on the row keys, and improving the query efficiency through methods such as pre-partitioning, secondary indexing and the like. However, the inventor discovers that the HBase database provided in the related art has less consideration on the query requirement of complex services, and also has limited support on complex condition queries. That is, the query efficiency can be further improved when the query of the data with complex business relations is dealt with.
Therefore, the embodiment of the application provides a data storage method, a query method and related devices to solve the above problems.
Referring to fig. 1, a block diagram of a server 100 is shown. The server 100 includes a data storage device 200, a data query device 300, a memory 111, a processor 112, and a communication unit 113.
The memory 111, the processor 112 and the communication unit 113 are electrically connected to each other directly or indirectly, so as to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The data storage means 200, 300 comprise at least one software function module which may be stored in the memory 111 in the form of software or Firmware (Firmware) or cured in an Operating System (OS) of the server 100. The processor 112 is configured to execute executable modules stored in the memory 111, such as software functional modules and computer programs included in the data storage device 200 and the data query device 300.
The Memory 111 may be, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc. Wherein the memory 111 is used for storing programs or data. The communication unit 113 is configured to establish a communication connection between the server 100 and other communication terminals through the network, and is configured to transmit and receive data through the network. The memory 111 includes DDR4RDIMM-16GB-2133000KHz-1.2V-ECC-2Rank (1G 4 bit) memory and 1200GB-SAS12Gb/s-10000rpm-2.5 inch-hot plug hard disk implementation. Further, the server 100 also includes SR320BC 1GB Cache, RAID cards supporting-RAID 0,1,5,6,10,50,60-supporting super capacitor +850mm MiniSAS module (8 disk specification).
It should be understood that the architecture shown in fig. 1 is merely a schematic diagram of the architecture of the server 100, and that the server 100 may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.
First embodiment
Referring to fig. 2, fig. 2 is a flowchart of a data storage method according to a preferred embodiment of the application. The data storage method comprises the following steps:
step S101, the data information processed in advance is acquired.
The pre-processed data information is obtained by: receiving the data information from different sources. And cleaning and integrating the received data information from different sources. And desensitizing the processed data information.
In an embodiment of the present application, the raw data may be received by using a secure cloud desktop system installed in the server 100. The raw data includes data information from different sources. And then cleaning and integrating the received data to ensure the correctness, the integrity, the consistency and the effectiveness of the preprocessed data information. The desensitizing of the processed data information may be performed on data information belonging to privacy data defined in advance among the data information. For example, if the identity information is predefined as private data, if the processed data information has the identity information, the processed data information is subjected to desensitization processing so as to ensure the security of the data.
Step S102, importing the data information into a plurality of tables of the snowflake model corresponding to the HBase database.
The snowflake model includes a plurality of tables such as fact tables and a plurality of dimension tables. A snowflake model is where there are one or more dimension tables that are not directly connected to a fact table, but are connected to a fact table through other dimension tables. The fact table and the plurality of dimension tables are data tables. Because row keys (Rowkey) of the data table in the HBase database are organized according to dictionary sequences, in the related art, when the start and stop row keys are not known, the expense of using scan in combination with various filters for query is very large when the data is queried for massive data, and the use of the filters should be avoided as much as possible. In this embodiment, by adopting the snowflake model, under the condition that full table scanning cannot be avoided, the row keys of each data table and the external keys associated with other data tables are reasonably set, so that the filter operation can be reduced, the query cost is saved, and the query efficiency is improved.
It should be noted that the fact table records all services that each person has time-sequentially occurred. The dimension tables comprise a business dimension table, a personnel dimension table and a mechanism dimension table. The service dimension table is used for recording detailed information of related services. The personnel dimension table and the mechanism dimension table are supplementary dimension tables and are used for recording basic information of individuals, detailed information of mechanisms and other information related to business. In complex queries, the date factor is particularly important, so that the date is contained in the row keys of each data table of the snowflake model, and the rest of information determines whether to add the row keys according to the importance of the date factor. All data tables include row keys and column families.
Further, as an embodiment, the design of each data table of the snowflake model may be as shown in fig. 3.
The row keys of the personnel dimension table are unique corresponding personnel codes generated randomly according to each personnel information. The random generation of personnel codes is beneficial to avoiding the problems of data hot spots and the like in the HBase database.
The row keys of the mechanism dimension table are unique corresponding mechanism codes generated according to the executing mechanism information.
The row keys of the service dimension table are composed of the personnel codes, the service occurrence time information and the sequence number codes, and the sequence number codes are the sequence numbers of the record data corresponding to the same service information. It should be noted that the same item of service information may include record data for multiple items of operations, that is, the record data all belong to the same service information, and the corresponding service occurrence time is also the same, and the sequence numbers are given to the record data through the occurrence sequence so as to distinguish each item of record data, so that the record data has a unique corresponding row key value for inquiry. For example, when the doctor's advice service information at the point A includes information such as the different treatment methods or medicines and the usage amount thereof, and the like, and the doctor's advice service information is written into the service dimension table, a row key value corresponding to the record data of one treatment method is A,9:00:00,1; the row key value corresponding to the medicine record data is A,9:00:00,2.
The row keys of the fact table are composed of personnel codes, service occurrence time and service codes. The column group of the fact table comprises the personnel code, the business occurrence time information corresponding to the personnel code and the corresponding maximum serial number. The maximum sequence number is the maximum value of all the sequence numbers given by all the recorded data corresponding to each service information, and when the doctor's advice service information from the point 9 is written into the fact table, the maximum sequence number is 2. And setting a maximum sequence number according to a column group in the fact table to obtain a row key range of corresponding service information in the service dimension table. Thus, the query efficiency is effectively quickened. When data inquiry is carried out, the data corresponding to the service can be inquired from the service dimension table only according to the personnel code, the service occurrence time and the service code to which the data belongs, or a termination line key can be obtained according to the service occurrence time information corresponding to the personnel code and the corresponding maximum serial number in the fact table, and the starting line key can be obtained according to the service occurrence time information corresponding to the personnel code and the serial number '1', so that the corresponding line key range in the service dimension table is obtained according to the starting line key and the termination line key.
The personnel codes in the service dimension table and the fact table are used as external keys associated with the personnel dimension table, the mechanism codes in the service dimension table are used as external keys associated with the mechanism dimension table, and the service occurrence time information and the corresponding maximum serial number corresponding to the personnel codes in the fact table are used as external keys associated with the service dimension table. Through the reasonable arrangement of the external keys and the row keys, the query efficiency can be effectively improved.
In the embodiment of the present application, step S102 is implemented by the following manner:
and acquiring personnel information, corresponding service information and service occurrence time information corresponding to each service information from the data information. And writing the personnel information into a column group of a personnel dimension table in the snowflake model, and coding a unique corresponding personnel randomly generated according to each personnel information as a row key value corresponding to each personnel information in the personnel dimension table.
And writing the personnel codes and the service information corresponding to the personnel codes into the HBase database through the service dimension table in the snowflake model. Optionally, corresponding actuator information and service detailed information are obtained from the service information. And writing the executing mechanism information into a mechanism dimension table of the snowflake model, wherein a row key of the mechanism dimension table is a unique corresponding mechanism code generated according to each executing mechanism information. And writing the personnel code, the organization code and the corresponding business detailed information into a column group of the business dimension table.
Writing the personnel code, the service occurrence time information corresponding to the personnel code and the corresponding maximum serial number into a column group of a fact table of the snowflake model, wherein the maximum serial number is the maximum value of the personnel code, the service code and the corresponding serial number code of the service occurrence time information in the service dimension table.
And step S103, compressing the data information imported into the HBase database. In a distributed file system, to ensure high availability of data and fault tolerance of the system, the same data block is often backed up on multiple nodes, and usually 3 copies are backed up by default. After the mass data is imported into the HBase database, the mass data is backed up, so that the occupied cluster space is considerable, and the mass data is required to be compressed. Saving of storage space is achieved.
Second embodiment
When the business relation between the stored data information is complex, the row keys are not too long in order to improve the storage efficiency and the memory utilization rate of hfile in the face of flexible and changeable query demands, so that the actual query demands cannot be met only by the row keys. In order to improve the query efficiency, as shown in fig. 4, the embodiment of the present application provides a data query method, which is applied to a server 100, where an index table, a query table and a special query table as shown in fig. 5 are further stored in the server 100. As shown in fig. 4, the data query method may include the steps of:
step S201, a query request sent by a client is received.
In this embodiment, the client is communicatively connected to the server 100. The above-described client may be installed in a Java client of an electronic device to obtain services provided by the user from the server 100. The electronic device may preferably be a mobile terminal device, a computer, and may include, for example, a smart phone, a tablet computer, a laptop computer, a desktop computer, and the like.
Step S202, according to the query request, the data information corresponding to the query request is searched from the HBase database in combination with the index table.
When data query is performed, the column value query of the HBase database needs to scan the whole table, so that the efficiency is low. Thus, the introduction of the index table helps to get frequent queries quickly and hopefully to get query results quickly. The index table includes row keys and column families. The row key of the index table comprises a column value and a row key value corresponding to the column value in the HBase database, and the column cluster of the index table is empty. The column value of the row key of the index table may be a query field that is often queried and a row key value corresponding to the query field in the HBase database.
In the embodiment of the application, the to-be-queried field included in the acquired query request can be queried in the index table according to the to-be-queried field, and when the corresponding row key value is queried in the row key in the index table, the corresponding row key value of the corresponding query field in the data table of the HBase database is acquired from the queried row key value in the index table so as to acquire the data information corresponding to the to-be-queried field in the query request.
Further, in order to cope with the query request of a plurality of fields to be queried, the query speed is further increased. The step of searching the data information corresponding to the query request from the HBase database according to the query request in combination with the index table may be the step of searching the data information corresponding to the query request from the HBase database according to the query request in combination with the index table and the query table.
The row key of the lookup table consists of at least two query fields and row key values corresponding to each query field in the HBase database, and the column family of the lookup table is empty. Specifically, when the query condition corresponding to the query request includes a plurality of fields to be queried, whether the plurality of fields to be queried corresponding to the query request exist is queried from the table names corresponding to the query table. When the data information exists, the corresponding row key value of each corresponding query field in the HBase database is acquired according to the row key values corresponding to the multiple fields to be queried of the query request in the row keys of the query table, so that the corresponding data information is acquired from the HBase database. And if the data information does not exist, sequentially inquiring the data information corresponding to each field to be inquired in the inquiry request by using an index table. For example, when the query field is from a different data table, the row key of the query table includes a row key of "query field a, query field b|data table row key a, and data table row key B", where the row key represents that the row key corresponding to the query field a in the data table of the HBase database is the data table row key a, and the row key corresponding to the query field B in the data table of the HBase database is the data table row key B. When the field to be queried in the query request comprises the combination of the query field A and the query field B, according to the field to be queried in the query request, the row key values of 'query field A, query field B|data table row key value a and data table row key value B' can be found in the query table, and then corresponding data information is respectively obtained from the HBase database according to the 'data table row key value a and the data table row key value B' in the row key value.
Further, when a query condition is included in the query request (e.g., query field a is included in the query request to be greater than query field B), a special lookup table may be utilized to help increase query speed. It should be noted that, when a query condition is often used by a user, a corresponding relationship between a query result (i.e., data information searched according to the query condition) corresponding to the query condition and a corresponding row key value of the query result in the HBase database is used as a row key value of a special query table, so as to facilitate searching. That is, the row key of the special lookup table is composed of the query result and the row key value corresponding to the query result in the HBase database, and the column group of the special lookup table is empty. The step of searching the data information corresponding to the query request from the HBase database according to the query request in combination with the index table may be the step of searching the data information corresponding to the query request from the HBase database according to the query request in combination with the index table and the special query table. Alternatively, it may be that whether there is a query result corresponding to the query condition of the query request is queried from the table name of the special query table. When the data exists, the corresponding row key value of the query result in the HBase database is acquired according to the row key value of the query result in the row key of the special query table, so that the corresponding data information is acquired from the HBase database. I/O overhead is reduced, and query performance is improved.
The index table, the lookup table and the special lookup table can be set in advance according to the requirements of users, and the index table, the lookup table and the special lookup table are suitable for the change of the requirements of the users flexibly. The data query method may further include: and counting the received query requests, and obtaining the number of the query requests with the same query conditions. If the number of the query requests with the same query conditions exceeds a preset threshold, creating a new corresponding index table or a query table or a special query table according to the query conditions corresponding to the query requests and the row key values obtained from the HBase database according to the query requests.
And step S203, feeding the searched data information back to the client.
In this embodiment, the searched data information is sent to the Java client, and the Java client displays the data information to the user.
The present application has been experimentally examined. The same batch of data is stored by using a snowflake model and a high table (74 columns) respectively, and when all data of one person is queried, the snowflake model and the high table have similar speeds of 2.46s and 2.53s respectively. When the line value is inquired, the snowflake model is obviously higher than the high-table efficiency, and the speeds of the snowflake model and the high-table efficiency are 16s and 26.68s respectively. After the index table is used, the column value query time of the snowflake model is reduced from 16s to 2.55s. When two fields in different services are combined and inquired, the inquiry time of the snowflake model before and after using the inquiry table is 13.2s and 0.26s respectively. When comparing and inquiring two field column values in different services, the inquiring speed of the snowflake model before and after using the special inquiring table is 177.6s and 18.56s respectively, and 1000 records are inquired and displayed.
Third embodiment
Fig. 6 shows a data storage device 200 corresponding to the above-described data storage method, and the details of the following device may be implemented by referring to the data storage method provided in the first embodiment, where the data storage device 200 includes:
an acquisition module 201, configured to acquire data information that is processed in advance.
And the storage module 202 is used for importing the data information into a plurality of tables of the snowflake model corresponding to the HBase database.
And the compression module 203 is configured to compress the data information imported into the HBase database, so as to implement data storage.
Fig. 7 shows a data query device 300 corresponding to the data query method described above, and the details of the following device may be implemented by referring to the data query method provided in the second embodiment, where the data query device 300 includes:
the receiving module 301 is configured to receive a query request sent by a client.
And the searching module 302 is configured to search, according to the query request, the data information corresponding to the query request from the HBase database in combination with the index table.
And the feedback module 303 is configured to feed back the found data information to the client.
In summary, the data storage method, the query method and the related devices provided by the present application are applied to a server, where the server includes a snowflake model constructed in advance, and the data storage method includes: and acquiring the pre-processed data information. And importing the data information into a plurality of tables of the snowflake model corresponding to the HBase database. And compressing the data information imported into the HBase database to realize data storage. The characteristic of transverse expansion of a data storage system of the HBase database facing array is utilized, and the data storage is carried out by matching with a snowflake model. On the one hand, the method can well adapt to the requirement of the rapid increase of data on the storage capacity. On the other hand, under the condition of complex relation of stored data service, the whole database scanning is avoided, and the searching efficiency of the stored data is improved. And the data information imported into the HBase database is compressed, so that the storage space is further saved, and mass data storage is better dealt with.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (9)

1. A data storage method applied to a server, wherein the server comprises a snowflake model constructed in advance, the data storage method comprising:
acquiring pre-processed data information;
importing the data information into a plurality of tables of the snowflake model corresponding to an HBase database;
compressing the data information imported into the HBase database to realize data storage;
wherein: the step of importing the data information into a plurality of tables of the snowflake model corresponding to the HBase database comprises the following steps:
acquiring personnel information, corresponding service information and service occurrence time information corresponding to each service information from the data information;
writing the personnel information into a column group of a personnel dimension table in the snowflake model, wherein row keys of the personnel dimension table are unique corresponding personnel codes randomly generated according to each personnel information;
writing the personnel code and the service information corresponding to the personnel code into the HBase database through a service dimension table in the snowflake model, wherein a row key of the service dimension table in the snowflake model consists of the personnel code, the service occurrence time information and a sequence number code, and the sequence number code is a sequence number of record data corresponding to the same service information;
writing the personnel code, the service occurrence time information corresponding to the personnel code and the corresponding maximum serial number into a column group of a fact table of the snowflake model, wherein the maximum serial number is the maximum value of the personnel code and the corresponding serial number code of the service occurrence time information in the service dimension table.
2. The data storage method of claim 1, wherein the step of writing the person code and the business information corresponding to the person code into the HBase database through the business dimension table in the snowflake model comprises:
corresponding actuator information and service detailed information are acquired from the service information;
writing the executing mechanism information into a mechanism dimension table of the snowflake model, wherein a row key of the mechanism dimension table is a unique corresponding mechanism code generated according to each executing mechanism information;
and writing the personnel code, the organization code and the corresponding business detailed information into a column group of the business dimension table.
3. The data storage method of claim 2, wherein the pre-processed data information is obtained by:
receiving the data information from different sources;
cleaning and integrating the received data information from different sources;
and desensitizing the processed data information belonging to the predefined privacy data in the data information.
4. A data storage device for use with a server, said server including a pre-constructed snowflake model, said data storage device comprising:
the acquisition module is used for acquiring the pre-processed data information;
the storage module is used for importing the data information into a plurality of tables of the snowflake model corresponding to the HBase database;
the compression module is used for compressing the data information imported into the HBase database so as to realize data storage;
the step of importing the data information into a plurality of tables of the snowflake model corresponding to the HBase database comprises the following steps:
acquiring personnel information, corresponding service information and service occurrence time information corresponding to each service information from the data information;
writing the personnel information into a column group of a personnel dimension table in the snowflake model, wherein row keys of the personnel dimension table are unique corresponding personnel codes randomly generated according to each personnel information;
writing the personnel code and the service information corresponding to the personnel code into the HBase database through a service dimension table in the snowflake model, wherein a row key of the service dimension table in the snowflake model consists of the personnel code, the service occurrence time information and a sequence number code, and the sequence number code is a sequence number of record data corresponding to the same service information;
writing the personnel code, the service occurrence time information corresponding to the personnel code and the corresponding maximum serial number into a column group of a fact table of the snowflake model, wherein the maximum serial number is the maximum value of the personnel code and the corresponding serial number code of the service occurrence time information in the service dimension table.
5. A data query method applied to a server, wherein the server comprises an HBase database, and data information and a corresponding index table written into the HBase database by using the data storage method according to any one of claims 1 to 3, and the server is in communication connection with a client, and the data query method comprises:
receiving a query request sent by a client;
searching data information corresponding to the query request from the HBase database according to the query request and the index table;
and feeding the searched data information back to the client.
6. The data query method of claim 5, wherein the server further comprises a lookup table, a row key of the lookup table is composed of at least two query fields and row key values corresponding to each of the query fields in the HBase database, a column group of the lookup table is empty, and when the query condition corresponding to the query request includes a plurality of fields to be queried, the step of searching the HBase database for data information corresponding to the query request according to the query request in combination with the index table comprises:
inquiring whether a plurality of fields to be inquired corresponding to the inquiry request exist in table names corresponding to the inquiry table;
when the data information exists, acquiring corresponding row key values of each corresponding query field in the HBase database according to row key values corresponding to a plurality of fields to be queried of the query request in row keys of the query table, so as to acquire corresponding data information from the HBase database;
and if the data information does not exist, sequentially inquiring the data information corresponding to each field to be inquired in the inquiry request by using an index table.
7. The data query method of claim 6, wherein the server further comprises a special lookup table, a row key of the special lookup table is composed of a query result and a row key value corresponding to the query result in the HBase database, a column group of the special lookup table is empty, and the step of searching data information corresponding to the query request from the HBase database in combination with the index table according to the query request comprises:
inquiring whether a query field corresponding to the query request exists in the table name of the special query table;
when the data exists, the corresponding row key value of the query result in the HBase database is acquired according to the row key value of the query result in the row key of the special query table, so that the corresponding data information is acquired from the HBase database.
8. The data query method of claim 7, wherein the data query method further comprises:
counting the received query requests, and obtaining the number of the query requests with the same query conditions;
and if the number of the query requests with the same query conditions exceeds a preset threshold, creating a corresponding index table, a query table or a special query table according to the query conditions corresponding to the query requests and the row key values obtained from the HBase database according to the query requests.
9. A data query device applied to a server, wherein the server comprises an HBase database, data information written into the HBase database by using the data storage method according to any one of claims 1 to 3 and a corresponding index table, and the server is in communication connection with a client, and the data query device comprises:
the receiving module is used for receiving a query request sent by the client;
the searching module is used for searching data information corresponding to the query request from the HBase database according to the query request and the index table;
and the feedback module is used for feeding the searched data information back to the client.
CN201810252830.1A 2018-03-26 2018-03-26 Data storage method, query method and related device Active CN110555021B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810252830.1A CN110555021B (en) 2018-03-26 2018-03-26 Data storage method, query method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810252830.1A CN110555021B (en) 2018-03-26 2018-03-26 Data storage method, query method and related device

Publications (2)

Publication Number Publication Date
CN110555021A CN110555021A (en) 2019-12-10
CN110555021B true CN110555021B (en) 2023-09-19

Family

ID=68733660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810252830.1A Active CN110555021B (en) 2018-03-26 2018-03-26 Data storage method, query method and related device

Country Status (1)

Country Link
CN (1) CN110555021B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111078679B (en) * 2019-12-23 2023-06-16 用友网络科技股份有限公司 Method and device for generating data report and computer readable storage medium
CN113031878B (en) * 2021-05-20 2021-08-06 睿至科技集团有限公司 HBase-based data storage optimization method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104102710A (en) * 2014-07-15 2014-10-15 浪潮(北京)电子信息产业有限公司 Massive data query method
CN104219088A (en) * 2014-08-21 2014-12-17 南京邮电大学 Hive-based network alarm information OLAP method
CN105677826A (en) * 2016-01-04 2016-06-15 博康智能网络科技股份有限公司 Resource management method for massive unstructured data
CN106326361A (en) * 2016-08-10 2017-01-11 中国农业银行股份有限公司 HBase database-based data inquiry method and device
CN106528674A (en) * 2016-10-31 2017-03-22 厦门服云信息科技有限公司 Method and device for high-performance query based on Hbase row keys
CN107301206A (en) * 2017-06-01 2017-10-27 华南理工大学 A kind of distributed olap analysis method and system based on pre-computation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10353923B2 (en) * 2014-04-24 2019-07-16 Ebay Inc. Hadoop OLAP engine

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104102710A (en) * 2014-07-15 2014-10-15 浪潮(北京)电子信息产业有限公司 Massive data query method
CN104219088A (en) * 2014-08-21 2014-12-17 南京邮电大学 Hive-based network alarm information OLAP method
CN105677826A (en) * 2016-01-04 2016-06-15 博康智能网络科技股份有限公司 Resource management method for massive unstructured data
CN106326361A (en) * 2016-08-10 2017-01-11 中国农业银行股份有限公司 HBase database-based data inquiry method and device
CN106528674A (en) * 2016-10-31 2017-03-22 厦门服云信息科技有限公司 Method and device for high-performance query based on Hbase row keys
CN107301206A (en) * 2017-06-01 2017-10-27 华南理工大学 A kind of distributed olap analysis method and system based on pre-computation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"OLAP 聚集计算中的维存储技术;宋爱波 等;《东南大学学报》;第第42卷卷(第第5期期);第797-802页 *

Also Published As

Publication number Publication date
CN110555021A (en) 2019-12-10

Similar Documents

Publication Publication Date Title
US7552130B2 (en) Optimal data storage and access for clustered data in a relational database
EP2729884B1 (en) Managing storage of data for range-based searching
US7885932B2 (en) Managing storage of individually accessible data units
EP2281242B1 (en) Managing storage of individually accessible data units
EP2080121B1 (en) Managing storage of individually accessible data units
US11520772B2 (en) Tracking change data in a database
EP3550441B1 (en) Managing storage of individually accessible data units
CN102722584B (en) Data storage system and method
CN106933859B (en) Medical data migration method and device
US11422721B2 (en) Data storage scheme switching in a distributed data storage system
CN109684270B (en) Database archiving method, device, system, equipment and readable storage medium
CN110555021B (en) Data storage method, query method and related device
Lomet Digital B-trees
CN112131214A (en) Method, system, equipment and storage medium for data writing and data query
CN103778064A (en) Data management method and device
Emery et al. Full Bitcoin blockchain data made easy
WO2006105206A2 (en) On-line organization of data sets
CN117331919B (en) Database joint query method and device, electronic equipment and storage medium
CN117520331A (en) Database storage space cleaning method, system and storage medium
AU2015258326B2 (en) Managing storage of individually accessible data units
CN118260330A (en) Service data storage method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant