CN110263061A - A kind of data query method and system - Google Patents

A kind of data query method and system Download PDF

Info

Publication number
CN110263061A
CN110263061A CN201910521866.XA CN201910521866A CN110263061A CN 110263061 A CN110263061 A CN 110263061A CN 201910521866 A CN201910521866 A CN 201910521866A CN 110263061 A CN110263061 A CN 110263061A
Authority
CN
China
Prior art keywords
data
file
partitioned
business datum
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910521866.XA
Other languages
Chinese (zh)
Inventor
王李平
李涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Apas Technology Co Ltd
Original Assignee
Zhengzhou Apas Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Apas Technology Co Ltd filed Critical Zhengzhou Apas Technology Co Ltd
Priority to CN201910521866.XA priority Critical patent/CN110263061A/en
Publication of CN110263061A publication Critical patent/CN110263061A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The embodiment of the invention provides a kind of data query method and system, by storing business datum into local disk, and corresponding index information is cached in local memory, wherein, first passing through data partitioned server obtains multiple partitioned files for big data multidomain treat-ment in advance, each partitioned file is distributed into corresponding distributed server and carries out data storage, and it is directed to the partitioned file respectively stored by distributed server and generates corresponding index information, it is subsequent after receiving data inquiry request, the index information being directly based upon in local memory, quickly position storage location of the data to be checked in local disk, so both without occupying excessive local memory space, reduce data buffer storage cost, corresponding data can be quickly read from local disk again, improve the response efficiency of data inquiry request, it is copied in combination with zero Shellfish technology sends data to network card interface, reduces data copy number, further improves the efficiency data query of client.

Description

A kind of data query method and system
Technical field
The present invention relates to data query technique field more particularly to a kind of data query method and system.
Background technique
Currently, generalling use Data cache technology in the prior art in order to improve the search efficiency of data to improve data Search efficiency;
Existing data buffer storage mode includes: memory type caching, magnetic disc type caching and memory disk mixed type caching;It is adopting During carrying out data query with above-mentioned data buffer storage mode, in the lesser situation of data volume, using above-mentioned data buffer storage The search efficiency of data can be improved in any one mode in mode, and efficiency during data query is also very high; However, when being directed to big data field, it usually needs when the larger even up to PB rank of the data of caching, if still using above-mentioned Many disadvantages will be present come the method for improving the search efficiency of data in Data cache technology;
When improving the efficiency of transmission of data using memory type caching technology, when buffer data size is very big, a large amount of numbers According to more memories will be occupied, however, occupying more memories along with mass data, it will lead to costly machine Resources costs;It is cached using magnetic disc type, when a large amount of data are written in disk, will seriously affect the search efficiency of data; When improving the efficiency of transmission of data using memory disk mixed type technology, when the data inquired without in memory but When in disk, the efficiency for inquiring data also will receive serious influence.
It follows that during existing data query, if using memory cache data, will be present data storage at This high problem, and if storing data will be present and read slow problem using disk buffering data, therefore, it is impossible to simultaneous simultaneously Care for data buffer storage cost and data reading performance using redundancy.
Summary of the invention
The purpose of the embodiment of the present application is to provide a kind of data query method and system, both without occupying in excessive local Space is deposited, reduces data buffer storage cost, and can quickly read corresponding data from local disk, data query is improved and asks The response efficiency asked sends data to network card interface in combination with zero duplication technology, reduces data copy number, further Improve the efficiency data query of client.
In order to solve the above technical problems, the embodiment of the present application is achieved in that
The embodiment of the present application provides a kind of data query method, comprising: receives client for target service data Inquiry request, wherein the inquiry request carries the Data Identification of the target service data, and the inquiry request is described What the target device mark that client is returned according to data partitioned server was sent, the target device mark is the data point Distribution where the target service data that area's server is chosen in multiple distributed servers according to the Data Identification The identification information of formula server;
In the index file information stored in local memory, according to the Data Identification, the target service number is inquired According to storage location information, wherein the index file information include: business datum Data Identification and storage location information it Between corresponding relationship;
Based on the corresponding storage location information of the target service data, the target industry is read from local disk Business data, wherein the pre-assigned number comprising multiple business datums of data partitioned server is stored in the local disk According to partitioned file;
The target service data read are transmitted to default network card interface using zero duplication technology, by described The target service data are sent to the client by network card interface.
The embodiment of the present application provides a kind of data query system, comprising: client and multiple distributed servers;
The client, for sending the inquiry request for being directed to target service data, and reception to distributed server The target service data that the distributed server returns, wherein the inquiry request is the client according to data What the target device mark that partitioned server returns was sent, the target device mark is the data partitioned server according to institute State the mark of the distributed server where the target service data that Data Identification is chosen in multiple distributed servers Information;
The distributed server, for receiving the inquiry request;And the index file letter stored in local memory In breath, according to the Data Identification, the storage location information of the target service data is inquired, wherein the index file letter Breath includes: the corresponding relationship between the Data Identification of business datum and storage location information;And
Based on the corresponding storage location information of the target service data, the target industry is read from local disk Business data, wherein the pre-assigned number comprising multiple business datums of data partitioned server is stored in the local disk According to partitioned file;And
The target service data read are transmitted to default network card interface using zero duplication technology, by described The target service data are sent to the client by network card interface.The embodiment of the present application provides a kind of data query and sets It is standby, comprising: processor;And
It is arranged to the memory of storage computer executable instructions, the computer executable instructions make when executed The processor realizes following below scheme:
Receive the inquiry request that client is directed to target service data, wherein the inquiry request carries the target The Data Identification of business datum, the inquiry request are the target device marks that the client is returned according to data partitioned server Know and send, target device mark be the data partitioned server according to the Data Identification in multiple Distributed Services The identification information of distributed server where the target service data chosen in device;
In the index file information stored in local memory, according to the Data Identification, the target service number is inquired According to storage location information, wherein the index file information include: business datum Data Identification and storage location information it Between corresponding relationship;
Based on the corresponding storage location information of the target service data, the target industry is read from local disk Business data, wherein the pre-assigned number comprising multiple business datums of data partitioned server is stored in the local disk According to partitioned file;
The target service data read are transmitted to default network card interface using zero duplication technology, by described The target service data are sent to the client by network card interface.
The embodiment of the present application provides a kind of storage medium, and for storing computer executable instructions, the computer can It executes instruction and realizes following below scheme when executed:
Receive the inquiry request that client is directed to target service data, wherein the inquiry request carries the target The Data Identification of business datum, the inquiry request are the target device marks that the client is returned according to data partitioned server Know and send, target device mark be the data partitioned server according to the Data Identification in multiple Distributed Services The identification information of distributed server where the target service data chosen in device;
In the index file information stored in local memory, according to the Data Identification, the target service number is inquired According to storage location information, wherein the index file information include: business datum Data Identification and storage location information it Between corresponding relationship;
Based on the corresponding storage location information of the target service data, the target industry is read from local disk Business data, wherein the pre-assigned number comprising multiple business datums of data partitioned server is stored in the local disk According to partitioned file;
The target service data read are transmitted to default network card interface using zero duplication technology, by described The target service data are sent to the client by network card interface.
Data query method and system in the embodiment of the present application, the inquiry for receiving client for target service data are asked It asks;In the index file information stored in local memory, according to Data Identification, the storage location letter of target service data is inquired Breath;Based on the corresponding storage location information of target service data, target service data are read from local disk;Utilize zero-copy The target service data read are transmitted to default network card interface by technology, to be sent target service data by network card interface To client.By the way that business datum storage is cached in local memory into local disk, and by corresponding index information, In, data partitioned server is first passed through in advance by big data multidomain treat-ment and obtains multiple partitioned files, and each partitioned file is distributed to Corresponding distributed server carries out data storage, and is directed to the partitioned file respectively stored by distributed server and generates accordingly Index information, subsequent after receiving data inquiry request, the index information being directly based upon in local memory, quickly positioning to Storage location of the data in local disk is inquired, it is slow both to reduce data without occupying excessive local memory space in this way It is saved as this, and can quickly read corresponding data from local disk, the response efficiency of data inquiry request is improved, ties simultaneously It closes zero duplication technology and sends data to network card interface, reduce data copy number, further improve the data of client Search efficiency.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in application, for those of ordinary skill in the art, in the premise of not making the creative labor property Under, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the application scenarios schematic diagram of data query provided by the embodiments of the present application processing;
Fig. 2 is the first flow diagram of data query method provided by the embodiments of the present application;
Fig. 3 is second of flow diagram of data query method provided by the embodiments of the present application;
Fig. 4 is the realization principle figure that data partitioned file is generated in data query method provided by the embodiments of the present application;
Fig. 5 is the realization principle figure that index file information is generated in data query method provided by the embodiments of the present application;
Fig. 6 is the specific implementation process schematic of data query method provided by the embodiments of the present application;
Fig. 7 is the first module composition schematic diagram of data query system provided by the embodiments of the present application;
Fig. 8 is second of module composition schematic diagram of data query system provided by the embodiments of the present application;
Fig. 9 is the structural schematic diagram of data query equipment provided by the embodiments of the present application.
Specific embodiment
In order to make those skilled in the art better understand the technical solutions in the application, below in conjunction with the application reality The attached drawing in example is applied, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described implementation Example is merely a part but not all of the embodiments of the present application.Based on the embodiment in the application, this field is common The application protection all should belong in technical staff's every other embodiment obtained without creative efforts Range.
The embodiment of the present application provides a kind of data query method and system, by storing business datum to local disk In, and corresponding index information is cached in local memory, wherein data partitioned server is first passed through by big data subregion in advance Processing obtains multiple partitioned files, and each partitioned file is distributed to corresponding distributed server and carries out data storage, and by dividing Cloth server is directed to the partitioned file respectively stored and generates corresponding index information, subsequent to receive data inquiry request Afterwards, the index information being directly based upon in local memory quickly positions storage location of the data to be checked in local disk, in this way Both without occupying excessive local memory space, data buffer storage cost is reduced, and phase can be quickly read from local disk The data answered, improve the response efficiency of data inquiry request, send data to network card interface in combination with zero duplication technology, subtract Lack data copy number, further improves the efficiency data query of client.
Fig. 1 is the application scenarios schematic diagram for the data query system that this specification one or more embodiment provides, such as Fig. 1 Shown, which includes: data partitioned server, multiple distributed servers in distributed data-storage system, Yi Jike Family end, wherein the client can be the mobile terminals such as smart phone, tablet computer;The data partitioned server is used for big Data carry out multidomain treat-ment and obtain multiple partitioned files, and each partitioned file is distributed to corresponding distributed server and is counted According to storage;Wherein, which can be millions data, and this specification one or more embodiment is mainly for mass data It is handled;Distributed server is multiple, and the distributed server can be partitioned server transmission for receiving data Data partitioned file, which is stored in local disk, and generates for the partitioned file of storage corresponding Index information server, wherein the detailed process of data query method are as follows:
(1) partitioning algorithm that data partitioned server is provided using off-line calculation technology and division module, will be to be processed Big data carries out multidomain treat-ment and obtains multiple data partitioned files, wherein and each data partitioned file carries file identification information, And each file identification information both corresponds to the device identification of a distributed server;Data partitioned server takes according to subregion The data partitioned file for carrying file identification information, is sent to corresponding distributed server by the device identification of business device In;
(2) distributed server receives the data partitioned file that data partitioned server is sent, and the data subregion is literary Part is stored into local disk;According to storage position of each business datum included in the data partitioned file in local disk Confidence breath, generates the index file information of data partitioned file, index file information is stored into local memory;
(3) client sends the inquiry request for carrying the Data Identification of target service data to data partitioned server, After the data partitioned server receives inquiry request, according to the Data Identification for the target service data that client is sent, by this The device identification of distributed server where target service data returns to client;Client receives the device identification Afterwards, data inquiry request is sent to partitioned server corresponding to the device identification;
(4) after distributed server receives client for the inquiry request of target service data, according to the target industry The identification information for data of being engaged in is searched in the index file information for being stored in local memory and is believed with the mark of the target service data The target service data read in local disk are transmitted to by the corresponding storage location information of manner of breathing using zero duplication technology Default network card interface, target service data are sent to client by network card interface.
In the embodiment of the present application, by the way that business datum storage is cached into local disk, and by corresponding index information Into local memory, wherein first passing through data partitioned server obtains multiple partitioned files for big data multidomain treat-ment in advance, will be each Partitioned file distributes to corresponding distributed server and carries out data storage, and point respectively stored is directed to by distributed server Area file generates corresponding index information, subsequent after receiving data inquiry request, the index being directly based upon in local memory Information quickly positions storage location of the data to be checked in local disk, both empty without occupying excessive local memory in this way Between, data buffer storage cost is reduced, and can quickly read corresponding data from local disk, improves data inquiry request Response efficiency sends data to network card interface in combination with zero duplication technology, reduces data copy number, further increases The efficiency data query of client.
Fig. 2 is the first flow diagram for the data query method that one embodiment of the application provides, the method energy in Fig. 2 Enough distributed servers by Fig. 1 execute, as shown in Fig. 2, this method at least includes the following steps:
S201, distributed server receive the inquiry request that client is directed to target service data, wherein inquiry request is taken Data Identification with target service data, inquiry request are the target device marks that client is returned according to data partitioned server Know transmission, target device mark is the mesh that data partitioned server is chosen in multiple distributed servers according to Data Identification Mark the identification information of the distributed server where business datum;
Wherein, distributed server can be more, wherein the appearance of institute's storage service data in every distributed server Amount can reach TB rank, constitute clusters using more distributed servers, can very easily allow and data cached reach PB Rank, by the way that business datum to be stored in the disk of distributed server, wherein above-mentioned that business datum is stored in distribution Process in the disk of server is that the business datum is copied to distribution by operating system free file copy function In the disk of server, rather than the method for being buffered in the distributed server memory, it is slow that data are reduced to a certain extent The cost deposited;
Specifically, client is asked to the inquiry that data partitioned server sends the Data Identification for carrying target service data It asks, it, will according to the Data Identification for the target service data that client is sent after which receives inquiry request The device identification of distributed server where the target service data returns to client;Client receives the device identification Afterwards, data inquiry request is sent to partitioned server corresponding to the device identification;Wherein, the mark letter of the target service data It ceases corresponding with the distributed server for carrying target device where the target service data;When client finds the mesh The target distribution formula server where business datum is marked, the inquiry for issuing target service data to the target distribution formula server is asked It asks;The target distribution formula server, when being directed to the inquiry request of target service data based on the client received, finally at this The target service data to be checked are back to client in the disk of target distribution formula server;
S202 in the index file information stored in local memory, according to Data Identification, inquires target service data Storage location information, wherein index file information includes: corresponding between the Data Identification of business datum and storage location information Relationship;
Specifically, the distributed server receive client transmitted by for target service data inquiry request it Before, the data partitioned file transmitted by data partitioned server comprising the target service data is received, and will be received Data partitioned file is stored in local disk;Being stored in local disk using index module reading includes target service number According to data partitioned file, according to data volume corresponding to the Data Identification of target service data local disk storage location Information generates index file information, and by the index file information cache in the local memory of the distributed server;
Wherein, Data Identification can be to be handled using the off-line technologies such as spark or MapReduce or novel Flink Data Identification Key corresponding to the target service data arrived;The distributed server is directed to mesh what reception client was sent When marking the inquiry request of business datum, according to Data Identification Key entrained in the inquiry request, in the distribution server In the index file information stored in local memory, storage position of the data volume of the target service data in local disk is inquired Confidence breath;
S203 is based on the corresponding storage location information of target service data, and target service data are read from local disk, Wherein, the pre-assigned data partitioned file comprising multiple business datums of data partitioned server is stored in local disk;
Specifically, the distributed server receive client transmitted by for target service data inquiry request it Before, the data partitioned file transmitted by data partitioned server comprising the target service data is received, and will be received Data partitioned file is stored in local disk;Being stored in local disk using index module reading includes target service number According to data partitioned file, according to data volume corresponding to the Data Identification of target service data local disk storage location Information generates index file information, and by the index file information cache in the local memory of the distributed server;
Data inquiry module in the distributed server is asked receiving inquiry of the client for target service data After asking, according to Data Identification, in the index file information being stored in local memory, the storage position of target service data is inquired Confidence breath, is based on target service data storage location information in local disk, and target service number is read from local disk According to;
The target service data read are transmitted to default network card interface using zero duplication technology, to pass through net by S204 Target service data are sent to client by card interface;
Specifically, being read in the inquiry request according to client for target service data by data inquiry module After storage location information of the target service data in local disk, the target service data read are passed through into zero-copy skill Art is transmitted to default network card interface from the storage location in local disk, to be returned to target service data by network card interface To client;The target service data read are transmitted to default network card interface, Jin Ertong by using zero duplication technology The method that target service data are back to client by network card interface is crossed, effectively prevents data when being back to client, is needed First to be copied to kernel state and arrive User space again, then again from User space to kernel state after be copied to the process of network interface, reduce It allows CPU to do the task of mass data copy, and then effectively increases the efficiency of transmission of data.
In the embodiment of the present application, the inquiry request that client is directed to target service data is received;It is stored in local memory Index file information in, according to Data Identification, inquire the storage location information of target service data;Based on target service data Corresponding storage location information reads target service data from local disk;The target that will be read using zero duplication technology Business data transmission is to default network card interface, target service data are sent to client by network card interface.By by industry Business data storage is cached in local memory into local disk, and by corresponding index information, wherein first passes through data point in advance Big data multidomain treat-ment is obtained multiple partitioned files by area's server, and each partitioned file is distributed to corresponding distributed server Carry out data storage, and the partitioned file that respectively stores be directed to by distributed server and generates corresponding index information, it is subsequent After receiving data inquiry request, the index information being directly based upon in local memory quickly positions data to be checked in this earth magnetism Storage location in disk both reduces data buffer storage cost, and can be from this without occupying excessive local memory space in this way Corresponding data are quickly read in local disk, improve the response efficiency of data inquiry request, will be counted in combination with zero duplication technology According to network card interface is transmitted to, reduce data copy number, further improves the efficiency data query of client.
Wherein, data partitioned server needs that big data to be processed is carried out multidomain treat-ment in advance, wherein the big data It can be millions data, this specification one or more embodiment is handled mainly for mass data, specifically, as schemed Shown in 3, before S201 distributed server receives inquiry request of the client for target service data, further includes:
S101 receives the data partitioned file that data partitioned server is sent, wherein data partitioned file is data subregion Server carries out multidomain treat-ment to big data and obtains and setting according to the file identification of data partitioned file and distributed server What standby mark was sent;
Specifically, needing to connect before distributed server receives inquiry request of the client for target service data Receive the data partitioned file that data partitioned server is sent;Data partitioned server first to big data using spark or The off-line technologies such as MapReduce or novel Flink handle to obtain key corresponding to each business datum that the big data is included Value pair, wherein the key-value pair includes Data Identification Key and data volume;Then, the Data Identification Key for calculating each business datum exists [0,1048576) cryptographic Hash in range, data text is generated according to the Data Identification, data volume and cryptographic Hash of each business datum Part carries out multidomain treat-ment to data file generated, obtains multiple numbers using the data division module of data partitioned server According to partitioned file, wherein each data partitioned file has its corresponding file identification information, data partitioned server according to Obtained data partitioned file is sent in the distributed server with target device mark by file identification information;
The data partitioned file received is stored in local disk, wherein data partitioned file includes: more by S102 A business datum;
Specifically, the distributed server with target device mark receives data transmitted by data partitioned server point Area file, and the data partitioned file received is stored into local disk, wherein data partitioned server is by copying The copy for carrying data partitioned file instruction is sent to the distributed server by the mode of shellfish, and distributed server receives should Copy instruction, and data partitioned file entrained in copy instruction is stored in local disk;
S103 generates the index of data partitioned file according to storage location information of each business datum in local disk The file information;
Specifically, utilizing index after the data partitioned file received is stored in local disk by distributed server Building module reads the data partitioned file being stored in local disk, according to the number of each business datum in the data partitioned file According to initial position of the body in local disk, end position, the index file information of the data partitioned file is generated, wherein should Index file information includes: the Data Identification of each business datum, is stored in the initial position in local disk, end position;
Wherein, the process of index file information and the process of data query of above-mentioned generation data partitioned file are asynchronous It carries out, is mutually independent of each other;The index file information architecture of all business datums included in the data partitioned file is completed Later, old index file information can be replaced with new index file information;The process, which can guarantee, is not influencing data query effect Under the premise of rate, data query is not in intermediate state, guarantees that inquired data are not in part legacy data or portion Divide new data or partial data also in the situation in importing;
S104 stores index file information into local memory, specifically, being connect if distributed server identifies The data partitioned file received is new file, then needs to carry out the data partitioned file whole scannings, obtain the data subregion Initial position of the data volume of each business datum included in file in local disk, end position, then, according to each Data Identification, initial position, the end position of business datum generate the index file information of the data file, due to will index The file information is stored in local memory, and data reading speed is very fast, therefore by the index file information preservation of generation at this In ground memory;In addition, being re-started when in order to avoid restarting to the data volume for the business datum being stored in local disk Index construct and waste the unnecessary time, for the data partitioned file using index construct module generate index file believe After breath, it is also necessary to by the index file information preservation in local disk;When identify the data partitioned file be indexed File further improves rope in such a way that the index file being stored in local disk is loaded directly into local memory Draw the building speed of the file information;
Further, above-mentioned data partitioned file obtains in the following way:
Step 1, data partitioned server carry out default processing to multiple business datums to subregion, obtain respectively to subregion The corresponding key-value pair of business datum, wherein key-value pair includes: Data Identification and data volume;
Specifically, carrying out pretreated process to multiple business datums to subregion includes: using spark, MapReduce Or the off-line technologies such as novel Flink handle to obtain key-value pair corresponding to pending data, wherein the key-value pair includes data mark Know Key and data volume;
Step 2 breathes out the Data Identification of the business datum for waiting for subregion for each business datum to subregion Uncommon processing, obtains the cryptographic Hash of the business datum for waiting for subregion;
Specifically, for each Data Identification to the business datum of subregion, to the data of the business datum for waiting for subregion Mark carries out Hash processing, calculate the Data Identification of each business datum [0,1048576) cryptographic Hash in range, wherein meter It calculates this and waits for that the purpose of section service data cryptographic Hash is to limit this to wait for the district location that section service data are fallen in, value model Enclose not too many restrict;
Step 3 carries out subregion to multiple business datums to subregion according to the cryptographic Hash respectively to the business datum of subregion Processing, obtains multiple data partitioned files;
Specifically, being generated according to the Data Identification Key, data volume Value and cryptographic Hash of the business datum for waiting for subregion To the data file of section service data, cryptographic Hash and default number of partitions to the pending data do modular arithmetic obtain it is remaining The identical pending data of remainder is divided into same subregion by number, data division module, obtains the data subregion of the default number of partitions File, or consistency hash algorithm can also be used, obtain the data partitioned file of the default number of partitions;
Wherein, the process of data is inquired in order to avoid influencing distributed server, and avoids reading data partitioned file Required data format conversion operation, in step 3, according to the cryptographic Hash respectively to the business datum of subregion, to multiple to subregion Business datum carry out multidomain treat-ment, after obtaining multiple data partitioned files, further includes:
Data partitioned server is sent using the file copy function that operating system provides to corresponding distributed server Copy instruction, wherein copy instruction carries data partitioned file;
Specifically, the file copy function that data partitioned server utilizes operating system to provide, to corresponding distributed clothes Business device sends copy instruction, wherein copy instruction can be rsync, cp, scp etc., carry data subregion in copy instruction File;Since the copy procedure does not have buffer service participation, the speed of the copy data partitioned file can be adjusted arbitrarily, be compared In by the way of caching, which is stored in this earth magnetism of the distributed server by way of copy Disk, can guarantee influence data inquiry request, it is easily controllable into local disk import data partitioned file copying speed, Data Format Transform required for reading data partitioned file is skipped over, data partitioned file is further improved and imports local disk Efficiency;
Wherein, above-mentioned steps three, according to the cryptographic Hash respectively to the business datum of subregion, to multiple business datums to subregion Multidomain treat-ment is carried out, multiple data partitioned files are obtained, comprising:
It will respectively be divided by the cryptographic Hash of the business datum of subregion with default number of partitions, it is corresponding remaining to obtain business datum Number, wherein default number of partitions is equal with the quantity of distributed server;
At least one identical business datum of remainder is divided into a data partitioned file, wherein data subregion text The file identification of part with it includes the corresponding remainder of business datum correspond, the equipment mark of each remainder and distributed server Knowing has default corresponding relationship.
Fig. 4 is the realization principle figure that data partitioned file is generated in data query method provided by the embodiments of the present application, such as Shown in Fig. 4, it is assumed that the number of the business datum to subregion is seven, and presetting the number of partitions is three;Using spark, MapReduce or new The off-line technologies such as the Flink of type handle to obtain key-value pair corresponding to pending data, wherein the key-value pair includes Data Identification Key and data volume Value;Hash processing is carried out to the pending data Data Identification Key, the business datum of subregion is waited for according to this Data Identification Key, data volume Value and cryptographic Hash generate data file to section service data;
By calculate the Data Identification Key of the pending data [0,1048576) cryptographic Hash in range, obtain Key0 It is the corresponding hash value of 1, Key2 be the corresponding hash value of 2, Key3 is 3 that corresponding hash value, which is the corresponding hash value of 0, Key1, It is the corresponding hash value of 5, Key6 is 6 that the corresponding hash value of Key4, which is the corresponding hash value of 4, Key5,;It is to be processed to above-mentioned seven The cryptographic Hash of data divided by the default number of partitions 3, obtains remainder corresponding to above-mentioned seven pending datas, data subregion mould respectively The identical pending data of remainder is divided into same subregion by block, generates three data subregions that file identification is respectively a, b, c File;Wherein, the file identification of obtained data partitioned file with it includes the corresponding remainder of business datum correspond, Device identification P0, p1, P2 of each remainder or data partitioned file mark and distributed server have default corresponding relationship.
Wherein, business datum includes: Data Identification and data volume, and data partitioned file includes the corresponding data of business datum Body;Above-mentioned S103 generates the index text of data partitioned file according to storage location information of each business datum in local disk Part information, comprising:
Step 1 determines storage of the data volume in local disk for each data volume in data partitioned file Corresponding relationship between location information Data Identification corresponding with the data volume;
Step 2 generates the index file information of data partitioned file according to the corresponding corresponding relationship of each business datum.
Specifically, after distributed server receives the data partitioned file that data partitioned server is sent, by the data Partitioned file is stored in local disk, by index construct module in distributed server or other can be used for reading and deposit Store up the module of the data body position in disk, each business datum being stored in the data partitioned file in local disk to this Corresponding data volume is read out, and determines initial position, the end position of storage of the business datum in local disk, with And the corresponding relationship between Data Identification corresponding to the data volume;According to each business included in the data partitioned file The Data Identification of data and the corresponding location information being stored in local disk, generate the index of the data partitioned file The file information;
Fig. 5 is the realization principle figure that index file information is generated in data query method provided by the embodiments of the present application, such as Include the identification information of tri- business datums of K0, K1, K2 shown in Fig. 5, in data partitioned file a and its corresponding is stored in Location information in local disk, initial position of the data volume corresponding with Data Identification K0 in local disk are 0, stop bits It is set to 2;Initial position of the data volume corresponding with Data Identification K1 in local disk is 2, end position 4;With data mark Knowing initial position of the corresponding data volume of K2 in local disk is 4, end position 6;According to institute in data partitioned file a The Data Identification for each business datum for including and the corresponding location information being stored in local disk generate the data point The index file information of area file a;Similarly, comprising there are two the Data Identification of business datum and its institutes in data partitioned file b The corresponding location information being stored in local disk, comprising there are two the Data Identifications of business datum in data partitioned file c And its corresponding location information being stored in local disk, it is wrapped according in data partitioned file b, data partitioned file c The Data Identification of each business datum contained and the corresponding location information being stored in local disk generate data subregion text The index file information of part b, data partitioned file c;
Distributed server generates data subregion text in the storage location information according to each business datum in local disk After the index file information of part, which receives the inquiry for target service data that client is sent and asks It asks, specifically, Fig. 6 is the specific implementation process schematic of data query method provided by the embodiments of the present application, as shown in fig. 6, Assuming that client to device identification be P0 distributed server send target service data to be checked be K3, the distribution After server receives the data inquiry request of client, according to mesh entrained in the data inquiry request of client transmission The Data Identification K3 of mark business datum inquires Data Identification K3 in the index file information being stored in local memory Initial position of the corresponding data volume in local disk is 2, and end position 4 reads the target being stored in above-mentioned position The data volume of business datum, and transmitted target service data corresponding to the Data Identification K3 read using zero duplication technology To default network card interface, target service data are sent to the client by network card interface.
Wherein, for the data partitioned file for having constructed index file, when utilizing index construct to the data partitioned file Module has generated index file information, saves it in after the distributed server, and the distributed server exists in order to prevent It needs to rebuild index when restarting, the index file file of the data partitioned file is stored in local disk;
Further, if index file information is also stored in local disk;
Above-mentioned S202 in the index file information stored in local memory, according to Data Identification, inquires target service number According to storage location information before, further includes:
Step 1 judges in local memory with the presence or absence of index file information;
If it does not exist, two are thened follow the steps, the index file information stored in local disk is loaded into local memory.
Specifically, when distributed server receive client transmission for target service data inquiry request when, Alternatively, judging with the presence or absence of index file information in local memory, if it does not exist, then when distributed server restarting The index file information stored in local disk is loaded into local memory, the index text that will be stored in local disk is passed through The method that part information is loaded into local memory has further speeded up the speed of index construct;
Data query method in the embodiment of the present application receives the inquiry request that client is directed to target service data;? In the index file information stored in local memory, according to Data Identification, the storage location information of target service data is inquired;Base In the corresponding storage location information of target service data, target service data are read from local disk;Utilize zero duplication technology The target service data read are transmitted to default network card interface, target service data are sent to visitor by network card interface Family end.By the way that business datum storage is cached in local memory into local disk, and by corresponding index information, wherein First passing through data partitioned server obtains multiple partitioned files for big data multidomain treat-ment in advance, and each partitioned file is distributed to correspondence Distributed server carry out data storage, and the partitioned file that respectively stores is directed to by distributed server and generates corresponding rope Fuse breath, subsequent after receiving data inquiry request, the index information being directly based upon in local memory quickly positions to be checked Storage location of the data in local disk, so both without occupying excessive local memory space, reduce data buffer storage at This, and corresponding data can be quickly read from local disk, the response efficiency of data inquiry request is improved, in combination with zero Duplication technology sends data to network card interface, reduces data copy number, further improves the data query of client Efficiency.
The data query method that corresponding above-mentioned Fig. 1 to Fig. 6 is described, based on the same technical idea, the embodiment of the present application is also A kind of data query system is provided, Fig. 7 is that the first structure composition of data query system provided by the embodiments of the present application is shown It is intended to, the system is for executing the data query method that Fig. 1 to Fig. 6 is described, as shown in fig. 7, the system includes: 20 He of client Multiple distributed servers 30;
The client 20, for the inquiry request to the transmission of distributed server 30 for target service data, and Receive the target service data that the distributed server 30 returns, wherein the inquiry request is the client 20 What the target device mark returned according to data partitioned server was sent, the target device mark is the data differentiated services Distributed clothes where the target service data that device is chosen in multiple distributed servers 30 according to the Data Identification The identification information of business device 30;
The distributed server 30, for receiving the inquiry request;And the index file stored in local memory In information, according to the Data Identification, the storage location information of the target service data is inquired, wherein the index file Information includes: the corresponding relationship between the Data Identification of business datum and storage location information;And
Based on the corresponding storage location information of the target service data, the target industry is read from local disk Business data, wherein the pre-assigned number comprising multiple business datums of data partitioned server is stored in the local disk According to partitioned file;And
The target service data read are transmitted to default network card interface using zero duplication technology, by described The target service data are sent to the client 20 by network card interface.
Data query system in the embodiment of the present application by storing business datum into local disk, and will correspond to Index information be cached in local memory, wherein in advance first pass through data partitioned server big data multidomain treat-ment is obtained it is more Each partitioned file is distributed to corresponding distributed server and carries out data storage by a partitioned file, and by distributed server Corresponding index information is generated for the partitioned file respectively stored, it is subsequent after receiving data inquiry request, it is directly based upon Index information in local memory quickly positions storage location of the data to be checked in local disk, so both without occupying Excessive local memory space reduces data buffer storage cost, and can quickly read corresponding data from local disk, mentions The response efficiency of high data inquiry request sends data to network card interface in combination with zero duplication technology, reduces data and copies Shellfish number further improves the efficiency data query of client.
Optionally, the distributed server 30, is also used to:
Receive the data partitioned file that data partitioned server is sent, wherein the data partitioned file is the data Partitioned server carries out multidomain treat-ment to big data and obtains and according to the file identification and distributed server of data partitioned file What 30 device identification was sent;
The data partitioned file received is stored in local disk, wherein the data partitioned file includes: Multiple business datums;
According to storage location information of each business datum in the local disk, the data partitioned file is generated Index file information;
The index file information is stored into local memory.
Optionally, as shown in figure 8, the system also includes data partitioned servers 40, wherein the data subregions clothes Business device 40, is used for:
Default processing is carried out to multiple business datums to subregion, obtains each corresponding key of business datum to subregion Value pair, wherein the key-value pair includes: Data Identification and data volume;
For each business datum to subregion, the Data Identification of the business datum for waiting for subregion is breathed out Uncommon processing, obtains the cryptographic Hash of the business datum for waiting for subregion;
According to the cryptographic Hash of each business datum to subregion, the multiple business datum to subregion is carried out Multidomain treat-ment obtains multiple data partitioned files.
Optionally, the data partitioned server 40, is also used to:
The file copy function of being provided using operating system sends copy instruction to corresponding distributed server, wherein The copy instruction carries the data partitioned file.
Optionally, the data partitioned server 40, is specifically used for:
The cryptographic Hash of each business datum to subregion is divided by with default number of partitions, obtains the business number According to corresponding remainder, wherein the default number of partitions is equal with the quantity of distributed server 30;
At least one identical described business datum of remainder is divided into a data partitioned file, wherein the number According to partitioned file file identification with it includes the corresponding remainder of business datum correspond, each remainder and distributed take The device identification of business device 30 has default corresponding relationship.
Optionally, the business datum includes: Data Identification and data volume, and the data partitioned file includes business datum The corresponding data volume;
The distributed server 30, also particularly useful for:
For each data volume in the data partitioned file, storage of the data volume in the local disk is determined Corresponding relationship between the location information Data Identification corresponding with the data volume;
According to the corresponding corresponding relationship of each business datum, the index file letter of the data partitioned file is generated Breath.
Optionally, if the index file information is also stored in local disk;
The distributed server 30, is also used to:
Judge in local memory with the presence or absence of index file information;
If it does not exist, then the index file information stored in the local disk is loaded into the local memory In.
Data query system in the embodiment of the present application by storing business datum into local disk, and will correspond to Index information be cached in local memory, wherein in advance first pass through data partitioned server big data multidomain treat-ment is obtained it is more Each partitioned file is distributed to corresponding distributed server and carries out data storage by a partitioned file, and by distributed server Corresponding index information is generated for the partitioned file respectively stored, it is subsequent after receiving data inquiry request, it is directly based upon Index information in local memory quickly positions storage location of the data to be checked in local disk, so both without occupying Excessive local memory space reduces data buffer storage cost, and can quickly read corresponding data from local disk, mentions The response efficiency of high data inquiry request sends data to network card interface in combination with zero duplication technology, reduces data and copies Shellfish number further improves the efficiency data query of client 20.
It should be noted that data query system provided by the embodiments of the present application is looked into data provided by the embodiments of the present application Based on the same inventive concept, therefore the specific implementation of the embodiment may refer to the implementation of aforementioned data querying method to inquiry method, Overlaps will not be repeated.
Further, corresponding above-mentioned Fig. 1 is to method shown in fig. 6, and based on the same technical idea, the embodiment of the present application is also A kind of data query equipment is provided, for executing above-mentioned data query method, Fig. 9 is provided the equipment for the embodiment of the present application Data query equipment structural schematic diagram.
As shown in figure 9, data query equipment can generate bigger difference because configuration or performance are different, it may include one A or more than one processor 901 and memory 902 can store one or more storages in memory 902 and answered With program or data.Wherein, memory 902 can be of short duration storage or persistent storage.It is stored in the application program of memory 902 It may include one or more modules (diagram is not shown), each module may include to the system in data query equipment Column count machine executable instruction.Further, processor 901 can be set to communicate with memory 902, set in data query Series of computation machine executable instruction in standby upper execution memory 902.Data query equipment can also include one or one The above power supply 903, one or more wired or wireless network interfaces 904, one or more input/output interfaces 905, one or more keyboards 906 etc..
In a specific embodiment, data query equipment includes memory and one or more journey Sequence, perhaps more than one program is stored in memory and one or more than one program may include one for one of them Or more than one module, and each module may include to the series of computation machine executable instruction in data query equipment, and Be configured to be executed this by one or more than one processor or more than one program include by carry out it is following based on Calculation machine executable instruction:
Distributed server receives the inquiry request that client is directed to target service data, wherein the inquiry request is taken Data Identification with the target service data, the inquiry request are that the client is returned according to data partitioned server Target device mark send, target device mark be the data partitioned server according to the Data Identification more The identification information of distributed server where the target service data chosen in a distributed server;
In the index file information stored in local memory, according to the Data Identification, the target service number is inquired According to storage location information, wherein the index file information include: business datum Data Identification and storage location information it Between corresponding relationship;
Based on the corresponding storage location information of the target service data, the target industry is read from local disk Business data, wherein the pre-assigned number comprising multiple business datums of data partitioned server is stored in the local disk According to partitioned file;
The target service data read are transmitted to default network card interface using zero duplication technology, by described The target service data are sent to the client by network card interface.
Optionally, computer executable instructions also include for carrying out following computer executable instructions when executed: Before receiving inquiry request of the client for target service data, further includes:
Receive the data partitioned file that the data partitioned server is sent, wherein the data partitioned file is described Data partitioned server carries out multidomain treat-ment to big data and obtains and according to the file identification of data partitioned file and distributed clothes What the device identification of business device was sent;
The data partitioned file received is stored in local disk, wherein the data partitioned file includes: Multiple business datums;
According to storage location information of each business datum in the local disk, the data partitioned file is generated Index file information;
The index file information is stored into local memory.
Optionally, computer executable instructions also include for carrying out following computer executable instructions when executed: The data partitioned file obtains in the following way:
The data partitioned server carries out default processing to multiple business datums to subregion, obtains each described to subregion The corresponding key-value pair of business datum, wherein the key-value pair includes: Data Identification and data volume;
For each business datum to subregion, the Data Identification of the business datum for waiting for subregion is breathed out Uncommon processing, obtains the cryptographic Hash of the business datum for waiting for subregion;
According to the cryptographic Hash of each business datum to subregion, the multiple business datum to subregion is carried out Multidomain treat-ment obtains multiple data partitioned files.
Optionally, computer executable instructions also include for carrying out following computer executable instructions when executed: In the cryptographic Hash according to each business datum to subregion, the multiple business datum to subregion is carried out at subregion Reason, after obtaining multiple data partitioned files, further includes:
The data partitioned server utilizes the file copy function of operating system offer to corresponding distributed server Send copy instruction, wherein the copy instruction carries the data partitioned file.
Optionally, computer executable instructions also include for carrying out following computer executable instructions when executed: The cryptographic Hash according to each business datum to subregion carries out subregion to the multiple business datum to subregion Processing, obtains multiple data partitioned files, comprising:
The cryptographic Hash of each business datum to subregion is divided by with default number of partitions, obtains the business number According to corresponding remainder, wherein the default number of partitions is equal with the quantity of distributed server;
At least one identical described business datum of remainder is divided into a data partitioned file, wherein the number According to partitioned file file identification with it includes the corresponding remainder of business datum correspond, each remainder and distributed take The device identification of business device has default corresponding relationship.
Optionally, computer executable instructions also include for carrying out following computer executable instructions when executed: The business datum includes: Data Identification and data volume, and the data partitioned file includes the corresponding data of business datum Body;
The storage location information according to each business datum in the local disk, generates the data subregion The index file information of file, comprising:
For each data volume in the data partitioned file, storage of the data volume in the local disk is determined Corresponding relationship between the location information Data Identification corresponding with the data volume;
According to the corresponding corresponding relationship of each business datum, the index file letter of the data partitioned file is generated Breath.
Optionally, computer executable instructions also include for carrying out following computer executable instructions when executed: If the index file information is also stored in local disk;
In the index file information stored in local memory, according to the Data Identification, the target service number is inquired According to storage location information before, further includes:
Judge in local memory with the presence or absence of index file information;
If it does not exist, then the index file information stored in the local disk is loaded into the local memory In.
Data query equipment in the embodiment of the present application, distributed server receive client for target service data Inquiry request, wherein inquiry request carries the Data Identification of target service data, and inquiry request is client according to data point Area's server return target device mark send, target device mark be data partitioned server according to Data Identification more The identification information of distributed server where the target service data chosen in a distributed server;It is deposited in local memory In the index file information of storage, according to Data Identification, the storage location information of target service data is inquired, wherein index file Information includes: the corresponding relationship between the Data Identification of business datum and storage location information;It is corresponding based on target service data Storage location information, from local disk read target service data, wherein data differentiated services are stored in local disk The pre-assigned data partitioned file comprising multiple business datums of device;The target service number that will be read using zero duplication technology According to default network card interface is transmitted to, target service data are sent to client by network card interface.
As it can be seen that by the data query equipment in the embodiment of the present application, by storing business datum into local disk, And corresponding index information is cached in local memory, wherein first passing through data partitioned server in advance will be at big data subregion Reason obtains multiple partitioned files, and each partitioned file is distributed to corresponding distributed server and carries out data storage, and by being distributed Formula server is directed to the partitioned file respectively stored and generates corresponding index information, subsequent after receiving data inquiry request, The index information being directly based upon in local memory quickly positions storage location of the data to be checked in local disk, so both Without occupying excessive local memory space, data buffer storage cost is reduced, and can quickly read from local disk corresponding Data, improve the response efficiency of data inquiry request, send data to network card interface in combination with zero duplication technology, reduce Data copy number, further improves the efficiency data query of client.
Preferably, the embodiment of the present application also provides a kind of data query equipment, including processor 901, and memory 902 is deposited The computer program that can be run on memory 902 and on processor 901 is stored up, which is executed by processor 901 Each process of the above-mentioned data query embodiment of the method for Shi Shixian, and identical technical effect can be reached, to avoid repeating, here It repeats no more.
Further, corresponding above-mentioned Fig. 1 is to method shown in fig. 6, and based on the same technical idea, the embodiment of the present application is also A kind of computer readable storage medium is provided, is stored with computer program on computer readable storage medium, the computer journey Each process of above-mentioned data query embodiment of the method is realized when sequence is executed by processor, and can reach identical technical effect, To avoid repeating, which is not described herein again.Wherein, the computer readable storage medium, such as read-only memory (Read-Only Memory, abbreviation ROM), random access memory (Random Access Memory, abbreviation RAM), magnetic or disk etc..
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal Replacement, improvement etc., should be included within the scope of the claims of this application.

Claims (14)

1. a kind of data query method, which is characterized in that the described method includes:
Distributed server receives the inquiry request that client is directed to target service data, wherein the inquiry request carries The Data Identification of the target service data, the inquiry request are the mesh that the client is returned according to data partitioned server Marking device mark is sent, target device mark be the data partitioned server according to the Data Identification at multiple points The identification information of distributed server where the target service data chosen in cloth server;
In the index file information stored in local memory, according to the Data Identification, the target service data are inquired Storage location information, wherein the index file information includes: between the Data Identification of business datum and storage location information Corresponding relationship;
Based on the corresponding storage location information of the target service data, the target service number is read from local disk According to, wherein the pre-assigned data comprising multiple business datums of data partitioned server point are stored in the local disk Area file;
The target service data read are transmitted to default network card interface using zero duplication technology, to pass through the network interface card The target service data are sent to the client by interface.
2. the method according to claim 1, wherein being asked receiving inquiry of the client for target service data Before asking, further includes:
Receive the data partitioned file that the data partitioned server is sent, wherein the data partitioned file is the data Partitioned server carries out multidomain treat-ment to big data and obtains and according to the file identification and distributed server of data partitioned file Device identification send;
The data partitioned file received is stored in local disk, wherein the data partitioned file includes: multiple Business datum;
According to storage location information of each business datum in the local disk, the rope of the data partitioned file is generated Draw the file information;
The index file information is stored into local memory.
3. according to the method described in claim 2, it is characterized in that, the data partitioned file is to obtain in the following way :
The data partitioned server carries out default processing to multiple business datums to subregion, obtains each industry to subregion The corresponding key-value pair of data of being engaged in, wherein the key-value pair includes: Data Identification and data volume;
For each business datum to subregion, the Data Identification of the business datum for waiting for subregion is carried out at Hash Reason, obtains the cryptographic Hash of the business datum for waiting for subregion;
According to the cryptographic Hash of each business datum to subregion, subregion is carried out to the multiple business datum to subregion Processing, obtains multiple data partitioned files.
4. according to the method described in claim 3, it is characterized in that, in the Kazakhstan according to each business datum to subregion Uncommon value carries out multidomain treat-ment to the multiple business datum to subregion, after obtaining multiple data partitioned files, further includes:
The data partitioned server is sent using the file copy function that operating system provides to corresponding distributed server Copy instruction, wherein the copy instruction carries the data partitioned file.
5. according to the method described in claim 3, it is characterized in that, described according to each business datum to subregion Cryptographic Hash carries out multidomain treat-ment to the multiple business datum to subregion, obtains multiple data partitioned files, comprising:
The cryptographic Hash of each business datum to subregion is divided by with default number of partitions, obtains the business datum pair The remainder answered, wherein the default number of partitions is equal with the quantity of distributed server;
At least one identical described business datum of remainder is divided into a data partitioned file, wherein the data point The file identification of area file with it includes the corresponding remainder of business datum correspond, each remainder and distributed server Device identification there is default corresponding relationship.
6. according to the method described in claim 2, it is characterized in that, the business datum includes: Data Identification and data volume, institute Stating data partitioned file includes the corresponding data volume of business datum;
The storage location information according to each business datum in the local disk, generates the data partitioned file Index file information, comprising:
For each data volume in the data partitioned file, storage location of the data volume in the local disk is determined Corresponding relationship between the information Data Identification corresponding with the data volume;
According to the corresponding corresponding relationship of each business datum, the index file information of the data partitioned file is generated.
7. according to the method described in claim 2, it is characterized in that, if the index file information is also stored in local disk In;
In the index file information stored in local memory, according to the Data Identification, the target service data are inquired Before storage location information, further includes:
Judge in local memory with the presence or absence of index file information;
If it does not exist, then the index file information stored in the local disk is loaded into the local memory.
8. a kind of data query system, which is characterized in that the system comprises: client and multiple distributed servers;
The client, for the inquiry request to distributed server transmission for target service data, and described in reception The target service data that distributed server returns, wherein the inquiry request is the client according to data subregion What the target device mark that server returns was sent, the target device mark is the data partitioned server according to the number The identification information of distributed server where the target service data chosen in multiple distributed servers according to mark;
The distributed server, for receiving the inquiry request;And in the index file information stored in local memory, According to the Data Identification, the storage location information of the target service data is inquired, wherein the index file packet It includes: the corresponding relationship between the Data Identification and storage location information of business datum;And
Based on the corresponding storage location information of the target service data, the target service number is read from local disk According to, wherein the pre-assigned data comprising multiple business datums of data partitioned server point are stored in the local disk Area file;And
The target service data read are transmitted to default network card interface using zero duplication technology, to pass through the network interface card The target service data are sent to the client by interface.
9. system according to claim 8, which is characterized in that the distributed server is also used to:
Receive the data partitioned file that data partitioned server is sent, wherein the data partitioned file is the data subregion Server carries out multidomain treat-ment to big data and obtains and setting according to the file identification of data partitioned file and distributed server What standby mark was sent;
The data partitioned file received is stored in local disk, wherein the data partitioned file includes: multiple Business datum;
According to storage location information of each business datum in the local disk, the rope of the data partitioned file is generated Draw the file information;
The index file information is stored into local memory.
10. system according to claim 9, which is characterized in that the system also includes: data partitioned server, wherein The data partitioned server, is used for:
Default processing is carried out to multiple business datums to subregion, obtains each corresponding key assignments of business datum to subregion It is right, wherein the key-value pair includes: Data Identification and data volume;
For each business datum to subregion, the Data Identification of the business datum for waiting for subregion is carried out at Hash Reason, obtains the cryptographic Hash of the business datum for waiting for subregion;
According to the cryptographic Hash of each business datum to subregion, subregion is carried out to the multiple business datum to subregion Processing, obtains multiple data partitioned files.
11. system according to claim 10, which is characterized in that the data partitioned server is also used to:
The file copy function of being provided using operating system sends copy instruction to corresponding distributed server, wherein described Copy instruction carries the data partitioned file.
12. system according to claim 10, which is characterized in that the data partitioned server is specifically used for:
The cryptographic Hash of each business datum to subregion is divided by with default number of partitions, obtains the business datum pair The remainder answered, wherein the default number of partitions is equal with the quantity of distributed server;
At least one identical described business datum of remainder is divided into a data partitioned file, wherein the data point The file identification of area file with it includes the corresponding remainder of business datum correspond, each remainder and distributed server Device identification there is default corresponding relationship.
13. system according to claim 9, which is characterized in that the business datum includes: Data Identification and data volume, The data partitioned file includes the corresponding data volume of business datum;
The distributed server, also particularly useful for:
For each data volume in the data partitioned file, storage location of the data volume in the local disk is determined Corresponding relationship between the information Data Identification corresponding with the data volume;
According to the corresponding corresponding relationship of each business datum, the index file information of the data partitioned file is generated.
14. system according to claim 9, which is characterized in that if the index file information is also stored in local disk In;The distributed server, is also used to:
Judge in local memory with the presence or absence of index file information;
If it does not exist, then the index file information stored in the local disk is loaded into the local memory.
CN201910521866.XA 2019-06-17 2019-06-17 A kind of data query method and system Pending CN110263061A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910521866.XA CN110263061A (en) 2019-06-17 2019-06-17 A kind of data query method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910521866.XA CN110263061A (en) 2019-06-17 2019-06-17 A kind of data query method and system

Publications (1)

Publication Number Publication Date
CN110263061A true CN110263061A (en) 2019-09-20

Family

ID=67918675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910521866.XA Pending CN110263061A (en) 2019-06-17 2019-06-17 A kind of data query method and system

Country Status (1)

Country Link
CN (1) CN110263061A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728317A (en) * 2019-09-30 2020-01-24 腾讯科技(深圳)有限公司 Training method and system of decision tree model, storage medium and prediction method
CN111008200A (en) * 2019-12-18 2020-04-14 北京数衍科技有限公司 Data query method and device and server
CN111107019A (en) * 2019-12-29 2020-05-05 浪潮电子信息产业股份有限公司 Data transmission method, device, equipment and computer readable storage medium
CN112162859A (en) * 2020-09-24 2021-01-01 成都长城开发科技有限公司 Data processing method and device, computer readable medium and electronic equipment
CN112181900A (en) * 2020-09-04 2021-01-05 中国银联股份有限公司 Data processing method and device in server cluster
CN112199442A (en) * 2020-09-29 2021-01-08 中国平安人寿保险股份有限公司 Distributed batch file downloading method and device, computer equipment and storage medium
CN112632129A (en) * 2020-12-31 2021-04-09 联想未来通信科技(重庆)有限公司 Code stream data management method, device and storage medium
CN112711580A (en) * 2020-12-30 2021-04-27 陈静 Big data mining method for cloud computing service and cloud computing financial server
CN113297266A (en) * 2020-07-08 2021-08-24 阿里巴巴集团控股有限公司 Data processing method, device, equipment and computer storage medium
CN113568870A (en) * 2020-04-28 2021-10-29 西安理邦科学仪器有限公司 Storage method, server and monitoring system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1547714A (en) * 2001-08-03 2004-11-17 易斯龙系统公司 Systems and methods providing metadata for tracking of information on a distributed file system of storage devices
CN102968498A (en) * 2012-12-05 2013-03-13 华为技术有限公司 Method and device for processing data
CN103761275A (en) * 2014-01-09 2014-04-30 浪潮电子信息产业股份有限公司 Management method for metadata in distributed file system
CN104346458A (en) * 2014-10-31 2015-02-11 易准科技发展(上海)有限公司 Data storage method and device
US20160070754A1 (en) * 2014-09-10 2016-03-10 Umm Al-Qura University System and method for microblogs data management
CN108255958A (en) * 2017-12-21 2018-07-06 百度在线网络技术(北京)有限公司 Data query method, apparatus and storage medium
CN109189995A (en) * 2018-07-16 2019-01-11 哈尔滨理工大学 Data disappear superfluous method in cloud storage based on MPI
CN109299157A (en) * 2018-08-27 2019-02-01 杭州安恒信息技术股份有限公司 A kind of data export method and device of distributed big single table

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1547714A (en) * 2001-08-03 2004-11-17 易斯龙系统公司 Systems and methods providing metadata for tracking of information on a distributed file system of storage devices
CN102968498A (en) * 2012-12-05 2013-03-13 华为技术有限公司 Method and device for processing data
CN103761275A (en) * 2014-01-09 2014-04-30 浪潮电子信息产业股份有限公司 Management method for metadata in distributed file system
US20160070754A1 (en) * 2014-09-10 2016-03-10 Umm Al-Qura University System and method for microblogs data management
CN104346458A (en) * 2014-10-31 2015-02-11 易准科技发展(上海)有限公司 Data storage method and device
CN108255958A (en) * 2017-12-21 2018-07-06 百度在线网络技术(北京)有限公司 Data query method, apparatus and storage medium
CN109189995A (en) * 2018-07-16 2019-01-11 哈尔滨理工大学 Data disappear superfluous method in cloud storage based on MPI
CN109299157A (en) * 2018-08-27 2019-02-01 杭州安恒信息技术股份有限公司 A kind of data export method and device of distributed big single table

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728317A (en) * 2019-09-30 2020-01-24 腾讯科技(深圳)有限公司 Training method and system of decision tree model, storage medium and prediction method
CN111008200A (en) * 2019-12-18 2020-04-14 北京数衍科技有限公司 Data query method and device and server
CN111008200B (en) * 2019-12-18 2024-01-16 北京数衍科技有限公司 Data query method, device and server
CN111107019A (en) * 2019-12-29 2020-05-05 浪潮电子信息产业股份有限公司 Data transmission method, device, equipment and computer readable storage medium
CN113568870A (en) * 2020-04-28 2021-10-29 西安理邦科学仪器有限公司 Storage method, server and monitoring system
CN113297266A (en) * 2020-07-08 2021-08-24 阿里巴巴集团控股有限公司 Data processing method, device, equipment and computer storage medium
CN112181900A (en) * 2020-09-04 2021-01-05 中国银联股份有限公司 Data processing method and device in server cluster
CN112162859A (en) * 2020-09-24 2021-01-01 成都长城开发科技有限公司 Data processing method and device, computer readable medium and electronic equipment
CN112199442B (en) * 2020-09-29 2023-07-21 中国平安人寿保险股份有限公司 Method, device, computer equipment and storage medium for distributed batch downloading files
CN112199442A (en) * 2020-09-29 2021-01-08 中国平安人寿保险股份有限公司 Distributed batch file downloading method and device, computer equipment and storage medium
CN112711580A (en) * 2020-12-30 2021-04-27 陈静 Big data mining method for cloud computing service and cloud computing financial server
CN112632129A (en) * 2020-12-31 2021-04-09 联想未来通信科技(重庆)有限公司 Code stream data management method, device and storage medium
CN112632129B (en) * 2020-12-31 2023-11-21 联想未来通信科技(重庆)有限公司 Code stream data management method, device and storage medium

Similar Documents

Publication Publication Date Title
CN110263061A (en) A kind of data query method and system
CN105324770B (en) Effectively read copy
US8229899B2 (en) Remote access agent for caching in a SAN file system
EP4202694A1 (en) Node memory-based data processing method and apparatus, device, and medium
CN110191428B (en) Data distribution method based on intelligent cloud platform
CN104754001A (en) Cloud storage system and data storage method
CN108810041A (en) A kind of data write-in of distributed cache system and expansion method, device
CN103853714B (en) A kind of data processing method and device
CN101350030A (en) Method and apparatus for caching data
CN110334297A (en) Loading method, terminal, server and the storage medium of terminal page
CN103678523A (en) Distributed cache data access method and device
CN103312624A (en) Message queue service system and method
CN103442090A (en) Cloud computing system for data scatter storage
CN109684273A (en) A kind of snapshot management method, apparatus, equipment and readable storage medium storing program for executing
CN113259478B (en) Method and device for executing transaction in blockchain system and blockchain system
CN112162846A (en) Transaction processing method, device and computer readable storage medium
CN105320676A (en) Customer data query service method and device
WO2023231339A1 (en) Transaction execution method and node in blockchain system, and blockchain system
CN109597903A (en) Image file processing apparatus and method, document storage system and storage medium
CN114003562B (en) Directory traversal method, device and equipment and readable storage medium
US10146833B1 (en) Write-back techniques at datastore accelerators
CN109241021A (en) A kind of file polling method, apparatus, equipment and computer readable storage medium
CN110457307A (en) Metadata management system, user's cluster creation method, device, equipment and medium
CN107908713A (en) A kind of distributed dynamic cuckoo filtration system and its filter method based on Redis clusters
CN114265814B (en) Data lake file system based on object storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190920

RJ01 Rejection of invention patent application after publication