CN117435560A - Data query method, device, electronic equipment and readable storage medium - Google Patents

Data query method, device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN117435560A
CN117435560A CN202311341492.6A CN202311341492A CN117435560A CN 117435560 A CN117435560 A CN 117435560A CN 202311341492 A CN202311341492 A CN 202311341492A CN 117435560 A CN117435560 A CN 117435560A
Authority
CN
China
Prior art keywords
data
target
hash value
input
disk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311341492.6A
Other languages
Chinese (zh)
Inventor
梅凯
丛俊羽
赵辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Du Xiaoman Technology Beijing Co Ltd
Original Assignee
Du Xiaoman Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Du Xiaoman Technology Beijing Co Ltd filed Critical Du Xiaoman Technology Beijing Co Ltd
Priority to CN202311341492.6A priority Critical patent/CN117435560A/en
Publication of CN117435560A publication Critical patent/CN117435560A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/137Hash-based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data query method, a data query device, electronic equipment and a readable storage medium, wherein the data query method comprises the following steps: receiving a first input, wherein the first input is a data query request; responding to the first input, obtaining a target hash value of target data corresponding to the data query request, and querying an index section of a memory to obtain a starting position and an ending position of a data file corresponding to the target hash value; receiving a second input, wherein the second input is a starting position and an ending position of a data file corresponding to the target hash value; and responding to the second input, and reading the data files in the corresponding range in the magnetic disk according to the starting position and the ending position of the data files corresponding to the target hash value. The invention solves the problems of multiple access times to the disk and low query efficiency during data query.

Description

Data query method, device, electronic equipment and readable storage medium
Technical Field
The present invention relates to the field of computer network communications technologies, and in particular, to a data query method, a data query device, an electronic device, and a readable storage medium.
Background
In a common scenario, for example, in a financial scenario, there is a scenario in which online model scoring is performed on a client, the data size in the scenario is of PB level, hot spot data is not obvious, and millisecond delay is required for access. These scenarios require SATA disks to be used to reduce cost, since SSD disks are too costly, and the number of data queried remains t+1. The data does not need to be updated in real time in the scenes, but has higher requirements on the query performance of the data, and the data hot spots are not obvious in the scenes, so that the access times to the disk during data query are required to be reduced when the disk is required to be queried frequently, and the query efficiency is improved.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide a data query method, apparatus, electronic device, and readable storage medium, so as to solve the problems of multiple access times to a disk and low query efficiency during data query.
According to an aspect of the present invention, there is provided a data query method including:
receiving a first input, wherein the first input is a data query request;
responding to the first input, obtaining a target hash value of target data corresponding to the data query request, and querying an index section of a memory to obtain a starting position and an ending position of a data file corresponding to the target hash value;
receiving a second input, wherein the second input is a starting position and an ending position of a data file corresponding to the target hash value;
and responding to the second input, and reading the data files in the corresponding range in the magnetic disk according to the starting position and the ending position of the data files corresponding to the target hash value.
Optionally, before the obtaining, in response to the first input, a target hash value of target data corresponding to the data query request, the method further includes:
constructing a routing table according to the first corresponding relation between the keywords and the hash value;
horizontally segmenting the disk into a plurality of segments, and sequentially storing the keywords corresponding to the hash values in the corresponding segments according to the first corresponding relation and the sequence of the first corresponding relation in the routing table;
the obtaining, in response to the first input, a target hash value of target data corresponding to the data query request, including:
acquiring the target data corresponding to the data query request;
and matching the target hash value of the target data according to the record of the routing table.
Optionally, before the inquiring and obtaining the starting position and the ending position of the data file corresponding to the target hash value in the index section of the memory, the method further includes:
and reading index section data stored in the disk file and storing the index section data into the memory, wherein the index section data comprises, but is not limited to, the maximum value of the hash value, the corresponding disk start offset and the data length.
Optionally, in response to the second input, reading the data file in the corresponding range in the disk according to the start position and the end position of the data file corresponding to the target hash value, including:
determining a corresponding target data service node and a target fragment according to the maximum value of the hash value obtained by indexing the target hash value in the index section of the memory, the corresponding disk initial offset and the data length;
and acquiring the data file stored in the target fragment.
According to a second aspect of the present invention, there is provided a data query apparatus comprising:
the first receiving module is used for receiving a first input, and the first input is a data query request;
the first acquisition module is used for responding to the first input, acquiring a target hash value of target data corresponding to the data query request, and querying an index section of a memory to obtain a starting position and an ending position of a data file corresponding to the target hash value;
the second receiving module is used for receiving a second input, wherein the second input is a starting position and an ending position of the data file corresponding to the target hash value;
and the query module is used for responding to the second input and reading the data files in the corresponding range in the magnetic disk according to the starting position and the ending position of the data files corresponding to the target hash value.
Optionally, the data query device further includes:
the construction module is used for constructing a routing table according to the first corresponding relation between the keywords and the hash value;
the first storage module is used for horizontally dividing the disk into a plurality of fragments, and sequentially storing the keywords corresponding to the hash values in the corresponding fragments according to the first corresponding relation and the sequence of the first corresponding relation in the routing table;
the first acquisition module includes:
the first acquisition sub-module is used for acquiring the target data corresponding to the data query request;
and the first matching module is used for matching the target hash value of the target data according to the record of the routing table.
Optionally, the data query device further includes:
the second storage module is used for reading the index section data stored in the disk file and storing the index section data into the memory, wherein the index section data comprises, but is not limited to, the maximum value of the hash value, the corresponding disk start offset and the data length.
Optionally, the query module includes:
the index module is used for determining corresponding target data service nodes and target fragments according to the maximum value of the hash value obtained by indexing the target hash value in the index section of the memory, the corresponding disk initial offset and the data length;
and the second acquisition module is used for acquiring the data file stored in the target fragment.
According to a third aspect of the present invention, there is provided an electronic device comprising:
a processor; and
a memory in which a program is stored,
wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method according to any of the first aspects of the invention.
According to a fourth aspect of the present invention there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method according to any one of the first aspects of the present invention.
According to one or more technical schemes provided by the embodiment of the application, the starting position and the ending position of a data file corresponding to a data query request are determined in an index section of a memory according to the hash value of data to be queried, the offset of a file block corresponding to the data file is found in a disk, then the file block containing the hash value is read through one-time disk IO, and the data is obtained by binary search. The invention can read the disk in one OA sequence, thereby realizing the effect of efficiently inquiring the data and improving the reading performance.
Drawings
Further details, features and advantages of the invention are disclosed in the following description of exemplary embodiments with reference to the following drawings, in which:
FIG. 1 illustrates a schematic diagram of an example system that implements the various methods described herein;
FIG. 2 shows a schematic diagram of a conventional data query system according to an exemplary embodiment of the present invention;
FIG. 3 illustrates a flowchart of a data query method according to an exemplary embodiment of the present invention;
FIG. 4 illustrates a schematic diagram of a data node service according to an exemplary embodiment of the present invention;
FIG. 5 illustrates a data read schematic in accordance with an exemplary embodiment of the present invention;
FIG. 6 shows a schematic block diagram of a data querying device according to an exemplary embodiment of the present invention;
fig. 7 shows a block diagram of an exemplary electronic device that can be used to implement an embodiment of the invention.
Detailed Description
Embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the invention is susceptible of embodiment in the drawings, it is to be understood that the invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the invention. It should be understood that the drawings and embodiments of the invention are for illustration purposes only and are not intended to limit the scope of the present invention.
It should be understood that the various steps recited in the method embodiments of the present invention may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the invention is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that the terms "first," "second," and the like herein are merely used for distinguishing between different devices, modules, or units and not for limiting the order or interdependence of the functions performed by such devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those skilled in the art will appreciate that "one or more" is intended to be construed as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the devices in the embodiments of the present invention are for illustrative purposes only and are not intended to limit the scope of such messages or information.
The following describes the solution of the present invention with reference to the drawings, and the technical solution provided in the embodiments of the present application is described in detail through specific embodiments and application scenarios thereof.
The common disk-type kv database storage structure is as follows:
lsm tree (Log-Structured Merge Tree): the method is totally called as a log structure merging tree, and adopts LevelDB, rocksDB and other structures, wherein the main differences between a disk LSM tree and a memory LSM tree are as follows: the hierarchy of disk LSM trees is deeper, and each hierarchy is larger in size due to disk I/O constraints. The read-write performance is similar to the memory LSM tree, the read performance is poor but the write performance is very high.
B+ tree: mySQL, cassandra and other structures are commonly used. The leaf nodes of the disk B+ tree are larger than those of the memory B+ tree, and the height of the tree is higher. The read-write performance is worse than that of memory B+ tree, but the range query is supported.
Lsm tree+b+ tree: the memory is periodically merged into a disk b+ tree using LSM trees. This structure can be used well with the advantages of both trees, such as the WiredTiger uses this structure. Read-write performance is good, and range query is supported.
4. Log structure and b+ tree: the recently written data is stored in a log or LSM tree form and then periodically consolidated into a b+ tree. This structure has high write performance but poor read performance.
As shown in fig. 2, taking a common leveldb, rocksdb as an example, the storage structure of the disk type distributed KV database storage structure is an LSM tree, when a data hotspot is not obvious, a disk query needs to be performed, and multiple disk accesses are generated during the query. Wherein the Distributed KV Database is a Distributed Key-Value Database (Distributed Key-Value Database).
Specifically, reading one data can be performed in the order of from top to bottom, and reading from the next hierarchy if the target data cannot be read. Memtable, immutable is in memory, IO overhead is not calculated, and if the target data is at level2, IO reading of 4 disks is needed, wherein two IOs are needed to traverse all ssts because level0 data can overlap. Except for the level0 layer, the data ranges of SST of other layers are not overlapped, so that 1 IO is generated in the level1 layer; finally, no level0 and level1 are found, and 1 IO will be found in level2, for a total of 4 IOs.
As shown in fig. 3, fig. 3 is a schematic flow chart of a data query method provided in an embodiment of the present application, where the method may include steps S301 to S304 as follows:
s301, receiving a first input, wherein the first input is a data query request.
S302, responding to the first input, acquiring a target hash value of target data corresponding to the data query request, and querying an index section of a memory to obtain a starting position and an ending position of a data file corresponding to the target hash value.
In this embodiment, the Hash (Hash), which is generally translated into a Hash, is to convert an input of any length (also called as pre-mapped pre-image) into an output of a fixed length, which is a Hash value, through a Hash algorithm. This conversion is a compressed mapping, i.e. the hash value is typically much smaller in space than the input, different inputs may be hashed to the same output, so it is not possible to determine a unique input value from the hash value. In short, hashing is a function of compressing messages of arbitrary length to a message digest of a fixed length.
In an optional manner of this embodiment, before the obtaining, in response to the first input, a target hash value of target data corresponding to the data query request, the method further includes:
s302a, constructing a routing table according to a first corresponding relation between the keywords and the hash value;
s302b, horizontally segmenting the disk into a plurality of segments, and sequentially storing the keywords corresponding to the hash values in the corresponding segments according to the first corresponding relation and the sequence of the first corresponding relation in the routing table.
S302, responding to the first input, acquiring a target hash value of target data corresponding to the data query request, and comprising:
s3021, acquiring the target data corresponding to the data query request;
s3022, matching the target hash value of the target data according to the record of the routing table.
In this embodiment, key values are stored, one key corresponds to one value, and value has no set schema of database objects. Multiple routing tables may be created and managed in the system. Fragment (Fragment): for one routing table, after horizontal segmentation is performed in a certain way, each obtained fragment is called fragment, each fragment contains a part of data of the routing table, fragments cannot be overlapped and intersected, and efficiency in query is guaranteed.
According to the record of the routing table, the hash value key is calculated according to a certain characteristic of the data, and a mapping relation (namely modulo) is built between the hash value and the fragments, so that the data with different hash values are distributed to different fragments.
In this embodiment, the mapping relationship between the fragments and the data node services is also recorded in the routing table service, so as to transfer the traffic to the corresponding data node service for query.
S303, receiving a second input, wherein the second input is a starting position and an ending position of the data file corresponding to the target hash value.
S304, responding to the second input, and reading the data files in the corresponding range in the magnetic disk according to the starting position and the ending position of the data files corresponding to the target hash value.
In this embodiment, as shown in fig. 1, at the beginning of data route creation, the number of data fragments is determined, and corresponding data fragments are distributed on different data node services. The key value on each data file is hashed according to the key in advance and then the fragments are modulo-placed. When the traffic is routed, the hash value is firstly taken from the key, and then the number of fragments is modulo. After the taking of the model, the key is judged to be on the segment with the number of the value of the taking of the model, so that the information in the target segment is inquired and obtained.
The data part in the magnetic disk comprises a plurality of data blocks, and the data blocks contain a plurality of pieces of data. The data portions are ordered by the value of the hash value. The data file may be an ordered large array. The data is sorted according to hash values, and may be partitioned according to data size, for example, 32k data blocks, where one data block includes a plurality of pieces of data. The data in the data block is also ordered, ordered by hash value.
In an optional manner of this embodiment, before the obtaining, by querying in the index segment of the memory, the start position and the end position of the data file corresponding to the target hash value, the method further includes:
s302c, reading index segment data stored in a disk file and storing the index segment data into the memory, wherein the index segment data comprises, but is not limited to, the maximum value of a hash value, a corresponding disk start offset and a data length.
In this embodiment, the index is an array that is placed into memory at initialization. The array contains the maximum value of the hash value of the data block and the initial offset of the corresponding block and also has the length, the initial offset of the data block corresponding to the hash value can be found in the memory through binary search, and then the data of the corresponding block and the length thereof are read through 1 time IO. The data of the data block is taken out, and then the corresponding data is taken out through the piece-by-piece comparison.
In an alternative manner of this embodiment, as shown in fig. 4, the data storage file of the kv database may be one or more. The data node file is composed of two parts, an index section is arranged at the tail of the file, the index section records the address and the length of the starting position of the file of each data block, the whole data file is an ordered array, the array is sequentially arranged according to the hash value, the index section records the largest hash_key (k 1) and the address thereof in the file, and simultaneously records the starting hash_key and the address of each data block. As shown in fig. 1, the hash_key size relationship k1> k2> k3> k4.
In an optional manner of this embodiment, the reading, in response to the second input, the data file in the corresponding range in the disk according to the start position and the end position of the data file corresponding to the target hash value includes:
s3041, determining a corresponding target data service node and a corresponding target fragment according to the maximum value of the hash value obtained by indexing the target hash value in an index section of the memory, a corresponding disk start offset and a data length;
s3042, obtaining the data file stored in the target fragment.
In this embodiment, as shown in fig. 5, the related service for storing the data file may be one or more. The data node service reads the data node file when starting, and records the index section of the data file into the memory. When the node of the routing table is queried to the data service node, the key is hashed, then the index section in the memory is searched in a bisection way, the initial position range and the end position range of the key corresponding to the hash are found, the data block is read once, the hash keys are ordered in the data block, and the corresponding data can be found by searching in the data block in a bisection way.
The traffic is forwarded to the data node service corresponding to the fragments through the routing table service, the data node service searches the hashed key in the memory to find the offset of the file block corresponding to the data file, and then reads the file block containing the hashed key through one-time disk IO, and the data is obtained by binary search.
According to the embodiment of the application, the data are stored in different hash buckets (namely fragments) according to the hash value distribution of the key, so that the hash bucket where the target data are can be quickly positioned. The data in each hash bucket is split into fixed-size file blocks for storage. The file block size is set to the maximum acceptable for system IO performance, so that a large amount of data can be obtained by reading one file block at a time. Each file block is stored internally in sequence, rather than randomly. When the file block is read, the magnetic disk can sequentially read data, so that frequent seeking of the magnetic head is avoided, and the reading performance is improved. And recording the minimum value of each file block in the memory as an index of the file block. Thus, the file block where the target data is located can be quickly found through binary search.
According to the data query method provided by the embodiment of the application, the starting position and the ending position of the data file corresponding to the data query request are determined in the index section of the memory according to the hash value of the data to be queried, the offset of the file block corresponding to the data file is found in the disk, then the file block containing the hash value is read through one-time disk IO, and the data is obtained by two-time searching. The effect of efficiently inquiring data can be achieved by reading the magnetic disk in one OA sequence, and the reading performance is improved. For a scene that the data volume is PB level, hot spot data is not obvious and millisecond time delay is needed for access, a solution of querying a distributed KV database is provided, for non-hot spot data, the condition that the original LSM tree has a plurality of disk IO when querying a data file is reduced to 1 disk IO, and the query efficiency of the data is improved.
Corresponding to the above embodiments, referring to fig. 6, the embodiment of the present application further provides a data query device 600, including:
a first receiving module 601, configured to receive a first input, where the first input is a data query request;
the first obtaining module 602 is configured to obtain, in response to the first input, a target hash value of target data corresponding to the data query request, and query an index segment of a memory to obtain a start position and an end position of a data file corresponding to the target hash value;
a second receiving module 603, configured to receive a second input, where the second input is a start position and an end position of the data file corresponding to the target hash value;
and the query module 604 is used for responding to the second input and reading the data files in the corresponding range in the magnetic disk according to the starting position and the ending position of the data files corresponding to the target hash value.
Optionally, the data query device 600 further includes:
a building module 605, configured to build a routing table according to the first correspondence between the key and the hash value;
a first storage module 606, configured to split the disk into a plurality of slices horizontally, and store the key words corresponding to the hash values in the corresponding slices sequentially according to the first correspondence and the order of the first correspondence in the routing table;
the first obtaining module 602 includes:
a first obtaining submodule 6021, configured to obtain the target data corresponding to the data query request;
the first matching module 6022 is configured to match the target hash value of the target data according to the record of the routing table.
Optionally, the data query device 600 further includes:
the second storage module 607 is configured to read the index segment data stored in the disk file and store the index segment data in the memory, where the index segment data includes, but is not limited to, a maximum value of the hash value, a corresponding disk start offset, and a data length.
Optionally, the query module 604 includes:
an index module 6041, configured to determine a corresponding target data service node and a target fragment according to a maximum value of the hash value obtained by indexing in the index segment of the memory, a corresponding disk start offset, and a data length;
a second acquiring module 6042 is configured to acquire the data file stored in the target fragment.
According to the data query method provided by the embodiment of the application, the starting position and the ending position of the data file corresponding to the data query request are determined in the index section of the memory according to the hash value of the data to be queried, the offset of the file block corresponding to the data file is found in the disk, then the file block containing the hash value is read through one-time disk IO, and the data is obtained by two-time searching. The invention can read the disk in one OA sequence, thereby realizing the effect of efficiently inquiring the data and improving the reading performance.
The exemplary embodiment of the invention also provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor for causing the electronic device to perform a method according to an embodiment of the invention when executed by the at least one processor.
The exemplary embodiments of the present invention also provide a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the present invention.
The exemplary embodiments of the invention also provide a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the invention.
Referring to fig. 7, a block diagram of an electronic device 700 that may be a server or a client of the present invention will now be described, which is an example of a hardware device that may be applied to aspects of the present invention. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 7, the electronic device 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706, an output unit 707, a storage unit 708, and a communication unit 709. The input unit 706 may be any type of device capable of inputting information to the electronic device 700, and the input unit 706 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 707 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 708 may include, but is not limited to, magnetic disks, optical disks. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through computer networks, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above. For example, in some embodiments, the data query method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. In some embodiments, the computing unit 701 may be configured to perform the data query method by any other suitable means (e.g., by means of firmware).
Program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Claims (10)

1. A method of querying data, comprising:
receiving a first input, wherein the first input is a data query request;
responding to the first input, obtaining a target hash value of target data corresponding to the data query request, and querying an index section of a memory to obtain a starting position and an ending position of a data file corresponding to the target hash value;
receiving a second input, wherein the second input is a starting position and an ending position of a data file corresponding to the target hash value;
and responding to the second input, and reading the data files in the corresponding range in the magnetic disk according to the starting position and the ending position of the data files corresponding to the target hash value.
2. The data query method of claim 1, comprising: the method further includes, before the step of responding to the first input and obtaining the target hash value of the target data corresponding to the data query request:
constructing a routing table according to the first corresponding relation between the keywords and the hash value;
horizontally segmenting the disk into a plurality of segments, and sequentially storing the keywords corresponding to the hash values in the corresponding segments according to the first corresponding relation and the sequence of the first corresponding relation in the routing table;
the obtaining, in response to the first input, a target hash value of target data corresponding to the data query request, including:
acquiring the target data corresponding to the data query request;
and matching the target hash value of the target data according to the record of the routing table.
3. The data query method of claim 1, comprising: before the initial position and the end position of the data file corresponding to the target hash value are inquired in the index section of the memory, the method further comprises the following steps:
and reading index section data stored in the disk file and storing the index section data into the memory, wherein the index section data comprises, but is not limited to, the maximum value of the hash value, the corresponding disk start offset and the data length.
4. The method of claim 3, wherein the reading the data file in the corresponding range of the disk according to the start position and the end position of the data file corresponding to the target hash value in response to the second input comprises:
determining a corresponding target data service node and a target fragment according to the maximum value of the hash value obtained by indexing the target hash value in the index section of the memory, the corresponding disk initial offset and the data length;
and acquiring the data file stored in the target fragment.
5. A data query device, comprising:
the first receiving module is used for receiving a first input, and the first input is a data query request;
the first acquisition module is used for responding to the first input, acquiring a target hash value of target data corresponding to the data query request, and querying an index section of a memory to obtain a starting position and an ending position of a data file corresponding to the target hash value;
the second receiving module is used for receiving a second input, wherein the second input is a starting position and an ending position of the data file corresponding to the target hash value;
and the query module is used for responding to the second input and reading the data files in the corresponding range in the magnetic disk according to the starting position and the ending position of the data files corresponding to the target hash value.
6. The data querying device of claim 5, wherein the data querying device further comprises:
the construction module is used for constructing a routing table according to the first corresponding relation between the keywords and the hash value;
the first storage module is used for horizontally dividing the disk into a plurality of fragments, and sequentially storing the keywords corresponding to the hash values in the corresponding fragments according to the first corresponding relation and the sequence of the first corresponding relation in the routing table;
the first acquisition module includes:
the first acquisition sub-module is used for acquiring the target data corresponding to the data query request;
and the first matching module is used for matching the target hash value of the target data according to the record of the routing table.
7. The data querying device of claim 5, wherein the data querying device further comprises:
the second storage module is used for reading the index section data stored in the disk file and storing the index section data into the memory, wherein the index section data comprises, but is not limited to, the maximum value of the hash value, the corresponding disk start offset and the data length.
8. The data querying device of claim 7, wherein the querying module comprises:
the index module is used for determining corresponding target data service nodes and target fragments according to the maximum value of the hash value obtained by indexing the target hash value in the index section of the memory, the corresponding disk initial offset and the data length;
and the second acquisition module is used for acquiring the data file stored in the target fragment.
9. An electronic device, comprising:
a processor; and
a memory in which a program is stored,
wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method according to any of claims 1-4.
10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-4.
CN202311341492.6A 2023-10-17 2023-10-17 Data query method, device, electronic equipment and readable storage medium Pending CN117435560A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311341492.6A CN117435560A (en) 2023-10-17 2023-10-17 Data query method, device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311341492.6A CN117435560A (en) 2023-10-17 2023-10-17 Data query method, device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN117435560A true CN117435560A (en) 2024-01-23

Family

ID=89549016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311341492.6A Pending CN117435560A (en) 2023-10-17 2023-10-17 Data query method, device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN117435560A (en)

Similar Documents

Publication Publication Date Title
US11275641B2 (en) Automatic correlation of dynamic system events within computing devices
CN107704202B (en) Method and device for quickly reading and writing data
US11762881B2 (en) Partition merging method and database server
CA3068345C (en) Witness blocks in blockchain applications
CN105159985A (en) Data query device and method based on redis cluster
CN108563697B (en) Data processing method, device and storage medium
CN110955704A (en) Data management method, device, equipment and storage medium
CN111221840A (en) Data processing method and device, data caching method, storage medium and system
CN113961510A (en) File processing method, device, equipment and storage medium
US11301436B2 (en) File storage method and storage apparatus
CN116842012A (en) Method, device, equipment and storage medium for storing Redis cluster in fragments
CN117435560A (en) Data query method, device, electronic equipment and readable storage medium
CN112148728A (en) Method, apparatus and computer program product for information processing
WO2024021491A1 (en) Data slicing method, apparatus and system
CN112711572B (en) Online capacity expansion method and device suitable for database and table division
CN111538804A (en) HBase-based graph data processing method and equipment
CN115221360A (en) Tree structure configuration method and system
CN112084141A (en) Full-text retrieval system capacity expansion method, device, equipment and medium
EP4131017A2 (en) Distributed data storage
CN116561106B (en) Configuration item data management method and system
US20230385240A1 (en) Optimizations for data deduplication operations
US11940998B2 (en) Database compression oriented to combinations of record fields
CN115883508B (en) Number processing method and device, electronic equipment and storage medium
CN117493355A (en) Data reading and writing method and device, electronic equipment and readable storage medium
CN113946702A (en) Image feature storage method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination