CN113312355A - Data management method and device - Google Patents

Data management method and device Download PDF

Info

Publication number
CN113312355A
CN113312355A CN202110660666.XA CN202110660666A CN113312355A CN 113312355 A CN113312355 A CN 113312355A CN 202110660666 A CN202110660666 A CN 202110660666A CN 113312355 A CN113312355 A CN 113312355A
Authority
CN
China
Prior art keywords
metadata
grouping
index
determining
fragment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110660666.XA
Other languages
Chinese (zh)
Inventor
张晓阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202110660666.XA priority Critical patent/CN113312355A/en
Publication of CN113312355A publication Critical patent/CN113312355A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for data management, and relates to the technical field of computers. One embodiment of the method comprises: acquiring a query request aiming at a metadata table, wherein the query request comprises query participation; determining a grouping identifier and a fragment identifier of target data corresponding to the query entry according to the query entry and the constructed secondary index model; and determining the data storage range of the target data according to the grouping identification and the fragment identification. According to the embodiment, the secondary index model is constructed through the metadata table for retrieval, so that the data retrieval efficiency is improved, and the user experience is improved.

Description

Data management method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for data management.
Background
An ES (elastic search, a distributed full-text search engine) supports distributed and reverse index, provides rich search API, can be used as a storage tool of mass data, and is widely applied to the services of mass data retrieval, aggregation, log analysis and the like in the Internet field.
In the prior art, an ES is generally adopted as a storage engine, and data synchronization is performed based on a database so as to solve the problem of list search caused by database division and table division of the database. However, when the data volume cardinality is large, the service retrieval range is wide, and the query scenario is complex, the retrieval efficiency of the ES is low, and the ES cannot efficiently support and adapt to the service requirement.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for data management, which can implement fast retrieval of data, improve retrieval efficiency, and improve user experience.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a data management method including:
acquiring a query request aiming at a metadata table, wherein the query request comprises query participation;
determining a grouping identifier and a fragment identifier of target data corresponding to the query entry according to the query entry and the constructed secondary index model;
and determining the data storage range of the target data according to the grouping identification and the fragment identification.
Optionally, before determining the grouping identifier and the fragment identifier of the target data corresponding to the query entry according to the query entry and the constructed secondary index model, the method further includes:
screening index fields from each field of the metadata table, and constructing a secondary index table according to the index fields;
and grouping the metadata in the metadata table according to the index fields in the secondary index table, and fragmenting the metadata in each group to construct the secondary index model.
Optionally, grouping the metadata in the metadata table according to the index field in the secondary index table, and fragmenting the metadata in each group, includes:
screening an index field from the index fields of the secondary index table as a grouping field, and screening a field from the index field as a slicing key value;
grouping the metadata according to the grouping field to obtain one or more groups;
fragmenting the metadata in each group according to the fragmentation key value to obtain one or more fragments;
and storing the grouping identification of the grouping and the fragment identification of the fragment corresponding to each piece of metadata.
Optionally, after storing the group identifier of the group and the fragment identifier of the fragment corresponding to each piece of metadata, the method further includes:
storing the storage path information for each piece of metadata.
Optionally, before storing the storage path information of each piece of metadata, for any packet including a plurality of slices, performing:
determining that a difference or ratio of data amounts of metadata in any two slices of the packet does not exceed a preset threshold,
and when the preset threshold value is exceeded, all metadata in the group are re-fragmented after hashing, and storage path information of all metadata in the group after re-fragmentation is stored.
Optionally, re-fragmenting all metadata in the packet after hashing the metadata includes:
and performing secondary hash on the metadata according to the fragment identifier of the metadata before re-fragmentation, determining the fragment identifier of the metadata after re-fragmentation, and re-fragmenting all the metadata in the packet according to the result of the secondary hash.
Optionally, determining a grouping identifier and a fragmentation identifier of the target data corresponding to the query entry includes:
determining an index field corresponding to the query entry from the secondary index model according to the query entry, and determining the grouping field and the slicing key value corresponding to the query entry according to the index field;
and determining the grouping identification and the fragment identification of the target data corresponding to the query entry according to the corresponding grouping field and the fragment key value.
According to still another aspect of an embodiment of the present invention, there is provided an apparatus for data management, including:
the acquisition module is used for acquiring a query request aiming at the metadata table, wherein the query request comprises query entry parameters;
the query module is used for determining a grouping identifier and a fragment identifier of target data corresponding to the query entry parameter according to the query entry parameter and the constructed secondary index model;
and the determining module is used for determining the data storage range of the target data according to the grouping identification and the slicing identification.
According to another aspect of an embodiment of the present invention, there is provided an electronic apparatus including:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method for data management provided by the present invention.
According to a further aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method of data management provided by the present invention.
One embodiment of the above invention has the following advantages or benefits: by means of a secondary index model constructed according to a metadata table, each metadata in the metadata table has a corresponding grouping identifier and a corresponding fragment identifier, a query entry is obtained from a query request, and the grouping identifier and the fragment identifier of target data corresponding to the query entry are obtained by combining the secondary index model, so that the storage range of the target data is obtained. The data management method of the embodiment of the invention can solve the problem of low ES retrieval efficiency under the conditions of large data base, wide service retrieval range and complex query scene, realize the rapid retrieval of data, meet service requirements, improve the retrieval efficiency and improve the user experience.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of a main flow of a method of data management according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a main process flow of constructing a two-level index model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of another method of data management according to an embodiment of the invention;
FIG. 4 is a flow diagram illustrating a method of data management according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of the main modules of an apparatus for data management according to an embodiment of the present invention;
FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 7 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of a main flow of a data management method according to an embodiment of the present invention, as shown in fig. 1, the data management method includes:
step S101: acquiring a query request aiming at a metadata table, wherein the query request comprises query parameters;
step S102: determining a grouping identifier and a fragment identifier of target data corresponding to the query entry according to the query entry and the constructed secondary index model;
step S103: and determining the data storage range of the target data according to the grouping identification and the fragment identification.
The data management method provided by the embodiment of the invention solves the problem that the ES cannot realize quick retrieval under the conditions of large data base, wide retrieval range and complex query scene, and realizes the high-efficiency retrieval of data by expanding the storage and retrieval modes of the ES.
In the embodiment of the invention, the query request comprises query entry parameters which can be obtained by analyzing the query request, and the query entry parameters can be query fields which can be fields such as order numbers, ordering time, commodity names, account numbers and the like in the field of e-commerce.
In the embodiment of the present invention, before step 102, the method includes: screening index fields from each field of the metadata table, and constructing a secondary index table according to the index fields; and grouping the metadata in the metadata table according to the index fields in the secondary index table, and slicing the metadata in each group to construct a secondary index (SecondardyIndex) model.
The method comprises the steps of screening one or more fields which are frequently searched from each field in a metadata table as index fields by combing search scenes or historical search conditions aiming at the metadata in the metadata table, constructing a secondary index table according to the index fields, screening the index fields by setting the fields with the search times exceeding a preset time threshold, and inquiring the storage range of the corresponding metadata through one or more index fields. The secondary index table includes an index field and a Value (Value) of the index field. Optionally, the index field further includes a routing time, and the routing time is used as an index field of the secondary index table, so that the metadata can be subsequently archived according to time, and the routing time can be ordering time.
For example, the fields in the metadata table include: the method comprises the steps of ordering number, order placing time, account number, commodity code, commodity name, selling price, tax rate, order placing account number, organization ID, invoice number and the like, and field ordering number, order placing account number, organization ID and invoice number which are frequently searched are packaged into a secondary index table as index fields through a search scene of combing metadata.
In the embodiment of the present invention, grouping metadata in a metadata table according to an index field in a secondary index table, and fragmenting the metadata in each group includes: screening an index field from the index fields of the secondary index table as a grouping field, and screening a field from the index fields as a fragmentation key value; grouping the metadata according to the grouping field to obtain one or more groups; fragmenting the metadata in each group according to the fragmentation key values to obtain one or more fragments; and storing the grouping identification of the grouping of each piece of metadata and the fragment identification of the fragment.
In an optional implementation manner of the embodiment of the present invention, after storing the group identifier of the group where each piece of metadata corresponds to and the fragment identifier of the fragment where each piece of metadata corresponds to, the method further includes: storing the storage path information for each piece of metadata.
And screening a grouping field and a fragmentation key value from the index fields in the secondary index table, wherein the grouping field is one of the index fields in the secondary index table, and the fragmentation key value is also one of the fields in the secondary index table. Optionally, the rule of screening may be determined according to the service retrieval and query scenarios, and a reasonable mapping file is generated by combing the scenarios of service "read" so that the screened grouping fields and the fragmentation key values are adapted to all retrieval domains as much as possible. Reasonable data splitting can be carried out according to the stock data and the growth ratio of future business development, splitting can be carried out according to the retrieval efficiency and the retrieval range, and a proper grouping field is determined to be used for archiving the data. The metadata may obtain the segment according to the segment key. For example, route time may be selected as a grouping field to archive metadata, may be yearly archived, monthly archived, etc.; when the fragmentation key value is determined, if a service query scene is searched only through an order number, the order number can be selected as the fragmentation key value, and when the service query scene is complex, a field with a large data dimension can be selected as the fragmentation key value.
According to the screened grouping field, metadata in the metadata table can be grouped to obtain one or more groups, each group has a corresponding grouping identification (such as ES.index #0, ES.index #1, … …, ES.index # M, M ≧ 0), the grouping identification can be determined according to the grouping field, then the metadata in each group is fragmented according to a fragmentation key value (partitionkey) to obtain one or more fragments, each fragment has a corresponding fragment identification (such as Shard0, Shard1, Shard2, … …, ShardN, N ≧ 0), one packet corresponds to one or more fragments, the grouping identification and the fragment identification corresponding to each piece of metadata are stored, and storage path information for storing each piece of metadata is stored. The storage path information is path information from the grouping identifier to the fragment identifier of the metadata, that is, information in which fragment of which grouping the metadata is stored, so that the metadata can be quickly located to the fragment where the metadata is located for access when being retrieved. The grouping where the metadata is located serves as a storage structure of an outer layer index of the secondary index model, and the fragmentation where the metadata is located serves as a storage structure inside an ES of the secondary index model.
As shown in fig. 2, a schematic diagram of a main process for constructing a secondary index model is provided, the secondary index model includes an ES data storage engine and an ES cluster, the ES data storage model includes: screening out routing time (such as order time) from the index field of the secondary index table as a grouping field, and if the metadata table is order data, taking order time as the index field so as to archive the order data according to the order time; then, an index field (such as an organization ID) is screened from the secondary index table as a fragmentation key value, and the determined grouping field and the fragmentation key value are written into the metadata table and the secondary index table, for example, an order ID, a placing time and an organization ID can be written into the metadata table and the secondary index table. The ES cluster includes: grouping the metadata according to a grouping field, for example, filing the metadata by a list time according to a year, wherein the grouping identifiers can be index _2019, index _2020 and index _2021, namely, by acquiring a value corresponding to the list time, the grouping identifier corresponding to the metadata of the list time in 2019 is index _ 2019; fragmenting the metadata in each group according to the agency ID, wherein each group corresponds to a plurality of fragments, and the fragment identification can be Shard0, Shard1, Shard2, … … and ShardN, wherein N is more than or equal to 0, so that the metadata of different agency IDs can correspond to different fragment identifications; and storing the grouping identification and the fragment identification corresponding to each piece of metadata and the storage path information, so that the fragment where the metadata is located can be accessed quickly according to the storage path information when the metadata is retrieved subsequently. For example, if an ES index includes 16 shards, after storing path information, each time a search request is made for metadata, the fragment corresponding to the metadata can be directly located without accessing the 16 shards, 15 accesses are reduced, and the search range is theoretically 1/16 before, so that the purpose of fast search is achieved, and the search efficiency is improved.
In the embodiment of the present invention, before storing the storage path information of each piece of metadata, for any packet including a plurality of slices, performing: determining that the difference or ratio of the data quantity of the metadata in any two fragments of the packet does not exceed a preset threshold, when the difference or ratio exceeds the preset threshold, hashing all the metadata in the packet and then re-fragmenting the metadata, and storing storage path information of all the metadata in the packet after re-fragmenting the metadata.
When a packet includes a plurality of slices, there may be a case where the data amount of metadata in different slices is not uniform, and the search efficiency may be reduced due to the non-uniform distribution of the data amount. In this case, the data in the packet may be hashed and then re-fragmented, so that the distribution of the data amount tends to be uniform. The data volume difference or ratio of the metadata in any two fragments in the packet can be judged whether to not exceed a preset threshold, if not, re-fragmentation is not needed, and if so, all the metadata hashes in the packet are re-fragmented. The preset threshold may be set according to a service scenario or a service requirement, for example, a ratio of data amounts of metadata in any two segments cannot exceed the preset threshold (50), and when the ratio exceeds 50, re-segmentation is required.
In the embodiment of the present invention, re-fragmenting all metadata in a packet after hashing includes: and performing secondary hash on the metadata according to the fragment identifier of the metadata before re-fragmentation, determining the fragment identifier of the metadata after re-fragmentation, and re-fragmenting all the metadata in the packet according to the result of the secondary hash.
After hashing all metadata in the packet, performing secondary hash processing on the metadata, re-fragmenting all the metadata in the packet, determining the fragment where the metadata is re-fragmented, and storing storage path information of the metadata after re-fragmentation, so as to be capable of quickly positioning in subsequent retrieval.
After all metadata within a packet is hashed, it is processed by a double hash to re-fragment. By analyzing all metadata in a packet, determining a secondary hash key value (second hash key) and a hash width (HashRange), wherein the secondary hash key value is a field to be fragmented again, the hash width is the width of the metadata hashed after being fragmented again, the secondary hash key value defaults to the id of the ES, a random UUID can be selected, the hash width is determined according to the distribution condition of the data volume of the fragmentation key value fragments, the data density of the Shard is as close to the data density of the Shard as possible after secondary hash, namely, the data volume in each Shard is uniformly distributed, and when the data distribution is more uniform, the larger the hash width is, the larger the retrieval performance is, so that the data distribution and the hash width need to be balanced.
According to the secondary hash key value, the hash width and the fragment identifier of the fragment in which the new metadata is re-fragmented, the fragment identifier of the fragment in which the metadata is re-fragmented can be determined. The ShardA identifier ShardA of the shard in which the metadata is located after re-sharding can be obtained by the following formula: ShardA is hash (_ partitionkey)% numtrimararyshards + hash (_ second hashkey)% HashRange, where hash (_ partitionkey)% numtrimararyshards is the slice where the metadata was before re-slicing, numtrimararyshards is the number of slices of the packet before re-slicing, and hash (_ second hashkey)% HashRange is the offset of the metadata.
For example, when the metadata in one packet is divided into 8 shards (Shard0, Shard1, Shard2, … …, Shard7) according to the organization ID as the slicing key, that is, numtrimary shards is 8, since the difference of the next unit amount of each organization is large, the distribution of the metadata in each Shard is not uniform, UUID is selected as the secondary hashing key, the hash width is 3, all the metadata in the packet is re-divided into 3 shards after hashing, if the slice in which the metadata is located before re-slicing is Shard2, the slice in which the metadata is located after re-slicing is 8% Shard2, and the offset% is a random number between 0 and 2, the slice in which the metadata is located after re-slicing is determined to be Shard2+ offset, that is the slicing position where the metadata is stored.
By re-fragmenting the data in the packet after hashing, the data in each fragment after re-fragmenting can be more uniformly distributed before the new fragment, the number of the fragments is properly reduced, and the retrieval efficiency can be improved.
In the embodiment of the present invention, as shown in fig. 3, determining the packet identifier and the fragment identifier of the target data corresponding to the query entry includes:
step S301: determining an index field corresponding to the query entry from the secondary index model according to the query entry;
step S302: determining a grouping field and a slicing key value corresponding to the query entry according to the index field;
step S303: and determining the grouping identification and the fragment identification of the target data corresponding to the query entry according to the corresponding grouping field and the fragment key value.
According to the query entry parameter, an index field corresponding to the query entry parameter is determined from the secondary index model, a grouping field (such as a single time) and a fragmentation key value (such as a mechanism ID) are further determined, according to the query entry parameter, and the grouping field and the fragmentation key value corresponding to the query entry parameter, which fragmentation of which grouping the target data is stored in can be obtained, that is, a grouping identifier and a fragmentation identifier of the target data are determined, the grouping identifier and the fragmentation identifier of the target data are encapsulated into Routinginfo (an entity encapsulating the grouping identifier and the fragmentation identifier), and then a data storage range of the target data is determined according to the grouping identifier and the fragmentation identifier. If the secondary index model is grouped according to the year and then is fragmented according to the organization ID, the order placing time and the organization ID are determined according to the query parameters, and then the order placing time is aggregated, so that which grouping is the order placing time, and which fragmentation is the order placing time can be determined by aggregating the organization ID.
After receiving a query request, identifying a query entry parameter (QueryParam) through a default initialization strategy, judging whether a grouping field and a fragment key value in a secondary index table exist in the query entry parameter, if so, determining a grouping identifier and a fragment identifier corresponding to the query entry parameter according to the grouping field and the fragment key value, packaging the grouping identifier and the fragment identifier into Routinginfo, and determining a data storage range of target data from a metadata table according to the grouping identifier and the fragment identifier.
In the embodiment of the invention, in order to improve the retrieval efficiency, the number of the fragmentation key values determined according to the query entry is set, namely the set threshold is set, but when the number of the determined fragmentation key values exceeds the set threshold, partial data retrieval is lost, so that the retrieval range of a user can be set at the front end in combination with the balance between the user experience and the retrieval efficiency of a service scene, and if the cross-mechanism retrieval cannot be carried out; or after determining the grouping field and the slicing key value corresponding to the query entry, and before determining the grouping identifier and the slicing identifier of the target data corresponding to the query entry, the method includes: judging whether the number of the fragment key values exceeds a set threshold value, and if not, retrieving target data from the metadata table according to the grouping identification and the fragment identification; if the number of the shards exceeds the preset number, the shards in the routing info are reset, and multi-Shard or full-Shard retrieval is set to avoid data loss so as to improve user experience.
Fig. 4 is a flowchart illustrating a method for data management according to an embodiment of the present invention, wherein the secondary index model includes M packets, and the packet identifiers are es.index #0, es.index #1, … …, es.index # i, … …, es.index # M, and es.index # i includes N slices, and the slice identifiers are Shard0, Shard1, … …, Shardi, … …, and ShardN. When a user sends a Query (Query request) to an application layer (such as a Client), wherein the Query request comprises a Query entry, the application layer calls an ES Client (ES Client), determines that a packet identifier corresponding to the Query entry is es.index # i and a corresponding fragment identifier is Shardj according to a secondary index model, determines a data storage range of target data corresponding to the Query entry according to the packet identifier and the fragment identifier corresponding to the Query entry, that is, determines that the target data is stored in Shardj of the es.index # i, then acquires the target data from a metadata table according to the data storage range, and returns the target data to the user, thereby completing a Query retrieval process.
According to the data management method provided by the embodiment of the invention, the grouping identification and the fragment identification of the target data corresponding to the query entry are determined according to the query entry through constructing the secondary index model, so that the data storage range of the target data is determined. The method provided by the embodiment of the invention can improve the efficiency of data retrieval aiming at the conditions of large data base number, wide retrieval range and complex retrieval scene, thereby improving the user experience.
According to another aspect of the embodiments of the present invention, as shown in fig. 5, there is provided an apparatus 500 for data management, including:
the obtaining module 501 obtains a query request for the metadata table, where the query request includes a query entry;
the query module 502 determines the grouping identifier and the fragment identifier of the target data corresponding to the query entry parameter according to the query entry parameter and the constructed secondary index model;
and the determining module 503 determines the data storage range of the target data according to the grouping identifier and the fragment identifier.
In this embodiment of the present invention, the apparatus 500 for data management further includes: the building module is used for screening out index fields from each field of the metadata table and building a secondary index table according to the index fields; and grouping the metadata in the metadata table according to the index fields in the secondary index table, and fragmenting the metadata in each group to construct a secondary index model.
In an embodiment of the present invention, the building module is further configured to: screening an index field from the index fields of the secondary index table as a grouping field, and screening a field from the index fields as a fragmentation key value; grouping the metadata according to the grouping field to obtain one or more groups; fragmenting the metadata in each group according to the fragmentation key values to obtain one or more fragments; and storing the grouping identification of the grouping of each piece of metadata and the fragment identification of the fragment.
In an embodiment of the present invention, the building module is further configured to: storing the storage path information for each piece of metadata.
In an embodiment of the present invention, the building module is further configured to: before storing the storage path information of each piece of metadata, for any packet containing a plurality of slices, performing: determining that the difference or ratio of the data quantity of the metadata in any two fragments of the packet does not exceed a preset threshold, when the difference or ratio exceeds the preset threshold, hashing all the metadata in the packet and then re-fragmenting the metadata, and storing storage path information of all the metadata in the packet after re-fragmenting the metadata.
In an embodiment of the present invention, the building module is further configured to: and performing secondary hash on the metadata according to the fragment identifier of the metadata before re-fragmentation, determining the fragment identifier of the metadata after re-fragmentation, and re-fragmenting all the metadata in the packet according to the result of the secondary hash.
In this embodiment of the present invention, the query module 502 is further configured to: determining an index field corresponding to the query entry from the secondary index model according to the query entry, and determining a grouping field and a fragmentation key value corresponding to the query entry according to the index field; and determining the grouping identification and the fragment identification of the target data corresponding to the query entry according to the corresponding grouping field and the fragment key value.
According to still another aspect of embodiments of the present invention, there is provided an electronic apparatus including: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by one or more processors, the one or more processors realize the data management method of the embodiment of the invention.
Yet another aspect of the embodiments of the present invention provides a computer-readable medium on which a computer program is stored, the program, when executed by a processor, implementing a method of data management of an embodiment of the present invention.
Fig. 6 shows an exemplary system architecture 600 of a data management apparatus or method to which embodiments of the invention may be applied.
As shown in fig. 6, the system architecture 600 may include terminal devices 601, 602, 603, a network 604, and a server 605. The network 604 serves to provide a medium for communication links between the terminal devices 601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 601, 602, 603 to interact with the server 605 via the network 604 to receive or send messages or the like. The terminal devices 601, 602, 603 may have installed thereon various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 605 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 601, 602, 603. The backend management server may analyze and otherwise process the received data such as the query request, and feed back target data (for example, product information — just an example) corresponding to the query request to the terminal device.
It should be noted that the method for data management provided by the embodiment of the present invention is generally executed by the server 605, and accordingly, the apparatus for data management is generally disposed in the server 605.
It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Reference is now made to the schematic diagram of fig. 7. The terminal device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 701.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes an acquisition module, a query module, and a determination module. The names of these modules do not in some cases constitute a limitation on the module itself, and for example, the obtaining module may also be described as a "module that obtains a query request for a metadata table".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring a query request aiming at a metadata table, wherein the query request comprises query parameters; determining a grouping identifier and a fragment identifier of target data corresponding to the query entry according to the query entry and the constructed secondary index model; and determining the data storage range of the target data according to the grouping identification and the fragment identification.
According to the technical scheme of the embodiment of the invention, each piece of metadata in the metadata table has the corresponding grouping identification and the corresponding fragment identification through the secondary index model constructed according to the metadata table, the grouping identification and the fragment identification of the target data corresponding to the query entry are obtained by acquiring the query entry from the query request and combining the secondary index model, and the storage range of the target data is further acquired. The data management method of the embodiment of the invention can solve the problem of low ES retrieval efficiency under the conditions of large data base, wide service retrieval range and complex query scene, realize the rapid retrieval of data, meet service requirements, improve the retrieval efficiency and improve the user experience.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method of data management, comprising:
acquiring a query request aiming at a metadata table, wherein the query request comprises query participation;
determining a grouping identifier and a fragment identifier of target data corresponding to the query entry according to the query entry and the constructed secondary index model;
and determining the data storage range of the target data according to the grouping identification and the fragment identification.
2. The method of claim 1, wherein before determining the group identity and the segment identity of the target data corresponding to the query entry according to the query entry and the constructed secondary index model, further comprising:
screening index fields from each field of the metadata table, and constructing a secondary index table according to the index fields;
and grouping the metadata in the metadata table according to the index fields in the secondary index table, and fragmenting the metadata in each group to construct the secondary index model.
3. The method of claim 2, wherein grouping the metadata in the metadata table according to the index fields in the secondary index table and fragmenting the metadata within each group comprises:
screening an index field from the index fields of the secondary index table as a grouping field, and screening a field from the index field as a slicing key value;
grouping the metadata according to the grouping field to obtain one or more groups;
fragmenting the metadata in each group according to the fragmentation key value to obtain one or more fragments;
and storing the grouping identification of the grouping and the fragment identification of the fragment corresponding to each piece of metadata.
4. The method according to claim 3, further comprising, after storing the grouping identifier of the grouping and the segment identifier of the segment corresponding to each piece of metadata:
storing the storage path information for each piece of metadata.
5. The method according to claim 4, wherein before storing the storage path information of each piece of metadata, for any packet including a plurality of slices, performing:
determining that a difference or ratio of data amounts of metadata in any two slices of the packet does not exceed a preset threshold,
and when the preset threshold value is exceeded, all metadata in the group are re-fragmented after hashing, and storage path information of all metadata in the group after re-fragmentation is stored.
6. The method of claim 5, wherein re-fragmenting all metadata in the packet after hashing comprises:
and performing secondary hash on the metadata according to the fragment identifier of the metadata before re-fragmentation, determining the fragment identifier of the metadata after re-fragmentation, and re-fragmenting all the metadata in the packet according to the result of the secondary hash.
7. The method of claim 3, wherein determining the group identity and the segment identity of the target data corresponding to the query entry comprises:
determining an index field corresponding to the query entry from the secondary index model according to the query entry, and determining the grouping field and the slicing key value corresponding to the query entry according to the index field;
and determining the grouping identification and the fragment identification of the target data corresponding to the query entry according to the corresponding grouping field and the fragment key value.
8. An apparatus for data management, comprising:
the acquisition module is used for acquiring a query request aiming at the metadata table, wherein the query request comprises query entry parameters;
the query module is used for determining a grouping identifier and a fragment identifier of target data corresponding to the query entry parameter according to the query entry parameter and the constructed secondary index model;
and the determining module is used for determining the data storage range of the target data according to the grouping identification and the slicing identification.
9. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202110660666.XA 2021-06-15 2021-06-15 Data management method and device Pending CN113312355A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110660666.XA CN113312355A (en) 2021-06-15 2021-06-15 Data management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110660666.XA CN113312355A (en) 2021-06-15 2021-06-15 Data management method and device

Publications (1)

Publication Number Publication Date
CN113312355A true CN113312355A (en) 2021-08-27

Family

ID=77378730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110660666.XA Pending CN113312355A (en) 2021-06-15 2021-06-15 Data management method and device

Country Status (1)

Country Link
CN (1) CN113312355A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115168409A (en) * 2022-09-05 2022-10-11 金蝶软件(中国)有限公司 Data query method and device for database sub-tables and computer equipment
WO2024022180A1 (en) * 2022-07-28 2024-02-01 天津联想协同科技有限公司 Network disk document indexing method and apparatus, and network disk and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8180789B1 (en) * 2005-12-05 2012-05-15 Teradata Us, Inc. Techniques for query generation, population, and management
CN103488687A (en) * 2013-09-02 2014-01-01 用友软件股份有限公司 Searching system and searching method of big data
CN103631951A (en) * 2013-12-12 2014-03-12 用友软件股份有限公司 Batch access function merging method and device based on metadata
CN103729471A (en) * 2014-01-21 2014-04-16 华为软件技术有限公司 Method and device for database query
US20150347443A1 (en) * 2012-12-20 2015-12-03 Bae Systems Plc Searchable data archive
CN107291889A (en) * 2017-06-20 2017-10-24 郑州云海信息技术有限公司 A kind of date storage method and system
WO2018081960A1 (en) * 2016-11-02 2018-05-11 华为技术有限公司 File management method, file system, and server system
CN108664223A (en) * 2018-05-18 2018-10-16 百度在线网络技术(北京)有限公司 A kind of distributed storage method, device, computer equipment and storage medium
CN108897859A (en) * 2018-06-29 2018-11-27 郑州云海信息技术有限公司 A kind of metadata retrieval method, apparatus, equipment and computer readable storage medium
US10318491B1 (en) * 2015-03-31 2019-06-11 EMC IP Holding Company LLC Object metadata query with distributed processing systems
CN110083605A (en) * 2019-04-24 2019-08-02 天津中新智冠信息技术有限公司 Traffic table querying method, device, server and computer readable storage medium
CN110389940A (en) * 2019-07-19 2019-10-29 苏州浪潮智能科技有限公司 A kind of data balancing method, device and computer readable storage medium
CN112783835A (en) * 2021-03-11 2021-05-11 百果园技术(新加坡)有限公司 Index management method and device and electronic equipment

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8180789B1 (en) * 2005-12-05 2012-05-15 Teradata Us, Inc. Techniques for query generation, population, and management
US20150347443A1 (en) * 2012-12-20 2015-12-03 Bae Systems Plc Searchable data archive
CN103488687A (en) * 2013-09-02 2014-01-01 用友软件股份有限公司 Searching system and searching method of big data
CN103631951A (en) * 2013-12-12 2014-03-12 用友软件股份有限公司 Batch access function merging method and device based on metadata
CN103729471A (en) * 2014-01-21 2014-04-16 华为软件技术有限公司 Method and device for database query
US10318491B1 (en) * 2015-03-31 2019-06-11 EMC IP Holding Company LLC Object metadata query with distributed processing systems
WO2018081960A1 (en) * 2016-11-02 2018-05-11 华为技术有限公司 File management method, file system, and server system
CN107291889A (en) * 2017-06-20 2017-10-24 郑州云海信息技术有限公司 A kind of date storage method and system
CN108664223A (en) * 2018-05-18 2018-10-16 百度在线网络技术(北京)有限公司 A kind of distributed storage method, device, computer equipment and storage medium
CN108897859A (en) * 2018-06-29 2018-11-27 郑州云海信息技术有限公司 A kind of metadata retrieval method, apparatus, equipment and computer readable storage medium
CN110083605A (en) * 2019-04-24 2019-08-02 天津中新智冠信息技术有限公司 Traffic table querying method, device, server and computer readable storage medium
CN110389940A (en) * 2019-07-19 2019-10-29 苏州浪潮智能科技有限公司 A kind of data balancing method, device and computer readable storage medium
CN112783835A (en) * 2021-03-11 2021-05-11 百果园技术(新加坡)有限公司 Index management method and device and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024022180A1 (en) * 2022-07-28 2024-02-01 天津联想协同科技有限公司 Network disk document indexing method and apparatus, and network disk and storage medium
CN115168409A (en) * 2022-09-05 2022-10-11 金蝶软件(中国)有限公司 Data query method and device for database sub-tables and computer equipment
CN115168409B (en) * 2022-09-05 2023-02-28 金蝶软件(中国)有限公司 Data query method and device for database sub-tables and computer equipment

Similar Documents

Publication Publication Date Title
CN109947668B (en) Method and device for storing data
CN110019211A (en) The methods, devices and systems of association index
CN110019080B (en) Data access method and device
CN110572422B (en) Data downloading method, device, equipment and medium
CN110928853A (en) Method and device for identifying log
CN111858586B (en) Data processing method and device
CN111753223A (en) Access control method and device
CN113312355A (en) Data management method and device
CN110727738A (en) Global routing system based on data fragmentation, electronic equipment and storage medium
CN109753424B (en) AB test method and device
CN110851419B (en) Data migration method and device
CN112100168A (en) Method and device for determining data association relationship
CN112395337B (en) Data export method and device
CN112835863A (en) Processing method and processing device of operation log
CN107977381B (en) Data configuration method, index management method, related device and computing equipment
CN113761433B (en) Service processing method and device
CN115794876A (en) Fragment processing method, device, equipment and storage medium for service data packet
CN112711572B (en) Online capacity expansion method and device suitable for database and table division
CN110019671B (en) Method and system for processing real-time message
CN109558433B (en) Method and device for requesting access to HDFS
CN109656519B (en) Method and device for automatically accessing service data
CN113704242A (en) Data processing method and device
CN112783914A (en) Statement optimization method and device
CN117478535B (en) Log storage method and device
CN110705935A (en) Logistics document processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination