CN115730011A

CN115730011A - Data storage method, device and equipment of fragment type cluster

Info

Publication number: CN115730011A
Application number: CN202211504085.8A
Authority: CN
Inventors: 刘雷; 豆超平
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2022-11-28
Filing date: 2022-11-28
Publication date: 2023-03-03

Abstract

The embodiment of the invention provides a data storage method, a device and equipment of a fragment type cluster. By adopting the embodiment provided by the invention, before the data to be stored is written into the fragment set, the data to be stored and the unique field value of the data to be stored are obtained, whether the unique field value of the data to be stored exists is determined in the copy set, if yes, the data to be stored corresponding to the unique field value is stored in the fragment set, and the data to be stored can be refused to be stored in the fragment set. Therefore, the condition that the same data is stored for multiple times can be effectively reduced, and the reliability of data storage is effectively improved.

Description

Data storage method, device and equipment of fragment type cluster

Technical Field

The invention relates to the technical field of computers, in particular to a data storage method, a data storage device and data storage equipment of a fragment type cluster.

Background

The MongoDB piece-dividing type cluster dispersedly stores the data in the database to different storage devices by using a piece dividing mechanism so as to realize distributed storage of the data and improve load balance. The realization principle of the fragmentation mechanism is as follows: and determining the fragment keys of different fragment sets according to the fragment rule, and storing the data belonging to the same fragment key into the fragment set represented by the fragment key by taking the determined fragment key as an index basis.

The hash fragmentation rule is widely used in the MongoDB fragmentation cluster due to its characteristic of being able to store data in a database in a uniformly dispersed manner. The principle of the Hash fragmentation rule is as follows: and calculating the hash value of the chip key field by using a hash function as an index basis, and storing the data to be stored into the fragment set represented by the index basis.

However, since the input and the output of the hash function do not have a unique correspondence relationship, that is, the outputs obtained by different inputs may be the same. In this way, the hash fragmentation rule may be configured such that hash values of fragment key fields corresponding to different fragment key field values are the same. After receiving the data to be stored, the fragmentation cluster directly stores the data into a fragmentation set represented by the hash value of the fragment key field. For a high concurrency scenario, if multiple requests simultaneously request to store the same data to be stored, multiple identical data records are easily generated in a fragmented cluster, and the reliability of system data storage is further affected.

Disclosure of Invention

The embodiment of the invention aims to provide a data storage method, a data storage device and data storage equipment of a fragment type cluster, so as to ensure the reliability of data storage of the fragment type cluster. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a data storage method for a tile-wise cluster, where the tile-wise cluster includes a tile set and a duplicate set, where unique field values of all data stored in the tile set are recorded in the duplicate set, and the method includes:

acquiring data to be stored and a unique field value of the data to be stored;

determining whether the unique field value is recorded in the replica set;

and if the unique field value is recorded, refusing to store the data to be stored in the fragment set.

With reference to the first aspect, in a second possible embodiment, the method further includes:

determining whether the unique field value already exists in the shard set;

if yes, storing the data to be stored into the target fragment represented by the unique field value;

if not, performing the step of determining whether the unique field value is recorded in the duplicate set.

With reference to the first aspect, in a third possible embodiment, the method further includes:

if the unique field value is not recorded, unique index information is created for the unique field value;

and recording the unique field value into the duplicate set by taking the unique index information as a routing basis.

With reference to the third possible embodiment of the first aspect, in a fourth possible embodiment, the method further includes:

and if the unique index information meets a preset failure condition, deleting the unique index information and the unique field value in the duplicate set.

In a second aspect, an embodiment of the present invention further provides a data storage apparatus for a tile-based cluster, where the tile-based cluster includes a tile-based set and a duplicate set, where unique field values of all data stored in the tile-based set are recorded in the duplicate set, and the apparatus includes:

the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring data to be stored and a unique field value of the data to be stored; a first determining module to determine whether the unique field value is recorded in the duplicate set;

the first execution module is used for refusing to store the data to be stored in the fragment set if the unique field value is recorded.

With reference to the second aspect, in a second possible embodiment, the apparatus further includes:

a second determination module to determine whether the uniqueness field value already exists in the set of tiles;

the second execution module is used for storing the data to be stored into the target fragment represented by the unique field value if the unique field value exists;

a third execution module, configured to execute the step of determining whether the unique field value is recorded in the duplicate set if the unique field value does not exist.

In a third possible embodiment, in combination with the second aspect, the apparatus further includes:

the index information creating module is used for creating unique index information for the unique field value if the unique field value is not found;

In combination with the third possible embodiment of the second aspect, in a fourth possible embodiment, the apparatus further includes:

a fourth execution module, configured to delete the unique index information and the unique field value from the duplicate set if the unique index information satisfies a preset invalidation condition.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the data storage method of the slice type cluster in the first aspect when executing the program stored in the memory.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the data storage method for a tiled cluster according to the first aspect.

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a data storage method, a device and equipment of a fragment type cluster.

By adopting the embodiment provided by the invention, the duplicate set is different from the sharded set, and all data stored in the duplicate set are unique. The duplicate set can ensure the uniqueness of the stored data, before the data to be stored is written into the shard set, the data to be stored and the uniqueness field value of the data to be stored are obtained, whether the uniqueness field value of the data to be stored exists or not is determined in the duplicate set, if the uniqueness field value of the data to be stored exists, the data to be stored corresponding to the uniqueness field value is stored into the shard set, or the data to be stored corresponding to the uniqueness field value is in the process of being stored into the shard set, and the data to be stored of which the uniqueness field value exists can be refused to be stored into the shard set. Therefore, the repeated storage of the same data in the fragment set can be effectively reduced, and the reliability of data storage is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below.

FIG. 1a is a schematic diagram of a MongoDB database;

FIG. 1b is a schematic diagram of the data storage logic for a MongoDB shard collection;

fig. 2 is a schematic flowchart of a first method for storing data in a tile cluster according to the present invention;

FIG. 3 is a schematic flow chart of a data storage method for a tile cluster according to a second embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a data storage device of a tile cluster according to the present invention;

fig. 5 is a schematic block diagram of an electronic device provided by the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by one skilled in the art based on the embodiments of the present invention, are within the scope of the present invention.

The MongoDB database is a cross-platform and document-oriented database, and belongs to a document-type non-relational database. The MongoDB database comprises a set, the set in the MongoDB database is equivalent to a table in the relational database, documents are stored in the set, and each document in the MongoDB database is equivalent to one line of data in the relational database. For example, the data storage diagram in the MongoDB database can be as shown in fig. 1a, where a set 1 and a set 2 are included in one MongoDB database, a document 1 and a document 2 are stored in the set 1, and a document 3 and a document 4 are stored in the join 2. Documents supported by the MongoDB are very loose in structure, have no fixed line and column structure, and are often stored in a BSON format (Binary JSON, a data storage format) similar to JSON (JavaScript Object notification), that is, the MongoDB can be used for storing documents with any structure, so that the MongoDB can store more complex data types.

In the related art, the montodb database is deployed in three cluster ways: master-slave clusters, replica clusters, and tiled clusters. The working principle of the slice type cluster is as follows: the method comprises the following steps of storing data in a database into different storage devices in a scattered manner by utilizing a fragmentation mechanism, wherein the specific fragmentation mechanism is realized according to the following principle: and routing the data to different fragments for storage by using the fragmentation rule of the MongoDB according to the fragment key field specified by the fragment set.

For example, as shown in fig. 1b, it is assumed that there is one shard set in the shard cluster, the number of shards in the shard set is 3, and the specific shards are: the system comprises a first section, a second section and a third section, wherein in the section set, data is routed by taking age as a section key field, data in the data range of 0-25 years old is routed into the first section, data in the data range of 26-50 years old is routed into the second section, and data in the data range of 51-75 years old is routed into the third section. When receiving the data to be stored, storing the data to be stored into the fragment corresponding to the fragment set by taking the value of the age field as a routing basis according to the fragment key field (namely the age field) of the data to be stored. That is, as shown in fig. 1b, if the value of the age field of the first data is equal to 20, the first data can be directly stored into the first segment of the segment set by using 20 as the routing basis.

The rule for determining the shards of the commonly used MongoDB database comprises the following steps: range rules, hash rules. The above manner of determining the shards through the age group ranges belongs to one of the range rules, the shards determined by the range rules easily cause a large amount of data to be stored in a part of the shards, and only a small amount of data is stored in a part of the shards, so that the data distribution in the whole shard set is unbalanced.

Because the hash rule can enable data to be uniformly distributed and stored in each fragment in the MongoDB fragment cluster, the MongoDB database can store a large amount of data without a powerful large computer, and process larger load, and the hash rule is widely applied to the MongoDB fragment cluster.

The hash rule is that a hash function is used for calculating the hash value of the segment key field of the data to be stored as the routing basis of the data, and then the data to be stored is stored into the target segment corresponding to the hash value of the field. Wherein, the chip key field can be a field composed of one or more sub-fields of the data uniqueness field to be stored.

However, since the hash function has a condition that different inputs may generate the same output, the hash values calculated by the hash function for the values of two different key fields are the same, and such a characteristic plus the capability of the shard set itself does not provide the uniqueness guarantee. In this way, when the shard set of the data storage is determined by the hash rule, the uniqueness of the data cannot be guaranteed. Especially, in a service scenario with high writing concurrency, the amount of data to be processed simultaneously is very large, and a problem that one piece of data is stored for many times easily occurs. Specifically, for example, in a scenario where multiple requests simultaneously apply for storing the same piece of data, the same piece of data may be stored as multiple documents, so that uniqueness of the data cannot be guaranteed, and reliability of system data storage is affected.

The method aims to solve the problem that the reliability of data is low due to the fact that documents are not unique when the documents are written in the documents concurrently on the premise that the Hash rule is used by the fragment type cluster. The embodiment of the invention provides a data storage method of a fragment type cluster, wherein a copy set is arranged in the fragment type cluster, and the copy set is used for recording the unique field values of all data recorded in the fragment set.

In the embodiment of the present invention, a fragmented cluster may be a data storage system composed of multiple devices, and if the fragmented cluster is a data storage system, a fragment in the fragmented cluster may be a server and a device with storage resources in the system, or may be a storage unit such as a hard disk and a memory of a device in the system. The replica set can also be a server or a device with storage resources in the system, or a storage unit formed by one or more devices in the system.

In the embodiment of the present invention, an execution main body of the data storage method for a tiled cluster may be a management device (or node) in the tiled cluster, or a forwarding device (or node) responsible for receiving, forwarding, or delivering data to be stored to a tiled set so that the tiled set stores the data to be stored. The execution main body of the data storage method for a tile-based cluster may also be any device in the tile-based cluster that can support data message forwarding, or a processor in the tile-based cluster that has data message processing capability, and the specific form of the present invention is not limited in particular.

Referring to fig. 2, a data storage method of a tile cluster according to an embodiment of the present invention includes:

s21, acquiring data to be stored and a unique field value of the data to be stored;

s22, determining whether the unique field value of the data to be stored is recorded in the copy set;

and S23, if the data are recorded, refusing to store the data to be stored into the fragment set.

By adopting the embodiment provided by the invention, the duplicate set can ensure the uniqueness of the stored data, so that the data to be stored and the uniqueness field value of the data to be stored are obtained before the data to be stored is written into the fragmented set, and whether the uniqueness field value of the data to be stored exists in the duplicate set is determined. If the unique field value exists, the data to be stored corresponding to the unique field value is stored in the fragmented set, or the data to be stored corresponding to the unique field value is in the process of being stored in the fragmented set, and the data to be stored of which the unique field value exists can be refused to be stored in the fragmented set. Therefore, the repeated storage of the same data can be effectively reduced, and the reliability of data storage is effectively improved.

For clearly explaining the data storage method of the tile-wise cluster provided by the embodiment of the present invention, the foregoing S21 to S23 will be described below respectively:

in step S21, the value of the uniqueness field of the data to be stored may be understood as the value of the uniqueness field of the data to be stored. The uniqueness field of the data to be stored is field information capable of identifying the uniqueness of the data to be stored. The unique field may be attribute information of the data to be stored, and the unique field may also be a unique ID created by the system according to the attribute information of the data to be stored in combination with other preset information.

The uniqueness field is taken as the attribute information of the data to be stored for illustration, the attribute information of the data to be stored may be information existing when the data to be stored is generated by itself, and is used for representing the uniqueness of the data to be stored, and similar to the id number, whether the data to be stored is the same as other data may be judged according to the attribute information of the data to be stored.

Illustratively, if the attribute information of the first data to be stored is video1_ epsilon 1_20010102, which is used to indicate that the first data to be stored is a video of a first episode of a tv series recorded on 1, 2 and 2001, and if the attribute information of the second data to be stored is video1_ epsilon 1_20010102, it indicates that the second data to be stored is the same as the first data to be stored. If the attribute information of the third data to be stored is video1_ epsilon 2_20010102 which is different from the attribute information of the first data to be stored, it indicates that the third data to be stored is not the same as the first data to be stored.

In an embodiment of the present invention, the uniqueness field may be an original attribute text field of the data, similar to video1_ epsilon 1. The uniqueness field can also be a string of numbers of the original attribute text field of the data after data conversion. Illustratively, the uniqueness field may be the conversion of the video1_ epsilon 1 original attribute text field into a corresponding binary data string by ASCII code. Similarly, the chip key field may be an original text field, or a string of numbers obtained by data conversion of the original text field. The specific representation of the uniqueness field and the slice key field is not particularly limited.

Based on this, in one possible case, in step S21, the data to be stored and the unique field value of the data to be stored may be obtained while the data to be stored is obtained, that is, the data to be stored and the unique field value of the data to be stored are obtained simultaneously. Specifically, in a possible situation, the data to be stored and the unique field value of the data to be stored are packaged in the same data packet at the same time, and the data to be stored and the unique field value of the data to be stored can be simultaneously obtained by obtaining the data packet.

In another possible situation, in S21, the data to be stored may be obtained first, then the attribute information of the data to be stored is obtained according to the data to be stored, and then the unique field value of the data to be stored is determined according to the obtained attribute information, that is, the unique field values of the data to be stored and the data to be stored may be obtained in a time-sharing manner. The specific order of obtaining the data to be stored and the unique field value of the data to be stored is not limited in the present invention.

In the MongoDB database, the working principle of a set in a replica cluster (i.e. a replica set) is different from that of a set in a sharded cluster (i.e. a sharded set). The fragmentation set adopting the hash rule does not provide the capability of ensuring uniqueness, and when the stored data is received, no verification mechanism is used for verifying the uniqueness of the data to be stored, so that multiple repeated records may exist in the fragmentation set aiming at the same data. In the copy set, the MongoDB database provides the ability to ensure uniqueness, namely, the copy set creates unique index information. When the data to be stored is stored in the copy set, the unique index information created by the copy set can check whether the data to be stored and the value of the data recorded in the copy set on the unique field are repeated, and if not, the data to be stored is stored. And if the data to be stored is repeated, refusing to store the data to be stored in the copy set. As can be seen, the duplicate set can guarantee the uniqueness of the uniqueness field values of the data stored in the whole duplicate set.

In the embodiment of the invention, a duplicate set is skillfully introduced into the fragmentation cluster adopting the hash rule, and the duplicate set is utilized to record the unique field value of the data required to be stored in the fragmentation set. Therefore, in a high-concurrency writing scene, by utilizing the characteristic that the uniqueness of the data can be guaranteed in the duplicate set, the same data to be stored is requested to be stored simultaneously according to a plurality of requests for verification, and whether the same uniqueness field value is recorded in the duplicate set is determined. If the same unique field value record exists, it indicates that there are other requests requesting to store the data to be stored. If the same unique field value record does not exist, it indicates that no other request is requesting to store the data to be stored.

In one possible embodiment, the recording of the unique field value by the duplicate set may be that, when the unique field value of the data to be stored is not recorded by the duplicate set, the duplicate set receives the unique field value to be recorded, opens up a storage space for the unique field value, and generates a storage path (i.e., unique index information) for the unique field value, and then stores the unique field value under the corresponding storage space according to the generated storage path. In this case, the step S22 may determine whether there is data identical to the unique field value of the data to be stored by traversing the unique field values stored in the duplicate sets, and further determine whether the unique field value of the data to be stored is recorded.

Or, in another possible embodiment, the step S22 may determine whether there is unique index information corresponding to the unique field value by querying an index table of the duplicate set, and if there is unique index information corresponding to the unique field value, it indicates that the unique field value of the data to be stored is recorded.

In step S22, on the basis of determining whether the unique field value of the data to be stored is recorded in the duplicate set, if not, it indicates that the data to be stored to which the unique field value belongs has not been saved in the shard set.

If the unique field value of the data to be stored is recorded in the duplicate set, it indicates that the data to be stored to which the unique field value belongs has been stored in the sharded set, that is, the same data to be stored exists in the sharded set, and at this time, step S23 may refuse to store the data to be stored in the sharded set.

In a possible embodiment, the step S23 refuses to store the data to be stored in the fragmented set, and may be to abandon storing the data to be stored, and send an alarm message to remind that there is an error in the process of storing the piece of data to be stored, so as to perform storage with subsequent human intervention.

In another possible embodiment, the step S23 refuses to store the data to be stored in the shard set, which may be suspending a data processing flow of the data to be stored, and attempting to restart the storage flow of the data to be stored again after waiting for a preset time. For example, if the unique field value of the data to be stored is recorded in the duplicate set, the storage processing flow of the data to be stored is suspended, and after waiting for 1ms, the storage processing flow of the data to be stored is started again.

In a possible embodiment, as shown in fig. 3, the method for storing a sliced cluster provided by the present invention may further include, as shown in fig. 3, the following steps:

s31, acquiring a unique field value of data to be stored; for a specific obtaining method, reference may be made to the content recorded in step S21, which is not described herein again.

Step S32, determining whether the unique field value of the data to be stored exists in the fragment set; if yes, step S33 is executed, otherwise, steps S34 and S35 are executed.

And step S33, storing the data to be stored into the target fragment corresponding to the unique field value.

Step S34, determining whether the unique field value of the data to be stored is recorded in the copy set; the content of the step S22 can be referred to for a specific determination method, which is not described herein again.

And step S35, if yes, refusing to store the data to be stored into the fragment set. If the data to be stored is rejected to be stored in the shard set, the step S31 may be executed to determine whether the data to be stored is already stored in the shard set.

Specifically, in step S32, it is determined whether the unique field value of the data to be stored already exists in the shard set, and it may be determined whether the unique field value identical to the unique field value of the data to be stored is stored by querying the existing data stored in the shard set. If the data to be stored exists in the fragmentation set, the data to be stored is stored in the fragmentation set before the storage, and the data in the fragmentation set needs to be updated instead of adding the data to be stored in the fragmentation set.

Based on this, the shard storing the data with the same value as the unique field of the data to be stored is determined as the target shard, and when step S33 is executed, the data to be stored is stored into the target shard of the target shard set.

Specifically, in a possible embodiment, the data to be stored may be stored in the target segment corresponding to the routing basis by obtaining the routing basis used when storing the stored data. After the data is successfully stored in the fragment set, the stored data and the stored corresponding route may be separately stored in the form of a mapping table, and when step S33 is executed, the routing basis corresponding to the stored data may be obtained by querying the mapping table, and then the existing data having the same unique field value as the data to be stored is updated according to the routing basis.

When data is inserted into the MongoDB, an object ID field is automatically generated for the inserted data, and the storage sequence of the inserted data in the whole stored data can be determined according to the object ID field. In another possible embodiment, the data to be stored may be stored in the target fragment corresponding to the object ID by querying the object ID corresponding to the existing data record with the same value of the uniqueness field of the data to be stored.

If it is determined in step S32 that the uniqueness field value of the data to be stored in the sharded set does not exist, it indicates that the data to be stored is not stored in the sharded set before, that is, the data to be stored belongs to newly added data, that is, it is necessary to further perform step S34 according to the uniqueness field value of the data to be stored, and determine whether the uniqueness field value already exists in the duplicate set. Namely, whether a plurality of requests simultaneously apply for storing the data to be stored with the same unique field value is judged.

When the embodiment of the invention is selected, when the data to be stored needs to be stored in the fragment set, whether the data with the same unique field value as the data to be stored is stored in the fragment set or not can be determined according to the acquired unique field value, if the stored data with the same unique field value exists, the data to be stored does not belong to newly added data, and the data to be stored can be directly stored in the storage space corresponding to the stored data with the same unique field value, so that the stored data can be updated.

In a high concurrency scenario, a large amount of data is requested to be stored at the same time, that is, there is a case where a plurality of requests apply for storing data to be stored with the same unique field value. If the uniqueness field value of the data to be stored exists in the fragment set, the data to be stored is indicated to be updated data, and is not newly added data, and at the moment, the data to be stored is directly stored into the target fragment of the target fragment set corresponding to the stored data with the same uniqueness field value. If the data to be stored is newly added data, a copy set is introduced at the moment, and the uniqueness of the data to be stored is ensured by the copy set.

After it is determined in step S22 that the unique field value of the data to be stored does not exist in the duplicate set, in a possible embodiment, the unique field value of the data to be stored needs to be stored in the duplicate set. Specifically, unique index information is created for the unique field value of the data to be stored, and the unique field value of the data to be stored is recorded into the copy set.

The uniqueness of the unique field value of the data to be stored can be guaranteed due to the unique index information created for the unique field value of the data to be stored. Therefore, in the embodiment of the present invention, whether the unique field value of the data to be stored has been recorded in the duplicate set can be determined by querying whether the unique index information corresponding to the unique field value of the data to be stored already exists in the duplicate set.

In the embodiment of the present invention, the unique field value stored in the duplicate set is mainly used for determining whether the unique field values of the data to be stored are the same. With the help of the duplicate sets, the mapping relationship between all the data stored in the fragmented sets and the unique field values of the data belongs to a one-to-one correspondence relationship, and at this time, the unique field values stored in the duplicate sets complete self mission.

Because the hash value of the slice key field in the slice set is used as a routing basis, the data uniqueness cannot be guaranteed to occur in a high-concurrency scene, and the probability of occurrence of a common scene is low. Therefore, in order to save the storage resources of the duplicate sets, in one possible embodiment, an invalidation condition is set for the unique field value recorded in the duplicate sets, and when the recorded unique field value meets the preset invalidation condition, the unique field value can be deleted, so that the occupation of the system resources of the duplicate sets by the unique field value of the data can be effectively saved. Specifically, the method comprises the following steps:

and if the unique field value records stored in the duplicate sets meet the preset failure condition, deleting the corresponding unique field value records in the duplicate sets.

Specifically, a creation time field is added to each unique field value record saved in the duplicate set, and a TTL index is established in the creation time field and used for indicating the effective duration of each unique field value record. And when the effective time length represented by the TTL index reaches, deleting the corresponding unique field value record in the duplicate set, and deleting the unique index information corresponding to the unique field value.

In another aspect of the present invention, there is also provided a data storage device of a slice type cluster, as shown in fig. 4, the data storage device including:

the obtaining module 401 is configured to obtain data to be stored and a unique field value of the data to be stored;

a first determining module 402 for determining whether the unique field value is recorded in a duplicate set;

the first executing module 403 is configured to refuse to store the data to be stored in the fragmented set if the unique field value is recorded.

In a possible embodiment, the obtaining module 401 is further configured to obtain a unique field value of the data to be stored; the data storage device further includes:

a second determining module 404, configured to determine whether a unique field value of data to be stored already exists in the shard set;

the second execution module 405 is configured to, if the unique field value already exists, store the data to be stored in the target fragment represented by the unique field value;

the third performing module 406 is further configured to perform the step of determining whether the unique field value is recorded in the duplicate set if the unique field value does not exist.

In one possible embodiment, the data storage device further comprises:

an index information creating module 407, configured to create unique index information for the unique field value if the unique field value is not recorded;

the recording module 408 is configured to record the unique field value into the duplicate set by using the unique index information as a routing basis.

In one possible embodiment, the data storage device further comprises:

the fourth executing module 409 is configured to delete the unique index information and the unique field value from the duplicate set if the unique index information meets a preset invalidation condition.

The embodiment of the present invention further provides an electronic device, as shown in fig. 5, which includes a processor 501, a communication interface 502, a memory 503 and a communication bus 504, wherein the processor 501, the communication interface 502 and the memory 503 complete mutual communication through the communication bus 504,

a memory 503 for storing a computer program;

the processor 501 is configured to implement the following steps when executing the program stored in the memory 503:

acquiring data to be stored and a unique field value of the data to be stored;

determining whether the unique field value of the data to be stored is recorded in the duplicate set;

and if the data to be stored is recorded, refusing to store the data to be stored in the fragment set.

The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.

The communication interface is used for communication between the terminal and other devices.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

In a further embodiment provided by the present invention, a computer-readable storage medium is further provided, where a computer program is stored in the computer-readable storage medium, and when executed by a processor, the computer program implements the data storage method of the tiled cluster described in any of the above embodiments.

In yet another embodiment provided by the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the data storage method of the tiled cluster described in any of the above embodiments.

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to be performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A data storage method of a fragment type cluster is characterized in that the fragment type cluster comprises a fragment set and a copy set, wherein the copy set records unique field values of all data stored in the fragment set, and the method comprises the following steps:

acquiring data to be stored and a unique field value of the data to be stored;

determining whether the unique field value is recorded in the duplicate set;

if the unique field value is recorded, the data to be stored is refused to be stored in the fragment set.

2. The method of claim 1, further comprising:

determining whether the unique field value already exists in the shard set;

3. The method of claim 1, further comprising:

4. The method of claim 3, further comprising:

5. A data storage device of a tile-type cluster, wherein the tile-type cluster includes a tile-type set and a duplicate set, and a unique field value of all data stored in the tile-type set is recorded in the duplicate set, the device comprising:

the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring data to be stored and the unique field value of the data to be stored;

a first determining module to determine whether the unique field value is recorded in the duplicate set;

a first execution module, configured to refuse to store the data to be stored in the shard set if the unique field value is recorded.

6. The apparatus of claim 5, further comprising:

a second determining module to determine whether the unique field value already exists in the set of shards;

a third executing module, configured to execute the step of determining whether the unique field value is recorded in the duplicate set if the unique field value does not exist.

7. The apparatus of claim 5, further comprising:

the index information creating module is used for creating unique index information for the unique field value if the unique field value is not recorded;

and the recording module is used for recording the unique field value into the duplicate set by taking the unique index information as a routing basis.

8. The apparatus of claim 7, further comprising:

and the fourth execution module is used for deleting the unique index information and the unique field value in the duplicate set if the unique index information meets a preset failure condition.

9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 4 when executing a program stored in the memory.

10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 4.