CN110362590A - Data managing method, device, system, electronic equipment and computer-readable medium - Google Patents

Data managing method, device, system, electronic equipment and computer-readable medium Download PDF

Info

Publication number
CN110362590A
CN110362590A CN201810283415.2A CN201810283415A CN110362590A CN 110362590 A CN110362590 A CN 110362590A CN 201810283415 A CN201810283415 A CN 201810283415A CN 110362590 A CN110362590 A CN 110362590A
Authority
CN
China
Prior art keywords
filter
data
user
specified memory
character string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810283415.2A
Other languages
Chinese (zh)
Inventor
黄日成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810283415.2A priority Critical patent/CN110362590A/en
Publication of CN110362590A publication Critical patent/CN110362590A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages

Abstract

This disclosure relates to a kind of data managing method, device, system, electronic equipment and computer-readable medium.It include character string in the data removal request this method comprises: receiving the data removal request of user by first filter;The data removal request is handled by specified memory;And when the data removal request number is greater than threshold value, the specified memory is rebuild by the second filter, wherein the first filter and second filter are Bloom filter.This disclosure relates to data managing method, device, system, electronic equipment and computer-readable medium, the storage resource of user name stocking system can be saved, though a large number of users name be deleted after be still able to guarantee user name search accuracy rate.

Description

Data managing method, device, system, electronic equipment and computer-readable medium
Technical field
This disclosure relates to computer information processing field, in particular to a kind of data managing method, device, system, Electronic equipment and computer-readable medium.
Background technique
With the development of internet, various businesses system emerges one after another, unavoidable in the operation system of human beings Need to be arranged the function of the pet name to everyone, the pet name plays the part of important role in the system based on relation chain.Especially More large-scale operation system, such as: instant chat system, network microblog system etc., the corresponding pet name of each user is in system In be all it is unique, this when, pet name itself occupies a large amount of storage space, and an independent pet name system is needed to deposit Store up the pet name of user.
When user states network traffic system in use, it is necessary first to which to the application of pet name system, one exclusive user is close Claim;During user states network traffic system in use, there is largely by pet name system to the pet name applied It deletes or modification operates.In terms of these frequent operations bring many reliability and treatment effeciency to user's pet name system The problem of.
Therefore, it is necessary to a kind of new data managing method, device, system, electronic equipment and computer-readable mediums.
Above- mentioned information are only used for reinforcing the understanding to the background of the disclosure, therefore it disclosed in the background technology part It may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
In view of this, the disclosure provides a kind of data managing method, device, system, electronic equipment and computer-readable Jie Matter can save the storage resource of user name stocking system, and even if be still able to guarantee after a large number of users name is deleted The accuracy rate of user name search.
Other characteristics and advantages of the disclosure will be apparent from by the following detailed description, or partially by the disclosure Practice and acquistion.
According to the one side of the disclosure, a kind of data managing method is proposed, this method comprises: receiving by first filter The data removal request of user includes character string in the data removal request;Pass through number described in specified memory and log processing According to removal request;And when the data removal request number is greater than threshold value, the finger is rebuild by the second filter Determine memory, wherein the first filter and second filter are Bloom filter.
In a kind of exemplary embodiment of the disclosure, the first filter and second filter are at distribution Processing node in reason system.
In a kind of exemplary embodiment of the disclosure, the distributed processing system(DPS) is constructed by Zookeeper framework.
According to the one side of the disclosure, a kind of data administrator is proposed, which includes: data removing module, is used for The data removal request of user is received by first filter, includes character string in the data removal request;Memory handles mould Block, for handling the data removal request by specified memory;And data reconstruction module, for being asked in data deletion Ask number be greater than threshold value when, the specified memory is rebuild by the second filter, wherein the first filter with it is described Second filter is Bloom filter.
In a kind of exemplary embodiment of the disclosure, further includes: data inquiry module, for being filtered by described first Device receives the data inquiry request of user, includes character string in the data inquiry request;And module is returned the result, for leading to The specified memory is crossed to handle the data inquiry request and return to processing result.
In a kind of exemplary embodiment of the disclosure, further includes: memory constructs module, for passing through virtual memory mappings Construct the specified memory.
According to the one side of the disclosure, propose that a kind of data management system, the system are distributed system, the system It include: multiple processing nodes, each processing node includes first filter, the second filter and specified memory;Wherein, described First filter is used to receive the data inquiry request of user, includes character string in the data inquiry request;It is described specified interior It deposits for handling the data removal request, second filter is used to be greater than threshold value in the data removal request number When, rebuild the specified memory, wherein the first filter and second filter are Bloom filter.
According to the one side of the disclosure, a kind of electronic equipment is proposed, which includes: one or more processors; Storage device, for storing one or more programs;When one or more programs are executed by one or more processors, so that one A or multiple processors realize such as methodology above.
According to the one side of the disclosure, it proposes a kind of computer-readable medium, is stored thereon with computer program, the program Method as mentioned in the above is realized when being executed by processor.
According to the data managing method of the disclosure, device, system, electronic equipment and computer-readable medium, can save The storage resource of user name stocking system, and even if be still able to guarantee user name search after a large number of users name is deleted Accuracy rate.
It should be understood that the above general description and the following detailed description are merely exemplary, this can not be limited It is open.
Detailed description of the invention
Its example embodiment is described in detail by referring to accompanying drawing, above and other target, feature and the advantage of the disclosure will It becomes more fully apparent.Drawings discussed below is only some embodiments of the present disclosure, for the ordinary skill of this field For personnel, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the system block diagram of a kind of data managing method and device shown according to an exemplary embodiment.
Fig. 2 is a kind of application scenarios schematic diagram of data managing method shown according to an exemplary embodiment.
Fig. 3 is a kind of flow chart of data managing method shown according to an exemplary embodiment.
Fig. 4 is a kind of flow chart of the data managing method shown according to another exemplary embodiment.
Fig. 5 is a kind of flow chart of the data managing method shown according to another exemplary embodiment.
Fig. 6 is a kind of block diagram of data administrator shown according to an exemplary embodiment.
Fig. 7 is a kind of block diagram of the data administrator shown according to another exemplary embodiment.
Fig. 8 is a kind of block diagram of data management system shown according to an exemplary embodiment.
Fig. 9 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be real in a variety of forms It applies, and is not understood as limited to embodiment set forth herein;On the contrary, thesing embodiments are provided so that the disclosure will be comprehensively and complete It is whole, and the design of example embodiment is comprehensively communicated to those skilled in the art.Identical appended drawing reference indicates in figure Same or similar part, thus repetition thereof will be omitted.
In addition, described feature, structure or characteristic can be incorporated in one or more implementations in any suitable manner In example.In the following description, many details are provided to provide and fully understand to embodiment of the disclosure.However, It will be appreciated by persons skilled in the art that can with technical solution of the disclosure without one or more in specific detail, Or it can be using other methods, constituent element, device, step etc..In other cases, it is not shown in detail or describes known side Method, device, realization or operation are to avoid fuzzy all aspects of this disclosure.
Block diagram shown in the drawings is only functional entity, not necessarily must be corresponding with physically separate entity. I.e., it is possible to realize these functional entitys using software form, or realized in one or more hardware modules or integrated circuit These functional entitys, or these functional entitys are realized in heterogeneous networks and/or processor device and/or microcontroller device.
Flow chart shown in the drawings is merely illustrative, it is not necessary to including all content and operation/step, It is not required to execute by described sequence.For example, some operation/steps can also decompose, and some operation/steps can close And or part merge, therefore the sequence actually executed is possible to change according to the actual situation.
It should be understood that although herein various assemblies may be described using term first, second, third, etc., these groups Part should not be limited by these terms.These terms are to distinguish a component and another component.Therefore, first group be discussed herein below Part can be described as the second component without departing from the teaching of disclosure concept.As used herein, term " and/or " include associated All combinations for listing any of project and one or more.
It will be understood by those skilled in the art that attached drawing is the schematic diagram of example embodiment, module or process in attached drawing Necessary to not necessarily implementing the disclosure, therefore it cannot be used for the protection scope of the limitation disclosure.
Particular content of the invention is described in detail below with reference to specific application scenarios.
The disclosure inventors have found that user's pet name system in the prior art is mainly the following work side at present Formula:
1. user's pet name system schema based on relational database: the field that the pet name is stored in database profession, the pet name CRC32 value or hash value storing data library table another field as index.One new pet name character string of request when It waits, first calculates the hash value H (or CRC32 value) of pet name character string, database is then inquired according to H, if inquired, is compared It is whether completely the same with the pet name of request compared with the pet name stored in the database checked out, returned if consistent the pet name by It uses.Otherwise, the new pet name and hash value H are inserted into database, returned successfully.
Wherein, CRC (Cyclic Redundancy Check, cyclic redundancy check code), is one in data communication field Kind debugging check code, it is characterized in that the length of information field and check field can be arbitrarily selected.Cyclical Redundancy Check (CRC) is A kind of data transmission error-detecting function carries out polynomial computation to data, and obtained result is attached to behind frame, receiving device Also similar algorithm is executed, to guarantee the correctness and integrality of data transmission.CRC32 then indicates that a 32bit (8 can be generated Position hexadecimal number) check value.
" hash " is done in Hash, general translation, and also having direct transliteration is " Hash ", is the input random length by dissipating Column algorithm is transformed into the output of regular length, which is exactly hashed value.This conversion is a kind of compression mapping, hashed value Space is generally much less than the space inputted, and different inputs may hash to identical output, it is impossible to from hashed value To determine unique input value.
2. user's pet name system schema based on Hash table: the pet name is stored in Hash table, when requesting the new pet name, first Hash table is inquired, is inquired, the pet name is returned and repeats to fail, otherwise the new pet name is inserted into Hash table, is returned successfully.
Wherein, Hash table (Hash table, also cry hash table), be according to key value (Key value) and directly into The data structure of row access.That is, Hash table accesses record by the way that key value is mapped in table a position, with Accelerate the speed searched.This mapping function is called hash function, and the array for storing record is called hash table.
3. user's pet name system schema of the storage system based on key-value: the pet name is stored in key- as key In the storage system of value, when requesting the new pet name, the storage system of first key-value is inquired, and returns to pet name repetition Failure returns successfully otherwise by the storage system of new pet name insertion key-value.
Wherein, Key-Value storing data library is a kind of NoSQL (non-relational database) model, and data are according to key The form of value pair carries out tissue, index and storage.Key-Value storage is very suitable to not be related to excessive data relationship business relations Business datum, while can effectively reduce read-write disk number, possess better readwrite performance than SQL database storage.
The disclosure inventors have found that user's pet name system mentioned above presently, there are following disadvantages:
1. storing the pet name by single table, there are readwrite performances in user's pet name system schema based on relational database Bottleneck can not accomplish high-performance, even if can only solve the bottleneck of readwrite performance in such a way that table is divided in a point library, but divide library point The mode of table can introduce more problems.
2. the pet name is stored in Hash table in user's pet name system schema based on Hash table, more memory is needed, Need larger memory machines.Meanwhile in order to guarantee read-write efficiency, for the filling rate of Hash table generally no more than 50%, memory is empty Between utilization rate it is low.
3. the pet name consumes excessively as key in user's pet name system schema of the storage system based on key-value Storage resource, for 100,000,000 pet name, average 12 byte of pet name length stores a pet name and needs 16 words in addition other are consumed Section, then always need the storage resource of 1.6G.
For the inventor of the disclosure in view of user's pet name system is when progress user's pet name judges, most important is to guarantee that its is close Title be not repeated uses, that is, guarantee user carry out system registry when, for its distribute the pet name in pet name system It is unique.When user carries out system registry, if the pet name being not used, the quilt in low probability It is judged to having used (or existing), so that the case where user's registration others pet name is acceptable.Therefore, the disclosure In the case where guaranteeing lesser rate of false alarm, Bloom filter (Bloom filter) is introduced to carry out pet name application in user When, duplicate removal judgement is carried out to the pet name.
It is noted that the data management system of the disclosure, cannot be only used for the judgement processing of user's pet name data, also It is applicable to use in the scene of the data search and deletion under any scene.
Can for example, veritification to user ID data, and to the inquiry and veritification of Item Management data and other relate to And in the scene of data veritification.The application scenarios that below registration of user's pet name will be carried out on network with user with deleted, and tie Detailed description of the invention is closed particular content of the invention is described in detail.
Fig. 1 is the system block diagram of a kind of data managing method and device shown according to an exemplary embodiment.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 101,102,103 The application of page browsing device, searching class application, instant messaging tools, mailbox client, social platform software etc..
Terminal device 101,102,103 can be the various electronic equipments with display screen and supported web page browsing, packet Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 105 can be to provide the server of various services, such as utilize terminal device 101,102,103 to user The website browsed provides the back-stage management server supported.Back-stage management server can be to numbers such as the user's requests received According to carrying out the processing such as analyzing, and processing result is fed back into terminal device.
Server 105 can for example receive the data inquiry request of user, the data query by the first filter It include character string in request;Server 105 can be handled for example at the data inquiry request and return by the specified memory Manage result.
Server 105 can also for example receive the data removal request of user by first filter, and the data deletion is asked It include character string in asking;Server 105 can for example pass through data removal request described in specified memory and log processing;Server 105 can rebuild the specified memory by the second filter for example when the data removal request number is greater than threshold value; Wherein, the first filter and second filter are Bloom filter.
Server 105 can be the server of an entity, also may be, for example, multiple server compositions.First filter with Second filter can be located in the same server 105.Server 105 can be used as one in larger system Handle node.Multiple servers 105 can form distributed processing system(DPS), and each server 105 is used as distributed processing system(DPS) A node.Distributed processing system(DPS) can be constructed for example by Zookeeper framework.It is constructed by Zookeeper framework Multiple servers 105 constitute user's pet name system of processing user's pet name together.At any one time, in user's pet name system There is a server 105 working as main equipment, other servers 105 are used as back-up system, are in standby.It is main Equipment handles the user name application request or user name removal request of user in real time, when user's pet name system is there are when failure, The seamless switching between main equipment and stand-by equipment is carried out by Zookeeper come the High Availabitity of safeguards system.
It should be noted that data managing method provided by the embodiment of the present disclosure can be executed by server 105, accordingly Ground, user data management device can be set in server 105.And the request end for being supplied to user's progress netpage registration is general In terminal device 101,102,103.
According to the data management system of the disclosure, the pet name is carried out by Bloom filter and repeats to judge, due to Bloom Filter does not need the storage pet name itself, regardless of user's pet name have how long, it is only necessary to the memory space of fixed a small amount of byte is Can, it can save a large amount of for storing the storage resource of user's pet name.
Be distributed processing system(DPS) according to the data management system of the disclosure, in systems by using main equipment with it is standby The registration of user's pet name and the service of deleting are provided with equipment, and carried out between main equipment and stand-by equipment by Zookeeper Coordination switching, be capable of the high availability of Materiel Support Data Management System.Wherein, high availability (High Availability) is logical Often pass through special design to describe a system, thus the high degree of availability for reducing downtime, and it being kept to service.
First filter and second are provided in each processing node according to the data management system of the disclosure Filter is provided the registration pet name of user in real time by first filter and deletes the request of the pet name in actual treatment task, When removal request is more than threshold value, in order to avoid the increase and decrease of system mistake probability, by first filter and the second filter this The double Bloom filters of kind carry out the synchronous mode of data, realize the deletion function of Bloom filter, this processing mode makes The accuracy rate of user name search can be guaranteed after a large number of users name is deleted by obtaining data management system.
Fig. 2 is a kind of application scenarios schematic diagram of data managing method shown according to an exemplary embodiment.Fig. 2 example Property illustrate user in the process using the data management system in the application.
For user when using certain network system, the displaying interface of network system can be located at mobile terminal or master at user Generator terminal, user when logging in the network system for the first time, and network system prompt creation user name or user's pet name can also examples Such as, prompt user provides other user related datas.The network system can also for example be pushed to the user's registration login page, To assist the user to carry out relevant operation processing.
In one embodiment, user submits user's pet name registration request by the received user's registration interface in mobile terminal, It include user's user's pet name to be registered in the request, user's pet name is character string.It is close that the character string is committed to responsible user In the server for claiming processing, server returns to whether user's pet name can be registered according to query result.
Maximum limitation can be set to the character string for example according to the limitation of different server process data.It is infused in user The user name of volume is prompted when being more than maximum limitation, so that user modifies to user's pet name to be committed.In the character It, can be for example by predetermined placeholder by user's pet name for the unified convenience of background server processing when string is less than maximum limitation Corresponding character string is supplemented to uniform length.The application is not limited.
It can also be for example, limiting the user names registered, or the obvious user's pet name for not meeting relevant laws and regulations, thing for certain Occupy-place setting is first carried out in the server for being responsible for the processing of user's pet name, when user submits pet name registration request, cannot be registered Success.
It can be for example, user submits user's pet name registration request by the received user's registration interface of host side, in the request Including user's pet name that user is to be registered, user's pet name is character string.The character string is committed to responsible user's pet name processing In server, server returns to whether user's pet name can be registered according to query result.
In one embodiment, user submits the deletion of user's pet name to ask by the received subscriber information management interface in mobile terminal It asks, includes user's user's pet name to be deleted in the request, user's pet name is character string.The character string is committed to responsible user In the server of pet name processing, server returns to whether user's pet name can delete according to query result.
Fig. 3 is a kind of flow chart of the data managing method shown according to another exemplary embodiment.As shown in figure 3, this Data managing method in application in the specific implementation, includes the steps that S302 into S304.
As shown in figure 3, receiving the data inquiry request of user, the data by the first filter in S302 It include character string in inquiry request.Wherein, first filter is grand (Bloom) filter of cloth.Bloom filter is actually one A very long binary vector and a series of random mapping functions.It is random that Bloom filter working principle is based on Bloom filter Data structure, Bloom filter compactly indicates a set using bit array very much, and can judge whether an element belongs to this A set.Bloom filter can be used for retrieving an element whether in a set.
When an element is added into set, the K that is mapped to this element by K hash function in one units group It is a, they are set to 1.When retrieval, (about) know in set either with or without it as long as whether these determining points are all 1 If: these points have any one 0, tested element does not exist centainly;If being all 1, tested element is likely to.It compares In other data structures, Bloom filter has big advantage in terms of room and time.Bloom filter memory space It is all constant (O (k)) with insertion/query time.And its not storage element itself, it is very strict in certain pairs of security requirements Occasion is advantageous.
In one embodiment, data inquiry request from the user, data inquiry request are received by first filter It may be, for example, user's pet name registration request, user's pet name to be registered inputs first filter in the form of character string.
In S304, the data inquiry request is handled by the specified memory and returns to processing result.It can be for example, logical The mark hyte crossed in the specified memory handles the data inquiry request.
In one embodiment, the specified memory for example can be constructed by virtual memory mappings technology.Virtual memory skill Art may be, for example, mmap technology, specified memory that can for example by mono- piece of shared drive of local mmap as Bloom filter. One file or other objects are mapped into memory by mmap, and file is mapped on multiple pages, if the size of file is not All pages of the sum of size, the space that the last one page is not used will be reset.Mmap is in user's space invoking upon mapping system Middle effect is very big, and mmap operation provides a kind of mechanism, allows the direct access equipment memory of user program, and this mechanism compares Data are copied mutually in user's space and kernel spacing, it is more efficient.
In one embodiment, in specified memory, memory head (struct bloom) identifies the storage of Bloom filter Essential information, next contiguous memory bit identify whether the pet name is previously used as flag bit.
The information of specified memory head storage can be for example as follows:
In S3042, the character string corresponding mark hyte in the specified memory is determined;
In S3044, when each of described mark hyte mark is the first numerical value, return the result as the word Symbol string is existing.
In one embodiment, the first numerical value is 1, and character string is mapped in a units group by K hash function K point determine whether these corresponding points of character string are all 1 just (about) to know the pet name then in first filter Either with or without it in system: if being all 1, tested character string is likely to.In the present embodiment, do not consider first filter Error condition, when character string it is corresponding these point be all 1 when, then it is assumed that the character string is necessarily present in user name system, The result for returning to user may be, for example, " user name is existing ".User can also be prompted to replace user name to apply again.
In S3046, when each of described mark hyte mark is not all the first numerical value, by the mark hyte Each of mark be set as first numerical value;And it returns the result and is not present for the character string.
In one embodiment, the first numerical value is 1, and character string is mapped in a units group by K hash function K point determine whether the identification point in the corresponding mark group of character string is all 1 to be known that then in first filter Either with or without it in set: if the identification point in mark group have any one 0, tested character string does not exist centainly.Work as character When the identification point gone here and there in corresponding mark group is not all 1, then it is assumed that the character string is centainly not present in user name system, is returned The result of user may be, for example, " user name is not present ".
In one embodiment, when the identification point in the corresponding mark group of character string is not all 1, then it is assumed that the character string It is centainly not present in user name system, the identification point in the corresponding mark group of the character string is set to 1 at this time, by the use The family pet name is stored into pet name system.
In one embodiment, the treatment process of first filter is recorded by binary log.
According to the data managing method of the disclosure, the pet name is carried out by Bloom filter and repeats to judge, due to Bloom Filter does not need the storage pet name itself, regardless of user's pet name have how long, it is only necessary to the memory space of fixed a small amount of byte is Can, it can save a large amount of for storing the storage resource of user's pet name.
Fig. 4 is a kind of flow chart of data managing method shown according to an exemplary embodiment.As shown in figure 4, this Shen Please in data managing method in the specific implementation, include the steps that S402 into S406.
As shown in figure 4, receiving the data removal request of user by first filter in S402, the data are deleted It include character string in request.Wherein, the first filter and second filter are Bloom filter.
In one embodiment, data removal request from the user, data removal request are received by first filter It may be, for example, user's pet name removal request, user's pet name to be deleted inputs first filter in the form of character string.
In S404, pass through data removal request described in specified memory and log processing.It can be for example, by described specified interior Mark hyte in depositing handles the data inquiry request.As described above, which can for example be reflected by virtual memory Penetrate technology building.
In one embodiment, the data removal request is handled by specified memory and comprises determining that the character string exists Corresponding mark hyte in the specified memory;When each of described mark hyte mark is the first numerical value, in day The mark hyte is recorded as deleting in will, the log is binary log.
In one embodiment, the first numerical value is 1, and character string to be deleted is mapped to one by K hash function K point in bit array, then in first filter, determine the identification point in the corresponding mark group of character string whether be all 1.If the identification point in the corresponding mark group of the character string is 1, it is determined that find user's pet name.
In one embodiment, if the identification point in the corresponding mark group of the character string is not 1, it is determined that pet name system This user's pet name is not present in system, can for example return to notification message at this time is " pet name is not present ".
In one embodiment, determine find character string to be deleted corresponding mark group when, pass through binary log It is middle by the mark group be recorded as deleting.Binary log may be, for example, binlog log.Binlog log effect is for remembering Record data store internal additions and deletions change the record (change to database) for the content for having update to database such as looking into, and are mainly used for data The leader follower replication and increment recovery in library.Binlog is binary log file, and binlog log is also used to record data update Or potential update (for example DELETE sentence executes the data deleted and be actually not carried out).
Wherein, the format of binlog log is divided into tri- kinds of statement, row and mixed, in the present embodiment The format of binlog log may be, for example, that one of three of the above or several combinations, the application are not limited.
In one embodiment, determine find character string to be deleted corresponding mark group when, by formulating in memory Specified storage space record deletion character string number.It can be for example, the storage of the specified memory header file as described in not yet be set It sets, " delete_count " bit of storage deletes number to record user's pet name.
In S406, when the data removal request number is greater than threshold value, rebuild by the second filter described Specified memory.It can be for example, recording the operation processing of the first filter by binary log;And pass through the binary system It is synchronous to rebuild the specified memory that log carries out the data between the first filter and second filter.Its In, the second filter is Bloom filter.
In one embodiment, " delete_count " enables the second filter for judging whether, can be set " delete_count " enables the second filter when being greater than certain threshold values.When enabling the second filter, the second filter Storage is rebuild by binlog, the storage of reconstruction will eliminate deleted element, after the second filter is rebuild, It will do it and switch between the second filter and first filter, after switching, " delete_count " resets to 0.
According to the data managing method of the disclosure, the registration pet name and the deletion for providing user in real time by first filter are close The request of title, in order to avoid the increase and decrease of system mistake probability, passes through first filter and second when removal request is more than threshold value This double Bloom filters of filter carry out the synchronous mode of data, realize the deletion function of Bloom filter, this place Reason mode enables data management system to guarantee the accuracy rate of user name search after a large number of users name is deleted.
It in one embodiment, can be for example, determining the specified memory by candidate character strings quantity and error probability Memory space.Bloom filter relies on multiple hash functions by compound mapping into bit array, if the Kazakhstan that selection is mapped The number of uncommon function is more, then obtaining 0 probability when the element for being not belonging to set to one is inquired with regard to big;But another party Face, if the number of hash function is few, 0 in bit array is just more.Optimal hash function number in order to obtain, can example Such as, according to error rate formula calculated.
When original state, Bloom filter is one and includes m bit arrays, each is all set to 0.In order to express S= The set of { x1, x2 ..., xn } such a n element, Bloom filter use k mutually independent hash functions, they Each element in set is mapped to respectively in the range of { 1 ..., m }.To any one element x, i-th of hash function reflects The position hi (x) penetrated will be set to 1 (1≤i≤k).Note that if a position is repeatedly set to 1, only for the first time Can work, behind several times will be without any effect.
Assuming that kn < m and each hash function is completely random.When set S={ x1, x2 ..., xn } all elements all When being mapped to by k hash function in m bit arrays, in this bit array a certain position or 0 probability, the as Bloom The error probability of filter are as follows:
F=(1-e-kn/m)k
By error probability, the size of the storage space of specified memory counter can be pushed away, as, it is assumed that the error probability of permission When being 0.01:
(1-e-kn/m)k≤0.01;
The array space of the needs of needs counter can be released, and then determines the size of the storage space of specified memory.
In the prior art, either in user's space still in kernel spacing, software cannot go directly to access without exception and set Standby physical address;If wanting the physical address of access equipment in kernel-driven, need to arrive the physical address map of equipment On kernel virtual address (dynamic memory map section), it is exactly that must visit indirect that later driver, which accesses this kernel virtual address, Ask the physical address of equipment.So Bloom filter is built upon on the shared drive of mmap in the application, filtered in Bloom During device executes, the operation on memory is directly carried out, can guarantee high-performance;And mmap technology can land data To disk file, can guarantee that the data in Bloom filter will not be lost user's pet name system cut-off.
In a kind of exemplary embodiment of the disclosure, the first filter and second filter are at distribution Processing node in reason system.The distributed processing system(DPS) is constructed by Zookeeper framework.Wherein, Zookeeper is one A distributed, the distributed application program coordination service of open source code is one and provides Consistency service for Distributed Application Software.Zookeeper is the manager of cluster, monitor feedback that the state of each node in cluster is submitted according to node into Row next step reasonable operation.Finally, the system of interface and performance efficiency easy to use, function-stable is supplied to user.
In one embodiment, first filter and second filter can be located in the same server.Service Device can be used as a processing node in larger system.Multiple servers can form distributed processing system(DPS), each A node of the server as distributed processing system(DPS).Distributed processing system(DPS) is constructed by Zookeeper framework.Any Moment has a server working as main equipment in user's pet name system, other servers as back-up system, It is in standby.Main equipment handles the user name application request or user name removal request of user in real time, when user's pet name System carries out the seamless switching between main equipment and stand-by equipment by Zookeeper there are when failure come safeguards system High Availabitity.
In one embodiment, when above-mentioned distributed processing system(DPS) system initialization, primary server on the estimation close In the case of the error rate for claiming candidate collection n and needs, memory space, the hash function number k of needs are calculated, then according to need The storage wanted to establish specified shared drive by mmap technology.
In one embodiment, after initializing the first filter in host node server, Zookeeper saves host node The Internet protocol address (Internet Protocol Address, ip) of server, the user terminal that user uses from Zookeeper obtains the ip of current primary node server, when host node server breaks down, is led by Zookeeper Switching between node server and standby node server, so that system guarantees High Availabitity.
According to the data managing method of the disclosure, data management system is constituted by distributed system, in data management system There are multiple equipment in system, real time service is provided by using the differentiation of main equipment and stand-by equipment between multiple equipment, into Row by way of carrying out active and standby seamless switching, is capable of the High Availabitity of safeguards system Zookeeper.
It will be clearly understood that the present disclosure describes how to form and use particular example, but the principle of the disclosure is not limited to These exemplary any details.On the contrary, the introduction based on disclosure disclosure, these principles can be applied to many other Embodiment.
In actual scene, it is assumed that number of users is registered as 100,000,000 in certain website, corresponding user's pet name also have 100,000,000 it It is more.Assuming that average 12 byte of pet name length stores a pet name and need 16 bytes in addition other are consumed, used for needing to safeguard For user's pet name system that name in an account book quantity is 100,000,000, the storage resource or more of 1.6G is needed.According to the data of the disclosure Management method, in the case where error rate is less than a ten thousandth, for needing to safeguard that user name quantity is 100,000,000 user's pet name For system, it is only necessary to which the data storage of the system can be realized in the memory space of 280M.And memory space needed for this case It will not increase as the average length of the pet name is elongated, other in background introduction realize that the scheme of user's pet name can then synchronize increasing Add.
Simultaneously as Bloom filter itself does not support delete operation, the first filter of the application and the second filter Twinfilter design enable user's pet name system in the application to support delete operation, can guarantee it is space-saving Meanwhile reducing error rate.
Fig. 5 is a kind of flow chart of the data managing method shown according to another exemplary embodiment.Step as shown in Figure 5 Rapid S502-S512 is " when the data removal request number is greater than threshold value, to pass through second in S406 as shown in Figure 4 Filter rebuilds the specified memory " further describe.
In S502, the marker for being recorded as deleting in the specified memory is determined by the binary system diary Group.
In S504, second filter is synchronous with first filter progress data.
In S506, the second filter reads multiple mark groups in first filter.
In S508, judged by binary system diary, whether which is marked as deleting.
In S510, if not replicating the mark group to be marked as the mark group deleted.
In early S512, if it is the mark group for being marked as deleting, the mark group is not replicated.
In the present embodiment, by the delete operation in binlog log recording first filter, and by delete operation pair The mark group answered is recorded as deleting.With following two purposes:
1. leader follower replication
It can be the filter of master-slave mode for the first filter in the application also the second filter.It is not generality, it can Such as main filter is set by first filter, main filter is at work by the operation of all possible change database positionings Binlog is written, to make its database shape as the sequence of operation in binlog is reset from the second filter of filter State and main filter reach unanimously, to realize that the data between first filter and the second filter are synchronous.
2. data are restored
First filter and the second filter are carrying out that data when binlog rebuilds delay machine can be played back when data recovery State.Assuming that first filter and the second filter run delay machine after a period of time, then the first filtering after some backup point again After device and the second filter are restarted, data can be reformed so as to extensive by playback binlog on the basis of Last Backup point Last Backup point is answered to the data mode in delay machine this period.
It will be appreciated by those skilled in the art that realizing that all or part of the steps of above-described embodiment is implemented as being executed by CPU Computer program.When the computer program is executed by CPU, above-mentioned function defined by the above method that the disclosure provides is executed Energy.The program can store in a kind of computer readable storage medium, which can be read-only memory, magnetic Disk or CD etc..
Further, it should be noted that above-mentioned attached drawing is only the place according to included by the method for disclosure exemplary embodiment Reason schematically illustrates, rather than limits purpose.It can be readily appreciated that above-mentioned processing shown in the drawings is not indicated or is limited at these The time sequencing of reason.In addition, be also easy to understand, these processing, which can be, for example either synchronously or asynchronously to be executed in multiple modules.
Following is embodiment of the present disclosure, can be used for executing embodiments of the present disclosure.It is real for disclosure device Undisclosed details in example is applied, embodiments of the present disclosure is please referred to.
Fig. 6 is a kind of block diagram of data administrator shown according to an exemplary embodiment.As shown in fig. 6, data pipe Managing device 60 includes: data removing module 602, memory processing module 604 and data reconstruction module 606.
Data removing module 602 is used to receive the data removal request of user by first filter, and the data are deleted It include character string in request.In one embodiment, data removal request from the user, data are received by first filter Removal request may be, for example, user's pet name removal request, and user's pet name to be deleted inputs the first filtering in the form of character string Device.
Memory processing module 604 is used to handle the data removal request by specified memory.In one embodiment, lead to It crosses the specified memory processing data removal request and comprises determining that the character string corresponding marker in the specified memory Group;When each of described mark hyte mark is the first numerical value, the mark hyte is remembered in binary log Record is deletion.
Data reconstruction module 606 is used to think highly of when the data removal request number is greater than threshold value by the second filtering Newly construct the specified memory.It can be for example, recording the operation processing of the first filter by binary log;And pass through It is synchronous described to rebuild that the binary log carries out the data between the first filter and second filter Specified memory.Wherein, the second filter is Bloom filter.
In one embodiment, reserved space can be set for judging whether the second filter of enabling in specified memory " delete_count " in reserved space enables the second filter when being greater than certain threshold values.Enable the second filter When, the second filter rebuilds storage by binlog, and the storage of reconstruction will eliminate deleted element, the second filtering Think highly of after building, will do it and switch between the second filter and first filter, after switching, " delete_count " resetting It is 0.
According to the data administrator of the disclosure, in actual treatment task, user is provided in real time by first filter The registration pet name and delete the request of the pet name and in order to avoid the increase and decrease of system mistake probability, lead to when removal request is more than threshold value The first filter mode synchronous with this double Bloom filters progress data of the second filter is crossed, Bloom filter is realized Deletion function, this processing mode, enable data management system a large number of users name be deleted after guarantee that user name is searched The accuracy rate of rope.
Fig. 7 is a kind of block diagram of the data administrator shown according to another exemplary embodiment.Data administrator 70 It is including data removing module 602 in user data management device 60, memory processing module 604 and data reconstruction mould Except block 606, further includes: data inquiry module 702 returns the result module 704, and memory constructs module 702.
Data inquiry module 702 is used to receive the data inquiry request of user, the data by the first filter It include character string in inquiry request.Wherein, first filter is grand (Bloom) filter of cloth.Bloom filter is actually one A very long binary vector and a series of random mapping functions.It is random that Bloom filter working principle is based on Bloom Filte Data structure, Bloom filter compactly indicates a set using bit array very much, and can judge whether an element belongs to this A set.Bloom filter can be used for retrieving an element whether in a set.
Module 704 is returned the result for the data inquiry request being handled and returning to processing and tie by the specified memory Fruit.In one embodiment, the first numerical value is 1, and character string is mapped to the K in a units group by K hash function Point, then in first filter, determine character string it is corresponding these point whether be all 1 just (about) know in pet name system Either with or without it: if being all 1, tested character string is likely to.In the present embodiment, the error of first filter is not considered Situation, when these corresponding points of character string are all 1, then it is assumed that the character string is necessarily present in user name system, is returned and is used The result at family may be, for example, " user name is existing ".User's replacement user name can also be provided to apply again.
In one embodiment, the first numerical value is 1, and character string is mapped in a units group by K hash function K point determine whether the identification point in the corresponding mark group of character string is all 1 to be known that then in first filter Either with or without it in set: if the identification point in mark group have any one 0, tested character string does not exist centainly.Work as character When the identification point gone here and there in corresponding mark group is not all 1, then it is assumed that the character string is centainly not present in user name system, is returned The result of user may be, for example, " user name is not present ".
Memory constructs module 706 and is used to construct the specified memory by virtual memory mappings technology.In one embodiment In, it can be for example, determining the memory space of the specified memory by candidate character strings quantity and error probability.Bloom filter By multiple hash functions by compound mapping into bit array, if the number of hash function that selection is mapped is more, 0 probability is obtained with regard to big when the element for being not belonging to set to one is inquired;But then, if of hash function Number is few, then 0 in bit array is just more.Optimal hash function number in order to obtain, can for example, according to error rate formula into Row calculates.
According to the data administrator of the disclosure, the pet name is carried out by Bloom filter and repeats to judge, due to Bloom Filter does not need the storage pet name itself, regardless of user's pet name have how long, it is only necessary to the memory space of fixed a small amount of byte is Can, it can save a large amount of for storing the storage resource of user's pet name.
In the prior art, either in user's space still in kernel spacing, software cannot go directly to access without exception and set Standby physical address;If wanting the physical address of access equipment in kernel-driven, need to arrive the physical address map of equipment On kernel virtual address (dynamic memory map section), it is exactly that must visit indirect that later driver, which accesses this kernel virtual address, Ask the physical address of equipment.So Bloom filter is built upon on the shared drive of mmap in the application, filtered in Bloom During device executes, the operation on memory is directly carried out, can guarantee high-performance;And mmap technology can land data To disk file, can guarantee that the data in Bloom filter will not be lost user's pet name system cut-off.
Fig. 8 is a kind of block diagram of data management system shown according to an exemplary embodiment.Data management system 80 is wrapped Include: main process task node 802, multiple further includes first filter 8062 and the second mistake from processing node 804, each processing node Filter 8064 and log module 808 and shared drive 810.
Wherein, data management system 80 is distributed system, and data management system 80 can for example pass through Zookeeper framework Building.Wherein, Zookeeper is one distributed, the distributed application program coordination service of open source code, be one for point Cloth application provides the software of Consistency service.Zookeeper is the manager of cluster, monitors the shape of each node in cluster State carries out next step reasonable operation according to the feedback that node is submitted.Finally, interface and performance efficiency easy to use, function is steady Fixed system is supplied to user.In Zookeeper framework, host node process is responsible for tracking from the effective of node state and task Property, and task is distributed to from node.
There are three types of node types for Zookeeper server tool: cluster (Leader), follower (Follower), observer (Observer)。
Host node of the Leader node as entire Zookeeper cluster is responsible for all pairs of Zookeeper states of response and is become Request more.Each state can be updated request and be ranked up and number by it, to guarantee entire cluster internal Message Processing First in first out (First Input First Output, FIFO).
For Follower node other than the read request on response book server, Follower will also handle mentioning for Leader View, and also submitted when Leader submits the proposal locally.Leader and Follower constitutes Zookeeper cluster Quorum, that is to say, that only they just participate in the election of new leader, respond the proposal of Leader.
If the reading of Zookeeper cluster loads very high or client more to across computer room, can be set Observer node, to improve the handling capacity read.Observer and Follower are more similar, only some small differences: first First Observer is not belonging to quorum, i.e., vote is not responding to propose yet;Followed by Observer is not needed affairs It is persisted to disk, once Observer is restarted, is needed from the entire name space of Leader re-synchronization.
In data management system 80, main process task node 802 is for receiving data query from the user and removal request. Main process task node 802 is uniformly coordinated it by Zookeeper as a processing node in distributed 80 With other from processing node 804 between task process.Main process task node 802 may be, for example, Zookeeper collection described above Follower node in group.
From processing node 804 for the standby node as main process task node 802, break down in main process task node 802 When, task is switched to from processing node 804.It may be, for example, in Zookeeper cluster described above from processing node 804 Observer node.
Main process task node 802 and it is multiple from processing each of node 804 include first filter 8062, the second mistake Filter 8064, log module 808, shared drive 810.
Wherein, first filter 8062 is for handling data query from the user and removal request.
It is synchronous that second filter 8064 carries out data when data delete number and are greater than threshold value, with first filter 8062.
Log module 808 is used to record the operation content in first filter 8062.
Shared drive 810 constructed by virtual memory mappings technology, for storing first filter for being retrieved Mark group data.
Below with respect to the example of Fig. 8, the process for handling data to data management system 80 in the application is illustrated:
Data management system 80 is in initialization, the pet name candidate collection n and needs of main process task node 802 on the estimation Error rate, calculate storage m bits, the hash function number k of needs, then storage as needed come mmap it is shared in It deposits, then initializes first filter 8062.Zookeeper save main process task node 802 ip, the client that user uses from Zookeeper obtains the ip of current main process task node 802 and is led when main process task node 802 breaks down by Zookeeper Handle node 802 and from processing node 804 between switching, thus real-time system guarantee High Availabitity.
Data management system 80 under normal circumstances, as 802 normal service of main process task node, the client that user uses Some pet name is added in request.Whether all the k marker storage of inquiry of first filter 8062 is seen has been set (the value of marker It whether is 1), to be used if it is, returning to the client pet name.Otherwise, first filter 8062 records Binlog is simultaneously not used by the client pet name in this k flag bit set, is returned.
Data management system 80 under normal circumstances, as 802 normal service of main process task node, the client that user uses Some pet name is deleted in request.Whether all the k marker storage of inquiry of first filter 8062 is seen has been set (the value of marker Whether it is 1), if so, delete_count is then added 1 by first filter 8062, while to record binlog, returns to client The pet name is deleted successfully.Otherwise, it returns to back the client pet name not to be previously used, deletes failure.Delete_count is greater than one The second filter 8064 is enabled when fixed threshold values d_num.
Wherein, the synchronization between first filter 8062 and the second filter 8064 is synchronized by binlog.If main Processing 802 delay machine of node or network problem occur etc. abnormal, then by Zookeeper by multiple from handling one of node 804 Be promoted to main process task node, at the same can synchronous averaging others be not in the processing node of working condition and be used as from processing node. The client that user uses obtains the address ip of new main process task node from Zookeeper.
Fig. 9 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.
The electronic equipment 900 of this embodiment according to the disclosure is described referring to Fig. 9.The electronics that Fig. 9 is shown Equipment 900 is only an example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.
As shown in figure 9, electronic equipment 900 is showed in the form of universal computing device.The component of electronic equipment 900 can wrap It includes but is not limited to: at least one processing unit 910, at least one storage unit 920, (including the storage of the different system components of connection Unit 920 and processing unit 910) bus 930, display unit 940 etc..
Wherein, the storage unit is stored with program code, and said program code can be held by the processing unit 910 Row, so that the processing unit 910 executes described in this specification above-mentioned electronic prescription circulation processing method part according to this The step of disclosing various illustrative embodiments.For example, the processing unit 910 can be executed such as Fig. 3, Fig. 4, shown in Fig. 5 The step of.
The storage unit 920 may include the readable medium of volatile memory cell form, such as random access memory Unit (RAM) 9201 and/or cache memory unit 9202 can further include read-only memory unit (ROM) 9203.
The storage unit 920 can also include program/practical work with one group of (at least one) program module 9205 Tool 9204, such program module 9205 includes but is not limited to: operating system, one or more application program, other programs It may include the realization of network environment in module and program data, each of these examples or certain combination.
Bus 930 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in a variety of bus structures Local bus.
Electronic equipment 900 can also be with one or more external equipments 1000 (such as keyboard, sensing equipment, bluetooth equipment Deng) communication, can also be enabled a user to one or more equipment interact with the electronic equipment 900 communicate, and/or with make Any equipment (such as the router, modulation /demodulation that the electronic equipment 900 can be communicated with one or more of the other calculating equipment Device etc.) communication.This communication can be carried out by input/output (I/O) interface 950.Also, electronic equipment 900 can be with By network adapter 960 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, Such as internet) communication.Network adapter 260 can be communicated by bus 930 with other modules of electronic equipment 900.It should Understand, although not shown in the drawings, other hardware and/or software module can be used in conjunction with electronic equipment 900, including but unlimited In: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and number According to backup storage system etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating Equipment (can be personal computer, server or network equipment etc.) executes the above method according to disclosure embodiment.
As on the other hand, the disclosure additionally provides a kind of computer-readable medium, which can be Included in electronic equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying electronic equipment. Above-mentioned computer-readable medium carries one or more program, when the electronics is set by one for said one or multiple programs When standby execution, so that method described in electronic equipment realization as the following examples.For example, the electronic equipment can be real Each step now as indicated above.
It will be appreciated by those skilled in the art that above-mentioned each module can be distributed in device according to the description of embodiment, it can also Uniquely it is different from one or more devices of the present embodiment with carrying out corresponding change.The module of above-described embodiment can be merged into One module, can also be further split into multiple submodule.
By the description of above embodiment, those skilled in the art is it can be readily appreciated that example embodiment described herein It can also be realized in such a way that software is in conjunction with necessary hardware by software realization.Therefore, implemented according to the disclosure The technical solution of example can be embodied in the form of software products, which can store in a non-volatile memories In medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) or on network, including some instructions are so that a calculating equipment (can To be personal computer, server, mobile terminal or network equipment etc.) it executes according to the method for the embodiment of the present disclosure.
It is particularly shown and described the exemplary embodiment of the disclosure above.It should be appreciated that the present disclosure is not limited to Detailed construction, set-up mode or implementation method described herein;On the contrary, disclosure intention covers included in appended claims Various modifications and equivalence setting in spirit and scope.
In addition, structure shown by this specification Figure of description, ratio, size etc., only to cooperate specification institute Disclosure, for skilled in the art realises that be not limited to the enforceable qualifications of the disclosure with reading, therefore Do not have technical essential meaning, the modification of any structure, the change of proportionate relationship or the adjustment of size are not influencing the disclosure Under the technical effect and achieved purpose that can be generated, it should all still fall in technology contents disclosed in the disclosure and obtain and can cover In the range of.Meanwhile cited such as "upper" in this specification, " first ", " second " and " one " term, be also only and be convenient for Narration is illustrated, rather than to limit the enforceable range of the disclosure, relativeness is altered or modified, without substantive change Under technology contents, when being also considered as the enforceable scope of the disclosure.

Claims (14)

1. a kind of data managing method characterized by comprising
The data removal request of user is received by first filter, includes character string in the data removal request;
Pass through data removal request described in specified memory and log processing;And
When the data removal request number is greater than threshold value, the specified memory is rebuild by the second filter;
Wherein, the first filter and second filter are Bloom filter.
2. the method as described in claim 1, which is characterized in that further include:
The data inquiry request of user is received by the first filter, includes character string in the data inquiry request;With And
The data inquiry request is handled by the specified memory and returns to processing result.
3. the method as described in claim 1, which is characterized in that further include:
The specified memory is constructed by virtual memory mappings.
4. method as claimed in claim 3, which is characterized in that constructing the specified memory by virtual memory mappings includes:
The memory space of the specified memory is determined by candidate character strings quantity and error probability.
5. the method as described in claim 1, which is characterized in that pass through data removal request described in specified memory and log processing Include:
Determine the character string corresponding mark hyte in the specified memory;
When each of described mark hyte mark is the first numerical value, the mark hyte is recorded in the log To delete, the log includes binary log.
6. the method as described in claim 1, which is characterized in that rebuild the specified memory packet by the second filter It includes:
Pass through the operation processing of first filter described in the log recording;And
It is synchronous to rebuild that the data between the first filter and second filter are carried out by the log State specified memory.
7. method as claimed in claim 6, which is characterized in that carry out the first filter and described the by the log Data between tow filtrator, which are synchronized to rebuild the specified memory, includes:
The mark hyte for being recorded as deleting in the specified memory is determined by the log;And
When second filter is synchronous with first filter progress data, the not deleted marker is only synchronized Group.
8. method according to claim 2, which is characterized in that handle the data inquiry request simultaneously by the specified memory Returning to processing result includes:
The data inquiry request is handled by the mark hyte in the specified memory.
9. method according to claim 8, which is characterized in that handle the number by the mark hyte in the specified memory It is investigated that inquiry request includes:
Determine the character string corresponding mark hyte in the specified memory;
When each of described mark hyte mark be the first numerical value when, return the result for the character string it is existing.
10. method as claimed in claim 9, which is characterized in that further include:
When each of described mark hyte mark is not all the first numerical value, each of described mark hyte is identified It is set as first numerical value;And
It returns the result and is not present for the character string.
11. a kind of data administrator characterized by comprising
Data removing module, for receiving the data removal request of user by first filter, in the data removal request Include character string;
Memory processing module, for passing through data removal request described in specified memory and log processing;
Data reconstruction module, for being rebuild by the second filter when the data removal request number is greater than threshold value The specified memory;And
Wherein, the first filter and second filter are Bloom filter.
12. a kind of data management system, which is characterized in that the system is distributed system, the system comprises: multiple processing Node, each processing node include first filter, the second filter and specified memory;Wherein, the first filter is used It include character string in the data inquiry request for receiving user, the data inquiry request;The specified memory is for handling institute Data removal request is stated, second filter is used to rebuild institute when the data removal request number is greater than threshold value State specified memory, wherein the first filter and second filter are Bloom filter.
13. a kind of electronic equipment characterized by comprising
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now method as described in any in claim 1-10.
14. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor The method as described in any in claim 1-10 is realized when row.
CN201810283415.2A 2018-04-02 2018-04-02 Data managing method, device, system, electronic equipment and computer-readable medium Pending CN110362590A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810283415.2A CN110362590A (en) 2018-04-02 2018-04-02 Data managing method, device, system, electronic equipment and computer-readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810283415.2A CN110362590A (en) 2018-04-02 2018-04-02 Data managing method, device, system, electronic equipment and computer-readable medium

Publications (1)

Publication Number Publication Date
CN110362590A true CN110362590A (en) 2019-10-22

Family

ID=68213385

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810283415.2A Pending CN110362590A (en) 2018-04-02 2018-04-02 Data managing method, device, system, electronic equipment and computer-readable medium

Country Status (1)

Country Link
CN (1) CN110362590A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328435A (en) * 2020-12-07 2021-02-05 武汉绿色网络信息服务有限责任公司 Method, device, equipment and storage medium for backing up and recovering target data
CN112749190A (en) * 2019-10-31 2021-05-04 中国移动通信集团重庆有限公司 Data query method and device, computing equipment and computer storage medium
CN112988880A (en) * 2019-12-12 2021-06-18 阿里巴巴集团控股有限公司 Data synchronization method and device, electronic equipment and computer storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901248A (en) * 2010-04-07 2010-12-01 北京星网锐捷网络技术有限公司 Method and device for creating and updating Bloom filter and searching elements
CN101923568A (en) * 2010-06-23 2010-12-22 北京星网锐捷网络技术有限公司 Method for increasing and canceling elements of Bloom filter and Bloom filter
CN102682037A (en) * 2011-03-18 2012-09-19 阿里巴巴集团控股有限公司 Data acquisition method, system and device
CN103345472A (en) * 2013-06-04 2013-10-09 北京航空航天大学 Redundancy removal file system based on limited binary tree bloom filter and construction method of redundancy removal file system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901248A (en) * 2010-04-07 2010-12-01 北京星网锐捷网络技术有限公司 Method and device for creating and updating Bloom filter and searching elements
CN101923568A (en) * 2010-06-23 2010-12-22 北京星网锐捷网络技术有限公司 Method for increasing and canceling elements of Bloom filter and Bloom filter
CN102682037A (en) * 2011-03-18 2012-09-19 阿里巴巴集团控股有限公司 Data acquisition method, system and device
CN103345472A (en) * 2013-06-04 2013-10-09 北京航空航天大学 Redundancy removal file system based on limited binary tree bloom filter and construction method of redundancy removal file system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JACOB HONOROFF: "An Examination of Bloom Filters and their Applications", 《CS.UNC.EDU》 *
张华等: "Bloom Filter技术及应用", 《阜阳师范学院学报(自然科学版)》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112749190A (en) * 2019-10-31 2021-05-04 中国移动通信集团重庆有限公司 Data query method and device, computing equipment and computer storage medium
CN112749190B (en) * 2019-10-31 2023-04-11 中国移动通信集团重庆有限公司 Data query method and device, computing equipment and computer storage medium
CN112988880A (en) * 2019-12-12 2021-06-18 阿里巴巴集团控股有限公司 Data synchronization method and device, electronic equipment and computer storage medium
CN112988880B (en) * 2019-12-12 2024-03-29 阿里巴巴集团控股有限公司 Data synchronization method, device, electronic equipment and computer storage medium
CN112328435A (en) * 2020-12-07 2021-02-05 武汉绿色网络信息服务有限责任公司 Method, device, equipment and storage medium for backing up and recovering target data
CN112328435B (en) * 2020-12-07 2023-09-12 武汉绿色网络信息服务有限责任公司 Method, device, equipment and storage medium for backing up and recovering target data

Similar Documents

Publication Publication Date Title
US11520670B2 (en) Method and apparatus for restoring data from snapshots
Chandra BASE analysis of NoSQL database
JP6448555B2 (en) Content class for object storage indexing system
US8762353B2 (en) Elimination of duplicate objects in storage clusters
US20190005262A1 (en) Fully managed account level blob data encryption in a distributed storage environment
CN103842969B (en) Information processing system
US20130110873A1 (en) Method and system for data storage and management
Băzăr et al. The Transition from RDBMS to NoSQL. A Comparative Analysis of Three Popular Non-Relational Solutions: Cassandra, MongoDB and Couchbase.
US9547706B2 (en) Using colocation hints to facilitate accessing a distributed data storage system
US20180074903A1 (en) Processing access requests in a dispersed storage network
CN106993064A (en) A kind of system and its construction method and application that the storage of mass data scalability is realized based on Openstack cloud platforms
JPWO2011108695A1 (en) Parallel data processing system, parallel data processing method and program
Dwivedi et al. Analytical review on Hadoop Distributed file system
CN112860777B (en) Data processing method, device and equipment
CN104965835B (en) A kind of file read/write method and device of distributed file system
CN110362590A (en) Data managing method, device, system, electronic equipment and computer-readable medium
Zhai et al. Hadoop perfect file: A fast and memory-efficient metadata access archive file to face small files problem in hdfs
Merceedi et al. A comprehensive survey for hadoop distributed file system
CN113687964A (en) Data processing method, data processing apparatus, electronic device, storage medium, and program product
Zhou et al. Sfmapreduce: An optimized mapreduce framework for small files
Jiang et al. A novel clustered MongoDB-based storage system for unstructured data with high availability
Fuguang Research on campus network cloud storage open platform based on cloud computing and big data technology
US10025680B2 (en) High throughput, high reliability data processing system
US20150113619A1 (en) Methods for monitoring and controlling a storage environment and devices thereof
Phan Cloud Databases for Internet-of-Things Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination