CN114443703A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114443703A
CN114443703A CN202111534833.2A CN202111534833A CN114443703A CN 114443703 A CN114443703 A CN 114443703A CN 202111534833 A CN202111534833 A CN 202111534833A CN 114443703 A CN114443703 A CN 114443703A
Authority
CN
China
Prior art keywords
data
read
state
bitmap
discarded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111534833.2A
Other languages
Chinese (zh)
Inventor
刘占山
张存义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202111534833.2A priority Critical patent/CN114443703A/en
Publication of CN114443703A publication Critical patent/CN114443703A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a data processing method, a data processing device, an electronic device and a storage medium. The method comprises the following steps: when a preset discarding event is monitored, determining to-be-discarded data fragments from at least one read-only data fragment; acquiring a read-only state bitmap associated with the to-be-discarded data slice; and deleting the data fragments to be discarded and the read-only state bitmap associated with the data fragments to be discarded. According to the technical scheme provided by the disclosure, the elimination operation of the data fragments can be effectively supported, and the data can be prevented from being deleted by mistake.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.
Background
With the development of computer applications, data processing becomes more and more important, in order to maintain data (add data, delete data, query data, eliminate data, and the like) to improve the accuracy of data processing, in the related art, a policy of using a log-structured merge tree is selected, in data deletion processing, a deletion operation is added as a special record (Doc) into a readable write data fragment, and then data is deleted in a merge stage. When the special Doc is queried, it is determined that the data has been deleted. Or adding a global bitmap, each bit in the bitmap marking whether a Doc is deleted or not. The former method will result in larger and larger Doc amount as more and more Doc is deleted, which affects query efficiency. And the implementation of the special Doc on the underlying index is invasive and has poor universal adaptability. The latter has high query efficiency and does not influence the implementation of the underlying index. However, as the number of added Doc is increased, the space required by bitmap is also increased, and the data elimination operation cannot be supported well.
Disclosure of Invention
The present disclosure provides a data processing method, an apparatus, an electronic device, and a storage medium, so as to at least solve the problem of how to improve data processing accuracy in the related art. The technical scheme of the disclosure is as follows:
according to a first aspect of the embodiments of the present disclosure, there is provided a data processing method, including:
when a preset discarding event is monitored, determining to-be-discarded data fragments from at least one read-only data fragment;
acquiring a read-only state bitmap associated with the to-be-discarded data fragment;
and deleting the data fragments to be discarded and the read-only state bitmap associated with the data fragments to be discarded.
In one possible implementation, the method further includes:
acquiring mapping information of the to-be-discarded identifier associated with the to-be-discarded data fragment and bitmap index recording information; the to-be-discarded identification mapping information represents the corresponding relation between the data index information and the bitmap index information of each to-be-discarded data in the to-be-discarded data fragment;
deleting the mapping information of the identifier to be discarded;
and in the bitmap index recording information, modifying the bitmap index information in the to-be-discarded identification mapping information into an unused state.
In one possible implementation, the method further includes:
responding to a data deletion request, wherein the data deletion request comprises first data index information of unit data to be deleted;
and in the readable and writable state bitmap, marking the state corresponding to the first data index information as a deleted state.
In one possible implementation, the method further includes:
responding to a data query request, wherein the data query request comprises second data index information of unit data to be queried;
acquiring a read-only state bitmap associated with each read-only data fragment and a read-write state bitmap associated with the read-write data fragment;
inquiring target state information corresponding to the second data index information in the readable and writable state bitmap and the read-only state bitmap;
under the condition that the target state information is in the initial state, acquiring query data matched with the second data index information;
and sending the query data to a terminal corresponding to the data query request.
In one possible implementation, the at least one read-only data slice is a plurality of read-only data slices; the method further comprises the following steps:
monitoring a first number of unit data in a deleted state in the readable and writable state bitmap and the read-only state bitmap, and a second number of unit data in the readable and writable data slice and the at least one read-only data slice;
determining at least two read-only data fragments from the plurality of read-only data fragments under the condition that the ratio of the first number to the second number is larger than a preset threshold;
merging the unit data in the non-deletion state in the at least two read-only data fragments to obtain merged data fragments;
replacing the at least two read-only data fragments with the merged data fragment.
In one possible implementation manner, the method further includes:
responding to a data adding request, and dividing data to be added in the data adding request into at least one unit data to be added;
configuring third data index information for the at least one unit data to be added respectively;
adding the at least one unit data to be added and the third data index information into the readable and writable data fragment;
and adding the third data index information to a readable and writable state bitmap associated with the readable and writable data fragment, and marking a state corresponding to the third data index information as an initial state in the readable and writable state bitmap.
In a possible implementation manner, after configuring data index information for the at least one unit data to be added, the method further includes:
configuring target bitmap index information for the unit data to be added;
storing the third data index information and the target bitmap index information into the readable and writable identifier mapping information associated with the readable and writable data fragment in an associated manner;
the adding the third data index information to a readable and writable state bitmap, and marking a state corresponding to the third data index information in the readable and writable state bitmap as an initial state includes:
and adding the target bitmap index information into the readable and writable state bitmap, and marking the state corresponding to the target bitmap identification information in the readable and writable state bitmap as an initial state.
According to a second aspect of the embodiments of the present disclosure, there is provided a data processing apparatus including:
the monitoring module is configured to monitor a preset discarding event and determine data fragments to be discarded from the read-only data fragments;
a first obtaining module configured to perform obtaining a read-only state bitmap associated with the to-be-discarded data slice;
and the data discarding module is configured to delete the to-be-discarded data fragment and the read-only state bitmap associated with the to-be-discarded data fragment.
In one possible implementation, the apparatus further includes:
a second obtaining module, configured to perform obtaining to-be-discarded identifier mapping information and bitmap index recording information associated with the to-be-discarded data slice; the to-be-discarded identification mapping information represents the corresponding relation between the data index information and the bitmap index information of each to-be-discarded data in the to-be-discarded data fragment;
a deletion module configured to perform deletion of the to-be-discarded identification mapping information;
and the multiplexing module is configured to modify the bitmap index information in the to-be-discarded identification mapping information into an unused state in the bitmap index recording information.
In one possible implementation, the apparatus further includes:
the data deleting module is configured to respond to a data deleting request, and the data deleting request comprises first data index information of unit data to be deleted;
and the data deleting module is configured to mark the state corresponding to the first data index information as a deleted state in the readable and writable state bitmap.
In one possible implementation, the apparatus further includes:
the query request response module is configured to respond to a data query request, wherein the data query request comprises second data index information of unit data to be queried;
a third obtaining module configured to perform obtaining a read-only state bitmap associated with each of the at least one read-only data slice and a readable-writable state bitmap associated with a readable-writable data slice;
the query module is configured to execute query of target state information corresponding to the second data index information in the readable and writable state bitmap and the read-only state bitmap;
the query data acquisition module is configured to acquire the query data matched with the second data index information under the condition that the target state information is the initial state;
and the sending module is configured to send the query data to a terminal corresponding to the data query request.
In one possible implementation, the at least one read-only data slice is a plurality of read-only data slices; the device further comprises:
a monitoring module configured to perform monitoring a first amount of unit data in a deleted state in the readable and writable state bitmap and the read-only state bitmap, and a second amount of unit data in the readable and writable data slice and the at least one read-only data slice;
a merged segment determining module configured to determine at least two read-only data segments from the plurality of read-only data segments if a ratio of the first number to the second number is greater than a preset threshold;
the merging processing module is configured to perform merging processing on unit data in a non-deleted state in the at least two read-only data fragments to obtain merged data fragments;
a replacement module configured to perform a replacement of the at least two read-only data fragments with the merged data fragment.
In one possible implementation, the apparatus further includes:
the data adding device comprises an adding request responding module, a data adding module and a data adding module, wherein the adding request responding module is configured to execute the step of responding to a data adding request and dividing data to be added in the data adding request into at least one unit data to be added;
the data index configuration module is configured to configure third data index information for the at least one unit data to be added respectively;
a first adding module configured to perform adding the at least one unit data to be added and the third data index information to the readable and writable data slice;
and the second adding module is configured to add the third data index information to the readable and writable state bitmap associated with the readable and writable data fragment, and mark a state corresponding to the third data index information in the readable and writable state bitmap as an initial state.
In one possible implementation, the apparatus further includes:
the bitmap index configuration module is configured to configure target bitmap index information for the unit data to be added;
the storage module is configured to perform associated storage of the third data index information and target bitmap index information into the readable and writable identifier mapping information associated with the readable and writable data fragment;
the second adding module comprises:
and the adding unit is configured to add the target bitmap index information into the readable and writable state bitmap, and mark the state corresponding to the target bitmap identification information in the readable and writable state bitmap as an initial state.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the method of any of the first aspects above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the method of any one of the first aspects of the embodiments of the present disclosure.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by a processor, cause a computer to perform the method of any one of the first aspects of the embodiments of the present disclosure.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
by setting the associated state bitmaps for different data fragments respectively, when the elimination operation of the read-only data fragments is carried out, the operation of eliminating the data fragments can be achieved by directly deleting the data fragments to be discarded and the read-only state bitmaps associated with the data fragments to be discarded, the operation of eliminating the data fragments by deleting the whole data fragments is realized, the marking of the deletion state of a large amount of unit data in the state bitmaps is not needed, the elimination operation of the data fragments can be effectively supported, and the resource consumption is low; and because the state bitmap does not need to be changed, the data can be prevented from being deleted by mistake, and the accuracy of data processing is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a schematic diagram illustrating an application environment in accordance with an exemplary embodiment.
Fig. 2 is a flow chart illustrating a method of data addition according to an example embodiment.
Fig. 3 is a schematic diagram illustrating data fragmentation and data fragmentation operations in accordance with an example embodiment.
FIG. 4 is a flow chart illustrating a method of data addition according to an example embodiment.
Fig. 5 is a flow diagram illustrating a method of data discarding in accordance with an exemplary embodiment.
FIG. 6 is a flow diagram illustrating another method of data discarding in accordance with an exemplary embodiment.
FIG. 7 is a flow diagram illustrating a method for data deletion in accordance with an exemplary embodiment.
FIG. 8 is a flowchart illustrating a method of data querying, according to an example embodiment.
FIG. 9 is a flow diagram illustrating a method of data consolidation in accordance with an exemplary embodiment.
FIG. 10 is a block diagram illustrating a data processing apparatus according to an example embodiment.
FIG. 11 is a block diagram illustrating an electronic device for data processing in accordance with an illustrative embodiment.
FIG. 12 is a block diagram illustrating an electronic device for data processing in accordance with an exemplary embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment according to an exemplary embodiment, which may include a server 01 and a terminal 02, as shown in fig. 1.
In an alternative embodiment, server 01 may be used for data processing. Specifically, the server 01 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like.
In an alternative embodiment, the terminal 02 may be used for a background manager to perform read-write and delete requests on data stored in the server. Specifically, the terminal 02 may include, but is not limited to, a smart phone, a desktop computer, a tablet computer, a notebook computer, a smart speaker, a digital assistant, an Augmented Reality (AR)/Virtual Reality (VR) device, a smart wearable device, and other types of electronic devices. Optionally, the operating system running on the electronic device may include, but is not limited to, an android system, an IOS system, linux, windows, and the like.
In addition, it should be noted that fig. 1 shows only one application environment of the data processing method provided by the present disclosure.
In the embodiment of the present specification, the server 01 and the terminal 02 may be directly or indirectly connected through a wired or wireless communication method, and the present application is not limited herein.
It should be noted that the following figures show a possible sequence of steps, and in fact do not limit the order that must be followed. Some steps may be performed in parallel without being dependent on each other. User information (including but not limited to user device information, user personal information, user behavior information, etc.) and data (including but not limited to data for presentation, training, etc.) to which the present disclosure relates are both information and data that are authorized by the user or sufficiently authorized by various parties.
Fig. 2 is a flow chart illustrating a method of data addition according to an example embodiment. As shown in fig. 2, the following steps may be included.
In step S201, in response to a data addition request, data to be added in the data addition request is divided into at least one unit data to be added.
In practical application, background management personnel need to update and maintain data stored in the database, such as data addition, deletion, elimination, combination and the like, so as to improve the read-write efficiency and reduce the storage pressure.
In this embodiment, a background manager may add data to the database to provide richer data for a user to use. Based on this, a data addition request may be triggered to the server, and accordingly, a data addition request may be received, which may include data to be added. Further, the data to be added can be divided into at least one unit data to be added, so as to facilitate the storage of the data to be added. Wherein the unit data may be a record in the database, and one record may be represented by docid.
In step S203, configuring third data index information for at least one unit data to be added, respectively;
in the embodiment of the present specification, data index information, such as docid1, may be configured for each unit data to be added. The data indexing information may be used to uniquely identify a unit of data in the database, which may allow for the identification and indexing of each unit of data in the database. As an example, the data index information may be allocated for each unit data to be added using the identity allocator in fig. 3.
In step S205, at least one unit data to be added and configured third data index information are added to the readable write data slice.
In step S207, the third data index information is added to the readable and writable state bitmap associated with the readable and writable data slice, and a state corresponding to the third data index information is marked as an initial state in the readable and writable state bitmap.
In this embodiment of the present specification, based on a data organization policy of a log merge tree, data storage may be performed in a manner of setting data fragments in a database. The data slices may include one read-write data slice and at least one read-only data slice, as shown in FIG. 3. It should be noted that time T0 shown in fig. 3 may be the setup time of read-only data slice 1, and before T1, the data slice may include only one readable write data slice: read-only data slice 1.
As shown in fig. 3, there may be one readable/writable data segment at the current time, where the readable/writable data segment may correspond to the current time; the at least one read-only data slice may correspond to at least one historical time period, respectively. In time order, T0-T1 historical time periods, T1-T2 historical time periods, and T2-T3 historical time periods may be included. The data stored in each read-only data slice may be as follows:
T0-T1 History time period, read-only data slice 1, data added within T0-T1: docidN0-docidN 1;
T1-T2 History time period, read-only data slice 2, data added within T1-T2: docidN2-docidN 3;
T2-T3 History time period, read-only data slice 3, data added within T2-T3: docidN4-docidN 5;
t3-current time, readwrite data slice 4, T3-data added within current time: docidN6-docidN 7. Wherein docidN0-docidN 7 may refer to a range of data index information, and docidN7 may be data index information of the latest stored unit data in the readable/writable data slice 4. In practical application, at least one unit data to be added and configured third data index information may be added to the current readable and writable data slice. That is, the unit data to be added and the configured third data index information may be stored in association. Note that the third data index information is index information of unit data to be added.
Accordingly, each data slice may have an associated state bitmap (bitmap), and the period of the state bitmap associated with each data slice for each data slice is the same, i.e., the corresponding time period is the same. The same cycle here may mean that the cycles of the writable operation are the same. The status bitmap associated with the data slice may refer to a bitmap for recording the status of the data in the corresponding time period of the data slice. As shown in fig. 3, the state bitmap associated with read-only data slice 1 may be bitmap1, the state bitmap associated with read-only data slice 2 may be bitmap2, and the state bitmap associated with read-only data slice 3 may be bitmap 3; the state bitmap associated with the readable-write data slice from T3 to the current time may be a readable-writable state bitmap (which may be denoted as bitmap 4). It should be noted that bitmap1, bitmap2, and bitmap3 may refer to read-only status bitmaps described below. The status information recorded by each status bitmap may be as follows:
for the read-only data slice 1, the bitmap1 records the deleted docids within the range of docidN0-docidN 1 during the period from T0 to T1;
read-only data slice 2, bitmap2 records docids deleted within the range docidN0-docidN3 during T1-T2;
read-only data slice 3, bitmap3 records docids deleted within the range docidN0-docidN5 during T2-T3;
in read-write data slice 4, bitmap4 records deleted docids within T3-all docids (docidN 0-docidN 7) in the current time.
Taking the above-mentioned read-only data fragments 1-3 and readable/writable data fragment 4 as an example, at least one unit data to be added may be added to the current readable/writable data fragment 4, and third data index information of each of the at least one unit data to be added is added to the readable/writable state bitmap4, and a state corresponding to these third data index information is marked as an initial state (a default state, i.e., a non-deleted state) in the readable/writable state bitmap. The initial state may be represented by "1", and thus the deletion state may be represented by "0", which is not limited by the present disclosure.
Through data adding processing and maintaining the state of data in the state bitmap associated with the readable and writable data fragment, the operation accuracy of data adding can be ensured.
FIG. 4 is a flow chart illustrating a method of data addition according to an example embodiment. In one possible implementation, as shown in fig. 4, the method may further include:
in step S401, configuring target bitmap index information for unit data to be added;
in step S403, the third data index information and the target bitmap index information are stored in association with the readable and writable label mapping information associated with the readable and writable data slice;
accordingly, the step S207 may include:
in step S405, the target bitmap index information is added to the writable-readable status bitmap, and the status corresponding to the target bitmap identification information is marked as an initial status in the writable-readable status bitmap.
As shown in fig. 3, an identifier assignor may be provided, and the identifier assignor may be configured to configure unit identifier information (docid) and bitmap identifier information (localid) for the unit data to be added; and can maintain the identification mapping information associated with each data slice, and can maintain a bitmap index record information. The identifier mapping information associated with each data fragment may be in the form of a table, and is used to record the mapping relationship between docid and localid in each data fragment. The bitmap index record information may record currently configured bitmap index information. As an example, the bitmap index information may be configured in a manner of continuously increasing from 0, for example, the bitmap index information of the first unit data added may be localized 0, the bitmap index information of the second unit data added may be localized 1, and so on, to implement the configuration of the bitmap index information.
In this embodiment of the present description, in order to improve the independence of the data index information and the bitmap index information and improve the independence of the data slicing operation and the state bitmap operation, it may be configured that a docid is used in the data slice and a locaid is used in the state bitmap. Based on the method, target bitmap index information can be configured for the unit data to be added, and the third data index information and the target bitmap index information can be stored in the readable and writable identifier mapping information associated with the readable and writable data fragment in an associated manner, so that mapping maintenance of two kinds of index identifiers is realized. And adding the target bitmap index information into the readable and writable state bitmap, and marking the state corresponding to the target bitmap identification information as an initial state in the readable and writable state bitmap.
Optionally, target bitmap index information may be added to the bitmap index record information to implement maintenance of used (configured) bitmap index information.
By using different index information in the data fragment and the state bitmap, the state operation in the bitmap state can not influence the data index information, the data misoperation is avoided, and the data processing precision is improved.
Fig. 5 is a flow diagram illustrating a method of data discarding in accordance with an exemplary embodiment. In a possible implementation manner, as shown in fig. 5, the following step of discarding the read-only data slice may also be included:
in step S501, a preset discarding event is monitored, and a to-be-discarded data fragment is determined from at least one read-only data fragment;
in step S503, a read-only status bitmap associated with the data slice to be discarded is obtained;
in step S505, the to-be-discarded data slice and the read-only status bitmap associated with the to-be-discarded data slice are deleted.
In the prior art, bitmaps are global, that is, all data fragments (read-only data fragments and read-write data fragments) only use one bitmap to maintain the state (deletion state or initial state) of all unit data, which cannot effectively support the discarding operation (elimination operation) of a certain read-only data fragment. In order to solve this problem, each data slice in the embodiments of the present specification is provided with an independent state bitmap associated with each data slice, which can effectively support slice discarding.
Wherein the preset discard event may include at least one of: the time length from the establishment time of one read-only data fragment to the current time is greater than a preset time length threshold value; alternatively, the total number of segments of the read-only data segments and the read-write data segments is greater than the number threshold. Correspondingly, when there is a situation that the time length from the setup time of one read-only data fragment to the current time is greater than the preset time length threshold, the read-only data fragment whose time length from the setup time to the current time is greater than the preset time length threshold may be determined as the data fragment to be discarded. When the total number of fragments is greater than the number threshold, the read-only data fragment with the time length from the current time ranked first (sorted from high to low according to the time length, i.e. the longest time length) may be determined as the data fragment to be discarded, for example, as shown in fig. 3, the read-only data fragment 1 may be determined as the data fragment to be discarded. Further, a read-only status bitmap associated with the to-be-discarded data slice may be obtained. Therefore, the data fragments to be discarded and the read-only state bitmap associated with the data fragments to be discarded can be deleted, so that elimination of the data fragments to be discarded is realized.
By setting the associated state bitmaps for different data fragments respectively, when the elimination operation of the read-only data fragments is carried out, the operation of eliminating the data fragments can be achieved by directly deleting the data fragments to be discarded and the read-only state bitmaps associated with the data fragments to be discarded, the operation of eliminating the data fragments by deleting the whole data fragments is realized, the marking of the deletion state of a large amount of unit data in the state bitmaps is not needed, the elimination operation of the data fragments can be effectively supported, and the resource consumption is low; and because the state bitmap does not need to be changed, the data can be prevented from being deleted by mistake, and the accuracy of data processing is improved.
The above may be a process of data elimination in the case that the data slice and the state bitmap use the same index information (such as docid). Under the condition that the data fragment and the state bitmap use different index information, the following contents in fig. 6 can be further executed, and the bitmap index information in the to-be-discarded identifier mapping information can be modified into an unused state by deleting the to-be-discarded identifier mapping information and in the bitmap index recording information, so that multiplexing of bitmap indexes can be realized, and infinite increase of bitmap index information in the state bitmap can be avoided. See in particular the following description.
FIG. 6 is a flow diagram illustrating another method of data discarding in accordance with an exemplary embodiment. In a possible implementation manner, after step S505, as shown in fig. 6, the method may further include:
in step S601, to-be-discarded identifier mapping information and bitmap index recording information associated with the to-be-discarded data slice are obtained; the to-be-discarded identification mapping information represents the corresponding relation between the data index information and the bitmap index information of each to-be-discarded data in the to-be-discarded data fragment;
in step S603, deleting identifier mapping information to be discarded;
in step S605, in the bitmap index record information, the bitmap index information in the to-be-discarded identification map information is modified to an unused state.
In this embodiment of the present specification, to-be-discarded flag mapping information and bitmap index recording information associated with to-be-discarded data slices may be obtained from a flag allocator. Therefore, the mapping information of the identifier to be discarded can be deleted, and the bitmap index information in the mapping information of the identifier to be discarded is modified into an unused state in the bitmap index recording information, so that the bitmap index information in the mapping information of the identifier to be discarded can be multiplexed.
FIG. 7 is a flow diagram illustrating a method for data deletion in accordance with an exemplary embodiment. In one possible implementation, as shown in fig. 7, the method may include:
in step S701, a data deletion request is received, where the data deletion request includes first data index information of unit data to be deleted;
in step S703, in the readable/writable state bitmap, the state corresponding to the first data index information is marked as a deleted state.
In this embodiment of the present specification, the data corresponding to the first data index information may be data to be deleted. In response to the data deletion request, the state corresponding to the first data index information may be marked as deleted in the readable and writable state bitmap. Therefore, when data is queried or merged, the state can be used for determining that the data corresponding to the first data index information is deleted.
Alternatively, in the case where the status bitmap uses bitmap index information, S703 may be replaced with: determining first bitmap index information corresponding to first data index information from existing identification mapping information; therefore, the state corresponding to the first bitmap index information can be marked as a deleted state in the readable and writable state bitmap.
When data is deleted, the state corresponding to the first data index information is marked as a deleted state in the readable and writable state bitmap, and compared with the prior art that a special doc is added in the readable and writable fragment to represent the deleted state, the doc quantity cannot be increased by the deleting operation, so that the influence on the data query efficiency can be avoided.
FIG. 8 is a flowchart illustrating a method of data querying, according to an example embodiment. In one possible implementation, as shown in fig. 8, the method may include:
in step S801, responding to a data query request, where the data query request includes second data index information of unit data to be queried;
in step S803, a read-only state bitmap associated with each read-only data slice and a readable-writable state bitmap associated with each readable-writable data slice are obtained;
in step S805, target status information corresponding to the second data index information is queried in the writable status bitmap and the read-only status bitmap.
In practical application, because the state bitmap associated with each data fragment only records the deletion state in the corresponding time period of each data fragment, the state of data needs to be queried in all data fragments. Based on this, in response to the data query request, at least one read-only data fragment and a read-only state bitmap associated with the at least one read-only data fragment can be acquired; and target state information corresponding to the second data index information can be inquired in the readable and writable state bitmap and the read-only state bitmap.
Alternatively, in the case where the status bitmap uses bitmap index information, S805 may be replaced with: determining second bitmap index information corresponding to second data index information from the existing identification mapping information; therefore, target state information corresponding to the second bitmap index information is inquired in the readable and writable state bitmap and the read-only state bitmap.
In step S807, in the case where the target state information is in the non-deletion state, query data matched with the second data index information is acquired;
in step S809, the query data is sent to the terminal corresponding to the data query request.
In this specification embodiment, when the target state information is in a non-deletion state, the data to be queried in the specification data query request is valid, and query data matched with the second data index information may be acquired.
Alternatively, in the case that the target state information is in a deleted state, the deleted data information may be directly sent to the terminal to inform the terminal side manager that the queried data is deleted.
Under the condition that each data fragment is provided with a corresponding state bitmap, the state accuracy of the query data can be ensured by determining the state of the query data in each state bitmap in response to a data query request, so that the processing accuracy of the query data can be ensured.
FIG. 9 is a flowchart illustrating a method of data merging, according to an example embodiment. In one possible implementation, as shown in fig. 9, the method may include:
in step S901, a first amount of unit data in the writable and readable status bit maps and the read-only status bit map in the deleted status and a second amount of unit data in the writable and readable data slice and the at least one read-only data slice are monitored.
In this embodiment, in order to reduce the storage pressure, the total amount of data may be monitored to implement merging of data fragments. For example, a first number of unit data in the writable and readable status bitmaps in the deleted state and a second number of unit data in the writable and readable data slices and the at least one read-only data slice (i.e., a total number of unit data in all data slices) may be monitored.
In step S903, in a case that a ratio of the first number to the second number is greater than a preset threshold, at least two read-only data fragments are determined from the plurality of read-only data fragments.
In this embodiment of the present specification, in a case that a ratio of the first number to the second number is greater than a preset threshold, at least two read-only data fragments are determined from the plurality of read-only data fragments. For example, the respective durations of the plurality of read-only data fragments from the current time may be determined, and the durations may be sorted in a descending order, so that the read-only data fragments located at the first and second bits in the sorting may be regarded as the at least two read-only data fragments.
In step S905, merging unit data in a non-deleted state in at least two read-only data fragments to obtain merged data fragments;
in step S907, at least two read-only data slices are replaced with merged data slices.
In this embodiment of the present description, unit data in a non-deleted state in at least two read-only data fragments may be merged to obtain merged data fragments; and at least two read-only data slices may be replaced with merged data slices. Optionally, an associated state bitmap may also be constructed for the merged data slice, such as "bitmap-merge" shown in fig. 3.
By merging the read-only data fragments, the storage space can be effectively saved, and the data retrieval efficiency can be improved; in addition, the combined data fragment is provided with an independent state bitmap by constructing a related state bitmap for the combined data fragment, so that the combined data fragment is similar to the original data fragment, and the deleting, inquiring, discarding and combining operations can be performed.
FIG. 10 is a block diagram illustrating a data processing apparatus according to an example embodiment. Referring to fig. 10, the apparatus may include:
a monitoring module 1001 configured to perform monitoring of a preset discarding event, and determine a to-be-discarded data fragment from the read-only data fragment;
a first obtaining module 1003 configured to perform obtaining a read-only status bitmap associated with the to-be-discarded data slice;
a data discarding module 1005 configured to execute deleting the to-be-discarded data slice and the read-only status bitmap associated with the to-be-discarded data slice.
In a possible implementation manner, the apparatus may further include:
a second obtaining module, configured to perform obtaining to-be-discarded identifier mapping information and bitmap index recording information associated with the to-be-discarded data slice; the to-be-discarded identification mapping information represents the corresponding relation between the data index information and the bitmap index information of each to-be-discarded data in the to-be-discarded data fragment;
a deletion module configured to perform deletion of the to-be-discarded identification mapping information;
and the multiplexing module is configured to modify the bitmap index information in the to-be-discarded identification mapping information into an unused state in the bitmap index recording information.
In a possible implementation manner, the apparatus may further include:
the data deleting module is configured to respond to a data deleting request, and the data deleting request comprises first data index information of unit data to be deleted;
and the data deleting module is configured to mark the state corresponding to the first data index information as a deleted state in the readable and writable state bitmap.
In a possible implementation manner, the apparatus may further include:
the query request response module is configured to respond to a data query request, wherein the data query request comprises second data index information of unit data to be queried;
a third obtaining module configured to perform obtaining a read-only state bitmap associated with each of the at least one read-only data slice and a readable-writable state bitmap associated with a readable-writable data slice;
the query module is configured to execute query of target state information corresponding to the second data index information in the readable and writable state bitmap and the read-only state bitmap;
the query data acquisition module is configured to acquire the query data matched with the second data index information under the condition that the target state information is the initial state;
and the sending module is configured to send the query data to a terminal corresponding to the data query request.
In one possible implementation, the at least one read-only data slice is a plurality of read-only data slices; the above apparatus may further include:
a monitoring module configured to perform monitoring a first amount of unit data in a deleted state in the readable and writable state bitmap and the read-only state bitmap, and a second amount of unit data in the readable and writable data slice and the at least one read-only data slice;
a merged segment determining module configured to determine at least two read-only data segments from the plurality of read-only data segments if a ratio of the first number to the second number is greater than a preset threshold;
the merging processing module is configured to perform merging processing on unit data in a non-deleted state in the at least two read-only data fragments to obtain merged data fragments;
a replacement module configured to perform a replacement of the at least two read-only data fragments with the merged data fragment.
In a possible implementation manner, the apparatus may further include:
the data adding device comprises an adding request responding module, a data adding module and a data adding module, wherein the adding request responding module is configured to execute the step of responding to a data adding request and dividing data to be added in the data adding request into at least one unit data to be added;
the data index configuration module is configured to configure third data index information for the at least one unit data to be added respectively;
a first adding module configured to perform adding the at least one unit data to be added and the third data index information to the readable and writable data slice;
and the second adding module is configured to add the third data index information to the readable and writable state bitmap associated with the readable and writable data fragment, and mark a state corresponding to the third data index information in the readable and writable state bitmap as an initial state.
In a possible implementation manner, the apparatus may further include:
a bitmap index configuration module configured to configure target bitmap index information for the unit data to be added;
the storage module is configured to perform associated storage of the third data index information and target bitmap index information into the readable and writable identifier mapping information associated with the readable and writable data fragment;
accordingly, the second adding module may include:
and the adding unit is configured to add the target bitmap index information into the readable and writable state bitmap, and mark the state corresponding to the target bitmap identification information in the readable and writable state bitmap as an initial state.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 11 is a block diagram illustrating an electronic device for data processing, which may be a terminal, according to an exemplary embodiment, and an internal structure thereof may be as shown in fig. 11. The electronic device comprises a processor, a memory, a network interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a method of data processing. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 11 is merely a block diagram of some of the structures associated with the disclosed aspects and does not constitute a limitation on the electronic devices to which the disclosed aspects apply, as a particular electronic device may include more or less components than those shown, or combine certain components, or have a different arrangement of components.
Fig. 12 is a block diagram illustrating an electronic device for data processing, which may be a server, according to an example embodiment, and an internal structure thereof may be as shown in fig. 12. The electronic device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a method of data processing.
Those skilled in the art will appreciate that the architecture shown in fig. 12 is merely a block diagram of some of the structures associated with the disclosed aspects and does not constitute a limitation on the electronic devices to which the disclosed aspects apply, as a particular electronic device may include more or less components than those shown, or combine certain components, or have a different arrangement of components.
In an exemplary embodiment, there is also provided an electronic device including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the data processing method as in the embodiments of the present disclosure.
In an exemplary embodiment, there is also provided a computer-readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a data processing method in an embodiment of the present disclosure. The computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product containing instructions is also provided, which when run on a computer, causes the computer to perform the method of data processing in the embodiments of the present disclosure.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, the computer program may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A data processing method, comprising:
when a preset discarding event is monitored, determining to-be-discarded data fragments from at least one read-only data fragment;
acquiring a read-only state bitmap associated with the to-be-discarded data fragment;
and deleting the data fragments to be discarded and the read-only state bitmap associated with the data fragments to be discarded.
2. The data processing method of claim 1, wherein the method further comprises:
acquiring mapping information of the to-be-discarded identifier associated with the to-be-discarded data fragment and bitmap index recording information; the to-be-discarded identification mapping information represents the corresponding relation between the data index information and the bitmap index information of each to-be-discarded data in the to-be-discarded data fragment;
deleting the mapping information of the identifier to be discarded;
and in the bitmap index recording information, modifying the bitmap index information in the to-be-discarded identification mapping information into an unused state.
3. The data processing method of claim 1, wherein the method further comprises:
responding to a data deletion request, wherein the data deletion request comprises first data index information of unit data to be deleted;
and in the readable and writable state bitmap, marking the state corresponding to the first data index information as a deleted state.
4. The data processing method of claim 1, wherein the method further comprises:
responding to a data query request, wherein the data query request comprises second data index information of unit data to be queried;
acquiring a read-only state bitmap associated with each read-only data fragment and a read-write state bitmap associated with the read-write data fragment;
inquiring target state information corresponding to the second data index information in the readable and writable state bitmap and the read-only state bitmap;
acquiring query data matched with the second data index information under the condition that the target state information is in the initial state;
and sending the query data to a terminal corresponding to the data query request.
5. The data processing method of claim 4, wherein the at least one read-only data slice is a plurality of read-only data slices; the method further comprises the following steps:
monitoring a first quantity of unit data in a deleted state in the readable and writable state bitmap and the read-only state bitmap, and a second quantity of unit data in the readable and writable data fragment and the at least one read-only data fragment;
determining at least two read-only data fragments from the plurality of read-only data fragments under the condition that the ratio of the first number to the second number is larger than a preset threshold;
merging the unit data in the non-deletion state in the at least two read-only data fragments to obtain merged data fragments;
replacing the at least two read-only data fragments with the merged data fragment.
6. The data processing method of claim 1, further comprising:
responding to a data adding request, and dividing data to be added in the data adding request into at least one unit data to be added;
configuring third data index information for the at least one unit data to be added respectively;
adding the at least one unit data to be added and the third data index information into the readable and writable data fragment;
and adding the third data index information to a readable and writable state bitmap associated with the readable and writable data fragment, and marking a state corresponding to the third data index information as an initial state in the readable and writable state bitmap.
7. A data processing apparatus, comprising:
the monitoring module is configured to monitor a preset discarding event and determine data fragments to be discarded from the read-only data fragments;
a first obtaining module configured to obtain a read-only state bitmap associated with the to-be-discarded data slice;
and the data discarding module is configured to delete the to-be-discarded data fragment and the read-only state bitmap associated with the to-be-discarded data fragment.
8. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the data processing method of any one of claims 1 to 6.
9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the data processing method of any of claims 1 to 6.
10. A computer program product comprising computer instructions, characterized in that the computer instructions, when executed by a processor, implement the data processing method of any of claims 1 to 6.
CN202111534833.2A 2021-12-15 2021-12-15 Data processing method and device, electronic equipment and storage medium Pending CN114443703A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111534833.2A CN114443703A (en) 2021-12-15 2021-12-15 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111534833.2A CN114443703A (en) 2021-12-15 2021-12-15 Data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114443703A true CN114443703A (en) 2022-05-06

Family

ID=81363990

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111534833.2A Pending CN114443703A (en) 2021-12-15 2021-12-15 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114443703A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8793290B1 (en) * 2010-02-24 2014-07-29 Toshiba Corporation Metadata management for pools of storage disks
CN110059090A (en) * 2019-04-19 2019-07-26 阿里巴巴集团控股有限公司 A kind of write-in/dump/merging/the querying method and device of bitmap index
CN110162672A (en) * 2019-05-10 2019-08-23 上海赜睿信息科技有限公司 Data processing method and device, electronic equipment and readable storage medium storing program for executing
CN111782134A (en) * 2019-06-14 2020-10-16 北京京东尚科信息技术有限公司 Data processing method, device, system and computer readable storage medium
CN111984651A (en) * 2020-08-21 2020-11-24 苏州浪潮智能科技有限公司 Column type storage method, device and equipment based on persistent memory
CN112069089A (en) * 2020-09-11 2020-12-11 杭州海康威视系统技术有限公司 Method and device for recycling storage blocks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8793290B1 (en) * 2010-02-24 2014-07-29 Toshiba Corporation Metadata management for pools of storage disks
CN110059090A (en) * 2019-04-19 2019-07-26 阿里巴巴集团控股有限公司 A kind of write-in/dump/merging/the querying method and device of bitmap index
CN110162672A (en) * 2019-05-10 2019-08-23 上海赜睿信息科技有限公司 Data processing method and device, electronic equipment and readable storage medium storing program for executing
CN111782134A (en) * 2019-06-14 2020-10-16 北京京东尚科信息技术有限公司 Data processing method, device, system and computer readable storage medium
CN111984651A (en) * 2020-08-21 2020-11-24 苏州浪潮智能科技有限公司 Column type storage method, device and equipment based on persistent memory
CN112069089A (en) * 2020-09-11 2020-12-11 杭州海康威视系统技术有限公司 Method and device for recycling storage blocks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孟必平;王腾蛟;李红燕;杨冬青: "分片位图索引:一种适用于云数据管理的辅助索引机制", 《计算机学报》, vol. 35, no. 11, 15 November 2012 (2012-11-15), pages 2306 - 2316 *

Similar Documents

Publication Publication Date Title
CN108287669B (en) Date storage method, device and storage medium
CN108053863B (en) Mass medical data storage system and data storage method suitable for large and small files
US9817835B2 (en) Efficient data synchronization for storage containers
CN109558404B (en) Data storage method, device, computer equipment and storage medium
KR102031588B1 (en) Method and system for implementing index when saving file
CN112039979B (en) Distributed data cache management method, device, equipment and storage medium
CN106951375B (en) Method and device for deleting snapshot volume in storage system
US11436194B1 (en) Storage system for file system objects
CN113515487B (en) Directory query method, computing device and distributed file system
CN111177302A (en) Business document processing method and device, computer equipment and storage medium
US10846338B2 (en) Data processing device, data processing method, and non-transitory computer readable medium
CN113297278B (en) Time sequence database, data processing method, storage device and computer program product
CN109542894B (en) User data centralized storage method, device, medium and computer equipment
CN112306993A (en) Data reading method, device and equipment based on Redis and readable storage medium
CN112052218B (en) Snapshot implementation method and distributed storage cluster
CN114443703A (en) Data processing method and device, electronic equipment and storage medium
US9626378B2 (en) Method for handling requests in a storage system and a storage node for a storage system
CN112379831A (en) Data management method and device, computer equipment and storage medium
CN113515518A (en) Data storage method and device, computer equipment and storage medium
CN118069611A (en) Cloning method and device of file system
CN113905252B (en) Data storage method and device for live broadcasting room, electronic equipment and storage medium
CN113220713B (en) Data query method and device, electronic equipment and storage medium
CN116795872A (en) Data query method, device, computer equipment and storage medium
CN110472167B (en) Data management method, device and computer readable storage medium
CN114217741A (en) Storage method of storage device and storage device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination